issue: # #34545
Print warn log instead of check health fail if orphan channel cp meta is
found in health check request.
Signed-off-by: jaime <yun.zhang@zilliz.com>
See also #34483
Some lint issues are introduced due to lack of static check run. This PR
fixes these problems.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #34304
cosine is more widely used in float vectors, and cosine and hamming
distance are 'metrics' which have good geometric properties
Signed-off-by: chasingegg <chao.gao@zilliz.com>
The import is dependent on syncTask, which in turn relies on the
allocator. This PR pre-allocate the necessary IDs for import syncTask.
issue: https://github.com/milvus-io/milvus/issues/33957
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Log will be confusing when `Reassign` channel operation failed for both
success & failure log will be printed in row. This PR continue the loop
to avoid this output.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
When import failed, mark the import segment as dropped instead of drop
it directly to prevent generating orphaned files.
issue: https://github.com/milvus-io/milvus/issues/34068
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #33540
1. gorwing L0 segments is invisible to datacoord.
2. flushed L0 segments need to clean by datacoord.
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Due to the removal of injection and syncSegments from the compaction, we
need to ensure that no compaction is successfully executed during the
rolling upgrade. This PR renames Compaction to CompactionV2, with the
following effects:
- New datacoord + old datanode: Utilizes the CompactionV2 interface,
resulting in the datanode error "CompactionV2 not implemented," causing
compaction to fail;
- Old datacoord + new datanode: Utilizes the CompactionV1 interface,
resulting in the datanode error "CompactionV1 not implemented," causing
compaction to fail.
issue: https://github.com/milvus-io/milvus/issues/32809
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
1. Fill log ID of stats log from import
2. Add a check to validate the log ID before writing to meta
issue: https://github.com/milvus-io/milvus/issues/33476
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related to #25309
- Remove ctx from struct
- Add ctx parameters for internal check logic methods
- Add Waitgroup to make sure worker goroutine quit before close returns
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Query slot of compaction in datanode, and transfer the control logic for
limiting compaction tasks from datacoord to the datanode.
issue: https://github.com/milvus-io/milvus/issues/32809
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
See also #33062
This PR:
- Add `lock.RWMutex` & `lock.Mutex` alias to switch implementation based
on build flags
- When build flags has `test` in it, use `go-deadlock` to detect
possible deadlocks
- Replace all `sync.RWMutex` & `sync.Mutex` in datacoord pkg
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #33005
1. add `MemorySize` field for insert binlog.
2. `LogSize` means the file size in the storage object.
3. `MemorySize` means the size of the data in the memory.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
See also #32165
This PR modify the `SelectSegments` interface to utilizing collection id
information when selecting segment with provided collection
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #19095,#29655,#31718
- Change `ListWithPrefix` to `WalkWithPrefix` of OOS into a pipeline
mode.
- File garbage collection is performed in other goroutine.
- Segment Index Recycle clean index file too.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Related to https://github.com/milvus-io/milvus/issues/32165
1. nodeid based channel store access should use map access instead of
iteration.
2. The join-ish functions calls are slow when # collections/segments
increases (e.g. 10k).
e.g.
getNumRowsOfCollectionUnsafe is O(num_segments); GetAllCollectionNumRows
is of O(num_collections*num_segments).
Signed-off-by: yiwangdr <yiwangdr@gmail.com>
See also #32165
`ChannelManager.Match` is a frequent operation for datacoord. When the
collection number is large, iteration over all channels will cost lots
of CPU time and time consuming.
This PR change the data structure storing datanode-channel info to map
avoiding this iteration when checking channel existence.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #32029
lack of logic to clean collection info in datacoord's meta, This PR
clean collection info after drop channel, to avoid collection info leak
in datacoord
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Tidy the following test codes
- Remove channel in newTestServer
- Remove newTestServerWithMeta
- Remove newTestServer2
- Remove testDataCoordBase
- Use the same func for handleTTmsg and handleRPCTTmsg
See also: #31620
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
issue: #29892
This PR:
1. Move the process of gathering materialized search info to when the
search plan is created, before it goes to each segment, to avoid
repeated work and access the plan node under multi-threaded
circumstances.
2. Enforce the supported MV type to `VARCHAR`
3. Add integration test
Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
See also #32066
This PR make coordinator register successful and let
`ProcessActiveStandBy` run async. And roles may receive stop signal and
notify servers.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #31791
This segment meta is implemented in COW pattern. All modification on
segment info shall happen on the copied version of it.
This PR clones the child segment info for `GetSegmentInfo` in case data
race problem.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Feature Introduced:
1. Ensure ImportV2 waits for the index to be built
Enhancements Introduced:
1. Utilization of local time for timeout ts instead of allocating ts
from rootcoord.
3. Enhanced input file length check for binlog import.
4. Removal of duplicated manager in datanode.
5. Renaming of executor to scheduler in datanode.
6. Utilization of a thread pool in the scheduler in datanode.
issue: https://github.com/milvus-io/milvus/issues/28521
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #31662#31409
during FilterIndexedSegment in GetRecoveryInfo, it try to acquire index
meta's read lock for every segment. when a collection has thousands of
segments, which may blocked for more than 10 seconds and even longer.
cause `AddSegmentIndex` may also triggered frequently, which try to get
the write lock.
This PR avoid acquire index meta's lock for each segment
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #29892
This PR
1. Pass Materialized View (MV) search information obtained from the
expression parsing planning procedure to Knowhere. It only performs when
MV is enabled and the partition key is involved in the expression. The
search information includes:
1. Touched field_id and the count of related categories in the
expression. E.g., `color == red && color == blue` yields `field_id ->
2`.
2. Whether the expression only includes AND (&&) logical operator,
default `true`.
3. Whether the expression has NOT (!) operator, default `false`.
4. Store if turning on MV on the proxy to eliminate reading from
paramtable for every search request.
5. Renames to MV.
## Rebuttals
1. Did not write in `ExtractInfoPlanNodeVisitor` since the new scalar
framework was introduced and this part might be removed in the future.
2. Currently only interested in `==` and `in` expression, `string` data
type, anything else is a bonus.
3. Leave handling expressions like `F == A || F == A` for future works
of the optimizer.
## Detailed MV Info

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
"-1" as `InvalidPartitionID` previously used as All partition place
holder in delete cases. It's confusing and hard to maintain when a const
var has more than one meaning.
This PR add `AllPartitionsID` to replace these usages in delete
scenarios.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
/kind improvement
fix: #31272
This pr add more metrics, which are:
- Slow query count, which the duration considered as slow can be
configurable;
- Number of deleted entities;
- Number of entities imported;
- Number of entities per collection;
- Number of loaded entities per collection;
- Number of indexed entities;
- Number of indexed entities, per collection, per index and whether it's
a vetor index;
- Quota states (LongTimeTickDelay, MemoryExhuasted, DiskQuotaExhuasted)
per database;
---------
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
See also #31362
This PR make datacoord garbage collection scan operation using differet
interval than other opeartion.
This interval is a newly added param item, which default value is 7*24
hours.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
didn't mark the compact as failure if it's simply an rpc error when
GetCompactionPlansResults
see #31352
Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
With the presence of L0 segments, there's no longer a need to add import
segments to the datanode.
issue: https://github.com/milvus-io/milvus/issues/28521
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
add sparse float vector support to different milvus components,
including proxy, data node to receive and write sparse float vectors to
binlog, query node to handle search requests, index node to build index
for sparse float column, etc.
https://github.com/milvus-io/milvus/issues/29419
---------
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
1. The Import APIs now provide detailed progress information for each
imported file, including details such as file name, file size, progress,
and more.
2. The APIs now return the collection name and the completion time.
3. Other modifications include changing jobID to jobId and other similar
adjustments.
issue: https://github.com/milvus-io/milvus/issues/28521
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Replacing the current import API v1 implementation with the v2
implementation.
issue: https://github.com/milvus-io/milvus/issues/28521
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
This PR includes the following adjustments:
1. To prevent channelCP update task backlog, only one task with the same
vchannel is retained in the updater. Additionally, the lastUpdateTime is
refreshed after the flowgraph submits the update task, rather than in
the callBack function.
2. Batch updates of multiple vchannel checkpoints are performed in the
UpdateChannelCheckpoint RPC (default batch size is 128). Additionally,
the lock for channelCPs in DataCoord meta has been switched from key
lock to global lock.
3. The concurrency of UpdateChannelCheckpoint RPCs in the datanode has
been reduced from 1000 to 10.
issue: https://github.com/milvus-io/milvus/issues/30004
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: jaime <yun.zhang@zilliz.com>
Co-authored-by: congqixia <congqi.xia@zilliz.com>
See also #31103
This PR add `listIndexes` API for datacoor server to list all indexes
for provided collection.
Comparing to the existing `DescribeIndex` API, the new one does NOT
check the segment index building progress to ease the burden when
invoking it
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Trigger l0 compaction when l0 views don't change
So that leftover l0 segments would be compacted in the end.
1. Refresh LevelZero plans in comactionPlanHandler, remove the meta
dependency
of compaction trigger v2
2. Add ForceTrigger method for CompactionView interface
3. rename mu to taskGuard
4. Add a new TriggerTypeLevelZeroViewIDLE
5. Add an idleTicker for compaction view manager
See also: #30098, #30556
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
This PR introduces novel managerial roles for importv2:
1. ImportMeta: To manage all the import tasks;
2. ImportScheduler: To process tasks and modify their states;
3. ImportChecker: To ascertain the completion of all tasks and instigate
relevant operations.
issue: https://github.com/milvus-io/milvus/issues/28521
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
See also #30538
Previously the `SelectSegments` changed to clone all return value
preventing possible update to returned info.
Since meta is implemented following COW rules, this shall not happen and
any update on segment shall have copy before it.
This PR:
- Remove clone for read-only Get segment info
- Add Segment Operator abstraction for changing segment
- Implemnt COW for updating MaxRowNum
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
If DC restarted, those unkonwn compaction tasks
will never get call back in DN, so that the segments in the compaction
task will be locked, unable to sync and compaction again, blocking cp
advance and compaction executing.
See also: #30137
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
1. add coordinator graceful stop timeout to 5s
2. change the order of datacoord component while stop
3. change querynode grace stop timeout to 900s, and we should
potentially change this to 600s when graceful stop is smooth
issue: #30310
also see pr: #30306
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Compaction would copy logPaths from comapctFrom segA to compactTo segB,
and previous code would copy the logPath directly, causing there're
full-logPaths-of-segA in compactTo segB's meta. So, for the next
compaction of segB, if segA has been GCed, Download would report error
"The sperified key not found".
This PR makes sure compactTo segment's meta contains logID only. And
this PR also refines CompleteComapctionMutation, increasing some
readability and merge two methods into one.
See also: #30496
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
patch search cache param from index configs when index meta could not
get the search cache size key
#issue: #30113
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
Refine compaction interfaces in datacoord, support compaction result
with more than one segment. Prepare for major compaction.
related: #30633
Signed-off-by: wayblink <anyang.wang@zilliz.com>
1. Increase maxCount of L0 compaction tasks to 30
This could reduce the l0 compaction task number by 30% for
high-frequently-generated-small l0 segments, with the maximum size 64MB
stay not changed. So that l0 segments would accumulate slower and
decrease the mem presure caused by L0 segment for QueryNode
2. Add force Trigger for later manual timely l0 compaction triggers.
See also: #30191, #30556
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
This pr decoups importing segment from flush process by:
1. Exclude the importing segment from the flush policy, this approch
avoids notifying the datanode to flush the importing segment, which may
not exist.
2. When RootCoord call Flush, DataCoord directly set the importing
segment state to `Flushed`.
issue: https://github.com/milvus-io/milvus/issues/30359
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
syncMgr.Block() will lock the segment when executing compaction.
Previous implementation was unable to Unblock thoese segments when
compaction failed. If next compaction of the same segments arrives,
it'll stuck forever and block all later compation tasks.
This PR makes sure compaction executor would Unblock these segments
after a failure compaction.
Apart form that, this PR also refines some logs and clean some codes of
compaction, compactor:
1. Log segment count instead of segmentIDs to avoid logging too many
segments
2. Flush RPC returns L1 segments only, skip L0 and L2
3. CompactionType is checked in `Compaction`, no need to check again
inside compactor
4. Use ligter method to replace `getSegmentMeta`
5. Log information for L0 compaction when encounters an error
See also: #30213
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
See also #30167
After support open telemetry tracing, we want to have traceID as well,
this PR adds util functions to set traceID with span & propagate traceID
between different context.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #25639
/kind improvement
When the number of vector columns increases, the number of rows per
segment will decrease. In order to reduce the impact on vector indexing
performance, it is necessary to increase the segment max limit.
If a collection has multiple vector fields with memory and disk indices
on different vector fields, the size limit after segment compaction is
the minimum of segment.maxSize and segment.diskSegmentMaxSize.
Signed-off-by: xige-16 <xi.ge@zilliz.com>
---------
Signed-off-by: xige-16 <xi.ge@zilliz.com>
After #28873, PartitionID and CollectionID should be filled in
CompactionSegmentBinlog so that DataNode can compose
the correct logPath. However There're some places left forgotten to fill
in the information, causing Datanode downloading `xxx/0/0/xxxx/xxxx`
binlogs during compaction
See also: #30213
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
Resolves#30167
This PR add tracing for all compaction from the task start in datacoord
and execution procedures in datanode.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
This PR discontinuing the subscription to the mq and, instead, employing
the channel checkpoint as the DML and starting position for the import
segments.
issue: https://github.com/milvus-io/milvus/issues/30106
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
See also: #28873
When datanode returns error or go offline during GetCompactionResult
call, the compress binlog logic will panic since it was using a nil
result
This PR move it after the CheckRPCCall error to prevent this case.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
don't store logPath in meta to reduce memory, when service get
segmentinfo, generate logpath from logid.
#28885
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
If segment has more than 128 log fils, drop segment will exceed etcd txn
ops limit, which will failed the drop segment request
This PR drop segment meta info with prefix, to avoid drop segment meta
failed
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #27675
For `GetRecoveryInfo` & `GetRecoveryInfoV2`, Level zero segment ids
shall be specified in vchan info so that querycoord could re-fetch
current segment info during watch procedure without having all segment
info
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
THe results don't meet our requirements, and the code hasn't been
maintained for a long time.
See also: #29447
Signed-off-by: yangxuan <xuan.yang@zilliz.com>