fix channel meta mergeFlushSegment not idempotent may cause data loss
when update compacted segment buffer, because may update buffer to
segment which has been covered.
relate: https://github.com/milvus-io/milvus/issues/31548
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
Cherry-pick from master
pr: #31616
See also #28491#31240
When colleciton number is large, querycoord saves collection target one
by one, which is slow and may block querycoord exits.
In local run, 500 collections scenario may lead to about 40 seconds
saving collection targets.
This PR changes the `SaveCollectionTarget` interface into batch one and
organizes the collection in 16 per bundle batches to accelerate this
procedure.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Cherry-pick from master
pr: #31473
See also #31470#31506
This PR adds nodeID assignment verification before updating channel
checkpoints.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
/kind improvement
pr: #31271
fix: https://github.com/milvus-io/milvus/issues/31272
This pr add more metrics, which are:
Slow query count, which the duration considered as slow can be
configurable;
Number of deleted entities;
Number of entities per collection;
Number of loaded entities per collection;
Number of indexed entities;
Number of indexed entities, per collection, per index and whether it's a
vetor index;
Quota states (LongTimeTickDelay, MemoryExhuasted, DiskQuotaExhuasted)
per database;
---------
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
issue: #28491
pr: #31240
after querycoord restart, it will pull a new target, which include
channel and segment list. when segments loaded on querynode has reached
the target, the collection could provide search/query. but if segment
list changes by time, ater querycoord pull a new target, it will takes a
few minutes to catch up the target's segment distribution. and before
that, query/search will fail due to lack of segments.
This PR save the current loaded target to meta storein querycoord's stop
progress, and recover it when query coord starts, to speed up the target
recovery time.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30816
pr: #31202
cause balance channel will stuck until leader view catch up the current
target, then start to unsub the old delegator. which make sure that the
new delegator can provide search before release old delegator. but
another logic in segment_checker skip loading segment during balance
channel. so during balance channel, if query node crash, new delegator
can't catch up target forever, then stuck forever.
This PR remove the rule that skip loading segment during balance channel
to avoid the logic dead lock here.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #31222
pr: #31256
grpcclient's `call` func return a unrecoverable error, then the caller's
retry policy also breaks due to this unrecoverable error.
This PR introduce `retry.Handle`, the new func use `func() (bool,
error)` as input parameters, which return `shouldRetry` directly, to
avoid grpcclient return a unrecoverable error
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30531
pr: #31277
cause get client from `shardClientMgr`, doesn't means query node is
unavailable. because of the ref counter policy in `shardClientMgr`,
which will clean the client, if no collection use qn as shard leader.
This PR fix that set node unreachable when get shard client failed.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #31351
pr: #31380
This PR fixed that search doesn't expire shard leader cache when send
request to query node failed, which make every request keep trying to
connect a offline query node
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #31162
pr: #31379
when give scope CurrentTargetFirst/NextTargetFirst, it's expected to
scan both current and next target.
This PR fixed wrong behavior of CurrentTargetFirst/NextTargetFirst in
target manager, which may cause unexpected task generated, and load
collection may stuck forever due to dirty leader view.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Cherry-pick from master
pr: #31388
See also #30806
`formatKey` may cost lots of CPU on string processing under high QPS
scenario, this PR adds a formattedKeys cache preventing string operation
in each param get value.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>