issue: #38237
pr: #38345
this PR only use better compression level for proto msg which is larger
than 1MB, and use a lighter compression level for smaller proto msg,
which could get a better latency in most case.
this PR could reduce the latency from 22.7s to 4.7s with 10000
collctions and each collections has 1000 segments.
before this PR:
BenchmarkTargetManager-8 1 22781536357 ns/op 566407275088 B/op 11188282
allocs/op
after this PR:
BenchmarkTargetManager-8 1 4729566944 ns/op 36713248864 B/op 10963615
allocs/op
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: wei liu <wei.liu@zilliz.com>
Cherry-pick from 2.4
pr: #38032
issue: #38031
cause call `cli.SyncSegments` use ctx which already be override and
canceled, so SyncSegments rpc will always failed.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: wei liu <wei.liu@zilliz.com>
Related to #37630
Data distribution became too large when segment number was huge. This PR
trims the index info struct and return needed info only.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #37630
TSafe manager is too complex for current implementation and each
delegator need one goroutine waiting for tsafe update event.
Tsafe updating could be executed in pipeline. This PR remove tsafe
manager and simplify the entire logic of tsafe updating.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #37630
Previously the loaded collection metrics was calculated via scanning all
loaded segment in segment manager, which is slow and buggy
implementation.
This PR:
- Move collection num metrics to collection manager
- Remove deprecated loaded partition metrics update logic
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
When standby transitions to active, the component state changes to
Initialize. If the initialization takes too long (exceeding the liveness
probe's maximum retries), the standby pod is stopped and fails to start.
This PR removes the Initialize state during standby transitions in
rolling upgrades. The state now switches directly from standby to
healthy, preventing health check failures.
issue: https://github.com/milvus-io/milvus/issues/37630
pr: https://github.com/milvus-io/milvus/pull/38308
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Cherry pick from pr
#38311
Related to #37630
Add secondary index with vchannel to reduce `GetBy` rlock holding time
when segment number is large.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/38142
current balance channel policy only consider current collection's
distribution, so if all collections has 1 channel, and all channels has
been loaded on same querynode, after querynode num increase, balance
channel won't be triggered.
This PR enable score based balance channel policy, to achieve:
1. distribute all channels evenly across multiple querynodes
2. distribute each collection's channel evenly across multiple
querynodes.
pr: https://github.com/milvus-io/milvus/pull/38143
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: Wei Liu <wei.liu@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/38237
query coord will save collection's target during stop progress, which
will be used for new querycoord's fast recover. but if milvus cluster
has thounsands of collections, which make query coord's stop progress
much more slower than expected.
this PR refine the impl to save collection's target to etcd when target
update, and clean it when collection released.
pr: https://github.com/milvus-io/milvus/pull/38238
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: Wei Liu <wei.liu@zilliz.com>
issue: #33285
pr: #37815
- remove the rpc layer of coordinator when enabling standalone or
mixcoord
- move health check into init
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Related to #37630
This PR add a new util coll2Replicas secondary index to reduce map
access & iteration while get replicas by collection
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
1. DataNode: Skip generating BF during the insert phase (BF will be
regenerated during the sync phase).
2. QueryNode: Skip generating or maintaining BF for growing segments;
deletion checks will be handled in the segcore.
issue: https://github.com/milvus-io/milvus/issues/37630
pr: https://github.com/milvus-io/milvus/pull/38129
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/37764
- add a local client to call local server directly for
querycoord/rootcoord/datacoord.
- enable local client if milvus is running mixcoord or standalone mode.
Signed-off-by: chyezh <chyezh@outlook.com>
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: Zhen Ye <chyezh@outlook.com>
1. Introduce a data view mechanism for DataCoord, attempting to update
each collection's data view periodically.
2. QueryCoord maintains a cache of data view versions. Before
batch-fetching recovery info, it retrieves all versions and only fetches
recovery info for collections with updated versions.
3. Return DataCoord's current data view when fetching RecoverInfo.
issue: https://github.com/milvus-io/milvus/issues/37743,
https://github.com/milvus-io/milvus/issues/37630
pr: https://github.com/milvus-io/milvus/pull/37863
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #37764
- add a local client to call local server directly for
querycoord/rootcoord/datacoord.
- enable local client if milvus is running mixcoord or standalone mode.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #33285
pr: #37722
- move most cgo opeartions related to search/query into segcore package
for reusing for streamingnode.
- add go unittest for segcore operations.
Signed-off-by: chyezh <chyezh@outlook.com>