Commit Graph

231 Commits (2.5)

Author SHA1 Message Date
wei liu ad0bf9cad8
enhance: Optimize channel node balancing for uneven QN distribution (#42786) (#43423)
issue: #42860
pr: #42786
Fix channel node allocation when QueryNode count is not a multiple of
channel count. The previous algorithm used simple division which caused
uneven distribution with remainders.

Key improvements:
- Implement smart remainder distribution algorithm
- Refactor large function into focused helper functions
- Support two-phase rebalancing (release then allocate)
- Handle edge cases like insufficient nodes gracefully

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-07-21 17:04:54 +08:00
wei liu 4952b8c416
enhance: apply load config changes after QueryCoord restart (#43108) (#43236)
issue: #43107
pr: #43108
- Add checkLoadConfigChanges() to apply load config during startup
- Call config check in startQueryCoord() after restart
- Skip auto-updates for collections with user-specified replica numbers
- Add is_user_specified_replica_mode field to preserve user settings
- Add comprehensive unit tests with mockey

Ensures existing collections use latest cluster-level config after
restart.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-07-14 10:22:50 +08:00
SimFG 6e18ededab
fix: [2.5] mockery too unavailable after upgrade golang version (#41522)
- issue: ##41291
- pr: #41481

Signed-off-by: SimFG <bang.fu@zilliz.com>
2025-04-25 14:40:40 +08:00
SimFG 18eb627533
fix: [2.5] Update logging context and upgrade dependencies (#41319)
- issue: #41291
- pr: #41318

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-04-24 23:50:40 +08:00
liliu-z cb0f984155
enhance: Revert "separate for index completed (#40873)" (#41152)
This reverts commit 23e579e324. #40873

issue: #39519

Signed-off-by: Li Liu <li.liu@zilliz.com>
2025-04-08 17:36:30 +08:00
Chun Han 23e579e324
separate for index completed (#40873)
related: https://github.com/milvus-io/milvus/issues/40781

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-05 10:20:24 +08:00
Xianhui Lin 249d5b9b41
fix: jsonstats check if cache schema is nil lazy describecollection (#41068)
fix: jsonstats check if cache schema is nil lazy describecollection
pr:https://github.com/milvus-io/milvus/pull/38039
issue:https://github.com/milvus-io/milvus/issues/36995

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-03 00:32:21 +08:00
wei liu d185a8f941
enhance: Balance the collection with the largest row count first (#40958)
issue: #37651
pr: #40297
this PR enable to balance the collection with largest row count first,
to avoid temporary migration of small table data to new nodes during
their onboarding, only to be moved out again after the large table
balance, which would cause unnecessary load.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-03-31 16:14:21 +08:00
Xianhui Lin f5e9dea2aa
fix: [2.5]fix the garbage cleanup logic of jsonkey stats && improve json key stats filer (#40039)
fix: fix the garbage collection cleanup logic of jsonkey stats &&
improve json key stats filer
issue: https://github.com/milvus-io/milvus/issues/36995
https://github.com/milvus-io/milvus/issues/40034
https://github.com/milvus-io/milvus/issues/40041
https://github.com/milvus-io/milvus/issues/40106
https://github.com/milvus-io/milvus/issues/40138
pr: https://github.com/milvus-io/milvus/pull/38039

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-03-13 20:18:10 +08:00
Bingyi Sun 683b26ffb7
feat: cherry pick json path index (#40313)
issue: #35528 
pr: #36750 
this pr includes json path index pr and some related prs:
1. update tantivy version #39253 
2. json path index #36750 
3. fall back to brute force #40076 
4. term filter #40140 
5. bug fix #40336

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-10 22:14:05 +08:00
congqixia 709594f158
enhance: [2.5] Use v2 package name for pkg module (#40117)
Cherry-pick from master
pr: #39990
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-23 00:46:01 +08:00
Xianhui Lin f0964f769d
enhance: [2.5]Add json key inverted index in stats for optimization (#39876)
Add json key inverted index in stats for optimization
issue: https://github.com/milvus-io/milvus/issues/36995
pr: https://github.com/milvus-io/milvus/pull/38039

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-02-16 20:12:15 +08:00
yihao.dai 6773fb10a8
enhance: [2.5] Read metadata concurrently to accelerate recovery (#38900)
Read metadata such as segments, binlogs, and partitions concurrently at
the collection level.

issue: https://github.com/milvus-io/milvus/issues/37630

pr: https://github.com/milvus-io/milvus/pull/38403

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-01-16 17:53:01 +08:00
yihao.dai 9d2a0e775c
fix: [2.5] Fix slow dist handle and slow observe (#38905)
1. Provide partition&channel level indexing in the collection target.
2. Make SegmentAction not wait for distribution.
3. Remove scheduler and target manager mutex
4. Optimize logging to reduce CPU overhead.

issue: https://github.com/milvus-io/milvus/issues/37630

pr: https://github.com/milvus-io/milvus/pull/38566

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-01-16 17:07:02 +08:00
Zhen Ye adfc3f945e
enhance: record memory size (uncompressed) item for index (#38844)
issue: #38715 
pr: #38770

- Current milvus use a serialized index size(compressed) for estimate
resource for loading.
- Add a new field MemSize (before compressing) for index to estimate
resource.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-14 10:33:06 +08:00
jaime b0afe32c98
fix: unstable ut in leader_vew_manager.go file (#39162)
issue: #38672
pr: #39161

Signed-off-by: jaime <yun.zhang@zilliz.com>
2025-01-10 19:54:57 +08:00
Zhen Ye 95809ca767
enhance: make new go package to manage proto (#39128)
issue: #39095
pr: #39114

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-10 10:53:01 +08:00
jaime 0693634f62
enhance: add db name in replica description (#38673)
issue: #36621
pr: #38672

Signed-off-by: jaime <yun.zhang@zilliz.com>
2025-01-09 19:43:04 +08:00
wei liu cb0618b2d4
fix: [2.5] Querycoord will trigger unexpected balance task after restart (#38725)
issue: https://github.com/milvus-io/milvus/issues/38606
pr: https://github.com/milvus-io/milvus/pull/38630

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-12-25 16:14:49 +08:00
yihao.dai d3c174b0f1
enhance: Accelerate observe collection (#38028)
1. A collection should observe the channel only once.
2. A collection should check the CollectionLoadPercent for updates only
once.
3. Skip saving coll/partition meta if there are no changes, primarily to
accelerate collection observation after recovery.

issue: https://github.com/milvus-io/milvus/issues/37630

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-12-17 14:14:45 +08:00
tinswzy 27229f7907
enhance: refine exists log print with ctx (#38080)
issue: #35917 
Refines exists log print with ctx

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-12-14 22:36:44 +08:00
Zhen Ye 833c74aa66
enhance: add detail, replica count for resource group (#38314)
issue: #30647

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-12-13 14:14:50 +08:00
wei liu 950203aba0
enhance: Optimize save colelction target latency (#38345)
issue: #38237
this PR only use better compression level for proto msg which is larger
than 1MB, and use a lighter compression level for smaller proto msg,
which could get a better latency in most case.

this PR could reduce the latency from 22.7s to 4.7s with 10000
collctions and each collections has 1000 segments.

before this PR:
BenchmarkTargetManager-8 1 22781536357 ns/op 566407275088 B/op 11188282
allocs/op
after this PR:
BenchmarkTargetManager-8 1 4729566944 ns/op 36713248864 B/op 10963615
allocs/op

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-12-11 10:12:43 +08:00
congqixia 36946cc9ce
enhance: Set loaded collection/partition number to metrics (#38271)
Related to #36456
Previous PR: #38471 #38233

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-06 16:18:40 +08:00
congqixia 051bc280dd
enhance: Make dynamic load/release partition follow targets (#38059)
Related to #37849

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-05 16:24:40 +08:00
congqixia 32645fc28a
enhance: Unify querycoord meta metrics (#38233)
Related to #36456

Unify collection/partition number metrics to collection manager in case
of unwant missing modification

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-05 15:48:39 +08:00
tinswzy 7944538ade
enhance: Add ctx param to KV operation interfaces (#38154)
issue: #35917 
Refine KV operation interfaces by adding a ctx param

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-12-05 15:16:41 +08:00
tinswzy e76802f910
enhance: refine querycoord meta/catalog related interfaces to ensure that each method includes a ctx parameter (#37916)
issue: #35917 
This PR refine the querycoord meta related interfaces to ensure that
each method includes a ctx parameter.

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-11-25 11:14:34 +08:00
jaime 7bbfe86bcd
enhance: add list index and segment index retrieval API for WebUI (#37861)
issue: #36621

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-22 16:58:34 +08:00
congqixia b34bfb98a0
enhance: Refine Replica manager colle2Replicas secondary index (#37906)
Related to #37630

This PR add a new util coll2Replicas secondary index to reduce map
access & iteration while get replicas by collection

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-22 11:54:32 +08:00
yihao.dai b6612e02b4
enhance: Reduce GetIndexInfos calls (#37695)
Batch `GetIndexInfos` calls for segments to reduce RPC calls.

issue: https://github.com/milvus-io/milvus/issues/37634

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-19 14:24:31 +08:00
congqixia b0bd290a6e
enhance: Use internal json(sonic) to replace std json lib (#37708)
Related to #35020

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-18 10:46:31 +08:00
jaime 1d06d4324b
fix: Int64 overflow in JSON encoding (#37657)
issue: ##36621

- For simple types in a struct, add "string" to the JSON tag for
automatic string conversion during JSON encoding.
- For complex types in a struct, replace "int64" with "string."

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-14 22:52:30 +08:00
jaime 1e8ea4a7e7
feat: add segment/channel/task/slow query render (#37561)
issue: #36621

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-12 17:44:29 +08:00
wei liu 266f8ef1f5
fix: Search may return less result after qn recover (#36549)
issue: #36293 #36242
after qn recover, delegator may be loaded in new node, after all segment
has been loaded, delegator becomes serviceable. but delegator's target
version hasn't been synced, and if search/query comes, delegator will
use wrong target version to filter out a empty segment list, which
caused empty search result.

This pr will block delegator's serviceable status until target version
is synced

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-12 16:34:28 +08:00
wei liu 61a5b15ada
fix: Lost loading collection's updateTs after qc restart (#37538)
issue: #37537

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-11 14:34:28 +08:00
yihao.dai ff9bdf7029
fix: Fix load slowly (#37454)
When there're a lot of loaded collections, they would occupy the target
observer scheduler’s pool. This prevents loading collections from
updating the current target in time, slowing down the load process.
This PR adds a separate target dispatcher for loading collections.

issue: https://github.com/milvus-io/milvus/issues/37166

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-09 07:48:26 +08:00
wei liu a03157838b
enhance: Enable node assign policy on resource group (#36968)
issue: #36977
with node_label_filter on resource group, user can add label on
querynode with env `MILVUS_COMPONENT_LABEL`, then resource group will
prefer to accept node which match it's node_label_filter.

then querynode's can't be group by labels, and put querynodes with same
label to same resource groups.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-08 11:18:27 +08:00
jaime f348bd9441
feat: add segment,pipeline, replica and resourcegroup api for WebUI (#37344)
issue: #36621

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-07 11:52:25 +08:00
wei liu 8714774305
fix: search/query failed due to segment not loaded (#37403)
issue: #36970
cause release segment and balance channel may happen at same time, and
before new delegator become serviceable, if release segment exeuctes on
new delegator, and search/query comes on old delegator, then release
segment and query segment happens in parallel, if release segment
execute first in worker, then search/query will got a SegmentNodeLoaded
error.

This PR add serviceable filter on delegator, then all load/release
segment operation will happens on serviceable delegator.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-06 15:10:25 +08:00
wei liu f029314e20
fix: Dynamic release parition may fail search/query. (#37049)
issue: #33550
cause wrong impl of UpdateCollectionNextTarget, if ReleaseCollection and
UpdateCollectionNextTarget happens at same time, the the released
partition's segment list may be add to target again, and delegator will
be marked as unserviceable due to lack of segment.

This PR fix the impl of UpdateCollectionNextTarget

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-10-24 01:03:28 +08:00
Bingyi Sun 6851738fd1
fix: fix `make generate-mockery` panic with go1.22 (#36830)
https://github.com/milvus-io/milvus/issues/36831
Fix `make generate-mockery` panic.

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-17 12:11:31 +08:00
sthuang 4493aa2142
fix: querycoord collection num metric (#36471)
related to: #36456

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-09-26 14:23:13 +08:00
wei liu 3cd0b26285
enhance: Enable dynamic update loaded collection's replica (#35822)
issue: #35821
After collection loaded, if we need to increase/decrease collection's
replica, we need to release and load it again.

milvus offers 4 solution to update loaded collection's replica, this PR
aims to dynamic change the replica number without release, and after
replica number changed, milvus will execute load replica or release
replica in async, and the replica loaded status can be checked by
getReplicas API.

Notice that if set too much replicas than querynode can afford,the new
replica won't be loaded successfully until enough querynode joins.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-25 10:13:18 +08:00
congqixia f985173da0
fix: Fill load field list from old version load info (#35993)
See also #35959

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-05 16:57:05 +08:00
wei liu c84ea5465c
fix: Fix some replicas don't participate in the query after the failure recovery (#35850)
issue: #35846
querycoord will notify proxy to update shard leader cache after
delegator location changes, but during querynode's failure recovery,
some delegator may become unserviceable due to lacking of segments, and
back to serviceable after segment loaded, so we also need to notify
proxy to invalidate shard leader cache when delegator serviceable state
changes.

This PR will maintain querynode's serviceable state during heartbeat,
and notify proxy to invalidate shard leader cache if serviceable state
changes.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-03 15:39:03 +08:00
Xiaofan 0dc5e89007
enhance: reduce the log level of frequent log (#35652)
fix #35651

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2024-08-25 16:20:57 +08:00
congqixia 2fbc628994
feat: Support field partial load collection (#35416)
Related to #35415

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-20 16:49:02 +08:00
wei liu c0200eec39
enhance: limit getSegmentInfo batch size to avoid excced grpc message limit (#35394)
issue: #35395

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-15 19:17:00 +08:00
SimFG b2cc4b0776
feat: add the rbac msg and send them to the replicate channel (#35392)
- issue: #35391

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-08-15 12:06:52 +08:00