Commit Graph

22 Commits (2.5)

Author SHA1 Message Date
wei liu 4a05180f88
enhance: [2.5] support balancing multiple collections in single trigger (#41875) (#42134)
issue: #41874
pr: #41875
- Optimize balance_checker to support balancing multiple collections
simultaneously
- Add new parameters for segment and channel balancing batch sizes
- Add enableBalanceOnMultipleCollections parameter
- Update tests for balance checker

This change improves resource utilization by allowing the system to
balance multiple collections in a single trigger with configurable batch
sizes.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-05-28 23:18:30 +08:00
wei liu 2e8445c2ef
fix: balance checker may enter infinite normal balance loop after balance suspension (#41196)
issue: #41194 
pr: #41195
- Refactor hasUnbalancedCollection flag handling to function scope
- Ensure tracking sets clearance when no balance needed
- Add deferred cleanup for both normal/stopping balance paths
- Add unit tests for collection tracking scenarios

The changes ensure tracking sets (normalBalanceCollectionsCurrentRound
and stoppingBalanceCollectionsCurrentRound) are properly cleared when:
- All collections in current round are balanced
- Balance checks return early due to unready targets
- Balance feature flags are disabled

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-04-10 15:18:28 +08:00
wei liu 37a533fe6d
fix: [2.5] Address manual balance and balance check issues (#41038)
issue: #37651
pr: #41037
- Fix context propagation for manual balance segment task creation from
PR #38080.
- Optimize stopping balance by preventing redundant checks per round,
addressing performance regression from PR #40297.
- Decrease default `checkBalanceInterval` from 3000ms to 300ms.
- Correct minor log messages in `BalanceChecker`.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-04-03 01:26:23 +08:00
wei liu d185a8f941
enhance: Balance the collection with the largest row count first (#40958)
issue: #37651
pr: #40297
this PR enable to balance the collection with largest row count first,
to avoid temporary migration of small table data to new nodes during
their onboarding, only to be moved out again after the large table
balance, which would cause unnecessary load.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-03-31 16:14:21 +08:00
wei liu b64bb63e77
enhance: [2.5] Add trigger interval config for auto balance (#39154) (#39918)
issue: #39156
pr: #39154

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-03-27 16:40:23 +08:00
congqixia 709594f158
enhance: [2.5] Use v2 package name for pkg module (#40117)
Cherry-pick from master
pr: #39990
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-23 00:46:01 +08:00
Zhen Ye 95809ca767
enhance: make new go package to manage proto (#39128)
issue: #39095
pr: #39114

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-10 10:53:01 +08:00
wei liu cb0618b2d4
fix: [2.5] Querycoord will trigger unexpected balance task after restart (#38725)
issue: https://github.com/milvus-io/milvus/issues/38606
pr: https://github.com/milvus-io/milvus/pull/38630

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-12-25 16:14:49 +08:00
tinswzy e76802f910
enhance: refine querycoord meta/catalog related interfaces to ensure that each method includes a ctx parameter (#37916)
issue: #35917 
This PR refine the querycoord meta related interfaces to ensure that
each method includes a ctx parameter.

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-11-25 11:14:34 +08:00
wei liu 30a99b66c1
fix: Fix logic dead lock when delegator has high memory usage (#36065)
issue: #36064
when delegator has high memory usage, load l0 segment will failed. and
balance segment task will blocked by load segment task, then delegator
cann't free memory by moving out some segment, causes a logic dead lock.

this PR remove the limit for balance, we permit segment and balance
execute in parallel. which won't cause side effect due to:
1. one segment can only has one task in qc's scheduler, and load/release
task will replace balance task if necessary
2. balance speed has been limited, and it won't block load segment task.

3. if collection has load task and balance task at same time, load task
will be scheduled first due to high proirity.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-09 10:21:06 +08:00
jaime 9630974fbb
enhance: move rocksmq from internal to pkg module (#33881)
issue: #33956

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-06-25 21:18:15 +08:00
wei liu 2013d97243
enhance: Enable to dynamic update balancer policy in querycoord (#33037)
issue: #33036
This PR enable to dynamic update balancer policy without restart
querycoord.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-21 14:29:39 +08:00
wei liu a7f6193bfc
fix: query node may stuck at stopping progress (#33104)
issue: #33103 
when try to do stopping balance for stopping query node, balancer will
try to get node list from replica.GetNodes, then check whether node is
stopping, if so, stopping balance will be triggered for this replica.

after the replica refactor, replica.GetNodes only return rwNodes, and
the stopping node maintains in roNodes, so balancer couldn't find
replica which contains stopping node, and stopping balance for replica
won't be triggered, then query node will stuck forever due to
segment/channel doesn't move out.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-20 10:21:38 +08:00
chyezh 48fe977a9d
enhance: declarative resource group api (#31930)
issue: #30647

- Add declarative resource group api

- Add config for resource group management

- Resource group recovery enhancement

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-04-15 08:13:19 +08:00
chyezh 9f9ef8ac32
enhance: transfer resource group and dbname to querynode when load (#30936)
issue: #30931

Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-21 11:59:12 +08:00
chyezh ff4237bb90
enhance: add hostname into node info (#30673)
issue: https://github.com/milvus-io/milvus/issues/30647

- Address may be reused in k8s environment. Using hostname can be
better.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-15 10:45:06 +08:00
wei liu 6dd7297178
fix: Skip generate balance task when target not ready (#30724)
issue: #30723

This PR skip generate balance task when collection's target isn't ready.
also refine the check stale logic in query coord's scheduler, if channel
exist in current or next target, task won't be canceled.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-02-23 10:32:53 +08:00
SimFG 26f06dd732
Format the code (#27275)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
Enwei Jiao fb0705df1b
Decouple basetable and componentparam (#26725)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-05 10:31:48 +08:00
Bingyi Sun a3e22786ed
Move meta store to kv catalog (#25915)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-07-31 13:57:04 +08:00
yiwangdr 4387f36897
make etcdKV private (#24778)
Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2023-06-13 10:52:38 +08:00
MrPresent-Han b517bc9e6a
refine balance mechanism including:(#23454) (#23763) (#23791)
1. balance granuity to replica to avoid influence unrelated replicas
2. avoid balance back and forth

Signed-off-by: MrPresent-Han <jamesharden11122@gmail.com>
2023-05-04 12:22:40 +08:00