issue: #42176
pr: #42177
Remove the mutual exclusion constraints between channel and segment
balance tasks to allow them to run concurrently.
Changes include:
- Remove permitBalanceChannel() and permitBalanceSegment() methods from
RoundRobinBalancer
- Update ChannelLevelScoreBalancer, MultiTargetBalancer,
RowCountBasedBalancer, and ScoreBasedBalancer to remove constraint
checks
- Allow segment balance tasks to proceed even when channel balance tasks
are running
- Update test cases to reflect new behavior where balance tasks no
longer block each other
- Improve error handling in task executor by preferring serviceable
shard leaders for segment release operations
- Add fallback logic to find latest shard leader when serviceable leader
is not available
This change improves the efficiency of load balancing by removing
unnecessary coordination overhead between different types of balance
operations.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #41874
pr: #41875
- Optimize balance_checker to support balancing multiple collections
simultaneously
- Add new parameters for segment and channel balancing batch sizes
- Add enableBalanceOnMultipleCollections parameter
- Update tests for balance checker
This change improves resource utilization by allowing the system to
balance multiple collections in a single trigger with configurable batch
sizes.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #38970
pr: #38971
cause the stopping balance channel still use the row_count_based policy,
which may causes channel unbalance in multi-collection case.
This PR impl a score based stopping balance channel policy.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
the task limit in assignSegment/assignChannel will works for both load
task and balance task.
this PR remove the load task limit, only limit balance task num in one
round.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #38142
current balance channel policy only consider current collection's
distribution, so if all collections has 1 channel, and all channels has
been loaded on same querynode, after querynode num increase, balance
channel won't be triggered.
This PR enable score based balance channel policy, to achieve:
1. distribute all channels evenly across multiple querynodes
2. distribute each collection's channel evenly across multiple
querynodes.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #35917
This PR refine the querycoord meta related interfaces to ensure that
each method includes a ctx parameter.
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
issue: #33550
balance segment and balance segment execute at same time, which will
cause bounch of corner case.
This PR disable simultaneous balance of segments and channels
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #36464
This PR enable balance on querynode with different mem capacity, for
query node which has more mem capactity will be assigned more records,
and query node with the largest difference between assignedScore and
currentScore will have a higher priority to carry the new segment.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #35087
after qc restarts, and target is not ready yet, if dist_handler try to
update segment dist, it will set legacy level to l0 segment, which may
cause l0 segment be moved to other node, cause search/query failed.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #34715
if collection's segment list doesn't changes anymore, then the next
target will be empty at most time, and balance segment will check
whether segment exist in both current and next target, so the balance
cloud be blocked due to next target is empty.
This PR permit segment to be moved if next target is empty, to avoid
balance stuck.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #34798
after we remove the task priority on query coord, to avoid load/release
segment blocked by too much balance task, we limit the balance task size
in each round. at same time, we reduce the balance interval to trigger
balance more frequently.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #34095
When a new query node comes online, the segment_checker,
channel_checker, and balance_checker simultaneously attempt to allocate
segments to it. If this occurs during the execution of a load task and
the distribution of the new query node hasn't been updated, the query
coordinator may mistakenly view the new query node as empty. As a
result, it assigns segments or channels to it, potentially overloading
the new query node with more segments or channels than expected.
This PR measures the workload of the executing tasks on the target query
node to prevent assigning an excessive number of segments to it.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #33103
when try to do stopping balance for stopping query node, balancer will
try to get node list from replica.GetNodes, then check whether node is
stopping, if so, stopping balance will be triggered for this replica.
after the replica refactor, replica.GetNodes only return rwNodes, and
the stopping node maintains in roNodes, so balancer couldn't find
replica which contains stopping node, and stopping balance for replica
won't be triggered, then query node will stuck forever due to
segment/channel doesn't move out.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
related to #32165
1. for all the manager, support collection level index
2. remove collection level filter to avoid extra cpu usage when
collection number increases
Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
issue: #31091
This PR add GetByFilter interface in leader view manager, instead of all
kind of get func
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30647
- ReplicaManager manage read only node now, and always do persistent of
node distribution of replica.
- All segment/channel checker using ReplicaManager to get read-only node
or read-write node, but not ResourceManager.
- ReplicaManager promise that only apply unique querynode to one replica
in same collection now (replicas in same collection never hold same
querynode at same time).
- ReplicaManager promise that fairly node count assignment policy if
multi replicas of collection is assigned to one resource group.
- Move some parameters check into ReplicaManager to avoid data race.
- Allow transfer replica to resource group that already load replica of
same collection
- Allow transfer node between resource groups that load replica of same
collection
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #31091
This PR add GetByFilter interface in channel dist manager, instead of
all kind of get func
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #29261
This PR Add restful api for devops to execute rolling upgrade, including
suspend/resume balance and manual transfer segments/channels.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30816
cause balance channel will stuck until leader view catch up the current
target, then start to unsub the old delegator. which make sure that the
new delegator can provide search before release old delegator. but
another logic in segment_checker skip loading segment during balance
channel. so during balance channel, if query node crash, new delegator
can't catch up target forever, then stuck forever.
This PR remove the rule that skip loading segment during balance channel
to avoid the logic dead lock here.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30983#30982
cause balancer call wrong interface to get segment/channel list in
replica, then got a wrong average segment/channel number, which make
each node have less segment/channel than average, and the balance won't
be trigger in multi replica case.
This PR fix that balance segment/channel won't be trigger on multi
replicas
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #28622
After we support balance segment with growing segment count #28623, if
we balance segment and channel at same time, some segments need to be
rebalanced after balance channel finish.
This PR skip balance segment when channel need be balanced.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #23726
/kind improvement
1. enable auto balance channel between nodes in querycoord
2. make `genSegmentPlan` reuse the `AssignSegment` logic
3. make `genChannelPlan` reuse the `AssignChannel` logic
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #28622
query node with delegator will has more rows than other query node due
to delgator loads all growing rows.
This PR enable the balance segment which based on the num of growing
rows in leader view.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
1. balance granuity to replica to avoid influence unrelated replicas
2. avoid balance back and forth
Signed-off-by: MrPresent-Han <jamesharden11122@gmail.com>