milvus

Commit Graph

Author	SHA1	Message	Date
wei liu	4631657304	fix: Unstable integration case TestBalanceOnSingleReplica (#43552 ) issue: #42930 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-25 10:52:55 +08:00
wei liu	b298218a29	enhance: [2.5] Remove balance constraints between channel and segment tasks (#42410 ) issue: #42176 pr: #42177 Remove the mutual exclusion constraints between channel and segment balance tasks to allow them to run concurrently. Changes include: - Remove permitBalanceChannel() and permitBalanceSegment() methods from RoundRobinBalancer - Update ChannelLevelScoreBalancer, MultiTargetBalancer, RowCountBasedBalancer, and ScoreBasedBalancer to remove constraint checks - Allow segment balance tasks to proceed even when channel balance tasks are running - Update test cases to reflect new behavior where balance tasks no longer block each other - Improve error handling in task executor by preferring serviceable shard leaders for segment release operations - Add fallback logic to find latest shard leader when serviceable leader is not available This change improves the efficiency of load balancing by removing unnecessary coordination overhead between different types of balance operations. Signed-off-by: Wei Liu <wei.liu@zilliz.com> Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-06-03 10:16:32 +08:00
wei liu	4a05180f88	enhance: [2.5] support balancing multiple collections in single trigger (#41875 ) (#42134 ) issue: #41874 pr: #41875 - Optimize balance_checker to support balancing multiple collections simultaneously - Add new parameters for segment and channel balancing batch sizes - Add enableBalanceOnMultipleCollections parameter - Update tests for balance checker This change improves resource utilization by allowing the system to balance multiple collections in a single trigger with configurable batch sizes. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-28 23:18:30 +08:00
SimFG	6e18ededab	fix: [2.5] mockery too unavailable after upgrade golang version (#41522 ) - issue: ##41291 - pr: #41481 Signed-off-by: SimFG <bang.fu@zilliz.com>	2025-04-25 14:40:40 +08:00
liliu-z	cb0f984155	enhance: Revert "separate for index completed (#40873 )" (#41152 ) This reverts commit `23e579e324`. #40873 issue: #39519 Signed-off-by: Li Liu <li.liu@zilliz.com>	2025-04-08 17:36:30 +08:00
Chun Han	23e579e324	separate for index completed (#40873 ) related: https://github.com/milvus-io/milvus/issues/40781 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-04-05 10:20:24 +08:00
congqixia	709594f158	enhance: [2.5] Use v2 package name for pkg module (#40117 ) Cherry-pick from master pr: #39990 Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-23 00:46:01 +08:00
wei liu	bf54f47c34	enhance: [2.5] use rated logger for high frequency log in dist handler (#39452 ) (#39928 ) pr: #39452 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-18 14:32:52 +08:00
wei liu	969e34d540	fix: [2.5]uneven distribution caused by executing task delta cache leak (#39759 ) issue: #39681 pr: #39702 this PR maintain workload effect in action instead of computing workload effect from target, which may cause leak if target changes. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-11 14:32:46 +08:00
wei liu	51994158d9	fix: channel unbalance during stopping balance progress (#38971 ) (#39200 ) issue: #38970 pr: #38971 cause the stopping balance channel still use the row_count_based policy, which may causes channel unbalance in multi-collection case. This PR impl a score based stopping balance channel policy. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-01-14 18:25:00 +08:00
Zhen Ye	95809ca767	enhance: make new go package to manage proto (#39128 ) issue: #39095 pr: #39114 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-10 10:53:01 +08:00
wei liu	cb0618b2d4	fix: [2.5] Querycoord will trigger unexpected balance task after restart (#38725 ) issue: https://github.com/milvus-io/milvus/issues/38606 pr: https://github.com/milvus-io/milvus/pull/38630 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-25 16:14:49 +08:00
wei liu	659847c11f	enhance: Remove load task limit in one round (#38436 ) the task limit in assignSegment/assignChannel will works for both load task and balance task. this PR remove the load task limit, only limit balance task num in one round. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-16 19:30:43 +08:00
wei liu	e279ccf109	enhance: Enable score based balance channel policy (#38143 ) issue: #38142 current balance channel policy only consider current collection's distribution, so if all collections has 1 channel, and all channels has been loaded on same querynode, after querynode num increase, balance channel won't be triggered. This PR enable score based balance channel policy, to achieve: 1. distribute all channels evenly across multiple querynodes 2. distribute each collection's channel evenly across multiple querynodes. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-11 17:20:43 +08:00
tinswzy	e76802f910	enhance: refine querycoord meta/catalog related interfaces to ensure that each method includes a ctx parameter (#37916 ) issue: #35917 This PR refine the querycoord meta related interfaces to ensure that each method includes a ctx parameter. Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-11-25 11:14:34 +08:00
wei liu	965bda6e60	enhance: Add channel name to shard leader log in meta cache (#37856 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-21 19:24:31 +08:00
wei liu	0a440e0d38	fix: Prevent simultaneous balance of segments and channels (#37850 ) issue: #33550 balance segment and balance segment execute at same time, which will cause bounch of corner case. This PR disable simultaneous balance of segments and channels Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-21 17:56:55 +08:00
Bingyi Sun	6851738fd1	fix: fix `make generate-mockery` panic with go1.22 (#36830 ) https://github.com/milvus-io/milvus/issues/36831 Fix `make generate-mockery` panic. Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-10-17 12:11:31 +08:00
congqixia	3fe0f82923	enhance: Add balance report log for qc balancer (#36747 ) Related to #36746 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-10-11 10:25:24 +08:00
wei liu	470bb0cc3f	enhance: Enable balance on querynode with different mem capacity (#36466 ) issue: #36464 This PR enable balance on querynode with different mem capacity, for query node which has more mem capactity will be assigned more records, and query node with the largest difference between assignedScore and currentScore will have a higher priority to carry the new segment. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-30 16:15:17 +08:00
wei liu	30a99b66c1	fix: Fix logic dead lock when delegator has high memory usage (#36065 ) issue: #36064 when delegator has high memory usage, load l0 segment will failed. and balance segment task will blocked by load segment task, then delegator cann't free memory by moving out some segment, causes a logic dead lock. this PR remove the limit for balance, we permit segment and balance execute in parallel. which won't cause side effect due to: 1. one segment can only has one task in qc's scheduler, and load/release task will replace balance task if necessary 2. balance speed has been limited, and it won't block load segment task. 3. if collection has load task and balance task at same time, load task will be scheduled first due to high proirity. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-09 10:21:06 +08:00
wei liu	e09dc3be58	enhance: Mark query node as read only after suspend (#35492 ) issue: #34985 #35493 after querynode has been suspended, it's not allow to load segment/channel on it, which means the node is read only. to be compatible with resource group design, after query node has been suspend, we remove it from it's original resource group, make it a read only query node in replica. then two things will happens: 1. it's original resource group will be lacking of query nodes, query coord will assign new node to it. 2. querycoord will try to move out all segments/channels after querynode has been suspended Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-08-20 14:02:54 +08:00
Bingyi Sun	ae1b81ac1a	fix: fix panic when generating plans (#35309 ) issue: https://github.com/milvus-io/milvus/issues/35335 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-08-07 18:14:16 +08:00
jaime	fcec4c21b9	fix: check collection health(queryable) fail for releasing collection (#34947 ) issue: #34946 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-08-02 17:20:15 +08:00
wei liu	27b6d58981	fix: Set legacy level to l0 segment after qc restart (#35197 ) issue: #35087 after qc restarts, and target is not ready yet, if dist_handler try to update segment dist, it will set legacy level to l0 segment, which may cause l0 segment be moved to other node, cause search/query failed. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-08-02 10:18:13 +08:00
wei liu	e9d61daa3f	enhance: Reduce delegator memory overloaded factor to 0.1 (#35092 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-08-01 10:21:50 +08:00
wei liu	03912a8788	enhance: Avoid balance stuck after segment list become stable (#34728 ) issue: #34715 if collection's segment list doesn't changes anymore, then the next target will be empty at most time, and balance segment will check whether segment exist in both current and next target, so the balance cloud be blocked due to next target is empty. This PR permit segment to be moved if next target is empty, to avoid balance stuck. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-31 18:09:48 +08:00
congqixia	de8a266d8a	enhance: Enable linux code checker (#35084 ) See also #34483 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-30 15:53:51 +08:00
wei liu	166fc902b0	enhance: Limit collection's normal balance speed (#34810 ) issue: #34798 after we remove the task priority on query coord, to avoid load/release segment blocked by too much balance task, we limit the balance task size in each round. at same time, we reduce the balance interval to trigger balance more frequently. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-24 19:11:44 +08:00
wei liu	92de49e38c	fix: Segment may bounce between delegator and worker (#34830 ) issue: #34595 pr#34596 to we add an overloaded factor to segment in delegator, which cause same segment got different score in delegator and worker. which may cause segment bounce between delegator and worker. This PR use average score to compute the delegator overloaded factor, to avoid segment bounce between delegator and worker. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-23 15:57:49 +08:00
wei liu	acb33bba4d	enhance: Preserve fixed-size memory in delegator node for growing segment. (#34596 ) issue: #34595 When consuming insert data on the delegator node, QueryCoord will move out some sealed segments to manage its memory usage. After the growing segment gets flushed, some sealed segments from other workers will be moved back to the delegator node. To avoid the frequent movement of segments, we estimate the maximum growing row count and preserve a fixed-size memory in the delegator node. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-15 20:51:46 +08:00
wei liu	8123bea1ae	enhance: Avoid assign too much segment/channels to new querynode (#34096 ) issue: #34095 When a new query node comes online, the segment_checker, channel_checker, and balance_checker simultaneously attempt to allocate segments to it. If this occurs during the execution of a load task and the distribution of the new query node hasn't been updated, the query coordinator may mistakenly view the new query node as empty. As a result, it assigns segments or channels to it, potentially overloading the new query node with more segments or channels than expected. This PR measures the workload of the executing tasks on the target query node to prevent assigning an excessive number of segments to it. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-06-27 19:06:05 +08:00
jaime	9630974fbb	enhance: move rocksmq from internal to pkg module (#33881 ) issue: #33956 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-06-25 21:18:15 +08:00
yihao.dai	3540eee977	enhance: Support L0 import (#33514 ) issue: https://github.com/milvus-io/milvus/issues/33157 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-06-07 14:17:20 +08:00
wei liu	a7f6193bfc	fix: query node may stuck at stopping progress (#33104 ) issue: #33103 when try to do stopping balance for stopping query node, balancer will try to get node list from replica.GetNodes, then check whether node is stopping, if so, stopping balance will be triggered for this replica. after the replica refactor, replica.GetNodes only return rwNodes, and the stopping node maintains in roNodes, so balancer couldn't find replica which contains stopping node, and stopping balance for replica won't be triggered, then query node will stuck forever due to segment/channel doesn't move out. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-20 10:21:38 +08:00
wei liu	e2332bdc17	enhance: Enable channel exclusive balance policy (#32911 ) issue: #32910 * split replica's node list to channels when create replicas * balance nodes among channels when node change happens * implement channel level balance, let balance happens in channel level Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-10 17:27:31 +08:00
Xiaofan	02ace25c68	enhance: reduce the cpu usage when collection number is high (#32245 ) related to #32165 1. for all the manager, support collection level index 2. remove collection level filter to avoid extra cpu usage when collection number increases Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2024-04-26 11:49:25 +08:00
wei liu	4822b109bd	fix: Skip to load l0 segment on old version query node (#32124 ) issue: #32107 during rolling upgrade progress, skip to load l0 segment on old version query node --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-15 11:23:23 +08:00
chyezh	48fe977a9d	enhance: declarative resource group api (#31930 ) issue: #30647 - Add declarative resource group api - Add config for resource group management - Resource group recovery enhancement --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-15 08:13:19 +08:00
wei liu	c4806b69c4	enhance: Refactor leader view manager interface (#31133 ) issue: #31091 This PR add GetByFilter interface in leader view manager, instead of all kind of get func --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-10 15:13:36 +08:00
chyezh	a2502bde75	enhance: replica manager enhancement (#31496 ) issue: #30647 - ReplicaManager manage read only node now, and always do persistent of node distribution of replica. - All segment/channel checker using ReplicaManager to get read-only node or read-write node, but not ResourceManager. - ReplicaManager promise that only apply unique querynode to one replica in same collection now (replicas in same collection never hold same querynode at same time). - ReplicaManager promise that fairly node count assignment policy if multi replicas of collection is assigned to one resource group. - Move some parameters check into ReplicaManager to avoid data race. - Allow transfer replica to resource group that already load replica of same collection - Allow transfer node between resource groups that load replica of same collection --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-05 04:57:16 +08:00
wei liu	0944a1f790	enhance: Refactor channel dist manager interface (#31119 ) issue: #31091 This PR add GetByFilter interface in channel dist manager, instead of all kind of get func --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-02 10:23:14 +08:00
wei liu	92971707de	enhance: Add restful api for devops to execute rolling upgrade (#29998 ) issue: #29261 This PR Add restful api for devops to execute rolling upgrade, including suspend/resume balance and manual transfer segments/channels. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-27 16:15:19 +08:00
chyezh	9f9ef8ac32	enhance: transfer resource group and dbname to querynode when load (#30936 ) issue: #30931 Signed-off-by: chyezh <chyezh@outlook.com>	2024-03-21 11:59:12 +08:00
chyezh	ff4237bb90	enhance: add hostname into node info (#30673 ) issue: https://github.com/milvus-io/milvus/issues/30647 - Address may be reused in k8s environment. Using hostname can be better. Signed-off-by: chyezh <chyezh@outlook.com>	2024-03-15 10:45:06 +08:00
wei liu	06b191b164	fix: Balance channel stuck forever due to logic dead lock (#31202 ) issue: #30816 cause balance channel will stuck until leader view catch up the current target, then start to unsub the old delegator. which make sure that the new delegator can provide search before release old delegator. but another logic in segment_checker skip loading segment during balance channel. so during balance channel, if query node crash, new delegator can't catch up target forever, then stuck forever. This PR remove the rule that skip loading segment during balance channel to avoid the logic dead lock here. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-13 15:05:04 +08:00
wei liu	06df9b8462	fix: Balance segment/channel won't be trigger on multi replicas (#31107 ) issue: #30983 #30982 cause balancer call wrong interface to get segment/channel list in replica, then got a wrong average segment/channel number, which make each node have less segment/channel than average, and the balance won't be trigger in multi replica case. This PR fix that balance segment/channel won't be trigger on multi replicas Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-11 20:35:04 +08:00
wei liu	efe8cecc88	enhance: refactor segment dist manager interface (#31073 ) issue: #31091 This PR add `GetByFilter` interface in segment dist manager, instead of all kind of get func Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-08 16:29:01 +08:00
Bingyi Sun	ece9d273a7	enhance: some patches for #30636 (#30664 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-02-26 11:42:55 +08:00
Bingyi Sun	564b12c661	enhance: make balance cost threshold configurable (#30636 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-02-19 15:24:50 +08:00

1 2 3

111 Commits (2.5)