milvus

Commit Graph

Author	SHA1	Message	Date
wei liu	75463725b3	fix: skip loading non-existent L0 segments to prevent load blocking (#43576 ) issue: #43557 In 2.5 branch, L0 segments must be loaded before other segments. If an L0 segment has been garbage collected but is still in the target list, the load operation would keep failing, preventing other segments from being loaded. This patch adds a segment existence check for L0 segments in getSealedSegmentDiff. Only L0 segments that actually exist will be included in the load list. Changes: - Add checkSegmentExist function parameter to SegmentChecker constructor - Filter L0 segments by existence check in getSealedSegmentDiff - Add unit tests using mockey to verify the fix behavior Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-31 14:33:38 +08:00
congqixia	3d58b2ecee	fix: [2.5] Make controller wait checker worker quit (#42704 ) (#42726 ) Cherry-pick from master pr: #42704 Related to #42702 This patch add wait logic for `CheckerController` Nil check already exists due to code branching Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-16 15:14:38 +08:00
wei liu	d2ff390a52	fix: Segment may be released prematurely during balance channel (#42043 ) issue: #41143 pr: #42090 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-29 18:36:35 +08:00
wei liu	4a05180f88	enhance: [2.5] support balancing multiple collections in single trigger (#41875 ) (#42134 ) issue: #41874 pr: #41875 - Optimize balance_checker to support balancing multiple collections simultaneously - Add new parameters for segment and channel balancing batch sizes - Add enableBalanceOnMultipleCollections parameter - Update tests for balance checker This change improves resource utilization by allowing the system to balance multiple collections in a single trigger with configurable batch sizes. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-28 23:18:30 +08:00
wei liu	2e8445c2ef	fix: balance checker may enter infinite normal balance loop after balance suspension (#41196 ) issue: #41194 pr: #41195 - Refactor hasUnbalancedCollection flag handling to function scope - Ensure tracking sets clearance when no balance needed - Add deferred cleanup for both normal/stopping balance paths - Add unit tests for collection tracking scenarios The changes ensure tracking sets (normalBalanceCollectionsCurrentRound and stoppingBalanceCollectionsCurrentRound) are properly cleared when: - All collections in current round are balanced - Balance checks return early due to unready targets - Balance feature flags are disabled Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-04-10 15:18:28 +08:00
liliu-z	cb0f984155	enhance: Revert "separate for index completed (#40873 )" (#41152 ) This reverts commit `23e579e324`. #40873 issue: #39519 Signed-off-by: Li Liu <li.liu@zilliz.com>	2025-04-08 17:36:30 +08:00
Chun Han	23e579e324	separate for index completed (#40873 ) related: https://github.com/milvus-io/milvus/issues/40781 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-04-05 10:20:24 +08:00
wei liu	37a533fe6d	fix: [2.5] Address manual balance and balance check issues (#41038 ) issue: #37651 pr: #41037 - Fix context propagation for manual balance segment task creation from PR #38080. - Optimize stopping balance by preventing redundant checks per round, addressing performance regression from PR #40297. - Decrease default `checkBalanceInterval` from 3000ms to 300ms. - Correct minor log messages in `BalanceChecker`. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-04-03 01:26:23 +08:00
Xianhui Lin	249d5b9b41	fix: jsonstats check if cache schema is nil lazy describecollection (#41068 ) fix: jsonstats check if cache schema is nil lazy describecollection pr:https://github.com/milvus-io/milvus/pull/38039 issue:https://github.com/milvus-io/milvus/issues/36995 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-04-03 00:32:21 +08:00
wei liu	d185a8f941	enhance: Balance the collection with the largest row count first (#40958 ) issue: #37651 pr: #40297 this PR enable to balance the collection with largest row count first, to avoid temporary migration of small table data to new nodes during their onboarding, only to be moved out again after the large table balance, which would cause unnecessary load. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-03-31 16:14:21 +08:00
wei liu	b64bb63e77	enhance: [2.5] Add trigger interval config for auto balance (#39154 ) (#39918 ) issue: #39156 pr: #39154 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-03-27 16:40:23 +08:00
Xianhui Lin	8bdff401a3	fix: fix indexchecker schema released (#40809 ) pr:https://github.com/milvus-io/milvus/pull/38039 issue:https://github.com/milvus-io/milvus/issues/36995 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-03-20 18:05:22 +08:00
Xianhui Lin	705b3c90a5	fix: Failed to rolling upgrade from v2.5.6 to new 2.5 version when enable JsonKeyStats (#40661 ) fix: Failed to rolling upgrade from v2.5.6 to new 2.5 version when enable JsonKeyStats.The reason is that the file path of the jsonkeyindex has changed. issue: https://github.com/milvus-io/milvus/issues/40649 ，https://github.com/milvus-io/milvus/issues/40669 https://github.com/milvus-io/milvus/issues/40707 master-pr: https://github.com/milvus-io/milvus/pull/38039 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-03-18 17:32:16 +08:00
Xianhui Lin	f5e9dea2aa	fix: [2.5]fix the garbage cleanup logic of jsonkey stats && improve json key stats filer (#40039 ) fix: fix the garbage collection cleanup logic of jsonkey stats && improve json key stats filer issue: https://github.com/milvus-io/milvus/issues/36995 https://github.com/milvus-io/milvus/issues/40034 https://github.com/milvus-io/milvus/issues/40041 https://github.com/milvus-io/milvus/issues/40106 https://github.com/milvus-io/milvus/issues/40138 pr: https://github.com/milvus-io/milvus/pull/38039 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-03-13 20:18:10 +08:00
Bingyi Sun	683b26ffb7	feat: cherry pick json path index (#40313 ) issue: #35528 pr: #36750 this pr includes json path index pr and some related prs: 1. update tantivy version #39253 2. json path index #36750 3. fall back to brute force #40076 4. term filter #40140 5. bug fix #40336 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-10 22:14:05 +08:00
Xianhui Lin	a4eb2ce224	fix: [2.5]Revert qc statschecker for json key stats (#40125 ) Revert qc statschecker for json key stats issue:https://github.com/milvus-io/milvus/issues/36995 pr:https://github.com/milvus-io/milvus/pull/39876 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-02-24 13:31:55 +08:00
congqixia	709594f158	enhance: [2.5] Use v2 package name for pkg module (#40117 ) Cherry-pick from master pr: #39990 Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-23 00:46:01 +08:00
Xianhui Lin	c1de61ff7c	fix: [2.5]Replace the position of EnabledJSONKeyStats (#40108 ) Replace the position of EnabledJSONKeyStats issue: https://github.com/milvus-io/milvus/issues/36995 pr: https://github.com/milvus-io/milvus/pull/38039 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-02-22 14:35:54 +08:00
wei liu	e42c944e04	fix: [2.5] querycoord panic in cornor case (#40058 ) issue: #40050 pr: #40057 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-21 11:19:58 +08:00
Xianhui Lin	f0964f769d	enhance: [2.5]Add json key inverted index in stats for optimization (#39876 ) Add json key inverted index in stats for optimization issue: https://github.com/milvus-io/milvus/issues/36995 pr: https://github.com/milvus-io/milvus/pull/38039 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-02-16 20:12:15 +08:00
wei liu	969e34d540	fix: [2.5]uneven distribution caused by executing task delta cache leak (#39759 ) issue: #39681 pr: #39702 this PR maintain workload effect in action instead of computing workload effect from target, which may cause leak if target changes. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-11 14:32:46 +08:00
wei liu	4fd56e4773	fix: Prevent leader checker from generating excessive duplicate leader tasks (#39000 ) (#39160 ) issue: #39001 pr: #39000 Background: Segment Load Version: Each segment load request assigns a timestamp as its version. When multiple copies of a segment are loaded on different QueryNodes, the leader checker uses this version to identify the latest copy and updates the routing table in the leader view to point to it. Delegator Router Version: When a delegator builds a route to a QueryNode that has loaded a segment, it also records the segment's version. Router Table Update Logic: If the leader checker detects that the version of a segment in the routing table does not match the version in the worker, it updates the routing table to point to the QueryNode with the latest version. Additionally, it updates the segment's load version in the QueryNode during this process. Issue: When a channel is undergoing load balancing, the leader checker may sync the routing table to a new delegator. This sync operation modifies the segment's load version, which invalidates the routing in the old delegator. Subsequently, the leader checker updates the routing table in the old delegator, breaking the routing in the new delegator. This cycle continues, causing repeated updates and inconsistencies. Fix: This PR introduces two changes to address the issue: 1. Use NodeID to verify whether the delegator's routing table needs an update, avoiding unnecessary modifications. 2. Ensure compatibility by using the latest segment's load version as the version recorded in the routing table. These changes resolve the cyclic updates and prevent the leader checker from generating excessive duplicate tasks, ensuring routing stability across delegators during load balancing. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-01-14 18:11:06 +08:00
Zhen Ye	95809ca767	enhance: make new go package to manage proto (#39128 ) issue: #39095 pr: #39114 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-10 10:53:01 +08:00
wei liu	cb0618b2d4	fix: [2.5] Querycoord will trigger unexpected balance task after restart (#38725 ) issue: https://github.com/milvus-io/milvus/issues/38606 pr: https://github.com/milvus-io/milvus/pull/38630 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-25 16:14:49 +08:00
wei liu	659847c11f	enhance: Remove load task limit in one round (#38436 ) the task limit in assignSegment/assignChannel will works for both load task and balance task. this PR remove the load task limit, only limit balance task num in one round. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-16 19:30:43 +08:00
tinswzy	27229f7907	enhance: refine exists log print with ctx (#38080 ) issue: #35917 Refines exists log print with ctx Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-12-14 22:36:44 +08:00
wei liu	e279ccf109	enhance: Enable score based balance channel policy (#38143 ) issue: #38142 current balance channel policy only consider current collection's distribution, so if all collections has 1 channel, and all channels has been loaded on same querynode, after querynode num increase, balance channel won't be triggered. This PR enable score based balance channel policy, to achieve: 1. distribute all channels evenly across multiple querynodes 2. distribute each collection's channel evenly across multiple querynodes. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-11 17:20:43 +08:00
tinswzy	e76802f910	enhance: refine querycoord meta/catalog related interfaces to ensure that each method includes a ctx parameter (#37916 ) issue: #35917 This PR refine the querycoord meta related interfaces to ensure that each method includes a ctx parameter. Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-11-25 11:14:34 +08:00
yihao.dai	b6612e02b4	enhance: Reduce GetIndexInfos calls (#37695 ) Batch `GetIndexInfos` calls for segments to reduce RPC calls. issue: https://github.com/milvus-io/milvus/issues/37634 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-11-19 14:24:31 +08:00
congqixia	9539739781	enhance: Release compacted growing segment if in dropped list (#37245 ) See also #37205 Previously releasing growing segments could be triggered by two conditions: - Sealed Segment with same id is loaded - Segment start position is before target checkpoint ts Which has a worst case that the corresponding sealed segment is compacted and the checkpoint is pinned by a growing l0 segment. This PR introduces a new rule that: a growing segment could be released if the segment id appeared in current target dropped segment id list. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-10-29 18:04:21 +08:00
jaime	9d16b972ea	feat: add tasks page into management WebUI (#37002 ) issue: #36621 1. Add API to access task runtime metrics, including: - build index task - compaction task - import task - balance (including load/release of segments/channels and some leader tasks on querycoord) - sync task 2. Add a debug model to the webpage by using debug=true or debug=false in the URL query parameters to enable or disable debug mode. Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-10-28 10:13:29 +08:00
wei liu	39a91eb100	fix: Delegator may becomes unserviceable after querycoord restart (#37055 ) issue: #37054 after querycoord restart, segment_checker may release segment by mistake due to next target isn't ready yet. This PR requires release segment must happens after next target is ready. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-10-24 12:21:28 +08:00
wei liu	3cd0b26285	enhance: Enable dynamic update loaded collection's replica (#35822 ) issue: #35821 After collection loaded, if we need to increase/decrease collection's replica, we need to release and load it again. milvus offers 4 solution to update loaded collection's replica, this PR aims to dynamic change the replica number without release, and after replica number changed, milvus will execute load replica or release replica in async, and the replica loaded status can be checked by getReplicas API. Notice that if set too much replicas than querynode can afford，the new replica won't be loaded successfully until enough querynode joins. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-25 10:13:18 +08:00
wei liu	fb2a41a94c	fix: Clean dirty segment/channel on querynode (#36202 ) issue: #36201 after querynode has been remove from replica, all dirty segment/channel on it should be released. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-13 18:15:08 +08:00
wei liu	30a99b66c1	fix: Fix logic dead lock when delegator has high memory usage (#36065 ) issue: #36064 when delegator has high memory usage, load l0 segment will failed. and balance segment task will blocked by load segment task, then delegator cann't free memory by moving out some segment, causes a logic dead lock. this PR remove the limit for balance, we permit segment and balance execute in parallel. which won't cause side effect due to: 1. one segment can only has one task in qc's scheduler, and load/release task will replace balance task if necessary 2. balance speed has been limited, and it won't block load segment task. 3. if collection has load task and balance task at same time, load task will be scheduled first due to high proirity. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-09 10:21:06 +08:00
wei liu	c84ea5465c	fix: Fix some replicas don't participate in the query after the failure recovery (#35850 ) issue: #35846 querycoord will notify proxy to update shard leader cache after delegator location changes, but during querynode's failure recovery, some delegator may become unserviceable due to lacking of segments, and back to serviceable after segment loaded, so we also need to notify proxy to invalidate shard leader cache when delegator serviceable state changes. This PR will maintain querynode's serviceable state during heartbeat, and notify proxy to invalidate shard leader cache if serviceable state changes. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-03 15:39:03 +08:00
congqixia	86691656f3	enhance: Change frequent balancer debug log to rated one (#35749 ) "skip balance" log is too frequent in debug level. This PR changes it into rated on. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-08-29 10:07:00 +08:00
Chun Han	3faef63a25	enhance: add log for partition stats( #30376 ) (#35219 ) related: #30376 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2024-08-02 19:34:22 +08:00
wei liu	166fc902b0	enhance: Limit collection's normal balance speed (#34810 ) issue: #34798 after we remove the task priority on query coord, to avoid load/release segment blocked by too much balance task, we limit the balance task size in each round. at same time, we reduce the balance interval to trigger balance more frequently. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-24 19:11:44 +08:00
wei liu	40e39ef7c9	fix: Avoid segment lack caused by deduplicate segment task (#34782 ) issue: #34781 when balance segment hasn't finished yet, query coord may found 2 loaded copy of segment, then it will generate task to deduplicate, which may cancel the balance task. then the old copy has been released, and the new copy hasn't be ready yet but canceled, then search failed by segment lack. this PR set deduplicate segment task's proirity to low, to avoid balance segment task canceled by deduplicate task. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-22 16:35:43 +08:00
congqixia	b284b81a47	fix: Check partition in current target when observing partition load status (#34282 ) See also #34234 `LoadPartitions` does not guarantee the current target has loading partitions if there are some partitions already loaded before. This PR check current target contains the partition to load when advancing loading percentage to 100. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-01 17:40:07 +08:00
wei liu	f7ecafe77d	enhance: Skip update index for L0 segment (#34099 ) try to update index for l0 segment, will failed by `index not found` This PR skip update index for l0 segment Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-01 10:26:06 +08:00
jaime	9630974fbb	enhance: move rocksmq from internal to pkg module (#33881 ) issue: #33956 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-06-25 21:18:15 +08:00
Chun Han	f7af323d1e	fix: sync partitiion stats blocking balance task(#33741 ) (#33742 ) related: #33741 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-06-11 14:21:56 +08:00
wayblink	a1232fafda	feat: Major compaction (#33620 ) #30633 Signed-off-by: wayblink <anyang.wang@zilliz.com> Co-authored-by: MrPresent-Han <chun.han@zilliz.com>	2024-06-10 21:34:08 +08:00
wei liu	2013d97243	enhance: Enable to dynamic update balancer policy in querycoord (#33037 ) issue: #33036 This PR enable to dynamic update balancer policy without restart querycoord. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-21 14:29:39 +08:00
wei liu	a7f6193bfc	fix: query node may stuck at stopping progress (#33104 ) issue: #33103 when try to do stopping balance for stopping query node, balancer will try to get node list from replica.GetNodes, then check whether node is stopping, if so, stopping balance will be triggered for this replica. after the replica refactor, replica.GetNodes only return rwNodes, and the stopping node maintains in roNodes, so balancer couldn't find replica which contains stopping node, and stopping balance for replica won't be triggered, then query node will stuck forever due to segment/channel doesn't move out. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-20 10:21:38 +08:00
wei liu	e2332bdc17	enhance: Enable channel exclusive balance policy (#32911 ) issue: #32910 * split replica's node list to channels when create replicas * balance nodes among channels when node change happens * implement channel level balance, let balance happens in channel level Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-10 17:27:31 +08:00
wei liu	fad8f0afa5	enhance: enable stopping balance after balance has been suspended (#32812 ) issue: #32811 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-08 10:15:29 +08:00
wei liu	ba02d54a30	enhance: update shard leader cache when leader location changed (#32470 ) issue: #32466 this PR enhance that when shard location changed, update proxy's shard leader cache. in case of query node failover case, proxy can find replica recover --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-08 10:05:29 +08:00

1 2 3

147 Commits (2.5)