milvus

Commit Graph

Author	SHA1	Message	Date
wei liu	39754af727	fix: Fix L0 segment loading delegator selection in QueryCoord (#43795 ) issue: #43794 Fix the issue where L0 segments were not correctly selecting appropriate delegators during loading, which could cause load failures or incorrect delegator assignments. Changes include: - Add special handling for L0 segments in delegator selection logic - Find delegators that are missing the L0 segment for direct loading - Fallback to existing serviceable delegator selection when no suitable delegator is found for L0 segments - Add comprehensive test coverage for L0 segment loading scenarios - Test delegator selection when some delegators are missing segments - Test fallback behavior when all delegators already have the segment - Test error handling when no delegators are available Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-08-19 16:35:47 +08:00
wei liu	80d1ef74ce	fix: apply load config changes failed after restart (#43555 ) issue: #43107 pr: #43554 --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-08-01 20:17:37 +08:00
wei liu	75463725b3	fix: skip loading non-existent L0 segments to prevent load blocking (#43576 ) issue: #43557 In 2.5 branch, L0 segments must be loaded before other segments. If an L0 segment has been garbage collected but is still in the target list, the load operation would keep failing, preventing other segments from being loaded. This patch adds a segment existence check for L0 segments in getSealedSegmentDiff. Only L0 segments that actually exist will be included in the load list. Changes: - Add checkSegmentExist function parameter to SegmentChecker constructor - Filter L0 segments by existence check in getSealedSegmentDiff - Add unit tests using mockey to verify the fix behavior Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-31 14:33:38 +08:00
wei liu	4631657304	fix: Unstable integration case TestBalanceOnSingleReplica (#43552 ) issue: #42930 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-25 10:52:55 +08:00
wei liu	ad0bf9cad8	enhance: Optimize channel node balancing for uneven QN distribution (#42786 ) (#43423 ) issue: #42860 pr: #42786 Fix channel node allocation when QueryNode count is not a multiple of channel count. The previous algorithm used simple division which caused uneven distribution with remainders. Key improvements: - Implement smart remainder distribution algorithm - Refactor large function into focused helper functions - Support two-phase rebalancing (release then allocate) - Handle edge cases like insufficient nodes gracefully --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-21 17:04:54 +08:00
wei liu	b08d9efe69	fix: Prevent delegator unserviceable due to shard leader change (#42689 ) (#43309 ) issue: #42098 #42404 pr: #42689 Fix critical issue where concurrent balance segment and balance channel operations cause delegator view inconsistency. When shard leader switches between load and release phases of segment balance, it results in loading segments on old delegator but releasing on new delegator, making the new delegator unserviceable. The root cause is that balance segment modifies delegator views, and if these modifications happen on different delegators due to leader change, it corrupts the delegator state and affects query availability. Changes include: - Add shardLeaderID field to SegmentTask to track delegator for load - Record shard leader ID during segment loading in move operations - Skip release if shard leader changed from the one used for loading - Add comprehensive unit tests for leader change scenarios This ensures balance segment operations are atomic on single delegator, preventing view corruption and maintaining delegator serviceability. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-15 17:46:51 +08:00
wei liu	4952b8c416	enhance: apply load config changes after QueryCoord restart (#43108 ) (#43236 ) issue: #43107 pr: #43108 - Add checkLoadConfigChanges() to apply load config during startup - Call config check in startQueryCoord() after restart - Skip auto-updates for collections with user-specified replica numbers - Add is_user_specified_replica_mode field to preserve user settings - Add comprehensive unit tests with mockey Ensures existing collections use latest cluster-level config after restart. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-14 10:22:50 +08:00
congqixia	2531ebda27	fix: [2.5] Check field mmap property before apply collection level one (#43091 ) Cherry-pick from master pr: #43090 Related to #43089 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-03 14:32:45 +08:00
congqixia	3d58b2ecee	fix: [2.5] Make controller wait checker worker quit (#42704 ) (#42726 ) Cherry-pick from master pr: #42704 Related to #42702 This patch add wait logic for `CheckerController` Nil check already exists due to code branching Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-16 15:14:38 +08:00
Zhen Ye	edca441eae	fix: filter the streaming query node from resource group when upgrading (#42594 ) issue: #42492 pr: #38677 - filter the streaming query node out from 2.6.0, avoid to load sealed segment on streaming query node. Signed-off-by: chyezh <chyezh@outlook.com>	2025-06-09 22:10:35 +08:00
wei liu	f06de7eca6	fix: Fix delegator selection logic in releaseSegment (#42572 ) issue: #42568 Fix incorrect delegator selection during segment release process which introduced by pr #42410 - Add serviceable filter to prioritize available shard leaders - Fix fallback logic with channel-specific lookup - Add early return when no leader found - Add comprehensive unit tests for all scenarios Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-06-06 19:24:33 +08:00
Xianhui Lin	a1927e22a5	fix: add ShowLoadCollections and ShowLoadPartitions for compatibale mixcoord (#42514 ) fix: add ShowLoadCollections and ShowLoadPartitions for compatibale mixcoord issue:https://github.com/milvus-io/milvus/issues/42492 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-06-05 15:46:33 +08:00
wei liu	b298218a29	enhance: [2.5] Remove balance constraints between channel and segment tasks (#42410 ) issue: #42176 pr: #42177 Remove the mutual exclusion constraints between channel and segment balance tasks to allow them to run concurrently. Changes include: - Remove permitBalanceChannel() and permitBalanceSegment() methods from RoundRobinBalancer - Update ChannelLevelScoreBalancer, MultiTargetBalancer, RowCountBasedBalancer, and ScoreBasedBalancer to remove constraint checks - Allow segment balance tasks to proceed even when channel balance tasks are running - Update test cases to reflect new behavior where balance tasks no longer block each other - Improve error handling in task executor by preferring serviceable shard leaders for segment release operations - Add fallback logic to find latest shard leader when serviceable leader is not available This change improves the efficiency of load balancing by removing unnecessary coordination overhead between different types of balance operations. Signed-off-by: Wei Liu <wei.liu@zilliz.com> Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-06-03 10:16:32 +08:00
wei liu	d2ff390a52	fix: Segment may be released prematurely during balance channel (#42043 ) issue: #41143 pr: #42090 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-29 18:36:35 +08:00
aoiasd	198ff1f150	enhance: [2.5] support run analyzer by loaded collection field (#42119 ) relate: https://github.com/milvus-io/milvus/issues/42094 pr: https://github.com/milvus-io/milvus/pull/42113 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-05-29 10:26:30 +08:00
wei liu	4a05180f88	enhance: [2.5] support balancing multiple collections in single trigger (#41875 ) (#42134 ) issue: #41874 pr: #41875 - Optimize balance_checker to support balancing multiple collections simultaneously - Add new parameters for segment and channel balancing batch sizes - Add enableBalanceOnMultipleCollections parameter - Update tests for balance checker This change improves resource utilization by allowing the system to balance multiple collections in a single trigger with configurable batch sizes. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-28 23:18:30 +08:00
yihao.dai	7c8370ccd2	fix: [2.5] Fix ants.Pool goroutine leak (#41893 ) 1. Release the pool after it is no longer in use. 2. Upgrade ants.Pool to fix the goroutine leak issue (see https://github.com/panjf2000/ants/pull/287). issue: https://github.com/milvus-io/milvus/issues/41838 pr: https://github.com/milvus-io/milvus/pull/41892 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-05-16 19:12:22 +08:00
SimFG	6e18ededab	fix: [2.5] mockery too unavailable after upgrade golang version (#41522 ) - issue: ##41291 - pr: #41481 Signed-off-by: SimFG <bang.fu@zilliz.com>	2025-04-25 14:40:40 +08:00
SimFG	18eb627533	fix: [2.5] Update logging context and upgrade dependencies (#41319 ) - issue: #41291 - pr: #41318 --------- Signed-off-by: SimFG <bang.fu@zilliz.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2025-04-24 23:50:40 +08:00
wei liu	2e8445c2ef	fix: balance checker may enter infinite normal balance loop after balance suspension (#41196 ) issue: #41194 pr: #41195 - Refactor hasUnbalancedCollection flag handling to function scope - Ensure tracking sets clearance when no balance needed - Add deferred cleanup for both normal/stopping balance paths - Add unit tests for collection tracking scenarios The changes ensure tracking sets (normalBalanceCollectionsCurrentRound and stoppingBalanceCollectionsCurrentRound) are properly cleared when: - All collections in current round are balanced - Balance checks return early due to unready targets - Balance feature flags are disabled Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-04-10 15:18:28 +08:00
liliu-z	cb0f984155	enhance: Revert "separate for index completed (#40873 )" (#41152 ) This reverts commit `23e579e324`. #40873 issue: #39519 Signed-off-by: Li Liu <li.liu@zilliz.com>	2025-04-08 17:36:30 +08:00
Chun Han	23e579e324	separate for index completed (#40873 ) related: https://github.com/milvus-io/milvus/issues/40781 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-04-05 10:20:24 +08:00
wei liu	37a533fe6d	fix: [2.5] Address manual balance and balance check issues (#41038 ) issue: #37651 pr: #41037 - Fix context propagation for manual balance segment task creation from PR #38080. - Optimize stopping balance by preventing redundant checks per round, addressing performance regression from PR #40297. - Decrease default `checkBalanceInterval` from 3000ms to 300ms. - Correct minor log messages in `BalanceChecker`. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-04-03 01:26:23 +08:00
Xianhui Lin	249d5b9b41	fix: jsonstats check if cache schema is nil lazy describecollection (#41068 ) fix: jsonstats check if cache schema is nil lazy describecollection pr:https://github.com/milvus-io/milvus/pull/38039 issue:https://github.com/milvus-io/milvus/issues/36995 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-04-03 00:32:21 +08:00
wei liu	d185a8f941	enhance: Balance the collection with the largest row count first (#40958 ) issue: #37651 pr: #40297 this PR enable to balance the collection with largest row count first, to avoid temporary migration of small table data to new nodes during their onboarding, only to be moved out again after the large table balance, which would cause unnecessary load. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-03-31 16:14:21 +08:00
wei liu	b64bb63e77	enhance: [2.5] Add trigger interval config for auto balance (#39154 ) (#39918 ) issue: #39156 pr: #39154 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-03-27 16:40:23 +08:00
Xianhui Lin	8bdff401a3	fix: fix indexchecker schema released (#40809 ) pr:https://github.com/milvus-io/milvus/pull/38039 issue:https://github.com/milvus-io/milvus/issues/36995 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-03-20 18:05:22 +08:00
Xianhui Lin	705b3c90a5	fix: Failed to rolling upgrade from v2.5.6 to new 2.5 version when enable JsonKeyStats (#40661 ) fix: Failed to rolling upgrade from v2.5.6 to new 2.5 version when enable JsonKeyStats.The reason is that the file path of the jsonkeyindex has changed. issue: https://github.com/milvus-io/milvus/issues/40649 ，https://github.com/milvus-io/milvus/issues/40669 https://github.com/milvus-io/milvus/issues/40707 master-pr: https://github.com/milvus-io/milvus/pull/38039 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-03-18 17:32:16 +08:00
Xianhui Lin	f5e9dea2aa	fix: [2.5]fix the garbage cleanup logic of jsonkey stats && improve json key stats filer (#40039 ) fix: fix the garbage collection cleanup logic of jsonkey stats && improve json key stats filer issue: https://github.com/milvus-io/milvus/issues/36995 https://github.com/milvus-io/milvus/issues/40034 https://github.com/milvus-io/milvus/issues/40041 https://github.com/milvus-io/milvus/issues/40106 https://github.com/milvus-io/milvus/issues/40138 pr: https://github.com/milvus-io/milvus/pull/38039 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-03-13 20:18:10 +08:00
Bingyi Sun	683b26ffb7	feat: cherry pick json path index (#40313 ) issue: #35528 pr: #36750 this pr includes json path index pr and some related prs: 1. update tantivy version #39253 2. json path index #36750 3. fall back to brute force #40076 4. term filter #40140 5. bug fix #40336 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-03-10 22:14:05 +08:00
yihao.dai	893caee467	fix: [2.5] Fix task delta cache data race (#40262 ) issue: https://github.com/milvus-io/milvus/issues/40258 pr: https://github.com/milvus-io/milvus/pull/40259 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-03-02 16:52:10 +08:00
wei liu	82c000a4b2	fix: task delta cache leak due to duplicate task id (#40184 ) issue: #40052 pr: #40183 task delta cache rely on the taskID is unique, so it incDeltaCache at AddTask, and decDeltaCache at RemoveTask, but the taskID allocator is not atomic, which cause two task with same taskID, in such case, it will call incDeltaCache twice, but call decDeltaCacheOnce, which cause delta cache leak. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-28 10:22:08 +08:00
wei liu	14f05650e3	enhance: clean shard location cache after collection released (#40228 ) issue: #40077 pr: #40088 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-27 19:42:05 +08:00
Xianhui Lin	a4eb2ce224	fix: [2.5]Revert qc statschecker for json key stats (#40125 ) Revert qc statschecker for json key stats issue:https://github.com/milvus-io/milvus/issues/36995 pr:https://github.com/milvus-io/milvus/pull/39876 Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-02-24 13:31:55 +08:00
congqixia	709594f158	enhance: [2.5] Use v2 package name for pkg module (#40117 ) Cherry-pick from master pr: #39990 Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-23 00:46:01 +08:00
Xianhui Lin	c1de61ff7c	fix: [2.5]Replace the position of EnabledJSONKeyStats (#40108 ) Replace the position of EnabledJSONKeyStats issue: https://github.com/milvus-io/milvus/issues/36995 pr: https://github.com/milvus-io/milvus/pull/38039 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-02-22 14:35:54 +08:00
yihao.dai	b8a758b6c4	enhance: [2.5] Add get vector latency metric and refine request limit error message (#40085 ) issue: https://github.com/milvus-io/milvus/issues/40078 pr: https://github.com/milvus-io/milvus/pull/40083 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-02-21 20:19:55 +08:00
wei liu	82fb0bf9c1	fix: [2.5] task delta cache leak on reduce task (#40056 ) issue: #40052 pr: #40055 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-21 16:49:54 +08:00
wei liu	e42c944e04	fix: [2.5] querycoord panic in cornor case (#40058 ) issue: #40050 pr: #40057 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-21 11:19:58 +08:00
wei liu	3c2d8c1419	enhance: [2.5] Add management api to check querycoord balance status (#37784 ) (#39909 ) issue: #37783 pr: #37784 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-19 10:56:49 +08:00
wei liu	bf54f47c34	enhance: [2.5] use rated logger for high frequency log in dist handler (#39452 ) (#39928 ) pr: #39452 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-18 14:32:52 +08:00
Xianhui Lin	f0964f769d	enhance: [2.5]Add json key inverted index in stats for optimization (#39876 ) Add json key inverted index in stats for optimization issue: https://github.com/milvus-io/milvus/issues/36995 pr: https://github.com/milvus-io/milvus/pull/38039 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-02-16 20:12:15 +08:00
congqixia	9407a3c9b1	fix: [2.5] Check collection released before target checks (#39843 ) Cherry-pick from master pr: #39841 Related to #39840 The target could be updated async in previous code. This PR make remove collection from target observer block until all tasks related in dispatchers are removed preventing the metrics being updated after collection released. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-13 20:00:15 +08:00
wei liu	82dc57ace0	fix: [skip e2e][2.5] pr conflict cause ut failed (#39810 ) Related to https://github.com/milvus-io/milvus/pull/39701 & https://github.com/milvus-io/milvus/issues/39681 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-12 11:44:51 +08:00
congqixia	4322a0d49a	fix: [2.5] Resolve conflict on qc task test (#39797 ) Cherry-pick from master pr: #39796 Related to #39701 & #39681 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-11 18:52:45 +08:00
wei liu	11cba57dc7	fix: [2.5] load collection stucks if compaction/gc happens (#39761 ) issue: #39680 pr: #39701 if compaction/gc happens, load collection may stuck due to SegmentNotFound, we should trigger UpdateNextTarget to get a new data view to execute loading operation. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-11 15:48:50 +08:00
wei liu	969e34d540	fix: [2.5]uneven distribution caused by executing task delta cache leak (#39759 ) issue: #39681 pr: #39702 this PR maintain workload effect in action instead of computing workload effect from target, which may cause leak if target changes. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-11 14:32:46 +08:00
jaime	ddc5b299ad	enhance: expose more metrics data (#39466 ) issue: #36621 #39417 pr: #39456 1. Adjust the server-side cache size. 2. Add source information for configurations. 3. Add node ID for compaction and indexing tasks. 4. Resolve localhost access issues to fix health check failures for etcd. Signed-off-by: jaime <yun.zhang@zilliz.com>	2025-02-07 11:48:45 +08:00
yihao.dai	4464966462	enhance: [2.5] Remove frequent observe log (#39414 ) /kind improvement pr: https://github.com/milvus-io/milvus/pull/39413 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-01-20 11:01:10 +08:00
yihao.dai	89a183c7c2	enhance: [2.5] enable task delta cache (#39349 ) When there are many segment tasks in the querycoord scheduler, the traversal in GetSegmentTaskDelta checks becomes time-consuming. This PR adds caching for segment deltas. issue: https://github.com/milvus-io/milvus/issues/37630 pr: https://github.com/milvus-io/milvus/pull/39307 Signed-off-by: bigsheeper <yihao.dai@zilliz.com> Co-authored-by: Wei Liu <wei.liu@zilliz.com>	2025-01-17 12:01:03 +08:00

1 2 3 4 5 ...

680 Commits (2.5)