milvus

Commit Graph

Author	SHA1	Message	Date
wei liu	ecc2ac0426	fix: apply load config changes failed after restart (#43554 ) issue: #43107 --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-08-01 20:13:37 +08:00
Spade A	faeb7fd410	feat: impl StructArray -- create schema, insert, and retrieve data (#42855 ) Ref https://github.com/milvus-io/milvus/issues/42148 https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of storage for handling with VectorArray. This PR: 1. impls the go part of storage for VectorArray 2. impls the collection creation with StructArrayField and VectorArray 3. insert and retrieve data from the collection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>	2025-07-27 01:30:55 +08:00
Zhen Ye	e9ab73e93d	enhance: add schema version at recovery storage (#43500 ) issue: #43072, #43289 - manage the schema version at recovery storage. - update the schema when creating collection or alter schema. - get schema at write buffer based on version. - recover the schema when upgrading from 2.5. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-23 21:38:54 +08:00
Zhen Ye	df7e507c49	fix: balance may not trigger at balance checker when upgrading (#43462 ) issue: #43416 Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-22 16:02:53 +08:00
Zhen Ye	25b76e1fde	fix: cannot auto balance the channel from old arch to streamingnode (#43424 ) issue: #43416, #43413 - also fix the panic on streamingnode when concurrent sync Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-20 23:00:52 +08:00
Zhen Ye	3aacd179f7	fix: balance channel before balance segment when upgrading (#43346 ) issue: #43117, #42966, #43373 - also fix channel balance may not work at 2.6. - fix error lost at delete path - add mvcc into s/q log - change the log level for TestCoordDownSearch Signed-off-by: chyezh <chyezh@outlook.com>	2025-07-17 20:16:52 +08:00
wei liu	039564199c	fix: Prevent duplicate segment results in count queries (#43173 ) issue: #41570 Fix issue where growing and sealed segments could be searched simultaneously, causing inflated count() results. This was caused by logic introduced in PR #42009 that made sealed segments readable before target version advancement. Changes include: - Fix conditional filtering logic in PinReadableSegments to prevent sealed segments from becoming readable prematurely - Use target version filter for full results (ratio=1.0) to ensure sealed segments only become readable after target advancement - Use query view segment list filter for partial results (ratio<1.0) to maintain backward compatibility - Simplify target version setting in AddDistributions to prevent premature segment readability - Add logging for redundant growing segments during sync - Add comprehensive unit tests covering the duplicate segment scenario This fix ensures count() queries return accurate results by preventing the same segment from being counted in both growing and sealed states. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-14 11:10:49 +08:00
wei liu	b2597c6329	enhance: apply load config changes after QueryCoord restart (#43108 ) issue: #43107 - Add checkLoadConfigChanges() to apply load config during startup - Call config check in startQueryCoord() after restart - Skip auto-updates for collections with user-specified replica numbers - Add is_user_specified_replica_mode field to preserve user settings - Add comprehensive unit tests with mockey Ensures existing collections use latest cluster-level config after restart. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-07-10 14:28:48 +08:00
congqixia	1fae5230fe	fix: Check field mmap property before apply collection level one (#43090 ) Related to #43089 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-03 14:30:44 +08:00
congqixia	7bc7b18ed5	fix: [AddField] Prevent concurrent load during UpdateSchema (#43043 ) Related to #43028 This PR: - Add mutex prevent concurrent load segment & schema change - Add schema verison field in load meta - Update schema in PutOrRef if schema verison is larger --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-02 17:38:44 +08:00
Zhen Ye	ecb24e7232	enhance: use multi-process framework in integration test (#42976 ) issue: #41609 - add env `MILVUS_NODE_ID_FOR_TESTING` to set up a node id for milvus process. - add env `MILVUS_CONFIG_REFRESH_INTERVAL` to set up the refresh interval of paramtable. - Init paramtable when calling `paramtable.Get()`. - add new multi process framework for integration test. - change all integration test into multi process. - merge some test case into one suite to speed up it. - modify some test, which need to wait for issue #42966, #42685. - remove the waittssync for delete collection to fix issue: #42989 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-06-30 14:22:43 +08:00
wei liu	c919340763	enhance: Optimize channel node balancing for uneven QN distribution (#42786 ) issue: #42860 Fix channel node allocation when QueryNode count is not a multiple of channel count. The previous algorithm used simple division which caused uneven distribution with remainders. Key improvements: - Implement smart remainder distribution algorithm - Refactor large function into focused helper functions - Support two-phase rebalancing (release then allocate) - Handle edge cases like insufficient nodes gracefully --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-06-30 12:14:42 +08:00
wei liu	be492c2939	fix: Add missing keylocks in ReleasePartition operation (#42940 ) issue: #42098 Fix concurrent access issue by adding proper locking around ReleasePartition operation to prevent race conditions when releasing partitions on the same collection. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-06-25 21:48:42 +08:00
wei liu	bf5fde1431	fix: Prevent delegator unserviceable due to shard leader change (#42689 ) issue: #42098 #42404 Fix critical issue where concurrent balance segment and balance channel operations cause delegator view inconsistency. When shard leader switches between load and release phases of segment balance, it results in loading segments on old delegator but releasing on new delegator, making the new delegator unserviceable. The root cause is that balance segment modifies delegator views, and if these modifications happen on different delegators due to leader change, it corrupts the delegator state and affects query availability. Changes include: - Add shardLeaderID field to SegmentTask to track delegator for load - Record shard leader ID during segment loading in move operations - Skip release if shard leader changed from the one used for loading - Add comprehensive unit tests for leader change scenarios This ensures balance segment operations are atomic on single delegator, preventing view corruption and maintaining delegator serviceability. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-06-19 12:10:38 +08:00
Bingyi Sun	6bebb68727	fix: Return all targets segments in ListLoadedSegments (#42728 ) issue: https://github.com/milvus-io/milvus/issues/42412 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-06-18 11:20:38 +08:00
Chun Han	001619aef9	feat: supporing load priority for loading (#42413 ) related: #40781 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2025-06-17 15:22:38 +08:00
congqixia	9653ec8d8c	fix: [AddField] Remove load list check on querycoord (#42736 ) Related to #42735 Load field list shall work as hint after tiered storage impl, so the load list compare is meaningless and block load with empty list after adding a new field. This PR totally moves the check logic. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-17 09:50:37 +08:00
Bingyi Sun	1bf960b1a8	enhance: Check loaded segments before gc (#42639 ) issue: https://github.com/milvus-io/milvus/issues/42412 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2025-06-13 17:44:38 +08:00
congqixia	d59002d45e	fix: Make controller wait checker worker quit and add nil protection (#42704 ) Related to #42702 This patch add wait logic for `CheckerController` and nil check for channel checker in case of panicking during server/testcase stop procedure Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-06-13 13:20:35 +08:00
wei liu	e7c0a6ffbb	enhance: Refine QueryNode task parallelism based on CPU core count (#42166 ) issue: #42165 Implement dynamic task execution capacity calculation based on QueryNode CPU core count instead of static configuration for better resource utilization. Changes include: - Add CpuCoreNum() method and WithCpuCoreNum() option to NodeInfo - Implement GetTaskExecutionCap() for dynamic capacity calculation - Add QueryNodeTaskParallelismFactor parameter for tuning - Update proto definition to include cpu_core_num field - Add unit tests for new functionality This allows QueryCoord to automatically adjust task parallelism based on actual hardware resources. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-06-11 13:20:35 +08:00
wei liu	317e7999da	fix: ReleasePartition cause delegator unserviceable. (#42423 ) issue: #42098 #42404 related to: ##42009 #41937 Implement new method to handle partition removal from next target without directly modifying current target. Changes include: - Add RemovePartitionFromNextTarget method and deprecate RemovePartition - Update target_observer to use new method for ReleasePartition operations - Add unit tests and mock methods for new functionality This ensures that all changes to next target will propagates to delegator's query view. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-06-09 19:02:34 +08:00
cai.zhang	5566a85bcc	enhance: Add proxy task queue metrics (#42156 ) issue: #42155 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-06-04 11:26:32 +08:00
Zhen Ye	508264f953	fix: querynode upgrade from 2.5 get stucked (#42502 ) issue: #42492 - consider the old RO query node (not streaming node) when balancing channel. - querynode graceful stop can be done if there's only L0 segment exists. Signed-off-by: chyezh <chyezh@outlook.com>	2025-06-04 11:20:30 +08:00
wei liu	aa66072a1c	enhance: Remove inadvertently introduced goccy/go-json dependency (#42146 ) Remove the 'goccy/go-json' library, which was inadvertently introduced, and revert to using the standard internal JSON handling. Changes include: - Removed dependency on 'github.com/goccy/go-json' in go.mod and go.sum. - Replaced import of 'goccy/go-json' with 'internal/json' in 'internal/querycoordv2/task/scheduler.go'. This correction ensures the project continues to use the intended JSON processing libraries and avoids unnecessary external dependencies. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-06-03 17:38:32 +08:00
wei liu	2669d14ba0	refactor: Remove balance constraints between channel and segment tasks (#42177 ) issue: #42176 Remove the mutual exclusion constraints between channel and segment balance tasks to allow them to run concurrently. Changes include: - Remove permitBalanceChannel() and permitBalanceSegment() methods from RoundRobinBalancer - Update ChannelLevelScoreBalancer, MultiTargetBalancer, RowCountBasedBalancer, and ScoreBasedBalancer to remove constraint checks - Allow segment balance tasks to proceed even when channel balance tasks are running - Update test cases to reflect new behavior where balance tasks no longer block each other This change improves the efficiency of load balancing by removing unnecessary coordination overhead between different types of balance operations. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-30 18:14:25 +08:00
wei liu	eabb62e3ab	fix: Segment may be released prematurely during balance channel (#42090 ) issue: #41143 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-29 18:36:35 +08:00
aoiasd	2ae4d80120	enhance: support run analyzer by loaded collection field (#42113 ) relate: https://github.com/milvus-io/milvus/issues/42094 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2025-05-29 10:54:30 +08:00
wei liu	54619eaa2c	feat: Implement partial result support on node down (#42009 ) issue: https://github.com/milvus-io/milvus/issues/41690 This commit implements partial search result functionality when query nodes go down, improving system availability during node failures. The changes include: - Enhanced load balancing in proxy (lb_policy.go) to handle node failures with retry support - Added partial search result capability in querynode delegator and distribution logic - Implemented tests for various partial result scenarios when nodes go down - Added metrics to track partial search results in querynode_metrics.go - Updated parameter configuration to support partial result required data ratio - Replaced old partial_search_test.go with more comprehensive partial_result_on_node_down_test.go - Updated proto definitions and improved retry logic These changes improve query resilience by returning partial results to users when some query nodes are unavailable, ensuring that queries don't completely fail when a portion of data remains accessible. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-28 00:12:28 +08:00
wei liu	78010262f0	enhance: Optimize shard serviceable mechanism (#41937 ) issue: https://github.com/milvus-io/milvus/issues/41690 - Merge leader view and channel management into ChannelDistManager, allowing a channel to have multiple delegators. - Improve shard leader switching to ensure a single replica only has one shard leader per channel. The shard leader handles all resource loading and query requests. - Refine the serviceable mechanism: after QC completes loading, sync the query view to the delegator. The delegator then determines its serviceable status based on the query view. - When a delegator encounters forwarding query or deletion failures, mark the corresponding segment as offline and transition it to an unserviceable state. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-22 11:38:24 +08:00
wei liu	4e1208f4f6	enhance: support balancing multiple collections in single trigger (#41875 ) issue: #41874 - Optimize balance_checker to support balancing multiple collections simultaneously - Add new parameters for segment and channel balancing batch sizes - Add enableBalanceOnMultipleCollections parameter - Update tests for balance checker This change improves resource utilization by allowing the system to balance multiple collections in a single trigger with configurable batch sizes. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-05-21 21:38:25 +08:00
yihao.dai	65dd3982d8	fix: Fix ants.Pool goroutine leak (#41892 ) 1. Release the pool after it is no longer in use. 2. Upgrade ants.Pool to fix the goroutine leak issue (see [PR #287](https://github.com/panjf2000/ants/pull/287)). issue: https://github.com/milvus-io/milvus/issues/41838 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-05-19 17:56:22 +08:00
Zhen Ye	5fd47c3c89	fix: mockery too unavailable after upgrade golang version (#41481 ) issue: #41291 pr: #41318 Signed-off-by: chyezh <chyezh@outlook.com>	2025-04-24 10:46:43 +08:00
SimFG	91d40fa558	fix: Update logging context and upgrade dependencies (#41318 ) - issue: #41291 --------- Signed-off-by: SimFG <bang.fu@zilliz.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2025-04-23 10:52:38 +08:00
congqixia	b36c88f3c8	enhance: [AddField] Broadcast schema change via WAL (#41373 ) Related to #39718 Add Broadcast logic for collection schema change and notifies: - Streamnode - Delegator - Streamnode - Flush component - QueryNodes via grpc --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-04-22 16:28:37 +08:00
Xianhui Lin	f9febe3bae	enhance: Merge RootCoord, DataCoord And QueryCoord into MixCoord (#41006 ) Merge RootCoord, DataCoord And QueryCoord into MixCoord Make Session into one issue : https://github.com/milvus-io/milvus/issues/37764 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>	2025-04-11 16:36:30 +08:00
wei liu	a839d94c9e	fix: balance checker may enter infinite normal balance loop after balance suspension (#41195 ) issue: #41194 - Refactor hasUnbalancedCollection flag handling to function scope - Ensure tracking sets clearance when no balance needed - Add deferred cleanup for both normal/stopping balance paths - Add unit tests for collection tracking scenarios The changes ensure tracking sets (normalBalanceCollectionsCurrentRound and stoppingBalanceCollectionsCurrentRound) are properly cleared when: - All collections in current round are balanced - Balance checks return early due to unready targets - Balance feature flags are disabled Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-04-10 15:22:29 +08:00
Xianhui Lin	3bc24c264f	enhance: Add json key inverted index in stats for optimization (#38039 ) Add json key inverted index in stats for optimization https://github.com/milvus-io/milvus/issues/36995 --------- Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2025-04-10 15:20:28 +08:00
wei liu	99270103cf	fix: Offline segment block delegator recovery (#40827 ) issue: #39937 Before PR #39552, whenever a segment was missing in either the `current target` or the `next target`, we would trigger `load segment` to recover the delegator. However, restoring only the missing segments in the `next target` is sufficient to advance the target and complete the recovery process. In PR #39552, we removed the scheduling of L0 segments along with this unnecessary `load segment` logic. However, this exposed a new issue: if the `current target` still has missing segments and there is a flaw in the `checkDelegatorDataReady` logic, it could block the recovery of a delegator that contains `offline segments`. Since `offline segments` are cleaned up asynchronously in this scenario, this PR removes their blocking effect on delegator recovery, ensuring a smoother failure recovery process. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-04-07 14:56:22 +08:00
wei liu	bf8547578f	fix: Address manual balance and balance check issues (#41037 ) issue: #37651 - Fix context propagation for manual balance segment task creation from PR #38080. - Optimize stopping balance by preventing redundant checks per round, addressing performance regression from PR #40297. - Decrease default `checkBalanceInterval` from 3000ms to 300ms. - Correct minor log messages in `BalanceChecker`. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-04-03 15:48:27 +08:00
smellthemoon	cb1e86e17c	enhance: support add field (#39800 ) after the pr merged, we can support to insert, upsert, build index, query, search in the added field. can only do the above operates in added field after add field request complete, which is a sync operate. compact will be supported in the next pr. #39718 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2025-04-02 14:24:31 +08:00
wei liu	c02892e9fb	enhance: Balance the collection with the largest row count first (#40297 ) issue: #37651 this PR enable to balance the collection with largest row count first, to avoid temporary migration of small table data to new nodes during their onboarding, only to be moved out again after the large table balance, which would cause unnecessary load. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-03-31 16:00:19 +08:00
wei liu	0420dc1eb1	fix: use correct delete checkpoint to prevent premature data cleanup (#40366 ) issue: #40292 related to #39552 - Fix incorrect delete checkpoint usage in SyncDistribution - Change checkpoint parameter from action.GetCheckpoint() to action.GetDeleteCP() in SyncTargetVersion call - This resolves the issue where delete buffer data was being cleaned prematurely due to wrong checkpoint reference Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-03-12 15:00:08 +08:00
yihao.dai	c368113233	fix: Fix task delta cache data race (#40259 ) issue: https://github.com/milvus-io/milvus/issues/40258 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-03-02 16:52:09 +08:00
wei liu	b0806bb900	fix: task delta cache leak due to duplicate task id (#40183 ) issue: #40052 task delta cache rely on the taskID is unique, so it incDeltaCache at AddTask, and decDeltaCache at RemoveTask, but the taskID allocator is not atomic, which cause two task with same taskID, in such case, it will call incDeltaCache twice, but call decDeltaCacheOnce, which cause delta cache leak. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-28 10:22:08 +08:00
wei liu	94f55df7fb	enhance: clean shard location cache after collection released (#40088 ) issue: #40077 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-27 19:42:05 +08:00
wei liu	69b8b89369	enhance: Remove QueryCoord's scheduling of L0 segments (#39552 ) issue: #39551 This PR remove querycoord's scheduling of l0 segments: - only load l0 segment when watch channel - only release l0 segment when release channel or sync data distribution --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-26 21:38:00 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
yihao.dai	2a037a97f1	enhance: Add get vector latency metric and refine request limit error message (#40083 ) issue: https://github.com/milvus-io/milvus/issues/40078 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-02-21 19:41:55 +08:00
wei liu	7d2c948c69	fix: task delta cache leak on reduce task (#40055 ) issue: #40052 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-21 16:47:54 +08:00
wei liu	07578041ba	fix: querycoord panic in cornor case (#40057 ) issue: #40050 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-02-21 11:19:58 +08:00

1 2 3 4 5 ...

696 Commits (master)