milvus

Commit Graph

Author	SHA1	Message	Date
wei liu	f49d618382	fix: Querycoord will trigger unexpected balance task after restart (#38630 ) issue: #38606 --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-25 19:30:48 +08:00
wei liu	25f0c82ceb	fix: Fix update loading collection's load config doesn't work (#38595 ) issue: #38594 --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-25 18:02:51 +08:00
wei liu	9c3f59dbbe	fix: Prevent balancer from overloading the same QueryNode (#38719 ) issue: #38718 The balancer calculates the workload of executing tasks as an ongoing score for target nodes. However, a logic issue arises when GetSegmentTaskDelta or GetChannelTaskDelta is called with collectionID=-1, which incorrectly returns zero. Due to the incorrect global score, the executing task's workload is not properly reflected for each collection. Consequently, each collection submits its own balance task, leading to the balancer assigning excessive tasks to the same QueryNode. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-25 16:36:49 +08:00
jaime	5afd0c0a2b	fix: Revert "Expose metrics of stanby coordinators (#27698 )" (#38620 ) issue: #38608 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-12-23 11:46:57 +08:00
jaime	78438ef41e	fix: revert optimize CPU usage for CheckHealth requests (#35589 ) (#38555 ) issue: #35563 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-12-19 00:38:45 +08:00
yihao.dai	d3c174b0f1	enhance: Accelerate observe collection (#38028 ) 1. A collection should observe the channel only once. 2. A collection should check the CollectionLoadPercent for updates only once. 3. Skip saving coll/partition meta if there are no changes, primarily to accelerate collection observation after recovery. issue: https://github.com/milvus-io/milvus/issues/37630 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-12-17 14:14:45 +08:00
jaime	28fdbc4e30	enhance: optimize CPU usage for CheckHealth requests (#35589 ) issue: #35563 1. Use an internal health checker to monitor the cluster's health state, storing the latest state on the coordinator node. The CheckHealth request retrieves the cluster's health from this latest state on the proxy sides, which enhances cluster stability. 2. Each health check will assess all collections and channels, with detailed failure messages temporarily saved in the latest state. 3. Use CheckHealth request instead of the heavy GetMetrics request on the querynode and datanode Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-12-17 11:02:45 +08:00
SimFG	2afe2eaf3e	feat: support to replicate collection when the services contains the system tt msg (#37559 ) - issue: #37105 --------- Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-12-17 09:08:46 +08:00
wei liu	659847c11f	enhance: Remove load task limit in one round (#38436 ) the task limit in assignSegment/assignChannel will works for both load task and balance task. this PR remove the load task limit, only limit balance task num in one round. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-16 19:30:43 +08:00
wei liu	40f9db491e	fix: Fix SyncDistribution may cost too much time on retry (#38454 ) issue: #38428 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-16 11:38:44 +08:00
tinswzy	27229f7907	enhance: refine exists log print with ctx (#38080 ) issue: #35917 Refines exists log print with ctx Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-12-14 22:36:44 +08:00
Zhen Ye	833c74aa66	enhance: add detail, replica count for resource group (#38314 ) issue: #30647 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-12-13 14:14:50 +08:00
wei liu	e279ccf109	enhance: Enable score based balance channel policy (#38143 ) issue: #38142 current balance channel policy only consider current collection's distribution, so if all collections has 1 channel, and all channels has been loaded on same querynode, after querynode num increase, balance channel won't be triggered. This PR enable score based balance channel policy, to achieve: 1. distribute all channels evenly across multiple querynodes 2. distribute each collection's channel evenly across multiple querynodes. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-11 17:20:43 +08:00
Zhen Ye	d3ae8e9232	fix: delay the wait other coord logic in query coord after query coord change into standby state (#38259 ) issue: https://github.com/milvus-io/milvus/issues/37764 - After removing rpc layer from mixcoord, the querycoord at standby mode will be blocked forever of deployment rolling --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-12-11 15:48:42 +08:00
wei liu	950203aba0	enhance: Optimize save colelction target latency (#38345 ) issue: #38237 this PR only use better compression level for proto msg which is larger than 1MB, and use a lighter compression level for smaller proto msg, which could get a better latency in most case. this PR could reduce the latency from 22.7s to 4.7s with 10000 collctions and each collections has 1000 segments. before this PR: BenchmarkTargetManager-8 1 22781536357 ns/op 566407275088 B/op 11188282 allocs/op after this PR: BenchmarkTargetManager-8 1 4729566944 ns/op 36713248864 B/op 10963615 allocs/op Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-11 10:12:43 +08:00
congqixia	7ea9c983d2	enhance: Add mockery package config for QC&QN (#38340 ) Related to #38339 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-12-10 19:18:42 +08:00
wei liu	856e2aad7d	fix: Leader task stuck and retry again and again (#38202 ) issue: #38201 leader task require to update delegator's distribution, and only success after the distribution change has been applyed to delegator. but the delegator will reject the distribution change if it's version is older than current version in delegator. which cause the leader task stuck and retry forever. this PR remove the leader task finish check. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-10 19:16:42 +08:00
wei liu	f04986fceb	enhance: Remove constraint on release segment task (#38297 ) issue: #38305 after we disable balance segment and balance channel happens at same time, the constriant which require release segment must happens on serviceable shard leader is unnessary. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-10 11:18:49 +08:00
jaime	8ed019735c	enhance: add disk stats within system metrics (#38033 ) issue: ##36621 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-12-06 16:32:41 +08:00
congqixia	36946cc9ce	enhance: Set loaded collection/partition number to metrics (#38271 ) Related to #36456 Previous PR: #38471 #38233 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-12-06 16:18:40 +08:00
congqixia	6ff19481f0	enhance: Resolve compilation error due to PR conflict (#38252 ) Related pr: #38233 #38059 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-12-05 19:26:40 +08:00
congqixia	051bc280dd	enhance: Make dynamic load/release partition follow targets (#38059 ) Related to #37849 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-12-05 16:24:40 +08:00
congqixia	32645fc28a	enhance: Unify querycoord meta metrics (#38233 ) Related to #36456 Unify collection/partition number metrics to collection manager in case of unwant missing modification Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-12-05 15:48:39 +08:00
tinswzy	7944538ade	enhance: Add ctx param to KV operation interfaces (#38154 ) issue: #35917 Refine KV operation interfaces by adding a ctx param Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-12-05 15:16:41 +08:00
tinswzy	e76802f910	enhance: refine querycoord meta/catalog related interfaces to ensure that each method includes a ctx parameter (#37916 ) issue: #35917 This PR refine the querycoord meta related interfaces to ensure that each method includes a ctx parameter. Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-11-25 11:14:34 +08:00
jaime	7bbfe86bcd	enhance: add list index and segment index retrieval API for WebUI (#37861 ) issue: #36621 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-11-22 16:58:34 +08:00
congqixia	b34bfb98a0	enhance: Refine Replica manager colle2Replicas secondary index (#37906 ) Related to #37630 This PR add a new util coll2Replicas secondary index to reduce map access & iteration while get replicas by collection --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-22 11:54:32 +08:00
wei liu	965bda6e60	enhance: Add channel name to shard leader log in meta cache (#37856 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-21 19:24:31 +08:00
wei liu	0a440e0d38	fix: Prevent simultaneous balance of segments and channels (#37850 ) issue: #33550 balance segment and balance segment execute at same time, which will cause bounch of corner case. This PR disable simultaneous balance of segments and channels Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-21 17:56:55 +08:00
wei liu	b983ef9fca	fix: Channel may be released after balance (#37862 ) issue: #37830 casue dist handler doesn't set channel's version, so if channel checker try to dedup channel, it may release the new delegator after balance finished. this PR fix the way to set proper version for channel. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-21 10:40:31 +08:00
congqixia	b8d31ebed8	enhance: Remove unnecessary segment clone updating dist (#37797 ) Related to #37630 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-20 11:26:31 +08:00
yihao.dai	b6612e02b4	enhance: Reduce GetIndexInfos calls (#37695 ) Batch `GetIndexInfos` calls for segments to reduce RPC calls. issue: https://github.com/milvus-io/milvus/issues/37634 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-11-19 14:24:31 +08:00
congqixia	6d86b9022e	enhance: Provide secondary index critria when filter leaderview (#37777 ) Related to #37630 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-19 10:12:30 +08:00
jaime	257ecab84b	enhance: remove collection queryable check from health check (#37712 ) Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-11-18 10:50:38 +08:00
congqixia	b0bd290a6e	enhance: Use internal json(sonic) to replace std json lib (#37708 ) Related to #35020 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-18 10:46:31 +08:00
wei liu	a1b6be1253	fix: Delegator stuck at unserviceable status (#37694 ) issue: #37679 pr #36549 introduce the logic error which update current target when only parts of channel is ready. This PR fix the logic error and let dist handler keep pull distribution on querynode until all delegator becomes serviceable. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-15 10:20:31 +08:00
jaime	1d06d4324b	fix: Int64 overflow in JSON encoding (#37657 ) issue: ##36621 - For simple types in a struct, add "string" to the JSON tag for automatic string conversion during JSON encoding. - For complex types in a struct, replace "int64" with "string." Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-11-14 22:52:30 +08:00
wei liu	1304b40552	fix: Balance channel may stuck at increasing replica number case (#37641 ) issue: #37640 fix the pr #36549 cause balance channel will wait until new delegator becomes serviceable, but new delegator need to sync target version then becomes serviceable, and sync target version need to be wait all replica load done. so if increasing replica number and balance channel happens at same time, logic dead lock occurs. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-14 10:08:31 +08:00
jaime	1e8ea4a7e7	feat: add segment/channel/task/slow query render (#37561 ) issue: #36621 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-11-12 17:44:29 +08:00
wei liu	266f8ef1f5	fix: Search may return less result after qn recover (#36549 ) issue: #36293 #36242 after qn recover, delegator may be loaded in new node, after all segment has been loaded, delegator becomes serviceable. but delegator's target version hasn't been synced, and if search/query comes, delegator will use wrong target version to filter out a empty segment list, which caused empty search result. This pr will block delegator's serviceable status until target version is synced --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-12 16:34:28 +08:00
congqixia	f5b06a3c9f	enhance: Invalidate collection cache when release collection (#37577 ) Related to #37395 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-12 10:16:29 +08:00
wei liu	61a5b15ada	fix: Lost loading collection's updateTs after qc restart (#37538 ) issue: #37537 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-11 14:34:28 +08:00
congqixia	5e90f348fc	enhance: Handle legacy proxy load fields request (#37565 ) Related to #35415 In rolling upgrade, legacy proxy may dispatch load request wit empty load field list. The upgraded querycoord may report error by mistake that load field list is changed. This PR: - Auto field empty load field list with all user field ids - Refine the error messag when load field list updates - Refine load job unit test with service cases Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-11 10:14:26 +08:00
sthuang	70605cf5b3	enhance: Support custom privilege group for RBAC (#37087 ) issue: #37031 --------- Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-11-09 08:44:28 +08:00
yihao.dai	ff9bdf7029	fix: Fix load slowly (#37454 ) When there're a lot of loaded collections, they would occupy the target observer scheduler’s pool. This prevents loading collections from updating the current target in time, slowing down the load process. This PR adds a separate target dispatcher for loading collections. issue: https://github.com/milvus-io/milvus/issues/37166 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-11-09 07:48:26 +08:00
congqixia	dcc1b506dc	enhance: Add context trace for querycoord queryable check (#37524 ) When check health logic failed to collection not-queryable, the related reason is hard to find in log. This PR add context for log with trace id and print unqueryable collection info log. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-08 14:00:26 +08:00
wei liu	a03157838b	enhance: Enable node assign policy on resource group (#36968 ) issue: #36977 with node_label_filter on resource group, user can add label on querynode with env `MILVUS_COMPONENT_LABEL`, then resource group will prefer to accept node which match it's node_label_filter. then querynode's can't be group by labels, and put querynodes with same label to same resource groups. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-08 11:18:27 +08:00
Xiaofan	e073906a19	enhance: optimize describe collection and index (#37490 ) fix #37489 combine multiple describe collection and list index into one call Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2024-11-08 10:18:34 +08:00
jaime	f348bd9441	feat: add segment,pipeline, replica and resourcegroup api for WebUI (#37344 ) issue: #36621 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-11-07 11:52:25 +08:00
wei liu	8714774305	fix: search/query failed due to segment not loaded (#37403 ) issue: #36970 cause release segment and balance channel may happen at same time, and before new delegator become serviceable, if release segment exeuctes on new delegator, and search/query comes on old delegator, then release segment and query segment happens in parallel, if release segment execute first in worker, then search/query will got a SegmentNodeLoaded error. This PR add serviceable filter on delegator, then all load/release segment operation will happens on serviceable delegator. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-06 15:10:25 +08:00

1 2 3 4 5 ...

618 Commits (f49d618382984af9a1e3c6752d83836658983cec)