milvus

Commit Graph

Author	SHA1	Message	Date
jaime	9630974fbb	enhance: move rocksmq from internal to pkg module (#33881 ) issue: #33956 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-06-25 21:18:15 +08:00
congqixia	07c25a19d9	fix: Make querycoord panick when rg metastore sync fail (#34106 ) See also #34047 When `unassignNode` sync resource group with node removed failed Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-06-24 21:38:02 +08:00
Chun Han	ca7ef26e4b	fix: sync part stats task cannot be finished(#30376 ) (#34027 ) related: #30376 also: refine log output for query_coord task by rephrasing action string Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2024-06-24 10:16:02 +08:00
wei liu	935bc1fb71	fix: Fix GetReplicas API return nil status (#33715 ) issue: #33702 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-06-20 14:40:15 +08:00
wayblink	5fac2fa1d2	fix: Panic if ProcessActiveStandBy returns error (#33369 ) #33368 Signed-off-by: wayblink <anyang.wang@zilliz.com>	2024-06-19 11:16:00 +08:00
wei liu	02945959d9	enhance: Avoid to iterate whole segment list for each task's process (#33943 ) when querycoord process segment task, it will try to iterate whole segment list to checke whether segment is loaded, which cost too much cpu if there has thousands of segments. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-06-19 10:19:58 +08:00
Chun Han	f7af323d1e	fix: sync partitiion stats blocking balance task(#33741 ) (#33742 ) related: #33741 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-06-11 14:21:56 +08:00
wayblink	a1232fafda	feat: Major compaction (#33620 ) #30633 Signed-off-by: wayblink <anyang.wang@zilliz.com> Co-authored-by: MrPresent-Han <chun.han@zilliz.com>	2024-06-10 21:34:08 +08:00
yihao.dai	3540eee977	enhance: Support L0 import (#33514 ) issue: https://github.com/milvus-io/milvus/issues/33157 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-06-07 14:17:20 +08:00
SimFG	ecee7d90d4	enhance: try to speed up the loading of small collections (#33570 ) - issue: #33569 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-06-07 08:25:53 +08:00
wei liu	b13932bb55	enhance: Enable database level replica num and resource groups for loading collection (#33052 ) issue: #30040 This PR introduce two database level props: 1. database.replica.number 2. database.resource_groups User can set those two database props by AlterDatabase API, then can load collection without specified replica_num and resource groups. then it will use database level load param when try to load collections. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-29 10:59:43 +08:00
wei liu	6275c75013	fix: Watch channel task may stuck forever until qn become offline (#33394 ) issue: #32901 pr #32814 introduce the compatible issue, when upgrade to milvus latest, the query coord may skip update dist due to the lastModifyTs doesn't changes. but for old version querynode, the lastModifyTs in GetDataDistritbuionResponse is always 0, which makes qc skip update dist. then qc will keep retry the task to watch channel again and again. this PR add compatible with old version querynode, when lastModifyTs is 0, qc will update it's data distribution. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-27 15:01:42 +08:00
wei liu	303470fc35	fix: Clean offline node from resource group after qc restart (#33232 ) issue: #33200 #33207 pr#33104 causes the offline node will be kept in resource group after qc recover, and offline node will be assign to new replica as rwNode, then request send to those node will fail by NodeNotFound. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-22 10:03:40 +08:00
wei liu	33bd6eed28	fix: Clean offline node from replica after qc recover (#33213 ) issue: #33200 #33207 pr#33104 remove this logic by mistake, which cause the offline node will be kept in replica after qc recover, and request send to offline qn will go a NodeNotFound error. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-21 15:41:39 +08:00
wei liu	2013d97243	enhance: Enable to dynamic update balancer policy in querycoord (#33037 ) issue: #33036 This PR enable to dynamic update balancer policy without restart querycoord. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-21 14:29:39 +08:00
jaime	0d99db23b8	fix: metrics leak on the coord nodes (#33075 ) issue: #32980 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-05-20 22:03:39 +08:00
wei liu	a7f6193bfc	fix: query node may stuck at stopping progress (#33104 ) issue: #33103 when try to do stopping balance for stopping query node, balancer will try to get node list from replica.GetNodes, then check whether node is stopping, if so, stopping balance will be triggered for this replica. after the replica refactor, replica.GetNodes only return rwNodes, and the stopping node maintains in roNodes, so balancer couldn't find replica which contains stopping node, and stopping balance for replica won't be triggered, then query node will stuck forever due to segment/channel doesn't move out. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-20 10:21:38 +08:00
wei liu	f1c9986974	enhance: Skip return data distribution if no change happen (#32814 ) issue: #32813 --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-17 10:11:37 +08:00
cai.zhang	6ea7633bd5	enhance: Add memory size for binlog (#33025 ) issue: #33005 1. add `MemorySize` field for insert binlog. 2. `LogSize` means the file size in the storage object. 3. `MemorySize` means the size of the data in the memory. --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com> Signed-off-by: cai.zhang <cai.zhang@zilliz.com>	2024-05-15 12:59:34 +08:00
SimFG	1d48d0aeb2	enhance: use different value to get related data size according to segment type (#33017 ) issue: #30436 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-05-14 14:59:33 +08:00
congqixia	861977ab60	fix: Start `LeaderCacheObserver` before `SyncAll` (#33035 ) Related to #33033 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-14 13:25:32 +08:00
wei liu	cba2c7a3be	enhance: clean channel node info in meta store (#32988 ) issue: #32910 see also: #32911 when channel exclusive mode is enabled, replica will record channel node info in meta store, and if the balance policy changes, which means channel exclusive mode is disabled, we should clean up the channel node info in meta store, and stop to balance node between channels. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-14 10:05:40 +08:00
chyezh	293f14a8b9	fix: remove redundant replica recover (#32985 ) issue: #22288 - replica recover should be only triggered by replica recover Signed-off-by: chyezh <chyezh@outlook.com>	2024-05-13 15:25:32 +08:00
Xiaofan	b044e5503e	enhance:Improve load speed (#32898 ) fix #32897 add memory check when load collection Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2024-05-11 10:29:31 +08:00
chyezh	1c84a1c9b6	fix: lru related issue fixup patch (#32916 ) issue: #32206, #32801 - search failure with some assertion, segment not loaded and resource insufficient. - segment leak when query segments --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-05-10 19:17:30 +08:00
wei liu	e2332bdc17	enhance: Enable channel exclusive balance policy (#32911 ) issue: #32910 * split replica's node list to channels when create replicas * balance nodes among channels when node change happens * implement channel level balance, let balance happens in channel level Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-10 17:27:31 +08:00
wei liu	04a8ec69f6	fix: Segment on stopping query node can't be release successfully (#32929 ) issue: #32901 Cause release segment request need be send to delegator, but it need replica to info find segment's delegator. but the stopping query node will be marked as read only in replica, then `replica.Contains()` just return true for rwNode in replica. then it can't get replica info by stopping query node and release segment will be blocked. This PR make `replica.Contains()` return true for both roNode and rwNode. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-10 14:33:30 +08:00
Bingyi Sun	b7ef8da360	fix: set channel checkpoint to delta position (#32878 ) issue: https://github.com/milvus-io/milvus/issues/32853 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-05-10 11:51:30 +08:00
congqixia	efa58ae423	enhance: Utilize coll2replica mapping when getting rg by collection (#32892 ) See also #32165 In old `GetResourceGroupByCollection` implementation, it iterates all replicas to match collection id, which is slow and CPU time consuming. This PR make it utilize the coll2Replicas mapping by calling `GetByCollection` and mapping replicas into resource group. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-09 19:37:30 +08:00
congqixia	acb0417a9f	enhance: Avoid iteration over channel results when update leaderview (#32887 ) See also #32165 Cache channel name to channel info to avoid iteration over channel results when updating leader view version. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-09 15:41:30 +08:00
wei liu	fad8f0afa5	enhance: enable stopping balance after balance has been suspended (#32812 ) issue: #32811 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-08 10:15:29 +08:00
wei liu	ba02d54a30	enhance: update shard leader cache when leader location changed (#32470 ) issue: #32466 this PR enhance that when shard location changed, update proxy's shard leader cache. in case of query node failover case, proxy can find replica recover --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-08 10:05:29 +08:00
yihao.dai	9db3aa18bc	enhance: Remove deprecated EnableIndex (#32704 ) /kind improvement Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-05-07 17:11:30 +08:00
chyezh	b904c8d377	enhance: resource group unittest refactory (#32739 ) issue: #30647 Signed-off-by: chyezh <chyezh@outlook.com>	2024-05-06 10:17:34 +08:00
wei liu	d900e68440	fix: fix GetShardLeaders return empty node list (#32685 ) issue: #32449 to avoid GetShardLeaders return empty node list, this PR add node list check in both client side and server side. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-29 14:19:26 +08:00
chyezh	ef4c875d4c	fix: resource group ut may failure (#32688 ) issue: https://github.com/milvus-io/milvus/issues/30647 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-29 14:17:26 +08:00
wei liu	c0555d4b45	fix: Remove read only node from replica immedaitely after node down (#32666 ) issue: #32665 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-28 20:25:25 +08:00
congqixia	4cdf6c3c41	fix: Check partition nil before observe load progress (#32659 ) See also #32441 #32615 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-28 16:29:25 +08:00
congqixia	a239e9110e	enhance: Apply node-indexing and cache optimization for channel dist (#32595 ) See also #32165 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-28 16:19:24 +08:00
Xiaofan	02ace25c68	enhance: reduce the cpu usage when collection number is high (#32245 ) related to #32165 1. for all the manager, support collection level index 2. remove collection level filter to avoid extra cpu usage when collection number increases Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2024-04-26 11:49:25 +08:00
chyezh	f06509bf97	fix: get replica should not report error when no querynode serve (#32536 ) issue: #30647 - Remove error report if there's no query node serve. It's hard for programer to use it to do resource management. - Change resource group `transferNode` logic to keep compatible with old version sdk. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-25 19:25:24 +08:00
chyezh	b287fbaa2e	fix: return collection on recovering but not collection not loaded when target is not recovered (#32447 ) issue: #32398 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-25 11:21:26 +08:00
congqixia	f30c22626e	enhance: Pre-cache result for frequent filters (#32580 ) See also #32165 Add segment dist and leader view filter criterion struct to store frequent filter conditions. Add collection/channel filter results for these two meta --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-25 11:13:25 +08:00
congqixia	37ca32dbba	enhance: Make SegmentDistManager filter use node index (#32533 ) See also #32165 Change `SegmentDistFilter` to interface in order to provde node index when filter segment dist. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-24 16:53:24 +08:00
smellthemoon	96d95e7743	enhance: fix pass error msg as channel name (#32511 ) Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-04-23 16:45:22 +08:00
congqixia	bfebdecf3e	enhance: Make LeaderView Manager filter use map index (#32505 ) See also #32165 Change `LeaderViewFilter` to interface to provided map key to avoid iterating all key-values in LeaderViewManager Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-23 11:07:24 +08:00
chyezh	21a9de5c8e	fix: resource group ut fixup (#32509 ) issue: #30647 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-23 10:01:23 +08:00
congqixia	d7ff1bbe5c	enhance: Make querycoordv2 collection observer task driven (#32441 ) See also #32440 - Add loadTask in collection observer - For load collection/partitions, load task shall timeout as a whole - Change related constructor to load jobs --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-22 10:39:22 +08:00
congqixia	01c16fe6e3	enhance: Manual release pool after save targets (#32358 ) See also #31632 Release conc.Pool after usage to clean worker and stop background purge and ticktock. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-19 13:51:21 +08:00
chyezh	a8c8a6bb0f	fix: parameter check of TransferReplica and TransferNode (#32297 ) issue: #30647 - Same dst and src resource group should not be allowed in `TransferReplica` and `TransferNode`. - Remove redundant parameter check. Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-17 15:27:19 +08:00
yiwangdr	7deda4d5e9	enhance: speed up GetByCollectionAndNode (#32232 ) Related to https://github.com/milvus-io/milvus/issues/32165 Avoid iterating through all replicas/collections if possible. Iteration is expensive when there are large number of replicas/collections. Signed-off-by: yiwangdr <yiwangdr@gmail.com>	2024-04-17 10:23:25 +08:00
congqixia	72c172a7d7	enhance: Remove duplicated collectionID label for task latency (#32308 ) `CollectionID` already exists in channel name, so remove it to save metrics traffic. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-16 18:55:19 +08:00
chyezh	70e3d5b495	fix: wrong node id in TestCheckNodesInReplica (#32268 ) issue: #31930 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-15 17:38:17 +08:00
wei liu	4822b109bd	fix: Skip to load l0 segment on old version query node (#32124 ) issue: #32107 during rolling upgrade progress, skip to load l0 segment on old version query node --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-15 11:23:23 +08:00
congqixia	dc11cbd123	enhance: Maintain collection-patitions mapping in qc meta (#32227 ) Related to #32165 Add collection to partitionIDs mapping to avoid interation on all partitions loaded when trying to get all partitions with collection id --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-15 10:05:19 +08:00
chyezh	48fe977a9d	enhance: declarative resource group api (#31930 ) issue: #30647 - Add declarative resource group api - Add config for resource group management - Resource group recovery enhancement --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-15 08:13:19 +08:00
wei liu	68dec7dcd4	fix: Use correct ts to avoid exclude segment list leak (#31991 ) issue: #31990 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-12 10:39:19 +08:00
congqixia	b9a487608a	fix: Make `ResourceGroup.nodes` concurrent safe (#32159 ) See also #32158 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-11 17:53:18 +08:00
congqixia	25a1c9ecf0	fix: Make coordinator `Register` not blocked on ProcessActiveStandby (#32069 ) See also #32066 This PR make coordinator register successful and let `ProcessActiveStandBy` run async. And roles may receive stop signal and notify servers. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-10 18:49:18 +08:00
chyezh	a3d6110957	fix: ut failure (#32120 ) issue: #30647 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-10 17:30:48 +08:00
chyezh	0be67e7f99	fix: ut failure (#32119 ) issue: #30647 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-10 17:23:27 +08:00
wei liu	c4806b69c4	enhance: Refactor leader view manager interface (#31133 ) issue: #31091 This PR add GetByFilter interface in leader view manager, instead of all kind of get func --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-10 15:13:36 +08:00
wei liu	177ddda47f	fix: Check stale should check leader task's leader id (#31962 ) issue: #30816 check stale rules for leader task: 1. for reduce leader task, it should keep executing until leader's node become offline. 2. for grow leader task,it should keep executing until leader's node become stopping. This PR check leader node's stopping state for grow leader task Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-09 15:33:25 +08:00
zhenshan.cao	089c805e0a	enhance:Refactor hybrid search (#32020 ) issue: https://github.com/milvus-io/milvus/issues/25639 https://github.com/milvus-io/milvus/issues/31368 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-04-09 14:21:18 +08:00
yiwangdr	1cd15d9322	test: support segment release in integration test (#31190 ) issue: #29507 Notice that api_testonly.go files should be guarded by compiler tag `test`, so that production build rules don't compile them and these APIs don't get misused. Signed-off-by: yiwangdr <yiwangdr@gmail.com>	2024-04-09 11:39:17 +08:00
chyezh	a2502bde75	enhance: replica manager enhancement (#31496 ) issue: #30647 - ReplicaManager manage read only node now, and always do persistent of node distribution of replica. - All segment/channel checker using ReplicaManager to get read-only node or read-write node, but not ResourceManager. - ReplicaManager promise that only apply unique querynode to one replica in same collection now (replicas in same collection never hold same querynode at same time). - ReplicaManager promise that fairly node count assignment policy if multi replicas of collection is assigned to one resource group. - Move some parameters check into ReplicaManager to avoid data race. - Allow transfer replica to resource group that already load replica of same collection - Allow transfer node between resource groups that load replica of same collection --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-05 04:57:16 +08:00
congqixia	c2aad513c0	fix: Check collection nil before check load status (#31850 ) See also #31849 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-03 10:07:13 +08:00
congqixia	56e371c478	fix: Check replica exists before get latest leader (#31848 ) See also #31847 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-03 10:05:22 +08:00
wei liu	7471a8005f	fix: querycoord panic after node down (#31831 ) issue: #30519 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-03 10:03:22 +08:00
congqixia	0feee53631	enhance: Add back unit test for compactor and fix some TODOs (#31829 ) This PR adds back compactor "Unhandled" data type unit test and fixes some TODOs behvaior Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-02 20:35:14 +08:00
Bingyi Sun	91cb529ba6	fix: get latest collection info when checking index (#31744 ) issue: https://github.com/milvus-io/milvus/issues/31727 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-04-02 14:43:13 +08:00
wei liu	0944a1f790	enhance: Refactor channel dist manager interface (#31119 ) issue: #31091 This PR add GetByFilter interface in channel dist manager, instead of all kind of get func --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-02 10:23:14 +08:00
congqixia	16d869c57e	enhance: Add EmbedEtcd testutil and remove etcd dep of task pkg (#31802 ) See also #20478 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-02 09:59:14 +08:00
wei liu	bb500d66c7	fix: Remove segment from leader view can't be executed (#31663 ) issue: #31664 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-01 10:39:12 +08:00
wei liu	c311932d5f	fix: Update segment's version in leader task (#31643 ) issue: #31468 1. when segment's version in leader view doesn't match segment's version in dist, should update leader view 2. after call loadDeltalog, should update segment's load version with latest ts 3. change leader task's priority from high to low, to avoid leader task replace segment task and balance task --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-01 10:37:21 +08:00
wei liu	92971707de	enhance: Add restful api for devops to execute rolling upgrade (#29998 ) issue: #29261 This PR Add restful api for devops to execute rolling upgrade, including suspend/resume balance and manual transfer segments/channels. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-27 16:15:19 +08:00
wei liu	5d752498e7	fix: Skip release duplicate l0 segment (#31540 ) issue: #31480 #31481 release duplicate l0 segment task, which execute on old delegator may cause segment lack, and execute on new delegator may break new delegator's leader view. This PR skip release duplicate l0 segment by segment_checker, cause l0 segment will be released with unsub channel --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-27 12:53:10 +08:00
congqixia	8e5865f630	enhance: Save collection targets by batches (#31616 ) See also #28491 #31240 When colleciton number is large, querycoord saves collection target one by one, which is slow and may block querycoord exits. In local run, 500 collections scenario may lead to about 40 seconds saving collection targets. This PR changes the `SaveCollectionTarget` interface into batch one and organizes the collection in 16 per bundle batches to accelerate this procedure. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-27 00:09:08 +08:00
congqixia	73858b23bc	fix: Make target observer auto/manual task mutual exclusive (#31584 ) See also #30867 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-26 09:57:08 +08:00
wei liu	6438d65459	fix: Grow task stuck at stopping node (#31487 ) issue: #30816 this PR fix that grow task stuck at stopping node Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-25 18:57:07 +08:00
congqixia	4d2142d041	fix: Check latest leader exists before using it (#31500 ) See also #31495 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-22 18:25:07 +08:00
wei liu	03eaa5d478	fix: Load segment task promote failed (#31430 ) issue: #30816 pr #31319 introduce the logic that segment checker need to load level zero segment which only exist in current target. This PR fix load segment task promote failed when segment only belongs to current target --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-21 18:09:07 +08:00
chyezh	9f9ef8ac32	enhance: transfer resource group and dbname to querynode when load (#30936 ) issue: #30931 Signed-off-by: chyezh <chyezh@outlook.com>	2024-03-21 11:59:12 +08:00
wei liu	7c7375031d	enhance: Add metrics for task latency in querycoord scheduler (#31405 ) This PR add metrics for task latency in querycoord scheduler, so if any kind of task stuck, it's easy to figure out by metrics --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-20 19:29:06 +08:00
congqixia	a647b84f3e	enhance: Add AllPartitionsID const to replace InvalidPartitionID (#31438 ) "-1" as `InvalidPartitionID` previously used as All partition place holder in delete cases. It's confusing and hard to maintain when a const var has more than one meaning. This PR add `AllPartitionsID` to replace these usages in delete scenarios. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-20 19:01:05 +08:00
congqixia	c3d53eb1bf	enhance: Remove metrics when target removed (#31399 ) See also #31390 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-20 10:09:08 +08:00
congqixia	194a611814	enhance: Add metrics for querycoord current target cp lag (#31391 ) See also #31390 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-19 14:07:05 +08:00
wei liu	3e7e9f15cd	fix: Wrong behavior of CurrentTargetFirst/NextTargetFirst in target maanger (#31379 ) issue: #31162 when give scope CurrentTargetFirst/NextTargetFirst, it's expected to scan both current and next target. This PR fixed wrong behavior of CurrentTargetFirst/NextTargetFirst in target manager, which may cause unexpected task generated, and load collection may stuck forever due to dirty leader view. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-19 11:49:05 +08:00
wei liu	c26c1b33c2	fix: Transfer l0 segment to new delegator after balance (#31319 ) issue: #30186 during channel balance, after new delegator loaded, instead of syncing l0 segment's location to new delegator, we should load l0 segment on new delegator, and release the old l0 segment, then start to release old delegator. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-19 09:59:05 +08:00
wei liu	4dfdb1a443	fix: save current target after target observer stop (#31315 ) issue: #28491 should save target to meta store after target observer stop, incase of target changed Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-18 12:27:04 +08:00
wei liu	d79aa58b37	enhance: Speed up target recovery after query coord restart (#31240 ) issue: #28491 after querycoord restart, it will pull a new target, which include channel and segment list. when segments loaded on querynode has reached the target, the collection could provide search/query. but if segment list changes by time, ater querycoord pull a new target, it will takes a few minutes to catch up the target's segment distribution. and before that, query/search will fail due to lack of segments. This PR save the current loaded target to meta storein querycoord's stop progress, and recover it when query coord starts, to speed up the target recovery time. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-15 14:19:03 +08:00
chyezh	ff4237bb90	enhance: add hostname into node info (#30673 ) issue: https://github.com/milvus-io/milvus/issues/30647 - Address may be reused in k8s environment. Using hostname can be better. Signed-off-by: chyezh <chyezh@outlook.com>	2024-03-15 10:45:06 +08:00
jaime	db79be3ae0	fix: ctx cancel should be the last step while stopping server (#31220 ) issue: #31219 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-03-15 10:33:05 +08:00
congqixia	773c64ecbb	fix: Set nodeID when remove distribution (#31259 ) See also #30930 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-14 15:09:03 +08:00
wei liu	06b191b164	fix: Balance channel stuck forever due to logic dead lock (#31202 ) issue: #30816 cause balance channel will stuck until leader view catch up the current target, then start to unsub the old delegator. which make sure that the new delegator can provide search before release old delegator. but another logic in segment_checker skip loading segment during balance channel. so during balance channel, if query node crash, new delegator can't catch up target forever, then stuck forever. This PR remove the rule that skip loading segment during balance channel to avoid the logic dead lock here. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-13 15:05:04 +08:00
congqixia	5b51c20293	fix: Use `Remove` sync type for distribution removal (#31215 ) See also #31214 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-13 06:11:04 +08:00
wei liu	06df9b8462	fix: Balance segment/channel won't be trigger on multi replicas (#31107 ) issue: #30983 #30982 cause balancer call wrong interface to get segment/channel list in replica, then got a wrong average segment/channel number, which make each node have less segment/channel than average, and the balance won't be trigger in multi replica case. This PR fix that balance segment/channel won't be trigger on multi replicas Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-11 20:35:04 +08:00
wei liu	ddd918ba04	enhance: change frequency log to rated level (#31084 ) This PR change frequency log of check shard leader to rated level --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-08 16:39:02 +08:00
wei liu	efe8cecc88	enhance: refactor segment dist manager interface (#31073 ) issue: #31091 This PR add `GetByFilter` interface in segment dist manager, instead of all kind of get func Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-08 16:29:01 +08:00
wei liu	22df5061c1	fix: Leader checker can't update segment's load version (#31040 ) issue: #30890 when leader checker find that leader view has an older load version of segment, it will try to correct leader view. but the sync action doesn't specify the latest load version. so the update operation will failed. This PR fix leader checker can't update segment's load version and keeping generate same task to scheduler. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-08 11:57:01 +08:00

1 2 3 4 5 ...

561 Commits (463c47ced186c20ca680b4f1a382ee6dba9db421)