milvus

Commit Graph

Author	SHA1	Message	Date
congqixia	9539739781	enhance: Release compacted growing segment if in dropped list (#37245 ) See also #37205 Previously releasing growing segments could be triggered by two conditions: - Sealed Segment with same id is loaded - Segment start position is before target checkpoint ts Which has a worst case that the corresponding sealed segment is compacted and the checkpoint is pinned by a growing l0 segment. This PR introduces a new rule that: a growing segment could be released if the segment id appeared in current target dropped segment id list. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-10-29 18:04:21 +08:00
jaime	9d16b972ea	feat: add tasks page into management WebUI (#37002 ) issue: #36621 1. Add API to access task runtime metrics, including: - build index task - compaction task - import task - balance (including load/release of segments/channels and some leader tasks on querycoord) - sync task 2. Add a debug model to the webpage by using debug=true or debug=false in the URL query parameters to enable or disable debug mode. Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-10-28 10:13:29 +08:00
wei liu	39a91eb100	fix: Delegator may becomes unserviceable after querycoord restart (#37055 ) issue: #37054 after querycoord restart, segment_checker may release segment by mistake due to next target isn't ready yet. This PR requires release segment must happens after next target is ready. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-10-24 12:21:28 +08:00
wei liu	3cd0b26285	enhance: Enable dynamic update loaded collection's replica (#35822 ) issue: #35821 After collection loaded, if we need to increase/decrease collection's replica, we need to release and load it again. milvus offers 4 solution to update loaded collection's replica, this PR aims to dynamic change the replica number without release, and after replica number changed, milvus will execute load replica or release replica in async, and the replica loaded status can be checked by getReplicas API. Notice that if set too much replicas than querynode can afford，the new replica won't be loaded successfully until enough querynode joins. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-25 10:13:18 +08:00
wei liu	fb2a41a94c	fix: Clean dirty segment/channel on querynode (#36202 ) issue: #36201 after querynode has been remove from replica, all dirty segment/channel on it should be released. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-13 18:15:08 +08:00
wei liu	30a99b66c1	fix: Fix logic dead lock when delegator has high memory usage (#36065 ) issue: #36064 when delegator has high memory usage, load l0 segment will failed. and balance segment task will blocked by load segment task, then delegator cann't free memory by moving out some segment, causes a logic dead lock. this PR remove the limit for balance, we permit segment and balance execute in parallel. which won't cause side effect due to: 1. one segment can only has one task in qc's scheduler, and load/release task will replace balance task if necessary 2. balance speed has been limited, and it won't block load segment task. 3. if collection has load task and balance task at same time, load task will be scheduled first due to high proirity. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-09 10:21:06 +08:00
wei liu	c84ea5465c	fix: Fix some replicas don't participate in the query after the failure recovery (#35850 ) issue: #35846 querycoord will notify proxy to update shard leader cache after delegator location changes, but during querynode's failure recovery, some delegator may become unserviceable due to lacking of segments, and back to serviceable after segment loaded, so we also need to notify proxy to invalidate shard leader cache when delegator serviceable state changes. This PR will maintain querynode's serviceable state during heartbeat, and notify proxy to invalidate shard leader cache if serviceable state changes. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-03 15:39:03 +08:00
congqixia	86691656f3	enhance: Change frequent balancer debug log to rated one (#35749 ) "skip balance" log is too frequent in debug level. This PR changes it into rated on. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-08-29 10:07:00 +08:00
Chun Han	3faef63a25	enhance: add log for partition stats( #30376 ) (#35219 ) related: #30376 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2024-08-02 19:34:22 +08:00
wei liu	166fc902b0	enhance: Limit collection's normal balance speed (#34810 ) issue: #34798 after we remove the task priority on query coord, to avoid load/release segment blocked by too much balance task, we limit the balance task size in each round. at same time, we reduce the balance interval to trigger balance more frequently. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-24 19:11:44 +08:00
wei liu	40e39ef7c9	fix: Avoid segment lack caused by deduplicate segment task (#34782 ) issue: #34781 when balance segment hasn't finished yet, query coord may found 2 loaded copy of segment, then it will generate task to deduplicate, which may cancel the balance task. then the old copy has been released, and the new copy hasn't be ready yet but canceled, then search failed by segment lack. this PR set deduplicate segment task's proirity to low, to avoid balance segment task canceled by deduplicate task. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-22 16:35:43 +08:00
congqixia	b284b81a47	fix: Check partition in current target when observing partition load status (#34282 ) See also #34234 `LoadPartitions` does not guarantee the current target has loading partitions if there are some partitions already loaded before. This PR check current target contains the partition to load when advancing loading percentage to 100. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-01 17:40:07 +08:00
wei liu	f7ecafe77d	enhance: Skip update index for L0 segment (#34099 ) try to update index for l0 segment, will failed by `index not found` This PR skip update index for l0 segment Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-01 10:26:06 +08:00
jaime	9630974fbb	enhance: move rocksmq from internal to pkg module (#33881 ) issue: #33956 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-06-25 21:18:15 +08:00
Chun Han	f7af323d1e	fix: sync partitiion stats blocking balance task(#33741 ) (#33742 ) related: #33741 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-06-11 14:21:56 +08:00
wayblink	a1232fafda	feat: Major compaction (#33620 ) #30633 Signed-off-by: wayblink <anyang.wang@zilliz.com> Co-authored-by: MrPresent-Han <chun.han@zilliz.com>	2024-06-10 21:34:08 +08:00
wei liu	2013d97243	enhance: Enable to dynamic update balancer policy in querycoord (#33037 ) issue: #33036 This PR enable to dynamic update balancer policy without restart querycoord. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-21 14:29:39 +08:00
wei liu	a7f6193bfc	fix: query node may stuck at stopping progress (#33104 ) issue: #33103 when try to do stopping balance for stopping query node, balancer will try to get node list from replica.GetNodes, then check whether node is stopping, if so, stopping balance will be triggered for this replica. after the replica refactor, replica.GetNodes only return rwNodes, and the stopping node maintains in roNodes, so balancer couldn't find replica which contains stopping node, and stopping balance for replica won't be triggered, then query node will stuck forever due to segment/channel doesn't move out. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-20 10:21:38 +08:00
wei liu	e2332bdc17	enhance: Enable channel exclusive balance policy (#32911 ) issue: #32910 * split replica's node list to channels when create replicas * balance nodes among channels when node change happens * implement channel level balance, let balance happens in channel level Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-10 17:27:31 +08:00
wei liu	fad8f0afa5	enhance: enable stopping balance after balance has been suspended (#32812 ) issue: #32811 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-08 10:15:29 +08:00
wei liu	ba02d54a30	enhance: update shard leader cache when leader location changed (#32470 ) issue: #32466 this PR enhance that when shard location changed, update proxy's shard leader cache. in case of query node failover case, proxy can find replica recover --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-08 10:05:29 +08:00
Xiaofan	02ace25c68	enhance: reduce the cpu usage when collection number is high (#32245 ) related to #32165 1. for all the manager, support collection level index 2. remove collection level filter to avoid extra cpu usage when collection number increases Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2024-04-26 11:49:25 +08:00
wei liu	4822b109bd	fix: Skip to load l0 segment on old version query node (#32124 ) issue: #32107 during rolling upgrade progress, skip to load l0 segment on old version query node --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-15 11:23:23 +08:00
chyezh	48fe977a9d	enhance: declarative resource group api (#31930 ) issue: #30647 - Add declarative resource group api - Add config for resource group management - Resource group recovery enhancement --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-15 08:13:19 +08:00
wei liu	c4806b69c4	enhance: Refactor leader view manager interface (#31133 ) issue: #31091 This PR add GetByFilter interface in leader view manager, instead of all kind of get func --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-10 15:13:36 +08:00
wei liu	177ddda47f	fix: Check stale should check leader task's leader id (#31962 ) issue: #30816 check stale rules for leader task: 1. for reduce leader task, it should keep executing until leader's node become offline. 2. for grow leader task,it should keep executing until leader's node become stopping. This PR check leader node's stopping state for grow leader task Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-09 15:33:25 +08:00
chyezh	a2502bde75	enhance: replica manager enhancement (#31496 ) issue: #30647 - ReplicaManager manage read only node now, and always do persistent of node distribution of replica. - All segment/channel checker using ReplicaManager to get read-only node or read-write node, but not ResourceManager. - ReplicaManager promise that only apply unique querynode to one replica in same collection now (replicas in same collection never hold same querynode at same time). - ReplicaManager promise that fairly node count assignment policy if multi replicas of collection is assigned to one resource group. - Move some parameters check into ReplicaManager to avoid data race. - Allow transfer replica to resource group that already load replica of same collection - Allow transfer node between resource groups that load replica of same collection --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-05 04:57:16 +08:00
Bingyi Sun	91cb529ba6	fix: get latest collection info when checking index (#31744 ) issue: https://github.com/milvus-io/milvus/issues/31727 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-04-02 14:43:13 +08:00
wei liu	0944a1f790	enhance: Refactor channel dist manager interface (#31119 ) issue: #31091 This PR add GetByFilter interface in channel dist manager, instead of all kind of get func --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-02 10:23:14 +08:00
wei liu	bb500d66c7	fix: Remove segment from leader view can't be executed (#31663 ) issue: #31664 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-01 10:39:12 +08:00
wei liu	c311932d5f	fix: Update segment's version in leader task (#31643 ) issue: #31468 1. when segment's version in leader view doesn't match segment's version in dist, should update leader view 2. after call loadDeltalog, should update segment's load version with latest ts 3. change leader task's priority from high to low, to avoid leader task replace segment task and balance task --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-01 10:37:21 +08:00
wei liu	92971707de	enhance: Add restful api for devops to execute rolling upgrade (#29998 ) issue: #29261 This PR Add restful api for devops to execute rolling upgrade, including suspend/resume balance and manual transfer segments/channels. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-27 16:15:19 +08:00
wei liu	5d752498e7	fix: Skip release duplicate l0 segment (#31540 ) issue: #31480 #31481 release duplicate l0 segment task, which execute on old delegator may cause segment lack, and execute on new delegator may break new delegator's leader view. This PR skip release duplicate l0 segment by segment_checker, cause l0 segment will be released with unsub channel --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-27 12:53:10 +08:00
congqixia	4d2142d041	fix: Check latest leader exists before using it (#31500 ) See also #31495 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-22 18:25:07 +08:00
chyezh	9f9ef8ac32	enhance: transfer resource group and dbname to querynode when load (#30936 ) issue: #30931 Signed-off-by: chyezh <chyezh@outlook.com>	2024-03-21 11:59:12 +08:00
wei liu	c26c1b33c2	fix: Transfer l0 segment to new delegator after balance (#31319 ) issue: #30186 during channel balance, after new delegator loaded, instead of syncing l0 segment's location to new delegator, we should load l0 segment on new delegator, and release the old l0 segment, then start to release old delegator. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-19 09:59:05 +08:00
chyezh	ff4237bb90	enhance: add hostname into node info (#30673 ) issue: https://github.com/milvus-io/milvus/issues/30647 - Address may be reused in k8s environment. Using hostname can be better. Signed-off-by: chyezh <chyezh@outlook.com>	2024-03-15 10:45:06 +08:00
wei liu	06b191b164	fix: Balance channel stuck forever due to logic dead lock (#31202 ) issue: #30816 cause balance channel will stuck until leader view catch up the current target, then start to unsub the old delegator. which make sure that the new delegator can provide search before release old delegator. but another logic in segment_checker skip loading segment during balance channel. so during balance channel, if query node crash, new delegator can't catch up target forever, then stuck forever. This PR remove the rule that skip loading segment during balance channel to avoid the logic dead lock here. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-13 15:05:04 +08:00
wei liu	ddd918ba04	enhance: change frequency log to rated level (#31084 ) This PR change frequency log of check shard leader to rated level --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-08 16:39:02 +08:00
wei liu	efe8cecc88	enhance: refactor segment dist manager interface (#31073 ) issue: #31091 This PR add `GetByFilter` interface in segment dist manager, instead of all kind of get func Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-08 16:29:01 +08:00
wei liu	22df5061c1	fix: Leader checker can't update segment's load version (#31040 ) issue: #30890 when leader checker find that leader view has an older load version of segment, it will try to correct leader view. but the sync action doesn't specify the latest load version. so the update operation will failed. This PR fix leader checker can't update segment's load version and keeping generate same task to scheduler. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-08 11:57:01 +08:00
wei liu	2a047103d6	fix: Dirty sealed segment won't release after channel balance (#31095 ) issue: #31074 This PR fix dirty sealed segment doesn't release after channel balance, dirty sealed segment means segment doesn't exist in targets. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-07 16:23:01 +08:00
wei liu	6dd7297178	fix: Skip generate balance task when target not ready (#30724 ) issue: #30723 This PR skip generate balance task when collection's target isn't ready. also refine the check stale logic in query coord's scheduler, if channel exist in current or next target, task won't be canceled. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-02-23 10:32:53 +08:00
congqixia	7b91fa3db8	fix: Make leader checker generate leader task instead of segment task (#30258 ) See also #30150 For leader view distribution with offline nodes, a release task can never be sent to querynode due to targetNode online check logic. Even the request is dispatched, normal release task does not have "force" flag when calling `delegator.ReleaseSegment`. This PR adds a new type of querycoord task: LeaderTask, the responsibility of which is to rectify leader view distribtion. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-02-21 11:08:51 +08:00
aoiasd	f84d9a589a	fix: channel checker reduce balancing channels. (#30087 ) Ignore leader unavailable when channel checker judge repeat channel to avoid channel checker remove channels balancing. relate: https://github.com/milvus-io/milvus/issues/29841 https://github.com/milvus-io/milvus/issues/29838 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-01-26 10:59:00 +08:00
wei liu	f69f65ff68	fix: Leader checker can't remove segment from leader view (#30151 ) issue: #30150 This PR fix three problems: 1. leader checker use wrong node id when generate release task, which cause the release task finished immediately 2. the release request generated by leader_checker doesn't set the `force` flag, the operation to clean leader view on delegator will fail. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-01-20 18:58:58 +08:00
wei liu	f8695aef9d	fix: Trigger leader checker too frequency (#29991 ) issue: #29841 This PR fix leader checker use wrong check interval, which causes leader checker trigger too frequency Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-01-17 19:40:53 +08:00
smellthemoon	595ec2559c	enhance: change some frequent log level (#29953 ) Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-01-14 10:19:16 +08:00
wei liu	565fc3a019	enhance: Skip generate load segment task (#29724 ) issue: #29814 if channel is not subscribed yet, the generated load segment task will be remove from task scheduler due to the load segment task need to be transfer to worker node by shard leader. This PR skip generate load segment task when channel is not subscribed yet. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-01-12 18:56:58 +08:00
congqixia	c4ddfff2a7	enhance: make Load process traceable in querycoord (#29806 ) See also #29803 This PR: - Add trace span for collection/partition load - Use TraceSpan to generate Segment/ChannelTasks when loading - Refine BaseTask trace tag usage --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-10 09:58:49 +08:00

1 2 3

118 Commits (a2ecff1fb71c5a1228a76cf761db57c5beccbc64)