Commit Graph

87 Commits (f7a59766df42c7e448e841920625fed4453e2e8c)

Author SHA1 Message Date
wei liu 92971707de
enhance: Add restful api for devops to execute rolling upgrade (#29998)
issue: #29261
This PR Add restful api for devops to execute rolling upgrade, including
suspend/resume balance and manual transfer segments/channels.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-27 16:15:19 +08:00
wei liu 5d752498e7
fix: Skip release duplicate l0 segment (#31540)
issue: #31480 #31481

release duplicate l0 segment task, which execute on old delegator may
cause segment lack, and execute on new delegator may break new
delegator's leader view.

This PR skip release duplicate l0 segment by segment_checker, cause l0
segment will be released with unsub channel

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-27 12:53:10 +08:00
congqixia 4d2142d041
fix: Check latest leader exists before using it (#31500)
See also #31495

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-22 18:25:07 +08:00
chyezh 9f9ef8ac32
enhance: transfer resource group and dbname to querynode when load (#30936)
issue: #30931

Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-21 11:59:12 +08:00
wei liu c26c1b33c2
fix: Transfer l0 segment to new delegator after balance (#31319)
issue: #30186

during channel balance, after new delegator loaded, instead of syncing
l0 segment's location to new delegator, we should load l0 segment on new
delegator, and release the old l0 segment, then start to release old
delegator.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-19 09:59:05 +08:00
chyezh ff4237bb90
enhance: add hostname into node info (#30673)
issue: https://github.com/milvus-io/milvus/issues/30647

- Address may be reused in k8s environment. Using hostname can be
better.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-15 10:45:06 +08:00
wei liu 06b191b164
fix: Balance channel stuck forever due to logic dead lock (#31202)
issue: #30816

cause balance channel will stuck until leader view catch up the current
target, then start to unsub the old delegator. which make sure that the
new delegator can provide search before release old delegator. but
another logic in segment_checker skip loading segment during balance
channel. so during balance channel, if query node crash, new delegator
can't catch up target forever, then stuck forever.

This PR remove the rule that skip loading segment during balance channel
to avoid the logic dead lock here.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-13 15:05:04 +08:00
wei liu ddd918ba04
enhance: change frequency log to rated level (#31084)
This PR change frequency log of check shard leader to rated level

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-08 16:39:02 +08:00
wei liu efe8cecc88
enhance: refactor segment dist manager interface (#31073)
issue: #31091
This PR add `GetByFilter` interface in segment dist manager, instead of
all kind of get func

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-08 16:29:01 +08:00
wei liu 22df5061c1
fix: Leader checker can't update segment's load version (#31040)
issue: #30890

when leader checker find that leader view has an older load version of
segment, it will try to correct leader view. but the sync action doesn't
specify the latest load version. so the update operation will failed.

This PR fix leader checker can't update segment's load version and
keeping generate same task to scheduler.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-08 11:57:01 +08:00
wei liu 2a047103d6
fix: Dirty sealed segment won't release after channel balance (#31095)
issue: #31074
This PR fix dirty sealed segment doesn't release after channel balance,
dirty sealed segment means segment doesn't exist in targets.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-07 16:23:01 +08:00
wei liu 6dd7297178
fix: Skip generate balance task when target not ready (#30724)
issue: #30723

This PR skip generate balance task when collection's target isn't ready.
also refine the check stale logic in query coord's scheduler, if channel
exist in current or next target, task won't be canceled.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-02-23 10:32:53 +08:00
congqixia 7b91fa3db8
fix: Make leader checker generate leader task instead of segment task (#30258)
See also #30150

For leader view distribution with offline nodes, a release task can
never be sent to querynode due to targetNode online check logic. Even
the request is dispatched, normal release task does not have "force"
flag when calling `delegator.ReleaseSegment`.

This PR adds a new type of querycoord task: LeaderTask, the
responsibility of which is to rectify leader view distribtion.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-21 11:08:51 +08:00
aoiasd f84d9a589a
fix: channel checker reduce balancing channels. (#30087)
Ignore leader unavailable when channel checker judge repeat channel to
avoid channel checker remove channels balancing.
relate: https://github.com/milvus-io/milvus/issues/29841
https://github.com/milvus-io/milvus/issues/29838

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-01-26 10:59:00 +08:00
wei liu f69f65ff68
fix: Leader checker can't remove segment from leader view (#30151)
issue: #30150

This PR fix three problems:
1. leader checker use wrong node id when generate release task, which
cause the release task finished immediately
2. the release request generated by leader_checker doesn't set the
`force` flag, the operation to clean leader view on delegator will fail.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-20 18:58:58 +08:00
wei liu f8695aef9d
fix: Trigger leader checker too frequency (#29991)
issue: #29841
This PR fix leader checker use wrong check interval, which causes leader
checker trigger too frequency

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-17 19:40:53 +08:00
smellthemoon 595ec2559c
enhance: change some frequent log level (#29953)
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-01-14 10:19:16 +08:00
wei liu 565fc3a019
enhance: Skip generate load segment task (#29724)
issue: #29814
if channel is not subscribed yet, the generated load segment task will
be remove from task scheduler due to the load segment task need to be
transfer to worker node by shard leader.

This PR skip generate load segment task when channel is not subscribed
yet.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-12 18:56:58 +08:00
congqixia c4ddfff2a7
enhance: make Load process traceable in querycoord (#29806)
See also #29803

This PR:
- Add trace span for collection/partition load
- Use TraceSpan to generate Segment/ChannelTasks when loading
- Refine BaseTask trace tag usage

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-10 09:58:49 +08:00
congqixia b5f039a221
fix: Assertion all async invocations in test case (#29737)
Resolves: #29736

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-07 15:54:47 +08:00
wei liu e98c62abbb
enhance: refactor leader_observer to leader_checker (#29454)
issue: #29453 

sync distribution by rpc will also call loadSegment/releaseSegment,
which may cause all kinds of concurrent case on same segment, such as
concurrent load and release on one segment.
This PR add leader_checker which generate load/release task to correct
the leader view, instead of calling sync distribution by rpc

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-05 15:54:55 +08:00
Bingyi Sun 10bb2431d8
test: add checker unittests (#28954)
issue: https://github.com/milvus-io/milvus/issues/28610

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-12-05 10:56:33 +08:00
Bingyi Sun 45e6801ce4
feat: Add checker activation service interfaces (#28850)
issue: #28610

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-12-04 17:38:37 +08:00
Bingyi Sun 8514a39d1a
feat: Add checker activation (#28611)
issue: https://github.com/milvus-io/milvus/issues/28610

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-11-24 18:08:24 +08:00
wei liu 86ec6f4832
fix load index for stopping node (#28047)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-03 07:58:18 +08:00
congqixia 5d2eba2c2f
Set qcv2 index task priority to Low (#28117)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-02 23:22:17 +08:00
congqixia 852be152de
Change task sourceID to stringer interface (#27965)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-10-27 01:08:12 +08:00
wei liu e0222b2ce3
refine target manager code style (#27883)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-10-25 00:44:12 +08:00
yah01 635efdf170
Schedule loading L0 segments first (#27593)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-19 11:14:06 +08:00
congqixia cd5f03f80c
Add var-name sub linter in revive (#27424)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-10-07 10:09:31 +08:00
yah01 a8ce1b6686
Refine QueryCoord stopping (#27371)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-27 16:27:27 +08:00
SimFG 26f06dd732
Format the code (#27275)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
congqixia c45c32fad4
Set task reason for collection released (#26962)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-09-10 15:15:17 +08:00
MrPresent-Han 2101f2d289
fix unstable checker id due to go map iteration(#26943) (#26944)
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-09-10 10:11:16 +08:00
congqixia 758aad705d
Fix checker using default interval after manual check (#26953)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-09-09 08:29:16 +08:00
Enwei Jiao fb0705df1b
Decouple basetable and componentparam (#26725)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-05 10:31:48 +08:00
wei liu 5602b22531
refine checker code style (#26759)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-09-01 11:57:01 +08:00
wei liu 949c320185
remove pull target from qc recover (#26775)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-09-01 11:17:01 +08:00
congqixia 065d1a962e
Add sourceID output for task.String and fill reduce channel reason (#26447)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-08-18 13:50:19 +08:00
yah01 889424b3f9
Fix load index with empty file list (#26236)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-08-09 18:39:16 +08:00
wei liu 6f89620a43
remove pull target rpc from lock (#26054)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-08-04 10:31:06 +08:00
Bingyi Sun a3e22786ed
Move meta store to kv catalog (#25915)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-07-31 13:57:04 +08:00
wei liu 1748c54fd7
skip load/release segment when more than one delegator exist (#25718)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-07-24 19:01:01 +08:00
congqixia 76e03fe6d3
Set reason for balance, index checker generated tasks (#25865)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-07-24 17:07:00 +08:00
congqixia 1045c88102
Support replace indexed field in QueryCoord (#25747)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-07-19 21:22:58 +08:00
wei liu 6534396b3d
enable config different interval for different checker (#25514)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-07-19 16:50:57 +08:00
Bingyi Sun 5afea0e5bf
Fix querycoord crash (#25638)
Signed-off-by: sunby <bingyi.sun@zilliz.com>
Co-authored-by: sunby <bingyi.sun@zilliz.com>
2023-07-17 16:56:35 +08:00
congqixia 7d00020c9e
Reduce DataScope to historical for segment release task (#25489)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-07-12 09:12:28 +08:00
congqixia b68fa2049b
Fix querycoord segment checker nil reference (#25290)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-07-03 18:48:24 +08:00
wei liu 68ae199a9f
load segment with target version, avoid read redundant segment (#24929)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-06-27 11:48:45 +08:00