Commit Graph

90 Commits (89d95901c50ed595ba613b30a3169f4e36257356)

Author SHA1 Message Date
congqixia d635495885
fix: [2.3] Make coordinator `Register` not blocked on ProcessActiveStandby(#32069) (#32133)
Cherry-pick from master
pr: #32069
See also #32066

This PR make coordinator register successful and let
`ProcessActiveStandBy` run async. And roles may receive stop signal and
notify servers.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-04-11 17:33:21 +08:00
wei liu 0bf595a513
enhance: Speed up target recovery after query coord restart (#31240) (#31449)
issue: #28491
pr: #31240

after querycoord restart, it will pull a new target, which include
channel and segment list. when segments loaded on querynode has reached
the target, the collection could provide search/query. but if segment
list changes by time, ater querycoord pull a new target, it will takes a
few minutes to catch up the target's segment distribution. and before
that, query/search will fail due to lack of segments.

This PR save the current loaded target to meta storein querycoord's stop
progress, and recover it when query coord starts, to speed up the target
recovery time.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-22 10:27:17 +08:00
jaime 5ddb0b435f
fix: revoke session may be ignored due to server context cancellation in advance (#31213)
issue: #31219
pr: #31220

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-03-14 19:05:04 +08:00
SimFG ef84d40e54
enhance: [2.3] make the watch dm channel request better compatibility (#30954)
pr: #30952
issue: https://github.com/milvus-io/milvus/issues/30938

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-03-01 16:09:01 +08:00
chyezh 77e123762f
enhance: add graceful stop timeout to avoid node stop hang under extreme cases (#30320)
1. add coordinator and proxy graceful stop timeout to 5s.
3. add other work node graceful stop timeout to 900s, and we should
potentially change this to 600s when graceful stop is smooth
4. change the order of datacoord component while stop.
5. `LivenessCheck` do not perform graceful shutdown now. 

issue: https://github.com/milvus-io/milvus/issues/30310
pr: #30317
also see: https://github.com/milvus-io/milvus/pull/30306

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-01-27 08:45:02 +08:00
congqixia 9e8eb2aa51
fix: Revert leader checker related check (#30262)
See also #30150
PR reverted: #29984 #30152

Currently this scenario could not be covered by ut/it/e2e test cases
Revert it for now

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-25 12:39:02 +08:00
wei liu 7d73032582
enhance: refactor leader_observer to leader_checker (#29454) (#29984)
issue: #29453
pr: #29452
sync distribution by rpc will also call loadSegment/releaseSegment,
which may cause all kinds of concurrent case on same segment, such as
concurrent load and release on one segment.
This PR add leader_checker which generate load/release task to correct
the leader view, instead of calling sync distribution by rpc

---------

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-18 14:08:54 +08:00
wei liu 26b1853c54
fix: Auto balance param can't be updated by dynamic(#29501) (#29502)
pr: #29501
This PR fixed that auto balance param can't be updated by dynamic

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-12-27 14:30:53 +08:00
SimFG 74e72ce27e
enhance: [2.3] Support to get the param value in the runtime (#29298)
pr: #29297
/kind improvement

Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-12-21 20:36:43 +08:00
MrPresent-Han 5f4ac437b2
enhance: [Cherry-pick] Moving etcd client into session (#27069) (#28996)
relate: #26694
pr: https://github.com/milvus-io/milvus/pull/27069

Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
Co-authored-by: Filip Haltmayer <81822489+filip-halt@users.noreply.github.com>
2023-12-07 16:22:34 +08:00
jaime 9378f78218
enhance: Add logs for each step during service initialization (#28687)
/kind improvement
pr: #28624

Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-11-27 17:54:26 +08:00
congqixia 6512b12fba
enhance: [cherry-pick] Make etcd kv request timeout configurable (#28661) (#28701)
Cherry-pick from master
pr: #28661
See also #28660
This pr add request timeout config item for etcd kv request timeout
 Sync the default timeout value to same value for etcdKV & tikv config

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-24 21:16:26 +08:00
wei liu d3f149c403
fix unstable auto balance config ut (#28289)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-09 10:02:19 +08:00
yah01 385507ce47
Fix the target updated before version updated to cause data missing (#28257)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-08 18:54:18 +08:00
wei liu 918333817e
Disable auto balance when old node exists (#28191) (#28224)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-08 07:10:17 +08:00
wei liu 87e8d04ed7
fix sync distribution with wrong version (#28130) (#28170)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-06 11:38:18 +08:00
wei liu 178db7b0f0
check stopping node during start qc (#27859)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-10-24 12:20:11 +08:00
jaime ec1fe3549e
Add a stop hook to clean session (#27564)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-10-16 10:24:10 +08:00
yah01 be980fbc38
Refine state check (#27541)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-11 21:01:35 +08:00
yah01 a8ce1b6686
Refine QueryCoord stopping (#27371)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-27 16:27:27 +08:00
jaime 7f7c71ea7d
Decoupling client and server API in types interface (#27186)
Co-authored-by:: aoiasd <zhicheng.yue@zilliz.com>

Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-09-26 09:57:25 +08:00
SimFG 26f06dd732
Format the code (#27275)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
yiwangdr 337edc321b
tikv integration (#26246)
Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2023-09-07 07:25:14 +08:00
SimFG 28681276e2
Improve the retry of the rpc client (#26795)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-06 17:43:14 +08:00
yah01 3349db4aa7
Refine errors to remove changes breaking design (#26521)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-04 09:57:09 +08:00
wei liu 949c320185
remove pull target from qc recover (#26775)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-09-01 11:17:01 +08:00
yihao.dai 63b86b32a6
Add server id validation interceptor (#26395)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-08-17 20:20:20 +08:00
Enwei Jiao 7d61355ab0
Refactor log for Query (#26310)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-08-14 18:57:32 +08:00
Bingyi Sun a3e22786ed
Move meta store to kv catalog (#25915)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-07-31 13:57:04 +08:00
congqixia 1045c88102
Support replace indexed field in QueryCoord (#25747)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-07-19 21:22:58 +08:00
wei liu 68ae199a9f
load segment with target version, avoid read redundant segment (#24929)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-06-27 11:48:45 +08:00
xige-16 33c2012675
Add more metrics (#25081)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-06-26 17:52:44 +08:00
Bingyi Sun b88e74a109
Fix querycoord close error (#25034)
Signed-off-by: sunby <bingyi.sun@zilliz.com>
Co-authored-by: sunby <bingyi.sun@zilliz.com>
2023-06-21 11:02:42 +08:00
congqixia 41af0a98fa
Use go-api/v2 for milvus-proto (#24770)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-06-09 01:28:37 +08:00
Jiquan Long 30415e1b83
Fix metric QueryCoordNumCollections (#24053) (#24107)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-05-15 16:33:22 +08:00
congqixia ed81eaa963
Make CollectionObserver trigger checker more frequently during load procedure (#23928)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-05-08 14:06:41 +08:00
Xiaofan 87d790f052
Fix upgrade casue panic (#23833)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-05-02 14:06:37 +08:00
wei liu 1deac692a0
fix nodeup block (#23634)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-04-25 19:20:37 +08:00
wei liu 4336ed8609
fix node up (#23415)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-04-20 09:52:31 +08:00
cai.zhang 43a9e175a3
Exit component process when session key is deleted (#21658) (#22164)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-04-12 20:12:28 +08:00
Xiaofan 680ad482b7
Check balance checker chore to 10s (#23304)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-04-09 16:14:32 +08:00
jaime c9d0c157ec
Move some modules from internal to public package (#22572)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-04-06 19:14:32 +08:00
MrPresent-Han afd874b736
enhance segment balance by considering global rowCount(##22914) (#23056)
Signed-off-by: MrPresent-Han <jamesharden11122@gmail.com>
Co-authored-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-04-03 14:16:25 +08:00
yah01 75737c65ac
Refine error handle of QueryCoord (#23068)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-31 10:54:29 +08:00
zhenshan.cao 1287ca699a
Refine usage of TimeRecorder.Record (#23142)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2023-03-30 18:56:22 +08:00
yah01 081572d31c
Refactor QueryNode (#21625)
Signed-off-by: yah01 <yang.cen@zilliz.com>
Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: aoiasd <zhicheng.yue@zilliz.com>
2023-03-27 00:42:00 +08:00
yihao.dai 1f718118e9
Dynamic load/release partitions (#22655)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-03-20 14:55:57 +08:00
SimFG b57e476089
Fix the nil point about the session (#22748)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-03-14 20:07:54 +08:00
wei liu c162c6ecc0
fix assign node err (#22479)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-03-01 11:11:47 +08:00
Enwei Jiao 697dedac7e
Use cockroachdb/errors to replace other error pkg (#22390)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-02-26 11:31:49 +08:00