congqixia
25a1c9ecf0
fix: Make coordinator `Register` not blocked on ProcessActiveStandby ( #32069 )
...
See also #32066
This PR make coordinator register successful and let
`ProcessActiveStandBy` run async. And roles may receive stop signal and
notify servers.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-04-10 18:49:18 +08:00
congqixia
357fe814ce
fix: Remove unnecessary deleteSession operation ( #31647 )
...
See also #31628
The `Revoke` operation shall delete all keys related to the lease
attaching to. This `deleteSession` operation may also remove the session
key in next epoch by mistake and cause chaos session status
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-29 13:57:11 +08:00
jaime
db79be3ae0
fix: ctx cancel should be the last step while stopping server ( #31220 )
...
issue: #31219
Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-03-15 10:33:05 +08:00
yiwangdr
32cff25f97
enhance: decrease coordinator init time ( #29822 )
...
This PR mainly improve two items:
1. Target observer should refresh loading status during init time. An
uninitialized loading status blocks search/query. Currently, the target
observer refreshes every 10 seconds, i.e. we'd need to wait for 10s for
no reason. That's also the reason why we constantly see false log
"collection unloaded" upon mixcoord restarts.
2. Delete session when service is stopped. So that the new service
doesn't need to wait for the previous session to expire (~10s).
Item 1 is the major improvement of this PR, which should speed up init
time by 10s.
Item 2 is not a big concern in most cases as coordinators usually shut
down after stop(). In those cases, coordinator restart triggers serverID
change which further triggers an existing logic that deletes expired
session. This PR only fixes rare cases where serverID doesn't change.
integration test:
`go test -tags dynamic -v -coverprofile=profile.out -covermode=atomic
tests/integration/coordrecovery/coord_recovery_test.go -timeout=20m`
Performance after the change:
Average init time of coordinators: 10s
Hardware: M2 Pro
Test setup: 1000 collections with 1000 rows (dim=128) per collection.
issue: #29409
Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2024-02-05 14:00:12 +08:00
smellthemoon
1c1f2a1371
enhance:change some logs ( #29579 )
...
related #29588
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-01-05 16:12:48 +08:00
wei liu
5b45a138b1
disable auto balance when old node exists ( #28191 )
...
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-07 14:02:20 +08:00
Xiaofan
da19e49daf
Support purge old session for standalone ( #28184 )
...
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-11-06 21:21:42 +08:00
wei liu
ecec5dfcfd
fix retry on offline node ( #28079 )
...
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-03 10:14:16 +08:00
yah01
9658367a3c
Refine chunk manager errors ( #27590 )
...
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-31 12:18:15 +08:00
Filip Haltmayer
6b1a106a31
Moving etcd client into session ( #27069 )
...
Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
2023-10-27 07:36:12 +08:00
SimFG
9b0ecbdca7
Support to replicate the mq message ( #27240 )
...
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-10-20 14:26:09 +08:00
jaime
ac2d1bb5c2
Support receive signals from parent process ( #27756 )
...
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-10-18 20:20:11 +08:00
congqixia
2f201c25e2
Remove deprecated io/ioutil usage ( #27747 )
...
`io/ioutil` package is deprecated, use `io`,`os` package replacement
also added golangci-lint rule to block future reference
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: guoguangwu <guoguangwu@magic-shield.com>
2023-10-17 20:32:09 +08:00
jaime
ec1fe3549e
Add a stop hook to clean session ( #27564 )
...
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-10-16 10:24:10 +08:00
Jiquan Long
e4f73cc805
Add host & enable_disk to session ( #27507 )
...
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-10-08 20:05:31 +08:00
Jiquan Long
5c1abfa2cc
Print the server id when active-standby switch ( #27119 )
...
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-10-07 10:01:31 +08:00
Jiquan Long
0f14d18201
Optimize the codec code of session ( #27360 )
...
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-10-01 10:33:30 +08:00
SimFG
c9653b1683
Add some log and improve TestSessionProcessActiveStandBy test case ( #27403 )
...
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-28 09:35:27 +08:00
foxspy
5db4a0489e
dynamic index version control ( #27335 )
...
Co-authored-by: longjiquan <jiquan.long@zilliz.com>
2023-09-25 21:39:27 +08:00
wei liu
9433a24f5d
fix component not exit when liveness check failed ( #27236 )
...
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-09-22 19:13:25 +08:00
SimFG
26f06dd732
Format the code ( #27275 )
...
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
congqixia
16b35e07b3
Fix `TestSessionSuite/TestKeepAliveRetryActiveCancel` unit test logic ( #27231 )
...
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-09-20 18:59:23 +08:00
congqixia
f0d0651989
Do not reset connection immediately if grpc code is `Canceled` or `DeadlineExceeded` ( #27014 )
...
We found lots of connection reset & canceled due to recent retry change
Current implementation resets connection no matter what the error code is
To sync behavior to previous retry, skip reset connection only if cancel error happens too much.
Also adds a config item for minResetInterval for grpc reset connection
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-09-13 15:01:18 +08:00
congqixia
adfb5298c6
Refine `TestSessionProcessActiveStandBy` unit test logic ( #26980 )
...
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-09-11 18:13:17 +08:00
wei liu
0e2085b77f
fix dc standby to active ( #26810 )
...
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-09-06 10:41:49 +08:00
Enwei Jiao
fb0705df1b
Decouple basetable and componentparam ( #26725 )
...
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-05 10:31:48 +08:00
congqixia
145387fdcb
Bump proto go-api to v2.3.0 ( #26561 )
...
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-08-23 20:18:23 +08:00
congqixia
2b367b6bb0
Fix sessionutil Liveness check blcok in watch forever ( #26248 )
...
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-08-10 14:07:16 +08:00
congqixia
7dfc8fbf0a
Fix data race on keepAliveCancel ( #26087 )
...
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-08-02 18:55:07 +08:00
congqixia
8b11636e72
Cancel previous ctx for session retry keepalive ( #26050 )
...
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-08-02 12:09:05 +08:00
wayblink
587237a3c9
Fix dead loop in session ( #25451 )
...
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-07-13 18:02:29 +08:00
yah01
cd29b863d0
Fix data race in session ( #25354 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-07-06 14:52:25 +08:00
wayblink
b7ecb7f56b
Disable retryKeepAlive when LivenessCheck's Context close ( #25161 )
...
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-06-27 17:08:45 +08:00
wayblink
b752a29995
Add timeout for keepalive in session ( #25077 )
...
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-06-26 12:30:44 +08:00
SimFG
0c3f92d7d7
Improve the panic code about the rootcoord/session/rocksmq ( #24859 ) ( #25024 )
...
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-06-21 11:24:42 +08:00
yah01
ebd0279d3f
Check error by Error() and NoError() for better report message ( #24736 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-06-08 15:36:36 +08:00
congqixia
d0c2fa5d19
Fix retryKeepAlive assertion panic ( #24667 )
...
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-06-07 10:08:36 +08:00
wayblink
5fb5b072ae
Retry keepalive when keepalive channel close ( #24581 )
...
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-06-01 16:14:35 +08:00
congqixia
74bba2320a
Fix session stop/goingStop stuck after connection lost ( #24131 )
...
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-05-16 14:51:22 +08:00
cai.zhang
43a9e175a3
Exit component process when session key is deleted ( #21658 ) ( #22164 )
...
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-04-12 20:12:28 +08:00
jaime
c9d0c157ec
Move some modules from internal to public package ( #22572 )
...
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-04-06 19:14:32 +08:00
yah01
7da870f512
Remove useCustomConfig and simpilify the session type ( #23166 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-03 20:10:24 +08:00
congqixia
732986aa04
Remove fmt.Print from internal package ( #22722 )
...
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-03-14 17:36:05 +08:00
Enwei Jiao
697dedac7e
Use cockroachdb/errors to replace other error pkg ( #22390 )
...
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-02-26 11:31:49 +08:00
zhenshan.cao
e768437681
Correct usage of Timer and Ticker ( #22228 )
...
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2023-02-23 18:59:45 +08:00
congqixia
f2575e5fa8
Add unconvert & durationcheck linters and fix issues ( #22161 )
...
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-02-15 17:22:34 +08:00
yah01
b1f31da77a
Fix activate standby server ignores all errors ( #22073 )
...
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-02-09 15:24:31 +08:00
wayblink
d41cc0b21b
Revoke session to only delete session key created by this node ( #21935 )
...
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-02-02 16:37:52 +08:00
wayblink
de584b508e
Fix active-standby switch fail bug ( #21755 )
...
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-01-17 11:43:43 +08:00
wayblink
6a722396bd
Integration test framework ( #21283 )
...
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-01-12 19:49:40 +08:00