Commit Graph

115 Commits (54797b42860f5917458996b8b5fa734b1f9b15bf)

Author SHA1 Message Date
congqixia 25a1c9ecf0
fix: Make coordinator `Register` not blocked on ProcessActiveStandby (#32069)
See also #32066

This PR make coordinator register successful and let
`ProcessActiveStandBy` run async. And roles may receive stop signal and
notify servers.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-04-10 18:49:18 +08:00
congqixia 357fe814ce
fix: Remove unnecessary deleteSession operation (#31647)
See also #31628

The `Revoke` operation shall delete all keys related to the lease
attaching to. This `deleteSession` operation may also remove the session
key in next epoch by mistake and cause chaos session status

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-29 13:57:11 +08:00
jaime db79be3ae0
fix: ctx cancel should be the last step while stopping server (#31220)
issue: #31219

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-03-15 10:33:05 +08:00
yiwangdr 32cff25f97
enhance: decrease coordinator init time (#29822)
This PR mainly improve two items:
1. Target observer should refresh loading status during init time. An
uninitialized loading status blocks search/query. Currently, the target
observer refreshes every 10 seconds, i.e. we'd need to wait for 10s for
no reason. That's also the reason why we constantly see false log
"collection unloaded" upon mixcoord restarts.
2. Delete session when service is stopped. So that the new service
doesn't need to wait for the previous session to expire (~10s).

Item 1 is the major improvement of this PR, which should speed up init
time by 10s.
Item 2 is not a big concern in most cases as coordinators usually shut
down after stop(). In those cases, coordinator restart triggers serverID
change which further triggers an existing logic that deletes expired
session. This PR only fixes rare cases where serverID doesn't change.

integration test:
`go test -tags dynamic -v -coverprofile=profile.out -covermode=atomic
tests/integration/coordrecovery/coord_recovery_test.go -timeout=20m`
Performance after the change:
Average init time of coordinators: 10s
Hardware: M2 Pro
Test setup: 1000 collections with 1000 rows (dim=128) per collection.


issue: #29409

Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2024-02-05 14:00:12 +08:00
smellthemoon 1c1f2a1371
enhance:change some logs (#29579)
related #29588

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-01-05 16:12:48 +08:00
wei liu 5b45a138b1
disable auto balance when old node exists (#28191)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-07 14:02:20 +08:00
Xiaofan da19e49daf
Support purge old session for standalone (#28184)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-11-06 21:21:42 +08:00
wei liu ecec5dfcfd
fix retry on offline node (#28079)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-03 10:14:16 +08:00
yah01 9658367a3c
Refine chunk manager errors (#27590)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-31 12:18:15 +08:00
Filip Haltmayer 6b1a106a31
Moving etcd client into session (#27069)
Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
2023-10-27 07:36:12 +08:00
SimFG 9b0ecbdca7
Support to replicate the mq message (#27240)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-10-20 14:26:09 +08:00
jaime ac2d1bb5c2
Support receive signals from parent process (#27756)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-10-18 20:20:11 +08:00
congqixia 2f201c25e2
Remove deprecated io/ioutil usage (#27747)
`io/ioutil` package is deprecated, use `io`,`os` package replacement
also added golangci-lint rule to block future reference

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: guoguangwu <guoguangwu@magic-shield.com>
2023-10-17 20:32:09 +08:00
jaime ec1fe3549e
Add a stop hook to clean session (#27564)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-10-16 10:24:10 +08:00
Jiquan Long e4f73cc805
Add host & enable_disk to session (#27507)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-10-08 20:05:31 +08:00
Jiquan Long 5c1abfa2cc
Print the server id when active-standby switch (#27119)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-10-07 10:01:31 +08:00
Jiquan Long 0f14d18201
Optimize the codec code of session (#27360)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-10-01 10:33:30 +08:00
SimFG c9653b1683
Add some log and improve TestSessionProcessActiveStandBy test case (#27403)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-28 09:35:27 +08:00
foxspy 5db4a0489e
dynamic index version control (#27335)
Co-authored-by: longjiquan <jiquan.long@zilliz.com>
2023-09-25 21:39:27 +08:00
wei liu 9433a24f5d
fix component not exit when liveness check failed (#27236)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-09-22 19:13:25 +08:00
SimFG 26f06dd732
Format the code (#27275)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
congqixia 16b35e07b3
Fix `TestSessionSuite/TestKeepAliveRetryActiveCancel` unit test logic (#27231)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-09-20 18:59:23 +08:00
congqixia f0d0651989
Do not reset connection immediately if grpc code is `Canceled` or `DeadlineExceeded` (#27014)
We found lots of connection reset & canceled due to recent retry change
Current implementation resets connection no matter what the error code is
To sync behavior to previous retry, skip reset connection only if cancel error happens too much.

Also adds a config item for minResetInterval for grpc reset connection

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-09-13 15:01:18 +08:00
congqixia adfb5298c6
Refine `TestSessionProcessActiveStandBy` unit test logic (#26980)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-09-11 18:13:17 +08:00
wei liu 0e2085b77f
fix dc standby to active (#26810)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-09-06 10:41:49 +08:00
Enwei Jiao fb0705df1b
Decouple basetable and componentparam (#26725)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-05 10:31:48 +08:00
congqixia 145387fdcb
Bump proto go-api to v2.3.0 (#26561)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-08-23 20:18:23 +08:00
congqixia 2b367b6bb0
Fix sessionutil Liveness check blcok in watch forever (#26248)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-08-10 14:07:16 +08:00
congqixia 7dfc8fbf0a
Fix data race on keepAliveCancel (#26087)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-08-02 18:55:07 +08:00
congqixia 8b11636e72
Cancel previous ctx for session retry keepalive (#26050)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-08-02 12:09:05 +08:00
wayblink 587237a3c9
Fix dead loop in session (#25451)
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-07-13 18:02:29 +08:00
yah01 cd29b863d0
Fix data race in session (#25354)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-07-06 14:52:25 +08:00
wayblink b7ecb7f56b
Disable retryKeepAlive when LivenessCheck's Context close (#25161)
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-06-27 17:08:45 +08:00
wayblink b752a29995
Add timeout for keepalive in session (#25077)
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-06-26 12:30:44 +08:00
SimFG 0c3f92d7d7
Improve the panic code about the rootcoord/session/rocksmq (#24859) (#25024)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-06-21 11:24:42 +08:00
yah01 ebd0279d3f
Check error by Error() and NoError() for better report message (#24736)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-06-08 15:36:36 +08:00
congqixia d0c2fa5d19
Fix retryKeepAlive assertion panic (#24667)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-06-07 10:08:36 +08:00
wayblink 5fb5b072ae
Retry keepalive when keepalive channel close (#24581)
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-06-01 16:14:35 +08:00
congqixia 74bba2320a
Fix session stop/goingStop stuck after connection lost (#24131)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-05-16 14:51:22 +08:00
cai.zhang 43a9e175a3
Exit component process when session key is deleted (#21658) (#22164)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-04-12 20:12:28 +08:00
jaime c9d0c157ec
Move some modules from internal to public package (#22572)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-04-06 19:14:32 +08:00
yah01 7da870f512
Remove useCustomConfig and simpilify the session type (#23166)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-03 20:10:24 +08:00
congqixia 732986aa04
Remove fmt.Print from internal package (#22722)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-03-14 17:36:05 +08:00
Enwei Jiao 697dedac7e
Use cockroachdb/errors to replace other error pkg (#22390)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-02-26 11:31:49 +08:00
zhenshan.cao e768437681
Correct usage of Timer and Ticker (#22228)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2023-02-23 18:59:45 +08:00
congqixia f2575e5fa8
Add unconvert & durationcheck linters and fix issues (#22161)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-02-15 17:22:34 +08:00
yah01 b1f31da77a
Fix activate standby server ignores all errors (#22073)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-02-09 15:24:31 +08:00
wayblink d41cc0b21b
Revoke session to only delete session key created by this node (#21935)
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-02-02 16:37:52 +08:00
wayblink de584b508e
Fix active-standby switch fail bug (#21755)
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-01-17 11:43:43 +08:00
wayblink 6a722396bd
Integration test framework (#21283)
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-01-12 19:49:40 +08:00