issue: #37718
This PR refine the shard client ref counter, dec ref counter won't
release client anymore, and only permit shard client manager to remove
client.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #37115
the old implementation update shard cache and shard client manager at
same time, which causes lots of conor case due to concurrent issue
without lock.
This PR decouple shard client manager from shard cache, so only shard
cache will be updated if delegator changes. and make sure shard client
manager will always return the right client, and create a new client if
not exist. in case of client leak, shard client manager will purge
client in async for every 10 minutes.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #36490
After the query node changes from a delegator to a worker, proxy should
skip this querynode's health check.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #35859
This PR introduce two new param: toleranceFactor and checkRequestNum,
after every checkRequestNum request has been assigned, try to compute
querynode's workload score.
if the diff is less than the toleranceFactor, replica selection policy
will fallback to round_robin, which reduce the average cost to about
500ns.
if the diff is larger than the toleranceFactor, replica selection policy
will compute querynode's score to select the target node with smallest
score in every assigment.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30531
cause get client from `shardClientMgr`, doesn't means query node is
unavailable. because of the ref counter policy in `shardClientMgr`,
which will clean the client, if no collection use qn as shard leader.
This PR fix that set node unreachable when get shard client failed.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>