issue: #39551
This PR remove querycoord's scheduling of l0 segments:
- only load l0 segment when watch channel
- only release l0 segment when release channel or sync data distribution
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #39681
this PR maintain workload effect in action instead of computing workload
effect from target, which may cause leak if target changes.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
the task limit in assignSegment/assignChannel will works for both load
task and balance task.
this PR remove the load task limit, only limit balance task num in one
round.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #35917
This PR refine the querycoord meta related interfaces to ensure that
each method includes a ctx parameter.
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
See also #37205
Previously releasing growing segments could be triggered by two
conditions:
- Sealed Segment with same id is loaded
- Segment start position is before target checkpoint ts
Which has a worst case that the corresponding sealed segment is
compacted and the checkpoint is pinned by a growing l0 segment.
This PR introduces a new rule that: a growing segment could be released
if the segment id appeared in current target dropped segment id list.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #37054
after querycoord restart, segment_checker may release segment by mistake
due to next target isn't ready yet.
This PR requires release segment must happens after next target is
ready.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #35821
After collection loaded, if we need to increase/decrease collection's
replica, we need to release and load it again.
milvus offers 4 solution to update loaded collection's replica, this PR
aims to dynamic change the replica number without release, and after
replica number changed, milvus will execute load replica or release
replica in async, and the replica loaded status can be checked by
getReplicas API.
Notice that if set too much replicas than querynode can afford,the new
replica won't be loaded successfully until enough querynode joins.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #36201
after querynode has been remove from replica, all dirty segment/channel
on it should be released.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #34781
when balance segment hasn't finished yet, query coord may found 2 loaded
copy of segment, then it will generate task to deduplicate, which may
cancel the balance task. then the old copy has been released, and the
new copy hasn't be ready yet but canceled, then search failed by segment
lack.
this PR set deduplicate segment task's proirity to low, to avoid balance
segment task canceled by deduplicate task.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #34234
`LoadPartitions` does not guarantee the current target has loading
partitions if there are some partitions already loaded before.
This PR check current target contains the partition to load when
advancing loading percentage to 100.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
try to update index for l0 segment, will failed by `index not found`
This PR skip update index for l0 segment
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #33103
when try to do stopping balance for stopping query node, balancer will
try to get node list from replica.GetNodes, then check whether node is
stopping, if so, stopping balance will be triggered for this replica.
after the replica refactor, replica.GetNodes only return rwNodes, and
the stopping node maintains in roNodes, so balancer couldn't find
replica which contains stopping node, and stopping balance for replica
won't be triggered, then query node will stuck forever due to
segment/channel doesn't move out.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #32910
* split replica's node list to channels when create replicas
* balance nodes among channels when node change happens
* implement channel level balance, let balance happens in channel level
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #31091
This PR add GetByFilter interface in leader view manager, instead of all
kind of get func
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30647
- ReplicaManager manage read only node now, and always do persistent of
node distribution of replica.
- All segment/channel checker using ReplicaManager to get read-only node
or read-write node, but not ResourceManager.
- ReplicaManager promise that only apply unique querynode to one replica
in same collection now (replicas in same collection never hold same
querynode at same time).
- ReplicaManager promise that fairly node count assignment policy if
multi replicas of collection is assigned to one resource group.
- Move some parameters check into ReplicaManager to avoid data race.
- Allow transfer replica to resource group that already load replica of
same collection
- Allow transfer node between resource groups that load replica of same
collection
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #31091
This PR add GetByFilter interface in channel dist manager, instead of
all kind of get func
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #29261
This PR Add restful api for devops to execute rolling upgrade, including
suspend/resume balance and manual transfer segments/channels.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #31480#31481
release duplicate l0 segment task, which execute on old delegator may
cause segment lack, and execute on new delegator may break new
delegator's leader view.
This PR skip release duplicate l0 segment by segment_checker, cause l0
segment will be released with unsub channel
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30186
during channel balance, after new delegator loaded, instead of syncing
l0 segment's location to new delegator, we should load l0 segment on new
delegator, and release the old l0 segment, then start to release old
delegator.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30816
cause balance channel will stuck until leader view catch up the current
target, then start to unsub the old delegator. which make sure that the
new delegator can provide search before release old delegator. but
another logic in segment_checker skip loading segment during balance
channel. so during balance channel, if query node crash, new delegator
can't catch up target forever, then stuck forever.
This PR remove the rule that skip loading segment during balance channel
to avoid the logic dead lock here.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #29814
if channel is not subscribed yet, the generated load segment task will
be remove from task scheduler due to the load segment task need to be
transfer to worker node by shard leader.
This PR skip generate load segment task when channel is not subscribed
yet.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #29803
This PR:
- Add trace span for collection/partition load
- Use TraceSpan to generate Segment/ChannelTasks when loading
- Refine BaseTask trace tag usage
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #29453
sync distribution by rpc will also call loadSegment/releaseSegment,
which may cause all kinds of concurrent case on same segment, such as
concurrent load and release on one segment.
This PR add leader_checker which generate load/release task to correct
the leader view, instead of calling sync distribution by rpc
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>