issue: #39551
This PR remove querycoord's scheduling of l0 segments:
- only load l0 segment when watch channel
- only release l0 segment when release channel or sync data distribution
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #39680
if compaction/gc happens, load collection may stuck due to
SegmentNotFound, we should trigger UpdateNextTarget to get a new data
view to execute loading operation.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #39681
this PR maintain workload effect in action instead of computing workload
effect from target, which may cause leak if target changes.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #36621#39417
1. Adjust the server-side cache size.
2. Add source information for configurations.
3. Add node ID for compaction and indexing tasks.
4. Resolve localhost access issues to fix health check failures for
etcd.
Signed-off-by: jaime <yun.zhang@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/37630
Reduce the frequency updating metrics to avoid holding the mutex for
long periods.
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
When there are many segment tasks in the querycoord scheduler, the
traversal in `GetSegmentTaskDelta` checks becomes time-consuming. This
PR adds caching for segment deltas.
issue: https://github.com/milvus-io/milvus/issues/37630
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: Wei Liu <wei.liu@zilliz.com>
1. Provide partition&channel level indexing in the collection target.
2. Make `SegmentAction` not wait for distribution.
3. Remove scheduler and target manager mutex.
4. Optimize logging to reduce CPU overhead.
issue: https://github.com/milvus-io/milvus/issues/37630
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #38718
The balancer calculates the workload of executing tasks as an ongoing
score for target nodes. However, a logic issue arises when
GetSegmentTaskDelta or GetChannelTaskDelta is called with
collectionID=-1, which incorrectly returns zero.
Due to the incorrect global score, the executing task's workload is not
properly reflected for each collection. Consequently, each collection
submits its own balance task, leading to the balancer assigning
excessive tasks to the same QueryNode.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #38201
leader task require to update delegator's distribution, and only success
after the distribution change has been applyed to delegator. but the
delegator will reject the distribution change if it's version is older
than current version in delegator. which cause the leader task stuck and
retry forever.
this PR remove the leader task finish check.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #38305
after we disable balance segment and balance channel happens at same
time, the constriant which require release segment must happens on
serviceable shard leader is unnessary.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #35917
This PR refine the querycoord meta related interfaces to ensure that
each method includes a ctx parameter.
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
issue: #33550
balance segment and balance segment execute at same time, which will
cause bounch of corner case.
This PR disable simultaneous balance of segments and channels
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: ##36621
- For simple types in a struct, add "string" to the JSON tag for
automatic string conversion during JSON encoding.
- For complex types in a struct, replace "int64" with "string."
Signed-off-by: jaime <yun.zhang@zilliz.com>
issue: #36970
cause release segment and balance channel may happen at same time, and
before new delegator become serviceable, if release segment exeuctes on
new delegator, and search/query comes on old delegator, then release
segment and query segment happens in parallel, if release segment
execute first in worker, then search/query will got a SegmentNodeLoaded
error.
This PR add serviceable filter on delegator, then all load/release
segment operation will happens on serviceable delegator.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #36621
1. Add API to access task runtime metrics, including:
- build index task
- compaction task
- import task
- balance (including load/release of segments/channels and some leader
tasks on querycoord)
- sync task
2. Add a debug model to the webpage by using debug=true or debug=false
in the URL query parameters to enable or disable debug mode.
Signed-off-by: jaime <yun.zhang@zilliz.com>
issue: #36536
query coord use `segmentTaskDeleta/channelTaskDelta` to measure the
executing workload for querynode in scheduler, and we maintains the
`segmentTaskDeleta/channelTaskDelta` by `scheulder.Add(task)` and
`scheduler.remove(task)`, but `scheduler.remove(task)` has been called
in unexpected way, which cause a wrong
`segmentTaskDeleta/channelTaskDelta` value and affect the segment assign
logic, causes segment unbalance.
This PR moves to compute the `segmentTaskDeleta/channelTaskDelta` when
access, to avoid the wrong value affect.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #35821
After collection loaded, if we need to increase/decrease collection's
replica, we need to release and load it again.
milvus offers 4 solution to update loaded collection's replica, this PR
aims to dynamic change the replica number without release, and after
replica number changed, milvus will execute load replica or release
replica in async, and the replica loaded status can be checked by
getReplicas API.
Notice that if set too much replicas than querynode can afford,the new
replica won't be loaded successfully until enough querynode joins.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #36426
the old constriant requires only segment on current target can be
balanced, which is wrong, and caused that segment can't be move out from
stopping node, if it's only exist in next target.
by design, stopping balance need to move out all segment on it by
balance task, thus the unfair old constriant should be removed.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #35846
querycoord will notify proxy to update shard leader cache after
delegator location changes, but during querynode's failure recovery,
some delegator may become unserviceable due to lacking of segments, and
back to serviceable after segment loaded, so we also need to notify
proxy to invalidate shard leader cache when delegator serviceable state
changes.
This PR will maintain querynode's serviceable state during heartbeat,
and notify proxy to invalidate shard leader cache if serviceable state
changes.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #34095
When a new query node comes online, the segment_checker,
channel_checker, and balance_checker simultaneously attempt to allocate
segments to it. If this occurs during the execution of a load task and
the distribution of the new query node hasn't been updated, the query
coordinator may mistakenly view the new query node as empty. As a
result, it assigns segments or channels to it, potentially overloading
the new query node with more segments or channels than expected.
This PR measures the workload of the executing tasks on the target query
node to prevent assigning an excessive number of segments to it.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
when querycoord process segment task, it will try to iterate whole
segment list to checke whether segment is loaded, which cost too much
cpu if there has thousands of segments.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>