issue: #35563
1. Use an internal health checker to monitor the cluster's health state,
storing the latest state on the coordinator node. The CheckHealth
request retrieves the cluster's health from this latest state on the
proxy sides, which enhances cluster stability.
2. Each health check will assess all collections and channels, with
detailed failure messages temporarily saved in the latest state.
3. Use CheckHealth request instead of the heavy GetMetrics request on
the querynode and datanode
Signed-off-by: jaime <yun.zhang@zilliz.com>
Relatedt #36887
DirectFoward streaming delete will cause memory usage explode if the
segments number was large. This PR add batching delete API and using it
for direct forward implementation.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #36464
This PR enable balance on querynode with different mem capacity, for
query node which has more mem capactity will be assigned more records,
and query node with the largest difference between assignedScore and
currentScore will have a higher priority to carry the new segment.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #35821
After collection loaded, if we need to increase/decrease collection's
replica, we need to release and load it again.
milvus offers 4 solution to update loaded collection's replica, this PR
aims to dynamic change the replica number without release, and after
replica number changed, milvus will execute load replica or release
replica in async, and the replica loaded status can be checked by
getReplicas API.
Notice that if set too much replicas than querynode can afford,the new
replica won't be loaded successfully until enough querynode joins.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #33744
This PR includes the following changes:
1. Added a new task type to the task scheduler in datacoord: stats task,
which sorts segments by primary key.
2. Implemented segment sorting in indexnode.
3. Added a new field `FieldStatsLog` to SegmentInfo to store token index
information.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
issue: #35087
after qc restarts, and target is not ready yet, if dist_handler try to
update segment dist, it will set legacy level to l0 segment, which may
cause l0 segment be moved to other node, cause search/query failed.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #34746
This PR add segment level field in response of
`GetPersistentSegmentInfo` and `GetQuerySegmentInfo`
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #32910
* split replica's node list to channels when create replicas
* balance nodes among channels when node change happens
* implement channel level balance, let balance happens in channel level
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30647
- Add declarative resource group api
- Add config for resource group management
- Resource group recovery enhancement
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #30647
- ReplicaManager manage read only node now, and always do persistent of
node distribution of replica.
- All segment/channel checker using ReplicaManager to get read-only node
or read-write node, but not ResourceManager.
- ReplicaManager promise that only apply unique querynode to one replica
in same collection now (replicas in same collection never hold same
querynode at same time).
- ReplicaManager promise that fairly node count assignment policy if
multi replicas of collection is assigned to one resource group.
- Move some parameters check into ReplicaManager to avoid data race.
- Allow transfer replica to resource group that already load replica of
same collection
- Allow transfer node between resource groups that load replica of same
collection
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #29261
This PR Add restful api for devops to execute rolling upgrade, including
suspend/resume balance and manual transfer segments/channels.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #28491
after querycoord restart, it will pull a new target, which include
channel and segment list. when segments loaded on querynode has reached
the target, the collection could provide search/query. but if segment
list changes by time, ater querycoord pull a new target, it will takes a
few minutes to catch up the target's segment distribution. and before
that, query/search will fail due to lack of segments.
This PR save the current loaded target to meta storein querycoord's stop
progress, and recover it when query coord starts, to speed up the target
recovery time.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #29625
This PR:
- Add a new implemention of `DeleteBuffer`: listDeleteBuffer
- holds cacheBlock slice
- `Put` method append new delete data into last block
- when a block is full, append a new block into the list
- Add `TryDiscard` method for `DeleteBuffer` interface
- For doubleCacheBuffer, do nothing
- For listDeleteBuffer, try to evict "old" blocks, which are blocks
before the first block whose start ts is behind provided ts
- Add checkpoint field for `UpdateVersion` sync action, which shall be
used to discard old cache delete block
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See also #27515
When Delegator processes delete data, it forwards delete data with only
segment id specified. When two segments has same segment id but one is
growing and the other is sealed, the delete will be applied to both
segments which causes delete data out of order when concurrent load
segment occurs.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #28622
query node with delegator will has more rows than other query node due
to delgator loads all growing rows.
This PR enable the balance segment which based on the num of growing
rows in leader view.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>