Commit Graph

45 Commits (lite)

Author SHA1 Message Date
wei liu 40f9db491e
fix: Fix SyncDistribution may cost too much time on retry (#38454)
issue: #38428

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-12-16 11:38:44 +08:00
congqixia 051bc280dd
enhance: Make dynamic load/release partition follow targets (#38059)
Related to #37849

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-12-05 16:24:40 +08:00
tinswzy e76802f910
enhance: refine querycoord meta/catalog related interfaces to ensure that each method includes a ctx parameter (#37916)
issue: #35917 
This PR refine the querycoord meta related interfaces to ensure that
each method includes a ctx parameter.

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-11-25 11:14:34 +08:00
wei liu a1b6be1253
fix: Delegator stuck at unserviceable status (#37694)
issue: #37679

pr #36549 introduce the logic error which update current target when
only parts of channel is ready.

This PR fix the logic error and let dist handler keep pull distribution
on querynode until all delegator becomes serviceable.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-15 10:20:31 +08:00
wei liu 1304b40552
fix: Balance channel may stuck at increasing replica number case (#37641)
issue: #37640
fix the pr #36549
cause balance channel will wait until new delegator becomes serviceable,
but new delegator need to sync target version then becomes serviceable,
and sync target version need to be wait all replica load done. so if
increasing replica number and balance channel happens at same time,
logic dead lock occurs.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-14 10:08:31 +08:00
wei liu 266f8ef1f5
fix: Search may return less result after qn recover (#36549)
issue: #36293 #36242
after qn recover, delegator may be loaded in new node, after all segment
has been loaded, delegator becomes serviceable. but delegator's target
version hasn't been synced, and if search/query comes, delegator will
use wrong target version to filter out a empty segment list, which
caused empty search result.

This pr will block delegator's serviceable status until target version
is synced

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-12 16:34:28 +08:00
yihao.dai ff9bdf7029
fix: Fix load slowly (#37454)
When there're a lot of loaded collections, they would occupy the target
observer scheduler’s pool. This prevents loading collections from
updating the current target in time, slowing down the load process.
This PR adds a separate target dispatcher for loading collections.

issue: https://github.com/milvus-io/milvus/issues/37166

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-09 07:48:26 +08:00
Xiaofan e073906a19
enhance: optimize describe collection and index (#37490)
fix #37489
combine multiple describe collection and list index into one call

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2024-11-08 10:18:34 +08:00
wei liu 75676fbd11
fix: Fix dynamic release partition may fail search/query request (#35919)
issue: #33550
cause concurrent issue may occur between remove parition in target
manager and sync segment list to delegator. when it happens, some
segment may be released in delegator, and those segment may also be
synced to delegator, which cause delegator become unserviceable due to
lack of necessary segments, then search/query fails.

this PR make sure that all write access to target_manager will be
executed in serial to avoid the concurrent issues.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-09-05 18:47:03 +08:00
congqixia 09ef3f1b4f
fix: Make sure querycoord observers started once (#35811)
Related to #35809

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-29 14:45:00 +08:00
jaime fcec4c21b9
fix: check collection health(queryable) fail for releasing collection (#34947)
issue: #34946

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-08-02 17:20:15 +08:00
congqixia b284b81a47
fix: Check partition in current target when observing partition load status (#34282)
See also #34234

`LoadPartitions` does not guarantee the current target has loading
partitions if there are some partitions already loaded before.

This PR check current target contains the partition to load when
advancing loading percentage to 100.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-01 17:40:07 +08:00
wayblink a1232fafda
feat: Major compaction (#33620)
#30633

Signed-off-by: wayblink <anyang.wang@zilliz.com>
Co-authored-by: MrPresent-Han <chun.han@zilliz.com>
2024-06-10 21:34:08 +08:00
wei liu c4806b69c4
enhance: Refactor leader view manager interface (#31133)
issue: #31091
This PR add GetByFilter interface in leader view manager, instead of all
kind of get func

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-04-10 15:13:36 +08:00
congqixia 73858b23bc
fix: Make target observer auto/manual task mutual exclusive (#31584)
See also #30867

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-26 09:57:08 +08:00
chyezh 9f9ef8ac32
enhance: transfer resource group and dbname to querynode when load (#30936)
issue: #30931

Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-21 11:59:12 +08:00
congqixia c886aa29ff
enhance: Use `ListIndexes` instead of `DescribeIndex` for qc broker (#31122)
See also #31103

Since querycoord need index meta information from datacoord only, broker
shall use `ListIndexes` to skip segment index building check logic in
datacoord

This PR is also related to #30538, in which DescribeIndex caused lots of
memory usage and lead to OOM eventually

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-07 21:43:03 +08:00
yiwangdr 32cff25f97
enhance: decrease coordinator init time (#29822)
This PR mainly improve two items:
1. Target observer should refresh loading status during init time. An
uninitialized loading status blocks search/query. Currently, the target
observer refreshes every 10 seconds, i.e. we'd need to wait for 10s for
no reason. That's also the reason why we constantly see false log
"collection unloaded" upon mixcoord restarts.
2. Delete session when service is stopped. So that the new service
doesn't need to wait for the previous session to expire (~10s).

Item 1 is the major improvement of this PR, which should speed up init
time by 10s.
Item 2 is not a big concern in most cases as coordinators usually shut
down after stop(). In those cases, coordinator restart triggers serverID
change which further triggers an existing logic that deletes expired
session. This PR only fixes rare cases where serverID doesn't change.

integration test:
`go test -tags dynamic -v -coverprofile=profile.out -covermode=atomic
tests/integration/coordrecovery/coord_recovery_test.go -timeout=20m`
Performance after the change:
Average init time of coordinators: 10s
Hardware: M2 Pro
Test setup: 1000 collections with 1000 rows (dim=128) per collection.


issue: #29409

Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2024-02-05 14:00:12 +08:00
congqixia da7c3cbd88
enhance: make delegator delete buffer holding all delete from cp (#29626)
See also #29625

This PR:
- Add a new implemention of `DeleteBuffer`: listDeleteBuffer
  - holds cacheBlock slice
  - `Put` method append new delete data into last block
  - when a block is full, append a new block into the list
- Add `TryDiscard` method for `DeleteBuffer` interface
  - For doubleCacheBuffer, do nothing
- For listDeleteBuffer, try to evict "old" blocks, which are blocks
before the first block whose start ts is behind provided ts
- Add checkpoint field for `UpdateVersion` sync action, which shall be
used to discard old cache delete block

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-04 17:02:46 +08:00
yah01 9819090247
enhance: add more logs for target updating (#29090)
- add more logs about the condition satisfying

---------

Signed-off-by: yah01 <yah2er0ne@outlook.com>
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-12-12 14:06:43 +08:00
aoiasd b4af6f8c40
fix: sync action load segment with lack collection index info list (#28788)
relate: https://github.com/milvus-io/milvus/issues/28779
https://github.com/milvus-io/milvus/issues/28637

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-12-04 18:14:34 +08:00
wei liu d081fd5481
enhance: Change some frequency log to rated level (#28897)
This pr change some frequency log's level to rated.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-12-04 10:38:35 +08:00
yah01 ecb3f585c3
Fix passing the wrong dropped list from current target (#28265)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-08 17:02:18 +08:00
yah01 1b90630633
Fix the target updated before version updated to cause data missing (#28250)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-08 11:36:22 +08:00
wei liu e0222b2ce3
refine target manager code style (#27883)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-10-25 00:44:12 +08:00
congqixia 93a877f55e
Make qcv2 target&leader observer execute in parallel (#27844)
- Add `taskDispatcher` to submit and run task async safely
- Change `LeaderObeserver` and `TargetObserver` schedule and manual check action to submitting task into dispatcher
- Fix logic problem in collection observer when manual check return false

See also #27494

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-10-24 10:14:11 +08:00
wei liu 55e5f80e24
update collection target after observer start (#27774)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-10-19 21:52:10 +08:00
congqixia eca79d149c
Add ctx control for observer manual check methods (#27531)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-10-09 11:07:33 +08:00
yah01 a8ce1b6686
Refine QueryCoord stopping (#27371)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-27 16:27:27 +08:00
wei liu 949c320185
remove pull target from qc recover (#26775)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-09-01 11:17:01 +08:00
Enwei Jiao 7d61355ab0
Refactor log for Query (#26310)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-08-14 18:57:32 +08:00
wei liu 6f89620a43
remove pull target rpc from lock (#26054)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-08-04 10:31:06 +08:00
yah01 848ef7418b
Fix the refresh may be notified finished early (#24438)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-05-26 18:43:26 +08:00
yah01 296380d6e6
Support async refresh (#23107)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-12 15:06:28 +08:00
jaime c9d0c157ec
Move some modules from internal to public package (#22572)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-04-06 19:14:32 +08:00
yah01 2b81933d13
Refine logs of DistHandler (#22879)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-21 18:11:57 +08:00
yihao.dai 1f718118e9
Dynamic load/release partitions (#22655)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-03-20 14:55:57 +08:00
zhenshan.cao e768437681
Correct usage of Timer and Ticker (#22228)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2023-02-23 18:59:45 +08:00
Ten Thousand Leaves defb7660c8
Add refresh option for LoadCollection/LoadPartitions interfaces (#21776)
/kind improvement

Signed-off-by: Yuchen Gao <yuchen.gao@zilliz.com>
2023-01-18 16:41:44 +08:00
yah01 c8f89907b6
Fix current target may be updated to an invalid target (#21742)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-01-17 11:41:51 +08:00
yah01 837e3162d7
Fix ready notifiers leak when collection released (#21712)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-01-14 21:55:41 +08:00
yah01 32fb409e57
Fix may update the current target to an unavailable target when node down (#21698)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-01-13 17:11:41 +08:00
yah01 5ba1a94858
Fix observers may update current target to a unfinished next target (#21107)
Signed-off-by: yah01 <yang.cen@zilliz.com>

Signed-off-by: yah01 <yang.cen@zilliz.com>
2022-12-12 15:15:22 +08:00
Enwei Jiao 89b810a4db
Refactor all params into ParamItem (#20987)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>

Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2022-12-07 18:01:19 +08:00
wei liu c5cd92d36e
update target (#19296)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2022-11-07 19:37:04 +08:00