Commit Graph

116 Commits (a56b24054ff019430d09615e93582c3ea1434bd8)

Author SHA1 Message Date
yihao.dai 004a1875dc
enhance: Introduce batch subscription in msgdispatcher (#39863)
Introduce a batch subscription mechanism in msgdispatcher: the
msgdispatcher now includes a vchannel watch task queue, where all
vchannels in the queue will subscribe to the MQ only once and pull
messages from the oldest vchannel checkpoint to the latest.

issue: https://github.com/milvus-io/milvus/issues/39862

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-03-05 14:38:02 +08:00
congqixia cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
yihao.dai a5a83a0904
fix: Fix consume blocked due to too many consumers (#38455)
This PR limits the maximum number of consumers per pchannel to 10 for
each QueryNode and DataNode.

issue: https://github.com/milvus-io/milvus/issues/37630

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-01-15 21:37:01 +08:00
Zhen Ye bb8d1ab3bf
enhance: make new go package to manage proto (#39114)
issue: #39095

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-10 10:49:01 +08:00
XuanYang-cn c0b855dc75
fix: ChannelManager concurret Release and Watch bug (#38590)
See also: #38589

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-12-19 22:50:47 +08:00
SimFG 2afe2eaf3e
feat: support to replicate collection when the services contains the system tt msg (#37559)
- issue: #37105

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-12-17 09:08:46 +08:00
tinswzy 27229f7907
enhance: refine exists log print with ctx (#38080)
issue: #35917 
Refines exists log print with ctx

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-12-14 22:36:44 +08:00
wei liu 97a44b62fd
fix: Data race in datacoord channel manager (#37866)
issue: #37865

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-21 19:00:32 +08:00
yihao.dai 0fc0d1a888
fix: Limit the concurrency of channel tasks (#37740)
Limit the maximum concurrency of channel tasks for each DataNode to
prevent excessive subscriptions from causing DataNode OOM.

issue: https://github.com/milvus-io/milvus/issues/37665

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-18 16:26:30 +08:00
jaime f348bd9441
feat: add segment,pipeline, replica and resourcegroup api for WebUI (#37344)
issue: #36621

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-07 11:52:25 +08:00
Zhen Ye cae9e1c732
fix: drop collection failed if enable streaming service (#37444)
issue: #36858

- Start channel manager on datacoord, but with empty assign policy in
streaming service.
- Make collection at dropping state can be recovered by flusher to make
sure that
 milvus consume the dropCollection message.
- Add backoff for flusher lifetime.
- remove the proxy watcher from timetick at rootcoord in streaming
service.

Also see the better fixup: #37176

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-07 10:26:26 +08:00
wei liu d51a808851
fix: Rootcoord stuck at graceful stop progress (#36880)
issue: #34553
when rootcoord trigger graceful stop progress, it will block until all
rpc finished. for create collection request, rootcoord need to block
until datacoord finish to watch all channels, but datacoord need to call
`rootcoord.Alloc` during watch channel, and rootcoord doesn't respond to
new request anymore. which cause create collection stucks, and graceful
stop progress stucks.

This PR remove the func call `rootcoord.Alloc` to solve the logic dead
lock during graceful stop progress.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-10-17 12:15:25 +08:00
Zhen Ye 99dff06391
enhance: using streaming service in insert/upsert/flush/delete/querynode (#35406)
issue: #33285

- using streaming service in insert/upsert/flush/delete/querynode
- fixup flusher bugs and refactor the flush operation
- enable streaming service for dml and ddl
- pass the e2e when enabling streaming service
- pass the integration tst when enabling streaming service

---------

Signed-off-by: chyezh <chyezh@outlook.com>
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-29 10:03:08 +08:00
XuanYang-cn f12e368a76
fix: Fill nil schema so that Milvus can watch channel for those upgraded from 2.2 to 2.4 #35695 (#35694)
See also: [#35701 ](https://github.com/milvus-io/milvus/issues/35701)

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-08-27 10:36:59 +08:00
congqixia c992a61a23
enhance: Separate allocator pkg in datacoord (#35622)
Related to #28861

Move allocator interface and implementation into separate package. Also
update some unittest logic.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-22 10:06:56 +08:00
XuanYang-cn 314f4d995b
enhance: Tidy dc channel manager (#34515)
See also: #34518

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-07-09 18:26:12 +08:00
jaime 21fc5f5d46
enhance: Remove datanode reporting TT based on MQ implementation (#34421)
issue: #34420

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-07-05 15:48:09 +08:00
congqixia d51d0954bd
enhance: Continue loop when reassign channel fails (#34331)
Log will be confusing when `Reassign` channel operation failed for both
success & failure log will be printed in row. This PR continue the loop
to avoid this output.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-04 14:20:10 +08:00
jaime d1f57aa4ba
enhance: remove deprecated code within channel manager (#34340)
issue: https://github.com/milvus-io/milvus/issues/33994

only remove deprecated code, no additional changes.

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-07-03 19:46:09 +08:00
jaime d6afb31b94
enhance: make subfunctions of datanode component modular (#33992)
issue: #33994

also remove deprecated channel manager based on the etcd implementation

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-07-01 14:46:07 +08:00
jaime 9630974fbb
enhance: move rocksmq from internal to pkg module (#33881)
issue: #33956

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-06-25 21:18:15 +08:00
yiwangdr e895cfed84
fix: reduce redundant map operations in datacoord (#33343)
More refactories will be added.
issue: #33342

Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2024-05-24 12:47:40 +08:00
congqixia 8cf2cf5c94
enhance: Add `go-deadlock` as unittest only dependency (#33063)
See also #33062

This PR:

- Add `lock.RWMutex` & `lock.Mutex` alias to switch implementation based
  on build flags
- When build flags has `test` in it, use `go-deadlock` to detect
  possible deadlocks
- Replace all `sync.RWMutex` & `sync.Mutex` in datacoord pkg

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-05-15 16:33:34 +08:00
yiwangdr b1eacb2ae8
feat: datacoord/node watch based on rpc (#32036)
issue: https://github.com/milvus-io/milvus/issues/25309

Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2024-05-07 15:49:30 +08:00
yiwangdr 037de8e4d3
enhance: speed up minor functions calls in datacoord (#32389)
Related to https://github.com/milvus-io/milvus/issues/32165

1. nodeid based channel store access should use map access instead of
iteration.

2. The join-ish functions calls are slow when # collections/segments
increases (e.g. 10k).
e.g.
getNumRowsOfCollectionUnsafe is O(num_segments); GetAllCollectionNumRows
is of O(num_collections*num_segments).

Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2024-04-20 07:55:21 +08:00
congqixia 83da08c388
enhance: Use map instead of slice to maintain channel info (#32273)
See also #32165

`ChannelManager.Match` is a frequent operation for datacoord. When the
collection number is large, iteration over all channels will cost lots
of CPU time and time consuming.

This PR change the data structure storing datanode-channel info to map
avoiding this iteration when checking channel existence.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-04-16 15:57:19 +08:00
wei liu 0d849a6c0a
fix: fix collectionInfo leak in datacoord (#32175)
issue: #32029

lack of logic to clean collection info in datacoord's meta, This PR
clean collection info after drop channel, to avoid collection info leak
in datacoord

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-04-15 16:33:19 +08:00
smellthemoon 1c1f2a1371
enhance:change some logs (#29579)
related #29588

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-01-05 16:12:48 +08:00
XuanYang-cn 623939c9f5
enhance: Remove not in use policies (#29448)
THe results don't meet our requirements, and the code hasn't been
maintained for a long time.

See also: #29447

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-12-28 10:38:46 +08:00
XuanYang-cn ae180d1628
enhance: Change ChannelManager to interface (#29300)
Rewrite cluster test
issue: #28854

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-12-25 19:24:46 +08:00
wei liu fdbca10e23
fix: Fix channel manager bg checker exit when disable auto balance (#28459)
issue: #28454

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-20 18:20:22 +08:00
XuanYang-cn a153950b10
Change channel to Interface (#27839)
This PR changes `*channel` into RWChannel interface

See also: #25309

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-11-13 11:16:18 +08:00
wei liu 14c8a90517
Fix auto balance block channel reassign after datanode restart (#28275)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-09 19:00:25 +08:00
wei liu 5b45a138b1
disable auto balance when old node exists (#28191)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-07 14:02:20 +08:00
jaime 6749957e71
Refine RPC call in unwatch drop channel (#27864)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-10-24 17:46:15 +08:00
Xiaofan 2ea7579dbb
Reduce rpc size for GetRecoveryInfoV2 (#27483)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-10-23 21:44:09 +08:00
jaime d2dbbbc11b
Reduce write lock scope in channel manager (#27823)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-10-21 07:58:16 +08:00
congqixia 49516d44b4
Add ctx parameter and log tracer for watch and selectNodes (#27809)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-10-20 04:22:11 +08:00
MrPresent-Han cb71a3e235
rm dependency to rc when getting recovery info(#25363) (#27405)
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-10-09 18:51:32 +08:00
SimFG 26f06dd732
Format the code (#27275)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
congqixia 8d13717cac
Fill Collection start position timestamp in WatchInfo (#26370)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-08-16 09:05:32 +08:00
Enwei Jiao 66fdc71479
Refactor logs in DataCoord & DataNode (#25574)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-07-14 15:56:31 +08:00
yiwangdr c7b851f870
add interface for non-watch metakv (#25092)
Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2023-06-26 09:20:44 +08:00
Xiaofan 72c5e2a41a
Fix channel reassigned to other datanodes (#25015)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-06-21 21:26:42 +08:00
congqixia 41af0a98fa
Use go-api/v2 for milvus-proto (#24770)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-06-09 01:28:37 +08:00
congqixia 4a22af6e1a
Unwatch channel in watch buffer (#23548)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-04-20 10:34:31 +08:00
congqixia d83654c33f
Add Close method for ChannelManager in datacoord (#23493)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-04-18 17:54:31 +08:00
zhenshan.cao 4a32b842e8
Improve the check logic of channel remove (#23473)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2023-04-18 02:58:30 +08:00
congqixia ba84f52119
Fix watcher loop quit and channel shouldDrop logic (#23402)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-04-14 09:54:28 +08:00
jaime c9d0c157ec
Move some modules from internal to public package (#22572)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-04-06 19:14:32 +08:00