Commit Graph

297 Commits (d3ae8e9232c25d6d52a661f28b9b2a32d0a8d98e)

Author SHA1 Message Date
Zhen Ye 18bef5e062
fix: fix crash when enable standby and streaming (#38239)
issue: #38125

- connect kv at standby mode.
- make balancer initialization lazy.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-12-06 11:52:40 +08:00
tinswzy 7944538ade
enhance: Add ctx param to KV operation interfaces (#38154)
issue: #35917 
Refine KV operation interfaces by adding a ctx param

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-12-05 15:16:41 +08:00
tinswzy 1dbb6cd7cb
enhance: refine the datacoord meta related interfaces (#37957)
issue: #35917 
This PR refines the meta-related APIs in datacoord to allow the ctx to
be passed down to the catalog operation interfaces

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-11-26 19:46:34 +08:00
Zhen Ye b9a10a2f68
enhance: remove the rpc layer of coordinator when enabling standalone or mixcoord (#37815)
issue: #37764

- add a local client to call local server directly for
querycoord/rootcoord/datacoord.
- enable local client if milvus is running mixcoord or standalone mode.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-23 21:36:33 +08:00
jaime 7bbfe86bcd
enhance: add list index and segment index retrieval API for WebUI (#37861)
issue: #36621

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-22 16:58:34 +08:00
cai.zhang 4dc684126e
enhance: Handoff growing segment after sorted (#37385)
issue: #33744 

1. Segments generated from inserts will be loaded as growing until they
are sorted by primary key.
2. This PR may increase memory pressure on the delegator, but we need to
test the performance of stats. In local testing, the speed of stats is
greater than the insert speed.

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-07 16:08:24 +08:00
jaime f348bd9441
feat: add segment,pipeline, replica and resourcegroup api for WebUI (#37344)
issue: #36621

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-07 11:52:25 +08:00
Zhen Ye cae9e1c732
fix: drop collection failed if enable streaming service (#37444)
issue: #36858

- Start channel manager on datacoord, but with empty assign policy in
streaming service.
- Make collection at dropping state can be recovered by flusher to make
sure that
 milvus consume the dropCollection message.
- Add backoff for flusher lifetime.
- remove the proxy watcher from timetick at rootcoord in streaming
service.

Also see the better fixup: #37176

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-07 10:26:26 +08:00
jaime 9d16b972ea
feat: add tasks page into management WebUI (#37002)
issue: #36621

1. Add API to access task runtime metrics, including:
  - build index task
  - compaction task
  - import task
- balance (including load/release of segments/channels and some leader
tasks on querycoord)
  - sync task
2. Add a debug model to the webpage by using debug=true or debug=false
in the URL query parameters to enable or disable debug mode.

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-10-28 10:13:29 +08:00
wei liu d51a808851
fix: Rootcoord stuck at graceful stop progress (#36880)
issue: #34553
when rootcoord trigger graceful stop progress, it will block until all
rpc finished. for create collection request, rootcoord need to block
until datacoord finish to watch all channels, but datacoord need to call
`rootcoord.Alloc` during watch channel, and rootcoord doesn't respond to
new request anymore. which cause create collection stucks, and graceful
stop progress stucks.

This PR remove the func call `rootcoord.Alloc` to solve the logic dead
lock during graceful stop progress.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-10-17 12:15:25 +08:00
yihao.dai 1f47d5510b
fix: Fix import segments leak in segment manager (#36602)
Directly add import segments from the meta, eliminating the dependency
on the segment manager.

issue: https://github.com/milvus-io/milvus/issues/34648

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-08 10:11:22 +08:00
yihao.dai 9e8cafcbe2
enhance: Skip loading bf in datanode (#36367)
Skip loading bf in datanode:
1. When watching vchannels, skip loading bloom filters for segments.
2. Bypass bloom filter checks for delete messages, directly writing to
L0 segments.
3. Remove flushed segments proactively after flush.

issue: https://github.com/milvus-io/milvus/issues/34585

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-26 10:11:15 +08:00
yihao.dai a61668c77e
feat: Introduce stats task for import (#35868)
This PR introduce stats task for import:
1. Define new `Stats` and `IndexBuilding` states for importJob
2. Add new stats step to the import process: trigger the stats task and
wait for its completion
3. Abort stats task if import job failed

issue: https://github.com/milvus-io/milvus/issues/33744

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-15 15:17:08 +08:00
cai.zhang 8395c8a8db
enhance: Update stats task to optional (#35947)
issue: #33744

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-09-12 20:37:08 +08:00
cai.zhang 2c9bb4dfa3
feat: Support stats task to sort segment by PK (#35054)
issue: #33744 

This PR includes the following changes:
1. Added a new task type to the task scheduler in datacoord: stats task,
which sorts segments by primary key.
2. Implemented segment sorting in indexnode.
3. Added a new field `FieldStatsLog` to SegmentInfo to store token index
information.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-09-02 14:19:03 +08:00
Zhen Ye 99dff06391
enhance: using streaming service in insert/upsert/flush/delete/querynode (#35406)
issue: #33285

- using streaming service in insert/upsert/flush/delete/querynode
- fixup flusher bugs and refactor the flush operation
- enable streaming service for dml and ddl
- pass the e2e when enabling streaming service
- pass the integration tst when enabling streaming service

---------

Signed-off-by: chyezh <chyezh@outlook.com>
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-29 10:03:08 +08:00
jaime b7ea1defd3
fix: mistaken deletions may occur during GC channel checkpoints (#35707)
issue: #35706

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-08-28 10:11:05 +08:00
congqixia 582d2eec79
enhance: Move datanode/indexnode manager to session pkg (#35634)
Related to #28861

Move session manager, worker manager to session package. Also renaming
each manager to corresponding node name(datanode, indexnode).

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-22 16:02:56 +08:00
congqixia c992a61a23
enhance: Separate allocator pkg in datacoord (#35622)
Related to #28861

Move allocator interface and implementation into separate package. Also
update some unittest logic.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-22 10:06:56 +08:00
chyezh bef06e5acf
enhance: add streaming coord and node grpc service register (#34655)
issue: #33285

- register streaming coord service into datacoord.
- add new streaming node role.
- add global static switch to enable streaming service or not.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-07-25 12:19:44 +08:00
jaime 3cd24f7548
fix: collection meta is not removed after gc in DataCoord (#34883)
issue: #34847

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-07-22 21:11:48 +08:00
XuanYang-cn e0b39d8bf4
fix: Milvus panic when compaction disabled and dropping a collection (#34103)
See also: #31059

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-07-11 14:44:52 +08:00
XuanYang-cn 314f4d995b
enhance: Tidy dc channel manager (#34515)
See also: #34518

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-07-09 18:26:12 +08:00
jaime 21fc5f5d46
enhance: Remove datanode reporting TT based on MQ implementation (#34421)
issue: #34420

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-07-05 15:48:09 +08:00
jaime d6afb31b94
enhance: make subfunctions of datanode component modular (#33992)
issue: #33994

also remove deprecated channel manager based on the etcd implementation

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-07-01 14:46:07 +08:00
jaime 9630974fbb
enhance: move rocksmq from internal to pkg module (#33881)
issue: #33956

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-06-25 21:18:15 +08:00
wayblink 5fac2fa1d2
fix: Panic if ProcessActiveStandBy returns error (#33369)
#33368

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-06-19 11:16:00 +08:00
wayblink a1232fafda
feat: Major compaction (#33620)
#30633

Signed-off-by: wayblink <anyang.wang@zilliz.com>
Co-authored-by: MrPresent-Han <chun.han@zilliz.com>
2024-06-10 21:34:08 +08:00
cai.zhang 27cc9f2630
enhance: Support analyze data (#33651)
issue: #30633

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: chasingegg <chao.gao@zilliz.com>
2024-06-06 17:37:51 +08:00
zhenshan.cao ac4f3997ce
enhance: Reconstructing Compaction to possess persistence capability (#33265)
issue #33586

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-06-05 10:17:50 +08:00
cai.zhang 77637180fa
enhance: Periodically synchronize segments to datanode watcher (#33420)
issue: #32809

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-05-30 13:37:44 +08:00
SimFG 2453181218
fix: not found database name in the datacoord meta object (#33411)
- issue: #33410

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-05-28 10:09:48 +08:00
yihao.dai 7730b910b9
enhance: Decouple compaction from shard (#33138)
Decouple compaction from shard, remove dependencies on shards (e.g.
SyncSegments, injection).

issue: https://github.com/milvus-io/milvus/issues/32809

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-05-24 09:07:41 +08:00
yihao.dai 32560263fa
enhance: Query slot for compaction task (#32881)
Query slot of compaction in datanode, and transfer the control logic for
limiting compaction tasks from datacoord to the datanode.

issue: https://github.com/milvus-io/milvus/issues/32809

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-05-17 18:19:38 +08:00
congqixia 8cf2cf5c94
enhance: Add `go-deadlock` as unittest only dependency (#33063)
See also #33062

This PR:

- Add `lock.RWMutex` & `lock.Mutex` alias to switch implementation based
  on build flags
- When build flags has `test` in it, use `go-deadlock` to detect
  possible deadlocks
- Replace all `sync.RWMutex` & `sync.Mutex` in datacoord pkg

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-05-15 16:33:34 +08:00
yiwangdr b1eacb2ae8
feat: datacoord/node watch based on rpc (#32036)
issue: https://github.com/milvus-io/milvus/issues/25309

Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2024-05-07 15:49:30 +08:00
wayblink 42d0412e93
enhance: Add channelCPs in FlushResponse (#32044)
#32609

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-04-30 09:45:27 +08:00
SimFG c012e6786f
feat: support rate limiter based on db and partition levels (#31070)
issue: https://github.com/milvus-io/milvus/issues/30577
co-author: @jaime0815

---------

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
Signed-off-by: SimFG <bang.fu@zilliz.com>
Co-authored-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2024-04-12 16:01:19 +08:00
XuanYang-cn 4617d22482
enhance: Use channel manager interface in server_test (#31621)
Tidy the following test codes

    - Remove channel in newTestServer
    - Remove newTestServerWithMeta
    - Remove newTestServer2
    - Remove testDataCoordBase
    - Use the same func for handleTTmsg and handleRPCTTmsg

See also: #31620

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-04-12 14:59:20 +08:00
congqixia 25a1c9ecf0
fix: Make coordinator `Register` not blocked on ProcessActiveStandby (#32069)
See also #32066

This PR make coordinator register successful and let
`ProcessActiveStandBy` run async. And roles may receive stop signal and
notify servers.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-04-10 18:49:18 +08:00
jaime bd853be8c7
enhance: Add db label for some usual metrics (#30956)
issue: #31782

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-04-02 14:27:13 +08:00
yihao.dai 4e264003bf
enhance: Ensure ImportV2 waits for the index to be built and refine some logic (#31629)
Feature Introduced:
1. Ensure ImportV2 waits for the index to be built

Enhancements Introduced:
1. Utilization of local time for timeout ts instead of allocating ts
from rootcoord.
3. Enhanced input file length check for binlog import.
4. Removal of duplicated manager in datanode.
5. Renaming of executor to scheduler in datanode.
6. Utilization of a thread pool in the scheduler in datanode.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-04-01 20:09:13 +08:00
congqixia fe2f34d76f
fix: Use server ctx instead of loopCtx for datacoord LivenessCheck (#31691)
See also #31689

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-29 10:37:11 +08:00
SimFG b1a1cca10b
feat: add more operation detail info for better allocation (#30438)
issue: #30436

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-03-28 06:33:11 +08:00
congqixia 16c661c722
enhance: Use different interval for gc scan (#31363)
See also #31362

This PR make datacoord garbage collection scan operation using differet
interval than other opeartion.

This interval is a newly added param item, which default value is 7*24
hours.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-19 11:27:06 +08:00
jaime db79be3ae0
fix: ctx cancel should be the last step while stopping server (#31220)
issue: #31219

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-03-15 10:33:05 +08:00
XuanYang-cn def72947c7
fix: Trigger l0 compaction when l0 views don't change (#30729)
Trigger l0 compaction when l0 views don't change

So that leftover l0 segments would be compacted in the end.

1. Refresh LevelZero plans in comactionPlanHandler, remove the meta
dependency
of compaction trigger v2
2. Add ForceTrigger method for CompactionView interface
3. rename mu to taskGuard
4. Add a new TriggerTypeLevelZeroViewIDLE
5. Add an idleTicker for compaction view manager

See also: #30098, #30556

Signed-off-by: yangxuan <xuan.yang@zilliz.com>

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-03-05 16:37:00 +08:00
jaime 4b0c3dd377
enhance: index meta use independent rather than global meta lock (#30869)
issue: https://github.com/milvus-io/milvus/issues/30837

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-03-04 16:56:59 +08:00
yihao.dai a434d33e75
feat: Add import scheduler and manager (#29367)
This PR introduces novel managerial roles for importv2:
1. ImportMeta: To manage all the import tasks;
2. ImportScheduler: To process tasks and modify their states;
3. ImportChecker: To ascertain the completion of all tasks and instigate
relevant operations.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-01 18:31:02 +08:00
chyezh 0c7474d7e8
enhance: add graceful stop timeout to avoid node stop hang under extreme cases (#30317)
1. add coordinator graceful stop timeout to 5s
2. change the order of datacoord component while stop
3. change querynode grace stop timeout to 900s, and we should
potentially change this to 600s when graceful stop is smooth

issue: #30310
also see pr: #30306

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-02-29 17:01:50 +08:00