Commit Graph

646 Commits (c38bca8d80cf995c570b7defbdc27cad4900ddaa)

Author SHA1 Message Date
yihao.dai a5a83a0904
fix: Fix consume blocked due to too many consumers (#38455)
This PR limits the maximum number of consumers per pchannel to 10 for
each QueryNode and DataNode.

issue: https://github.com/milvus-io/milvus/issues/37630

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-01-15 21:37:01 +08:00
presburger 38881bf591
enhance: prevent multiple query nodes from causing excessive occupancy of a single node, leading to GPU memory overflow (#39276) (#38617)
issue: #39276

Signed-off-by: yusheng.ma <yusheng.ma@zilliz.com>
2025-01-15 20:15:01 +08:00
congqixia d89768f9e0
enhance: Unify LoadStateLock RLock & PinIf (#39206)
Related to #39205

This PR merge `RLock` & `PinIfNotReleased` into `PinIf` function
preventing segment being released before any Read operation finished.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-01-14 18:38:59 +08:00
Zhen Ye fd84ed817c
enhance: add broadcast operation for msgstream (#39040)
issue: #38399

- make broadcast service available for msgstream by reusing the
architecture streaming service

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-14 15:14:59 +08:00
Ted Xu 4355b485e5
enhance: remove compaction parallelism control (#39081)
See #39080

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-01-10 13:23:00 +08:00
sthuang 6bc799061e
fix: fix privilege group list and list collections (#38684)
related: #37031
* built-in privilege group privileges in listPrivilegeGroups() should be
the same as in milvus.yaml
* collections granted by collection level built-in privilege group
should be list in showCollections()

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-12-25 18:00:51 +08:00
Zhen Ye 5395ec19ad
enhance: add mem size for index file and fallback to multiply with serialized size (#38716)
issue: #38715

Signed-off-by: chyezh <chyezh@outlook.com>
2024-12-25 14:46:49 +08:00
jaime 78438ef41e
fix: revert optimize CPU usage for CheckHealth requests (#35589) (#38555)
issue: #35563

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-12-19 00:38:45 +08:00
jaime 29e620fa6d
fix: sync task still running after DataNode has stopped (#38377)
issue: #38319

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-12-17 18:06:44 +08:00
jaime 28fdbc4e30
enhance: optimize CPU usage for CheckHealth requests (#35589)
issue: #35563
1. Use an internal health checker to monitor the cluster's health state,
storing the latest state on the coordinator node. The CheckHealth
request retrieves the cluster's health from this latest state on the
proxy sides, which enhances cluster stability.
2. Each health check will assess all collections and channels, with
detailed failure messages temporarily saved in the latest state.
3. Use CheckHealth request instead of the heavy GetMetrics request on
the querynode and datanode

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-12-17 11:02:45 +08:00
SimFG 2afe2eaf3e
feat: support to replicate collection when the services contains the system tt msg (#37559)
- issue: #37105

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-12-17 09:08:46 +08:00
sthuang c2855a5c74
enhance: add privilege group privilege into built-in privilege group (#38393)
related issue: https://github.com/milvus-io/milvus/issues/37031

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-12-12 17:20:42 +08:00
Ted Xu dc85d8e968
enhance: improve mix compaction performance by removing max segment limitations (#38344)
See #37234

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-12-11 20:38:42 +08:00
wei liu e279ccf109
enhance: Enable score based balance channel policy (#38143)
issue: #38142
current balance channel policy only consider current collection's
distribution, so if all collections has 1 channel, and all channels has
been loaded on same querynode, after querynode num increase, balance
channel won't be triggered.

This PR enable score based balance channel policy, to achieve:
1. distribute all channels evenly across multiple querynodes
2. distribute each collection's channel evenly across multiple
querynodes.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-12-11 17:20:43 +08:00
cai.zhang 41b19c6b1d
enhance: Determine the number of buffers based on the resource limits of the DataNode (#38209)
issue: #28410

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-12-08 18:02:40 +08:00
shaoyue 1f66b9ebfb
feat: add config field to set internal tls sni (#38124)
/cc @xiaofan-luan @jaime0815 @nish112022

part of https://github.com/milvus-io/milvus/issues/36864

Signed-off-by: haorenfsa <haorenfsa@gmail.com>
2024-12-04 14:56:47 +08:00
sthuang 23dc313c44
fix: fix grant/revoke v2 meta and unclear error messages (#38110)
related issue: https://github.com/milvus-io/milvus/issues/37031

fixed issues:
#37974: better error messages for grant v2 interface
#37903: fix meta built-in privilege group object name
#37843: better error messages for custom privilege group interface 
#38002: fix built-in privilege group meta to pass proxy interceptor
check
#38008: fix revoke v2 to support revoking v1 granted privileges

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-12-02 11:36:39 +08:00
SimFG 2208b7c2ef
fix: the too long default root password does not take effect (#37983)
- issue: #36987

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-11-26 17:24:35 +08:00
Zhen Ye 2b4f211d84
enhance: add switch for local rpc enabled (#37985)
issue: #33285

- Add switch for local rpc

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-26 17:00:54 +08:00
congqixia 83df725146
enhance: Revert default l0 forward policy to FilterByBF (#37867)
Related to #37767

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-22 09:00:33 +08:00
XuanYang-cn d7dcc752f1
enhance: Increase task capacity and clean illegal task (#37896)
1. taskQueueCapacity 256 is too small for production when we want to
re-write the entire collection

2. tasks should be cleaned when unable to recover, or the meta will
remain in etcd forever later.

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-21 18:48:32 +08:00
wei liu 0a440e0d38
fix: Prevent simultaneous balance of segments and channels (#37850)
issue: #33550
balance segment and balance segment execute at same time, which will
cause bounch of corner case.

This PR disable simultaneous balance of segments and channels

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-21 17:56:55 +08:00
nish112022 484c6b5c44
feat: Added code for Internal-tls (#36865)
issue : https://github.com/milvus-io/milvus/issues/36864

I have a few questions regarding my approach.I will consolidate them
here for feedback and review.Thanks

---------

Signed-off-by: Nischay Yadav <nischay.yadav@ibm.com>
Signed-off-by: Nischay <Nischay.Yadav@ibm.com>
2024-11-20 06:00:32 +08:00
cai.zhang dae4160466
enhance: Whether to enable mergeSort mode when performing mixCompaction (#37664)
issue: #37579

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-19 11:28:31 +08:00
sthuang 2d72ad33f2
enhance: RBAC built in privilege groups (#37720)
issue: #37031

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-11-18 20:38:39 +08:00
yihao.dai 0fc0d1a888
fix: Limit the concurrency of channel tasks (#37740)
Limit the maximum concurrency of channel tasks for each DataNode to
prevent excessive subscriptions from causing DataNode OOM.

issue: https://github.com/milvus-io/milvus/issues/37665

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-18 16:26:30 +08:00
Zhen Ye 81fa7dd52c
fix: add ddl and dcl concurrency to avoid competition (#37672)
issue: #37166

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-15 15:04:31 +08:00
congqixia 66bf254437
enhance: Enable `RemoteLoad` l0 forward policy by default (#37678)
Related to #35303

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-15 10:22:30 +08:00
Ted Xu a9538f6e96
fix: config file validator missed the file tail validation (#37608)
See: #32168

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-11-13 15:14:30 +08:00
XuanYang-cn a45a288a25
fix: Separate L0 and Mix trigger interval (#37190)
See also: #37108

- Add MixCompactionTriggerInterval, default 60s
- Add L0CompactionTriggerInterval, default 10s
- Export Single related compaction configs
- Raise SingleCompactionDeltaLogMaxSize from 2MB to 16MB

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-12 10:56:37 +08:00
Bingyi Sun c1eccce2fa
enhance: enable multiple chunked segment by default (#37570)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-12 09:20:28 +08:00
foxspy 3224e58c5b
enhance: add unify vector index config management (#36846)
issue: #34298

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-11-01 06:18:21 +08:00
XuanYang-cn 4926021c02
fix: Skip mark compaction timeout for mix and l0 compaction (#37118)
Timeout is a bad design for long running tasks, especially using a
static timeout config. We should monitor execution progress and fail the
task if the progress has been stale for a long time.

This pr is a small patch to stop DC from marking compaction tasks
timeout, while still waiting for DN to finish. The design is
self-conflicted. After this pr, mix and L0 compaction are no longer
controlled by DC timeout, but clustering is still under timeout control.

The compaction queue capacity grows larger for priority calc, hence
timeout compactions appears more often, and when timeout, the queuing
tasks will be timeout too, no compaction will success after.

See also: #37108, #37015

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-28 14:33:29 +08:00
yihao.dai f0b3942a08
enhance: Limit import job number (#36891)
issue: https://github.com/milvus-io/milvus/issues/36890

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-23 16:01:28 +08:00
Zhen Ye f3d9d05a28
fix: use binlog counter to trigger flush but not stats log (#37037)
issue: #36804

Signed-off-by: chyezh <chyezh@outlook.com>
2024-10-23 15:07:29 +08:00
jaime 4746f47282
feat: management WebUI homepage (#36822)
issue: #36784
1. Implement an embedded web server for WebUI access.  
2. Complete the homepage development.

Home page demo:
<img width="2177" alt="iShot_2024-10-10_17 57 34"
src="https://github.com/user-attachments/assets/38539917-ce09-4e54-a5b5-7f4f7eaac353">

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-10-23 11:29:28 +08:00
Ted Xu 50da48a30d
enhance: adding mix compaction first prioritizer (#36956)
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-10-18 11:37:24 +08:00
congqixia 1184319644
fix: Load original key if ts is MaxTimestamp (#36934)
Related to #36933

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-17 14:11:29 +08:00
Bingyi Sun a75bb85f3a
feat: support chunked column for sealed segment (#35764)
This PR splits sealed segment to chunked data to avoid unnecessary
memory copy and save memory usage when loading segments so that loading
can be accelerated.

To support rollback to previous version, we add an option
`multipleChunkedEnable` which is false by default.

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-12 15:04:52 +08:00
Ted Xu 5fc731795b
enhance: Datacoord to support prioritization of compaction tasks (#36547)
See #36550

This PR made 2 changes:

1. Introducing a prioritization mechanism, if
`dataCoord.compaction.taskPrioritizer` is set to `level`, compaction
tasks are always executed as the priority of L0>Mix>Clustering
2. `dataCoord.compaction.maxParallelTaskNum` now controls the
parallelism of executing tasks, not the task number of queue +
executing.

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-10-09 19:11:20 +08:00
XuanYang-cn c84bdfa766
fix: raise l0 compaction memory ratio to 0.5 (#36690)
5 percent of free memory is too less for l0 compaction. This pr will
raise it to 50 percent.

See also: #36614

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-09 17:19:24 +08:00
yihao.dai 0fc2a4aa53
enhance: Optimize import scheduling and add time cost metric (#36601)
1. Optimize import scheduling strategic:
a. Revise slot weights, calculating them based on the number of files
and segments for both import and pre-import tasks.
b. Ensure that the DN executes tasks in ascending order of task ID.
2. Add time cost metric and log.

issue: https://github.com/milvus-io/milvus/issues/36600,
https://github.com/milvus-io/milvus/issues/36518

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-09 14:41:20 +08:00
Rijin-N a05a37a583
enhance: GCS native support (GCS implemented using Google Cloud Storage libraries) (#36214)
Native support for Google cloud storage using the Google Cloud Storage
libraries. Authentication is performed using GCS service account
credentials JSON.

Currently, Milvus supports Google Cloud Storage using S3-compatible APIs
via the AWS SDK. This approach has the following limitations:

1. Overhead: Translating requests between S3-compatible APIs and GCS can
introduce additional overhead.
2. Compatibility Limitations: Some features of the original S3 API may
not fully translate or work as expected with GCS.

To address these limitations, This enhancement is needed.

Related Issue: #36212
2024-09-30 13:23:32 +08:00
Zhen Ye a6545b2e29
fix: refactor milvus config and change default txn timeout (#36522)
issue: #36498

Signed-off-by: chyezh <chyezh@outlook.com>
2024-09-29 11:01:15 +08:00
jaime 52cce4de58
fix: iaccurate size estimation for encoded array data (#36373)
issue: #36029

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-09-24 14:51:14 +08:00
congqixia 1833913f44
enhance: Add streaming forward policy switch for delegator (#36330)
Related to #35303

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-23 18:01:12 +08:00
SimFG c50fe71163
fix: long buffering causes mq to be unable to receive messages. (#36420)
- issue: #36397

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-09-23 16:33:18 +08:00
yihao.dai 763fd0dfc5
enhance: Use a separate mmap config for chunk cache (#36276)
issue: https://github.com/milvus-io/milvus/issues/35273

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-15 16:23:09 +08:00
Chun Han b8b4aea4f5
enhance: restrict max group size(#33544) (#36223)
related: #33544

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-09-14 15:45:08 +08:00
aoiasd c22a2cebb6
fix: split stream query result to avoid grpc response too large error (#36090)
relate: https://github.com/milvus-io/milvus/issues/36089

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-09-13 15:07:09 +08:00