Commit Graph

648 Commits (master)

Author SHA1 Message Date
yihao.dai 760223f80a
fix: use seperate warmup pool and disable warmup by default (#33348)
1. use a small warmup pool to reduce the impact of warmup
2. change the warmup pool to nonblocking mode
3. disable warmup by default
4. remove the maximum size limit of 16 for the load pool

issue: https://github.com/milvus-io/milvus/issues/32772

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: xiaofanluan <xiaofan.luan@zilliz.com>
2024-05-27 01:25:40 +08:00
congqixia 5cdc6ae489
enhance: Sync `deleteBufBytes` config value to default config (#33320)
The delete buffer size is set to 64MB in milvus.yaml but the default set
up shall be 16MB

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-05-24 10:41:40 +08:00
aoiasd 1b4e28b97f
enhance: Check by proxy rate limiter when delete get data by query. (#30891)
relate: https://github.com/milvus-io/milvus/issues/30927

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-05-23 20:03:40 +08:00
wei liu c7be2ce33a
enhance: Decrease bloom filter fp rate to reduce delete impact (#33301)
when milvus process delete record, it need to find record's corresponded
segment by bloom filter, and higher bloom filter fp rate will cause
delete record forwards to wrong segments.

This PR Decrease bloom filter's default fp to 0.001.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-23 18:15:41 +08:00
shaoting-huang de7901121f
Upgrade go from 1.20 to 1.21 (#33047)
Signed-off-by: shaoting-huang [shaoting-huang@zilliz.com]

issue: https://github.com/milvus-io/milvus/issues/32982

# Background
Go 1.21 introduces several improvements and changes over Go 1.20, which
is quite stable now. According to
[Go 1.21 Release Notes](https://tip.golang.org/doc/go1.21), the big
difference of Go 1.21 is enabling Profile-Guided Optimization by
default, which can improve performance by around 2-14%. Here are the
summary steps of PGO:
1. Build Initial Binary (Without PGO)
2. Deploying the Production Environment
3. Run the program and collect Performance Analysis Data (CPU pprof)
4. Analyze the Collected Data and Select a Performance Profile for PGO
5. Place the Performance Analysis File in the Main Package Directory and
Name It default.pgo
6. go build Detects the default.pgo File and Enables PGO
7. Build and Release the Updated Binary (With PGO)
8. Iterate and Repeat the Above Steps
<img width="657" alt="Screenshot 2024-05-14 at 15 57 01"
src="https://github.com/milvus-io/milvus/assets/167743503/b08d4300-0be1-44dc-801f-ce681dabc581">

# What does this PR do
There are three experiments, search benchmark by Zilliz test platform,
search benchmark by open-source
[VectorDBBench](https://github.com/zilliztech/VectorDBBench?tab=readme-ov-file),
and search benchmark with PGO. We do both search benchmarks by Zilliz
test platform and by VectorDBBench to reduce reliance on a single
experimental result. Besides, we validate the performance enhancement
with PGO.

## Search Benchmark Report by Zilliz Test Platform
An upgrade to Go 1.21 was conducted on a Milvus Standalone server,
equipped with 16 CPUs and 64GB of memory. The search performance was
evaluated using a 1 million entry local dataset with an L2 metric type
in a 768-dimensional space. The system was tested for concurrent
searches with 50 concurrent tasks for 1 hour, each with a 20-second
interval. The reason for using one server rather than two servers to
compare is to guarantee the same data source and same segment state
after compaction.

Test Sequence:
1. Go 1.20 Initial Run: Insert data, build index, load index, and
search.
2. Go 1.20 Rebuild: Rebuild the index with the same dataset, load index,
and search.
3. Go 1.21 Load: Upload to Go 1.21 within the server. Then load the
index from the second run, and search.
4. Go 1.21 Rebuild: Rebuild the index with the same dataset, load index,
and search.

Search Metrics: 
| Metric | Go 1.20 | Go 1.20 Rebuild Index | Go 1.21 | Go 1.21 Rebuild
Index |

|----------------------------|------------------|-----------------|------------------|-----------------|
| `search requests` | 10,942,683 | 16,131,726 | 16,200,887 | 16,331,052
|
| `search fails` | 0 | 0 | 0 | 0 |
| `search RT_avg` (ms) | 16.44 | 11.15 | 11.11 | 11.02 |
| `search RT_min` (ms) | 1.30 | 1.28 | 1.31 | 1.26 |
| `search RT_max` (ms) | 446.61 | 233.22 | 235.90 | 147.93 |
| `search TP50` (ms) | 11.74 | 10.46 | 10.43 | 10.35 |
| `search TP99` (ms) | 92.30 | 25.76 | 25.36 | 25.23 |
| `search RPS` | 3,039 | 4,481 | 4,500 | 4,536 |

### Key Findings
The benchmark tests reveal that the index build time with Go 1.20 at
340.39 ms and Go 1.21 at 337.60 ms demonstrated negligible performance
variance in index construction. However, Go 1.21 offers slightly better
performance in search operations compared to Go 1.20, with improvements
in handling concurrent tasks and reducing response times.

## Search Benchmark Report By VectorDb Bench
Follow
[VectorDBBench](https://github.com/zilliztech/VectorDBBench?tab=readme-ov-file)
to create a VectorDb Bench test for Go 1.20 and Go 1.21. We test the
search performance with Go 1.20 and Go 1.21 (without PGO) on the Milvus
Standalone system. The tests were conducted using the Cohere dataset
with 1 million entries in a 768-dimensional space, utilizing the COSINE
metric type.

Search Metrics: 
Metric | Go 1.20 | Go 1.21 without PGO
-- | -- | --
Load Duration (seconds) | 1195.95 | 976.37
Queries Per Second (QPS) | 841.62 | 875.89
99th Percentile Serial Latency (seconds) | 0.0047 | 0.0076
Recall | 0.9487 | 0.9489

### Key Findings
Go 1.21 indicates faster index loading times and larger search QPS
handling.

## PGO Performance Test
Milvus has already added
[net/http/pprof](https://pkg.go.dev/net/http/pprof) in the metrics. So
we can curl the CPU profile directly by running
`curl -o default.pgo
"http://${MILVUS_SERVER_IP}:${MILVUS_SERVER_PORT}/debug/pprof/profile?seconds=${TIME_SECOND}"`
to collect the profile as the default.pgo during the first search. Then
I build Milvus with PGO and use the same index to run the search again.
The result is as below:

Search Metrics
| Metric | Go 1.21 Without PGO | Go 1.21 With PGO | Change (%) |

|---------------------------------------------|------------------|-----------------|------------|
| `search Requests` | 2,644,583 | 2,837,726 | +7.30% |
| `search Fails` | 0 | 0 | N/A |
| `search RT_avg` (ms) | 11.34 | 10.57 | -6.78% |
| `search RT_min` (ms) | 1.39 | 1.32 | -5.18% |
| `search RT_max` (ms) | 349.72 | 143.72 | -58.91% |
| `search TP50` (ms) | 10.57 | 9.93 | -6.05% |
| `search TP99` (ms) | 26.14 | 24.16 | -7.56% |
| `search RPS`                 | 4,407       | 4,729       | +7.30%    |

### Key Findings
PGO led to a notable enhancement in search performance, particularly in
reducing the maximum response time by 58% and increasing the search QPS
by 7.3%.

### Further Analysis
Generate a diff flame graphs between two CPU profiles by running `go
tool pprof -http=:8000 -diff_base nopgo.pgo pgo.pgo -normalize`

<img width="1894" alt="goprofiling"
src="https://github.com/milvus-io/milvus/assets/167743503/ab9e91eb-95c7-4963-acd9-d1c3c73ee010">
Further insight of HnswIndexNode and Milvus Search Handler
<img width="1906" alt="hnsw"
src="https://github.com/milvus-io/milvus/assets/167743503/a04cf4a0-7c97-4451-b3cf-98afc20a0b05">
<img width="1873" alt="search_handler"
src="https://github.com/milvus-io/milvus/assets/167743503/5f4d3982-18dd-4115-8e76-460f7f534c7f">

After applying PGO to the Milvus server, the CPU utilization of the
faiss::fvec_L2 function has decreased. This optimization significantly
enhances the performance of the
[HnswIndexNode::Search::searchKnn](e0c9c41aa2/src/index/hnsw/hnsw.cc (L203))
method, which is frequently invoked by Knowhere during high-concurrency
searches. As the explanation from Go release notes, the function might
be more aggressively inlined by Go compiler during the second build with
the CPU profiling collected from the first run. As a result, the search
handler efficiency within Milvus DataNode has improved, allowing the
server to process a higher number of search queries per second (QPS).



# Conclusion
The combination of Go 1.21 and PGO has led to substantial enhancements
in search performance for Milvus server, particularly in terms of search
QPS and response times, making it more efficient for handling
high-concurrency search operations.

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-05-22 13:21:39 +08:00
yihao.dai 32560263fa
enhance: Query slot for compaction task (#32881)
Query slot of compaction in datanode, and transfer the control logic for
limiting compaction tasks from datacoord to the datanode.

issue: https://github.com/milvus-io/milvus/issues/32809

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-05-17 18:19:38 +08:00
wei liu cba2c7a3be
enhance: clean channel node info in meta store (#32988)
issue: #32910
see also: #32911
when channel exclusive mode is enabled, replica will record channel node
info in meta store, and if the balance policy changes, which means
channel exclusive mode is disabled, we should clean up the channel node
info in meta store, and stop to balance node between channels.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-14 10:05:40 +08:00
foxspy f6777267e3
enhance: add score compute consistency config for knowhere (#32997)
issue: https://github.com/milvus-io/milvus/issues/32583
related: #32584

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-05-13 14:21:31 +08:00
Bingyi Sun 4724779b3b
enhance: remove fallback keys for config generator (#32946)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-05-13 13:33:31 +08:00
yiwangdr 855192eb3d
fix: sync milvus.yaml (#32920)
issue: https://github.com/milvus-io/milvus/issues/25309

Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2024-05-10 17:29:31 +08:00
aoiasd 54a51b1236
enhance: Support dynamic config for opentelemetry trace (#32169)
relate: https://github.com/milvus-io/milvus/issues/31940

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-05-09 17:43:30 +08:00
chyezh 641f702f64
fix: add request resource timeout for lazy load, refactor context usage in cache (#32709)
issue: #32663

- Use new param to control request resource timeout for lazy load.

- Remove the timeout parameter of `Do`, remove `DoWait`. use `context`
to control the timeout.

- Use `VersionedNotifier` to avoid notify event lost and broadcast,
remove the redundant goroutine in cache.

related dev pr: #32684

Signed-off-by: chyezh <chyezh@outlook.com>
2024-05-07 16:33:30 +08:00
Bingyi Sun fecd9c21ba
feat: LRU cache implementation (#32567)
issue: https://github.com/milvus-io/milvus/issues/32783
This pr is the implementation of lru cache on branch lru-dev.

Signed-off-by: sunby <sunbingyi1992@gmail.com>
Co-authored-by: chyezh <chyezh@outlook.com>
Co-authored-by: MrPresent-Han <chun.han@zilliz.com>
Co-authored-by: Ted Xu <ted.xu@zilliz.com>
Co-authored-by: jaime <yun.zhang@zilliz.com>
Co-authored-by: wayblink <anyang.wang@zilliz.com>
2024-05-06 20:29:30 +08:00
chyezh 2586c2f1b3
enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740)
issue: #19095,#29655,#31718

- Change `ListWithPrefix` to `WalkWithPrefix` of OOS into a pipeline
mode.

- File garbage collection is performed in other goroutine.

- Segment Index Recycle clean index file too.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-04-25 20:41:27 +08:00
Ted Xu 744a54a534
enhance: enforce milvus.yaml assertion in UT (#32357)
See #32168

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-04-19 16:47:20 +08:00
Ted Xu 78d32bd8b2
enhance: update milvus.yaml (#31832)
See #32168

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-04-16 16:17:19 +08:00
edward.zeng b7ff85638d
fix: mvcc database space exceeded for embed etcd (#32048)
Fix #30314

Signed-off-by: Edward Zeng <jie.zeng@zilliz.com>
2024-04-12 21:39:19 +08:00
jaime 371e6d2c1a
enhance: refine sync memory watermark configuration (#32140)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-04-11 20:07:24 +08:00
yihao.dai 49d109de18
enhance: Use an individual buffer size parameter for imports (#31833)
Use an individual buffer size parameter for imports and set buffer size
to 64MB.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-04-08 21:07:18 +08:00
yihao.dai 4e264003bf
enhance: Ensure ImportV2 waits for the index to be built and refine some logic (#31629)
Feature Introduced:
1. Ensure ImportV2 waits for the index to be built

Enhancements Introduced:
1. Utilization of local time for timeout ts instead of allocating ts
from rootcoord.
3. Enhanced input file length check for binlog import.
4. Removal of duplicated manager in datanode.
5. Renaming of executor to scheduler in datanode.
6. Utilization of a thread pool in the scheduler in datanode.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-04-01 20:09:13 +08:00
Bingyi Sun fbff46a005
enhance: add lazyload global config (#31610)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-03-27 20:23:10 +08:00
groot 5be395354c
fix: minio ssl compatible issue (#31607)
issue: https://github.com/milvus-io/milvus/issues/30709

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2024-03-27 14:41:20 +08:00
presburger fe1961ff14
enhance: add comments for gpu mem pool setting (#31231)
Signed-off-by: yusheng.ma <yusheng.ma@zilliz.com>
2024-03-25 14:41:07 +08:00
yihao.dai f65a796d18
enhance: Add max file num limit and max file size limit for import (#31497)
The max number of import files per request should not exceed 1024 by
default (configurable).
The import file size allowed for importing should not exceed 16GB by
default (configurable).

issue: https://github.com/milvus-io/milvus/issues/28521

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-22 18:13:06 +08:00
yihao.dai 0fe5e90e8b
enhance: Remove import v1 (#31403)
Remove all code and logic related to import v1.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-22 15:29:09 +08:00
Chun Han c3264ca3e3
feat: support segment pruner (#31003)
related: #30376
2024-03-22 13:57:06 +08:00
groot c81909bfab
enhance: Support MinIO TLS connection (#31311)
issue: https://github.com/milvus-io/milvus/issues/30709
pr: #31292

Signed-off-by: yhmo <yihua.mo@zilliz.com>
Co-authored-by: Chen Rao <chenrao317328@163.com>
2024-03-21 11:15:20 +08:00
Bingyi Sun 9dbd67879f
enhance: use mmap prefix to define all mmap related configs (#31436)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-03-20 17:55:08 +08:00
Jiquan Long dc2cdbe387
enhance: add more metrics (#31271)
/kind improvement
fix: #31272 

This pr add more metrics, which are:
- Slow query count, which the duration considered as slow can be
configurable;
- Number of deleted entities;
- Number of entities imported;
- Number of entities per collection;
- Number of loaded entities per collection;
- Number of indexed entities;
- Number of indexed entities, per collection, per index and whether it's
a vetor index;
- Quota states (LongTimeTickDelay, MemoryExhuasted, DiskQuotaExhuasted)
per database;

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-03-19 15:23:06 +08:00
congqixia 16c661c722
enhance: Use different interval for gc scan (#31363)
See also #31362

This PR make datacoord garbage collection scan operation using differet
interval than other opeartion.

This interval is a newly added param item, which default value is 7*24
hours.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-19 11:27:06 +08:00
Bingyi Sun bdc70dfc6a
feat: Add global mmap enable configuration (#31267)
https://github.com/milvus-io/milvus/issues/31279

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-03-18 15:17:10 +08:00
XuanYang-cn ff80d2fd8c
enhance: Enable L0 by default (#30998)
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-03-08 15:53:02 +08:00
yihao.dai c411cb4a49
enhance: Prevent the backlog of channelCP update tasks, perform batch updates of channelCPs (#30941)
This PR includes the following adjustments:
1. To prevent channelCP update task backlog, only one task with the same
vchannel is retained in the updater. Additionally, the lastUpdateTime is
refreshed after the flowgraph submits the update task, rather than in
the callBack function.
2. Batch updates of multiple vchannel checkpoints are performed in the
UpdateChannelCheckpoint RPC (default batch size is 128). Additionally,
the lock for channelCPs in DataCoord meta has been switched from key
lock to global lock.
3. The concurrency of UpdateChannelCheckpoint RPCs in the datanode has
been reduced from 1000 to 10.

issue: https://github.com/milvus-io/milvus/issues/30004

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: jaime <yun.zhang@zilliz.com>
Co-authored-by: congqixia <congqi.xia@zilliz.com>
2024-03-07 20:39:02 +08:00
Jiquan Long a88c896733
enhance: purge client infos periodically (#31037)
https://github.com/milvus-io/milvus/issues/31007

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-03-06 12:50:59 +08:00
congqixia 8c2615f840
enhance: Add unit(seconds) for new added connection manager param (#31023)
See also #31007 #31008 #31009

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-05 14:50:59 +08:00
congqixia 3b5ce73ded
enhance: Change proxy connection manager to concurrent safe (#31008)
See also #31007

This PR:
- Add param item for connection manager behavior: TTL & check interval
- Change clientInfo map to concurrent map

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-05 10:39:00 +08:00
yihao.dai a434d33e75
feat: Add import scheduler and manager (#29367)
This PR introduces novel managerial roles for importv2:
1. ImportMeta: To manage all the import tasks;
2. ImportScheduler: To process tasks and modify their states;
3. ImportChecker: To ascertain the completion of all tasks and instigate
relevant operations.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-01 18:31:02 +08:00
groot 85de56e894
fix: Clean kafka default configuration (#30924)
issue: #30917

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2024-03-01 18:17:03 +08:00
MrPresent-Han 17a2fd048e
feat: support set up knowhere-build-pool-size on querynode(#29650) (#30922)
related: #29650

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-02-29 18:15:00 +08:00
groot ba6d33cd57
fix: Support TLS for kafka connection (#30468)
#27977

Add extra configurations in milvus.yaml to pass certificates for kafka.

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2024-02-28 18:43:07 +08:00
chyezh 941dc755df
feat: add collection level flush rate control (#29567)
flush rate control at collection level to avoid generate too much
segment.
0.1 qps by default.

issue: #29477

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-02-18 15:32:50 +08:00
XuanYang-cn e0ed5647b3
fix: Limit L0 Compaction segment size and count (#30374)
See also: #30191

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-02-01 20:39:03 +08:00
yihao.dai c5918290e6
feat: Add import executor and manager for datanode (#29438)
This PR introduces novel importv2 roles for datanode:
1. Executor: To execute tasks, a import task will be divided into the
following steps: read data -> hash data -> sync data;
2. Manager: To manage all the tasks;

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-31 20:45:04 +08:00
cai.zhang 47af347d0e
enhance: Limit index pool size of standalone server (#30170)
issue: #29926

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-01-30 16:47:03 +08:00
Bingyi Sun 406bf14e84
enhance: Add growing row count weight (#30271)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-01-29 14:05:02 +08:00
xige-16 033eae9e73
enhance: Set segment.maxSize param to 1024M (#30139)
issue: #25639 
/kind improvement

When the number of vector columns increases, the number of rows per
segment will decrease. In order to reduce the impact on vector indexing
performance, it is necessary to increase the segment max limit.

If a collection has multiple vector fields with memory and disk indices
on different vector fields, the size limit after segment compaction is
the minimum of segment.maxSize and segment.diskSegmentMaxSize.

Signed-off-by: xige-16 <xi.ge@zilliz.com>

---------

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2024-01-29 10:17:02 +08:00
congqixia 7ced0af197
enhance: Enlarge default datanode sync parallel to 256 (#30270)
See also #27675

After supporting control sync parallel in datanode globally, the shall
change default value to a more suitable value for most use cases.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-26 11:35:00 +08:00
yihao.dai c02fb64ad6
enhance: Allows proactive warming up of chunk cache (#30182)
Allows proactive warming up of chunk cache. Original vector data will be
asynchronously loaded into the chunk cache during the load process. It
has the potential to significantly reduce query/search latency for a
certain duration after the load, albeit with a concurrent increase in
disk usage.

issue: https://github.com/milvus-io/milvus/issues/30181

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-25 19:55:39 +08:00
MrPresent-Han 2a0eb1d2e6
feat: support general capacity restrict for cloud-side resoure contro… (#29845)
related: #29844

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-16 16:32:53 +08:00
wei liu 07057fcf7c
fix: Unexpected rpc msg size limit (#29682)
due to `clientMaxSendSize` and `serverMaxRecvSize` will limit the rpc
request size limit, they should use same config value, and
`serverMaxSendSize` and `clientMaxRecvSize` will limit the rpc response
size limit, they should use same config value too.

This PR fix unexpected rpc msg limit which caused by the wrong usage of
misunderstanding rpc config items

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-05 15:56:47 +08:00
MrPresent-Han ed644983e2
enhance: add param for bloomfilter(#29388) (#29490)
related: #29388

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-12-28 18:10:46 +08:00
xige-16 0a70e8b601
enhance: Remove multiple vector field limit (#27827)
issue: https://github.com/milvus-io/milvus/issues/25639

/kind improvement
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-12-28 16:40:46 +08:00
wei liu 839a72129e
fix: Auto balance param can't be updated by dynamic (#29501)
This PR fixed that auto balance param can't be updated by dynamic

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-12-27 14:30:53 +08:00
MrPresent-Han 7c7003bff6
enhance:refine the range of chunk size config value(#29388) (#29389)
related: #29388

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-12-26 17:36:46 +08:00
cqy123456 4c979538a4
enhance: update cagra index params in config and add params check (#29045)
issue:https://github.com/milvus-io/milvus/issues/29230
this pr do two things about cagra index:
 a.milvus yaml config support gpu memory settings

 b.add cagra-params check

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
Co-authored-by: yusheng.ma <yusheng.ma@zilliz.com>
2023-12-26 11:04:47 +08:00
zhagnlu a602171d06
enhance: Refactor runtime and expr framework (#28166)
#28165

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-12-18 12:04:42 +08:00
aoiasd b5ee563914
fix: accesslog can not print search expression (#28899)
relate: https://github.com/milvus-io/milvus/issues/28893

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-12-13 18:50:42 +08:00
wayblink 51f870da7e
feat: Introduce channelCheckpointUpdater to reduce goroutine use in ttNode (#28570)
/kind enhancement

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-12-12 13:48:42 +08:00
Enwei Jiao 0e65e90338
enhance: Support otlp with insecure (#29115)
issue: https://github.com/milvus-io/milvus/issues/28914

Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-12-12 11:14:37 +08:00
wayblink 6736f65345
feat: skip some empty ttMsg in Datanode flowgraph (#28756)
/kind feature

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-12-07 01:00:37 +08:00
shaoyue 4a067a4c8c
enhance: Add proxy.ginLogSkipPaths (#28945)
Fix #28944

/cc @xiaofan-luan

Signed-off-by: shaoyue.chen <shaoyue.chen@zilliz.com>
2023-12-06 10:26:35 +08:00
aoiasd 3cc4209d26
enhance: pack proxy connection code and support accesslog print SDK_Version (#28835)
relate: https://github.com/milvus-io/milvus/issues/28086
https://github.com/milvus-io/milvus/issues/28940

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-12-05 16:50:47 +08:00
shaoyue 8b2b0d412c
enhance: storeageType default value change to remote (#28792)
/kind enhancement
/cc @PowderLi

Signed-off-by: shaoyue.chen <shaoyue.chen@zilliz.com>
2023-11-30 11:34:27 +08:00
cqy123456 3b1b14dd78
fix: update binlog index memory uasge before loading segments (#28528)
issue: #27678 
when interimIndex = true, memory predict should be update with the
memory usage of binlog index build process.

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2023-11-29 16:42:27 +08:00
aoiasd 89d8ce2f73
enhance: refine access log to support format access log by yaml and print name info. (#28319)
relate: https://github.com/milvus-io/milvus/issues/28086

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-11-28 15:32:31 +08:00
Bingyi Sun 4fedff6d47
feat: integrate storage v2 into the write path (#28440)
#28378

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-11-23 17:26:24 +08:00
XuanYang-cn e88bbaac24
feat: Add universal levelzero segment switch (#28483)
See also: #27349

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-11-20 19:04:22 +08:00
SimFG cfb6edea61
Support to trace the grpc request (#28349)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-11-13 20:16:18 +08:00
XuanYang-cn f8aa46419a
Add LevelZeroCompaction configs (#28190)
See also: #27606

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-11-13 11:18:19 +08:00
wei liu 5b45a138b1
disable auto balance when old node exists (#28191)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-07 14:02:20 +08:00
Xiaofan da19e49daf
Support purge old session for standalone (#28184)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-11-06 21:21:42 +08:00
zhenshan.cao b596d8e75b
Change storageType to the default minio (#28140)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2023-11-03 10:50:17 +08:00
Enwei Jiao 8ae9c947ae
Use OpenDAL to access object store (#25642)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-11-01 09:00:14 +08:00
cqy123456 4fbe3c9142
replace loaded binlog with binlog index for search performance (#27673)
Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2023-11-01 02:20:15 +08:00
yah01 2af46d7333
Increase the ChunkManager request timeout (#28015)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-31 09:06:13 +08:00
aoiasd 53246b1b38
Set accesslog default to close and use stdout (#27891)
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-10-25 10:30:10 +08:00
wei liu 40723a292e
reduce compact parallel task num (#27899)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-10-25 09:40:12 +08:00
jaime 4640928280
Fix initialization and backward compatibility issue for GRPC compression (#27894)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-10-24 22:14:14 +08:00
Sheldon 351c64b606
fix some typos (#27851)
1. fix some typos in md,yaml #22893

Signed-off-by: Sheldon <chuanfeng.liu@zilliz.com>
2023-10-24 09:30:10 +08:00
Xiaofan 2ea7579dbb
Reduce rpc size for GetRecoveryInfoV2 (#27483)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-10-23 21:44:09 +08:00
zhagnlu 6060dd7ea8
Add chunk manager request timeout (#27692)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-10-23 20:08:08 +08:00
SimFG 9b0ecbdca7
Support to replicate the mq message (#27240)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-10-20 14:26:09 +08:00
smellthemoon 4b0ec156b3
Set channel work pool size in datanode (#27728)
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2023-10-19 08:28:08 +08:00
yihao.dai 106c17f304
Make read ahead policy in ChunkCache configurable (#27291)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-09-28 15:47:27 +08:00
foxspy 5db4a0489e
dynamic index version control (#27335)
Co-authored-by: longjiquan <jiquan.long@zilliz.com>
2023-09-25 21:39:27 +08:00
foxspy 370b6fde58
milvus support multi index engine (#27178)
Co-authored-by: longjiquan <jiquan.long@zilliz.com>
2023-09-22 09:59:26 +08:00
yihao.dai fe01d54eca
Set kafka read timeout to 10s and make it configurable (#27238)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-09-20 22:01:27 +08:00
smellthemoon d1f6825b86
Specify componet ip (#27161)
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2023-09-20 11:49:25 +08:00
chyezh 791e6ef6c6
[Improvement] add pulsar metrics and fix timeout (#26907)
Signed-off-by: chyezh <ye.zhen@zilliz.com>
2023-09-13 12:03:19 +08:00
yiwangdr 337edc321b
tikv integration (#26246)
Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2023-09-07 07:25:14 +08:00
SimFG 28681276e2
Improve the retry of the rpc client (#26795)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-06 17:43:14 +08:00
wei liu 1097776477
stop heartbeat if reach heartbeat limit (#26728)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-09-04 17:51:48 +08:00
XuanYang-cn b2e7cbdf4b
Remove TimeTravel in compactor (#26785)
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-09-04 17:41:48 +08:00
zhagnlu 7056e9c0f7
Increase minio log level to avoid unnecessary log (#26776)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-09-04 17:25:48 +08:00
chyezh 0530fd80c9
[Fixup] remove nats from default (#26791)
Signed-off-by: chyezh <ye.zhen@zilliz.com>
2023-09-04 10:01:04 +08:00
MrPresent-Han 7d5a4b2994
add more event for segcore search(#26277) (#26688)
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-08-30 14:15:01 +08:00
jaime c603f1c244
Remove mysql metastore (#26633)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-08-29 14:36:26 +08:00
congqixia ee7aef9272
Make pulsar request timeout configurable (#26525)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-08-23 20:46:23 +08:00
SimFG d13ca54414
Change the default gracefulStopTimeout and only warn when fail to refresh policy cache (#26450)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-08-22 09:04:21 +08:00
wayblink c5a1b41f95
Update session ttl to 60s (#26346)
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-08-18 22:52:20 +08:00
wei liu 79ccf06cf6
refine proxy to querynode heartbeat interval (#26426)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-08-18 14:26:19 +08:00
wei liu 74133a3996
refine retry on grpc (#26360)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-08-16 19:36:18 +08:00
chyezh 4c4b2903f9
[Fixup] nats log configuration, open nats log at info level by deafult (#26168)
Signed-off-by: chyezh <ye.zhen@zilliz.com>
2023-08-14 14:21:32 +08:00
yah01 48422dd4c5
Fix spawn too many threads (#26293)
- Low the thread pool cap
- Limit CGO calls concurrency

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-08-11 18:29:29 +08:00
zhagnlu 411f9ac823
Upgrade minio-go and add region and virtual host config for segcore chunk manager (#26194)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-08-11 10:37:36 +08:00
MrPresent-Han 3421956afa
modify default value for sync max parallel to mitigate oom on dn(#26763) (#26231)
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-08-10 10:23:15 +08:00
congqixia 767955ec6b
Reduce MQ buffer length and flowgraph wait queue length to 16 (#26179)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-08-09 10:05:14 +08:00
yihao.dai b6effd7345
Disable deny writing when the growing segment size exceeds the watermark (#26163)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-08-08 12:43:07 +08:00
wei liu 49902f1b37
adjust default value of queryNode.enableDisk (#25404)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-08-04 10:07:06 +08:00
MrPresent-Han 47392b0a0f
support metrics mutex to monitor cost of locks(#26102) (#26103)
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-08-03 15:03:06 +08:00
MrPresent-Han 5634ba777d
add new threadpool with various priority to avoid deadlock(#25781) (#26028)
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-08-03 09:31:07 +08:00
zhagnlu 833674c1cb
add glog configurable function and redirect aws log to segcore log (#25664)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-07-27 19:49:02 +08:00
xige-16 6f18587f35
Fix small segment compaction (#21327)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-07-26 14:49:01 +08:00
chyezh e24a8b3606
[Feature] Add benchmark and retention configuration for nmq (#25768)
Signed-off-by: chyezh <ye.zhen@zilliz.com>
2023-07-25 19:33:01 +08:00
yah01 45b89cfc71
Low the predicting memory usage factor to 1 (#25884)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-07-25 16:37:01 +08:00
MrPresent-Han f4e72cb170
remove sync segmentLastExpire every time when assigning(#25271) (#25316) (#25557)
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-07-24 14:11:07 +08:00
Bingyi Sun 9f31fc9a31
Add broker timeout config item. (#25855)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-07-24 14:07:00 +08:00
cai.zhang b15e34db21
Add contraint for compaction based indexed segments (#25709)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-07-23 21:31:00 +08:00
aoiasd 3545b1a608
Refine rocksmq (#25031)
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-07-18 20:18:57 +08:00
congqixia 8d343bf75a
Make compaction rpc timeout and parallel maxium configurable (#25672)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-07-18 14:25:20 +08:00
Xiaofan dbf0130803
Faster garbage collect on compacted data (#25088)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-07-12 10:14:29 +08:00
yah01 8b06941da3
Reduce the load memory usage predict factor to 2 (#25469)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-07-11 13:06:28 +08:00
smellthemoon d63323d117
Add rate limit and deny write in upsert (#25351)
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2023-07-11 11:20:34 +08:00
wei liu 342cfcad46
decrease default value of group max nq (#25380)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-07-07 15:40:25 +08:00
congqixia efdd71c640
Make cgo pool size larger than worker pool size (#25318)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-07-05 16:56:26 +08:00
foxspy 31173727b2
growing segment index memory opt & get vector bugfix (#25272)
Signed-off-by: xianliang <xianliang.li@zilliz.com>
2023-07-05 00:04:25 +08:00
chyezh d7d61f529c
[Feature|Pick] enable scheduler policy and add user-task-polling policy (#24839)
Signed-off-by: chyezh <ye.zhen@zilliz.com>
2023-07-03 18:24:25 +08:00
smellthemoon b30517d303
Enlarge timeout to prevent health check failure (#25173)
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2023-06-29 17:10:23 +08:00
jaime 18df2ba6fd
[Cherry-Pick] Support Database (#24769)
Support Database(#23742)
Fix db nonexists error for FlushAll (#24222)
Fix check collection limits fails (#24235)
backward compatibility with empty DB name (#24317)
Fix GetFlushAllState with DB (#24347)
Remove db from global meta cache after drop database (#24474)
Fix db name is empty for describe collection response (#24603)
Add RBAC for Database API (#24653)
Fix miss load the same name collection during recover stage (#24941)

RBAC supports Database validation (#23609)
Fix to list grant with db return empty (#23922)
Optimize PrivilegeAll permission check (#23972)
Add the default db value for the rbac request (#24307)

Signed-off-by: jaime <yun.zhang@zilliz.com>
Co-authored-by: SimFG <bang.fu@zilliz.com>
Co-authored-by: longjiquan <jiquan.long@zilliz.com>
2023-06-25 17:20:43 +08:00
Xiaofan 72c5e2a41a
Fix channel reassigned to other datanodes (#25015)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-06-21 21:26:42 +08:00
cai.zhang c9e456c6eb
Remove metric_type check and fix some minor bugs (#24921)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
Co-authored-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2023-06-19 09:54:41 +08:00
yihao.dai c73219a54d
Limit the number of concurrent sync tasks and allow only one sync task for the same segment (#24881)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-06-16 14:14:39 +08:00
MrPresent-Han ce5cb3c0c5
[skip e2e] modify default used balancer for 2.3 (#24937)
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-06-16 14:00:45 +08:00
wayblink bfae6b49af
Remove datanode timetick mq, use rpc to report instead (#23156)
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2023-06-14 14:16:38 +08:00
chyezh f97127ae55
add nats mq wrappers (#24445)
bug fixup, configurable natsmq, add unittest, pass e2e.



move natsmq to pkg project

Signed-off-by: chyezh <ye.zhen@zilliz.com>
Co-authored-by: yiwangdr <yiwangdr@gmail.com>
2023-06-07 10:00:37 +08:00
Jiquan Long 29ae1229b6
Support AutoIndex (#24387)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-05-29 20:35:28 +08:00
wei liu 1deac47069
reduce grpc client dial timout (#24427)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-05-29 10:03:28 +08:00
Bingyi Sun 46a8ca5b5b
Change log level to info (#24339)
Signed-off-by: sunby <bingyi.sun@zilliz.com>
Co-authored-by: sunby <bingyi.sun@zilliz.com>
2023-05-24 10:15:25 +08:00
MrPresent-Han 7744573d3d
support parms for import maxfilesize(#24191) (#24192)
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-05-18 18:59:27 +08:00
yihao.dai 7384d83d2c
Support rate limit based on growing segments size (#24121)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-05-17 09:57:22 +08:00
wei liu 4c956fab73
enable config collection level rate limit (#24012)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-05-12 18:13:26 +08:00
MrPresent-Han b517bc9e6a
refine balance mechanism including:(#23454) (#23763) (#23791)
1. balance granuity to replica to avoid influence unrelated replicas
2. avoid balance back and forth

Signed-off-by: MrPresent-Han <jamesharden11122@gmail.com>
2023-05-04 12:22:40 +08:00
yihao.dai 8c060215e9
Add collection number quota per DB (#23656)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-04-28 11:02:35 +08:00
smellthemoon 45fbe1d1a7
Change proxy max shard num (#23777)
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2023-04-27 22:20:35 +08:00
yah01 eab94489ba
Refine the merge algorithm (#23767)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-27 18:26:35 +08:00
Bingyi Sun f289aed63a
Fix nightly tests timeout (#23751)
Signed-off-by: sunby <bingyi.sun@zilliz.com>
Co-authored-by: sunby <bingyi.sun@zilliz.com>
2023-04-27 14:56:35 +08:00
smellthemoon 912cf4ef0f
Change some configurations, include change the defaultChannelNum to 16 (#23617)
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2023-04-27 14:26:35 +08:00
yihao.dai ed8836cd15
Add disk quota at the collection level (#23704)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-04-26 21:52:36 +08:00
SimFG 5cd21893c8
Fix superusers' password verification problem (#23733)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-04-26 21:16:34 +08:00
foxspy 6f4ed517de
add growing segment index (#23615)
Signed-off-by: xianliang <xianliang.li@zilliz.com>
2023-04-26 10:14:41 +08:00