Commit Graph

2114 Commits (master)

Author SHA1 Message Date
Spade A 8e1ce15146
fix: ngram index is mistakenly used for unsopported operations (#43955)
issue: https://github.com/milvus-io/milvus/issues/43917

1. fix ngrma index to be mistakenly used for unsopported operation
2. fix potential uaf problem

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-08-21 14:41:46 +08:00
zhagnlu d904c4e677
enhance: optimize compare expr performance for pk field (#43154)
#43153

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-08-21 10:59:46 +08:00
congqixia 7963b17ac1
fix: Revert "fix: Use `folly::SharedMutex` preventing starvation (#43937)" (#43959)
Related to #43958

This reverts commit 580350495a.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-21 10:09:47 +08:00
Spade A d6a428e880
feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726)
Ref https://github.com/milvus-io/milvus/issues/42148

This PR supports create index for vector array (now, only for
`DataType.FLOAT_VECTOR`) and search on it.
The index type supported in this PR is `EMB_LIST_HNSW` and the metric
type is `MAX_SIM` only.

The way to use it:
```python
milvus_client = MilvusClient("xxx:19530")
schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True)
...
struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field")
...
struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000)
...
schema.add_struct_array_field(struct_schema)
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128})
...
milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params)
```

Note: This PR uses `Lims` to convey offsets of the vector array to
knowhere where vectors of multiple vector arrays are concatenated and we
need offsets to specify which vectors belong to which vector array.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-08-20 10:27:46 +08:00
Alexander Guzhva cfdb17a088
enhance: Fix ArithHelperI64 for SVE in bitset (#43952)
missing ArithHelperI64<ArithOpType::Div, CmpOp>

Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
2025-08-19 22:48:58 +08:00
Alexander Guzhva e179a5635f
enhance: remove duplicate code in ArithHelperF32 in SVE for bitset (#43950)
fixes a problem of https://github.com/milvus-io/milvus/pull/43949

Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
2025-08-19 22:35:47 +08:00
liliu-z 7dd2b103b0
enhance: Fix template declaration order for ArithHelperF32 in SVE implementation (#43949)
Signed-off-by: Li Liu <li.liu@zilliz.com>
2025-08-19 21:58:22 +08:00
congqixia 580350495a
fix: Use `folly::SharedMutex` preventing starvation (#43937)
Related to #43936

This PR:
- Use `folly::SharedMutex` instead of `std::shared_mutex` preventing
starvation
- Use `folly::SharedMutex::WriteHolder/ReadHolder` instead of
std::shared_lock and std::unique_lock to get better performance

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-19 20:05:46 +08:00
aoiasd dcf04a58b8
feat: support use score function on segment search and use filter (#43868)
relate: https://github.com/milvus-io/milvus/issues/43867
Support boost function score, multiply by the weight if match filter.

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-08-19 16:15:45 +08:00
Gao b602b4187d
enhance: upgrade aws-sdk from 1.9.234 to 1.11.352 (#43916)
issue: #43908

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-08-19 11:11:45 +08:00
yihao.dai 64ab3d2681
enhance: Improve error message when query vector and dim mismatch (#43835)
/kind improvement

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-08-18 01:07:44 +08:00
foxspy 647c2bca2d
enhance: Support streaming read and write of vector index files (#43824)
issue: #42032

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-08-15 23:41:43 +08:00
Alexander Guzhva ebb10dfae0
enhance: better auto-detect of SVE options for the bitset library (#43833)
Enables the compilation of SVE code for the bitset library if a C++
compiler supports it.

There are two conditions for enabling the SVE code
* a C++ compiler needs to have a `-march=armv8-a+sve`
* `arm_sve.h` header must be available

AFAIK, `gcc 7 does not support SVE`, `gcc 8` and `gcc 9` support SVE,
but have no `arm_sve.h` file, and only `gcc 10` has both.

Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
2025-08-15 22:37:44 +08:00
sthuang 5e4eb4a6e0
enhance: [StorageV2] bump storage version (#43871)
related: https://github.com/milvus-io/milvus/issues/43869

bump storage version. include the following feature:
* https://github.com/milvus-io/milvus-storage/pull/231
* https://github.com/milvus-io/milvus-storage/pull/232
* https://github.com/milvus-io/milvus-storage/pull/233

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-08-15 17:59:43 +08:00
sthuang c102fa8b0b
enhance: [StorageV2] zero copy for packed writer record batch (#43779)
The Out of Memory (OOM) error occurs because a handler retains the
entire ImportRecordBatch in memory. Consequently, even when child arrays
within the batch are flushed, the memory for the complete batch is not
released. We temporarily fixed by deep copying record batch in #43724.

The proposed fix is to split the RecordBatch into smaller sub-batches by
column group. These sub-batches will be transferred via CGO, then
reassembled before being written to storage using the Storage V2 API.
Thus we can achieve zero-copy and only transferring references in CGO.

related: #43310

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-08-15 10:11:44 +08:00
congqixia f032044125
enhance: Refine segcore param change callback (#43838)
Related to #43230

This PR
- Move segcore setup function to `initcore` package to remove cgo
dependency from pkg
- Register core callback only for components depends on segcore
- Rectify `UpdateLogLevel` implementation

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-13 19:31:44 +08:00
zhagnlu b7c7df9440
fix: fix delete consumer bug for cocurrency R-W (#43831)
#41570

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-08-12 11:37:42 +08:00
Gao 81a0915c29
enhance: add milvus-common module to decouple knwhere & segcore (#43624)
issue: https://github.com/milvus-io/milvus/issues/42032
https://github.com/milvus-io/milvus/issues/41435

based on pr: https://github.com/milvus-io/milvus/pull/42124

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
Co-authored-by: xianliang.li <xianliang.li@zilliz.com>
2025-08-11 14:09:42 +08:00
zhagnlu 5b83975d39
enhance:convert multi not equal to not in (#43690)
#43689

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-08-08 10:37:40 +08:00
sparknack 169be30a76
enhance: cachinglayer: reserve resource for inevictable cachecell (#43602)
issue: #41435

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-08-08 10:35:49 +08:00
zhagnlu c04d678ad4
enhance: make segcore params effective without restarting milvus (#43231)
#43230

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-08-08 10:33:48 +08:00
congqixia 1561a4ae8c
enhance: [StorageV2] Avoid create local parent dir if fs remote (#43790)
Related to #43752
milvus-storage pr: milvus-io/milvus-storage#230

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-08 10:19:40 +08:00
congqixia b6199acb05
enhance: Utilize `search_batch_pks` for `search_ids` of PkTerm (#43751)
Related to #43660

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-07 14:19:40 +08:00
congqixia d414f6bd4d
enhance: Add assertion preventing reload same field (#43736)
Related to #43725

This patch add assertion preventing segment reloading same field column.
Also improve the message info when pk already exists.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-05 19:35:39 +08:00
yihao.dai cb7be8885d
enhance: Deep copy arraw array (#43724)
Deep copy arrow array and make a new RecordBatch with the copied array.

issue: https://github.com/milvus-io/milvus/issues/43310

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-08-05 00:31:38 +08:00
Chun Han d826d6ac91
fix: try to get span raw data for variable length data type(#43544) (#43705)
related: #43544

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-08-04 11:15:38 +08:00
aoiasd 4f02b06abc
enhance: support set lindera dict build dir and download url in yaml (#43541)
relate: https://github.com/milvus-io/milvus/issues/43120

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-08-04 09:47:38 +08:00
congqixia 4aff581007
enhance: Pass callback in search batch pks to void large result (#43695)
Related to #43660

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-02 17:57:37 +08:00
Buqian Zheng 01baf582d5
fix: GroupChunkTranslator to correctly identify vector field (#43706)
issue: #43653

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-08-02 00:49:37 +08:00
Bingyi Sun b59bc5e2c0
fix: make json path index non exists offsets compatible with 2.5 (#43691)
issue: https://github.com/milvus-io/milvus/issues/43666

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-08-01 23:22:23 +08:00
Buqian Zheng b0226ef47c
fix: added more comprehensive container limit detection (#43693)
issue: #41435

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-08-01 20:37:37 +08:00
Xianhui Lin 0f0edff7f0
fix: increment offset for null data rows in JsonKeyStats (#43679)
fix: increment offset for null data rows in JsonKeyStatsInvertedIndex
issue: https://github.com/milvus-io/milvus/issues/43151

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-08-01 15:53:37 +08:00
congqixia 5f2f4eb3d6
enhance: Ignore entry with same ts when DeleteRecord search pks (#43669)
Related to #43660

This patch reduces the unwanted offset&ts entries having same timestamp
of delete record. Under large amount of upsert, this false hit could
increase large amount of memory usage while applying delete.

The next step could be passing a callback to `search_pk_func_` to handle
hit entry streamingly.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-01 10:15:36 +08:00
Ted Xu e37cd19da2
enhance: enable storage v2 by default (#43652)
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-08-01 08:59:36 +08:00
zhagnlu 239f743a18
fix: add enable_mmap key to load config (#43672)
#43670

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-07-31 21:35:37 +08:00
sparknack 4aabe23a45
enhance: update flat_hash_map.hpp to a modified version (#43506)
issue: #41435

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2025-07-31 20:09:36 +08:00
congqixia f29964bd17
fix: Add padding for sorted index preventing 0 length mmap (#43663)
Related to #43655

This patch add a padding when writing mmap file for ScalarSortedIndex in
case of mmap falure due to 0 mmap length.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-31 18:53:36 +08:00
zhagnlu 708e426bb3
enhance: using set element for string term type (#43049)
issue: #43048

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-07-31 10:35:37 +08:00
zhagnlu 31801f5937
fix: fix pk in [..] skip next batch when using multi-chunk segment (#43618)
#43494

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-07-31 10:15:37 +08:00
congqixia 089f02bcca
fix: [StorageV2] Align null bitmap offset for fixed-length datatype (#43654)
Related to #43626

Similar to previous pr #43321, null bitmap could be dislocated if the
bitset ptr does not count the offset of array

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-31 09:55:36 +08:00
congqixia 6a74a7de66
enhance: Make DeleteRecord search pks by batch and PinAll (#43640)
Related to #43592

When delete records are large, search pk one by one will result into
many `Pincells` call which creates lots of futures.

This patch make search pk execute in batch to reduce this cost.

Also add `GetAllChunks` API to utilize `PinAllCells` to reduce pins.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-30 19:15:36 +08:00
sthuang a2c7ed2780
fix: [StorageV2] sort field binlogs paths for packed reader and writer (#43585)
key changes:
* fix unstable storage v2 compaction unit test by guaranteeing the order
of paths during sync.
* bump milvus-storage version, include
https://github.com/milvus-io/milvus-storage/pull/222
https://github.com/milvus-io/milvus-storage/pull/223
https://github.com/milvus-io/milvus-storage/pull/224
https://github.com/milvus-io/milvus-storage/pull/225
https://github.com/milvus-io/milvus-storage/pull/226
* Also fix the below related oom issue.
related: https://github.com/milvus-io/milvus/issues/43310

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-07-30 08:09:36 +08:00
congqixia 4fe55e3008
fix: [StorageV2] Use separate channel for `get_cells` (#43632)
Related to #43584

There might be concurrent calls on `translator.get_cells`. The channel
cannot be shared among these calls, otherwise the logic will break.

This patch create new channel for each `get_cells` invocation in case of
data race.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-29 20:59:38 +08:00
foxspy d57890449f
enhance: update knowhere version (#43528)
issue: #42937

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-07-29 17:21:36 +08:00
Buqian Zheng 052fb6c562
feat: add time based eviction to data managed by cachinglayer (#43490)
issue: https://github.com/milvus-io/milvus/issues/41435

also added disk capacity protection

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-07-29 16:17:35 +08:00
Bingyi Sun a765cd1eaa
enhance: unlink mmap file when chunk and index are destructed (#43524)
issue: https://github.com/milvus-io/milvus/issues/41636

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-07-29 16:05:36 +08:00
congqixia 268f1cdace
fix: Hold field shared_ptr in case of being released (#43614)
Related to #43584

Directly accessing `fields_` in `get_raw_data` may have race if load vec
index happens concurrently during getting raw data.

This PR make `bulk_subscript` hold shared_ptr of field column prevent
field column being release during reading it.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-29 12:15:36 +08:00
aoiasd c9412434c8
enhance: add char group tokenizer (#42793)
relate: https://github.com/milvus-io/milvus/issues/42792
Add char group tokenizer which support use costum char group or use some
build-in char group as delimiters.

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-07-29 11:11:35 +08:00
congqixia f666d89919
fix: [StorageV2] Access future result to get exception if any (#43613)
Related to #43584

When `LoadWithStrategy` throw exception, the ex was wrapped in the
returned future. If the future is not handled, this exception would be
ignored.

This patch add `future.get()` to get exception if any.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-07-28 22:33:35 +08:00
Spade A 864d1b93b1
enhance: enable stlsort with mmap support (#43359)
issue: https://github.com/milvus-io/milvus/issues/43358

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-07-28 15:32:55 +08:00