Commit Graph

722 Commits (master)

Author SHA1 Message Date
congqixia cc42d49769
fix: [StorageV2][AddField] Handle lack binlog rows in storage v2 (#42186)
Related to #39173 #39718

In storage v2, the `lack_bin_rows` cannot be used since field id is not
column group id, which will not be matched forever.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-31 02:44:30 +08:00
Chun Han ed0df38605
enhance: resize high priority wqthreadpool dynamically(#40838) (#41549) (#41929)
related: #40838
pr: https://github.com/milvus-io/milvus/pull/41549

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
2025-05-30 10:18:36 +08:00
cqy123456 5fe7015f63
enhance: InterimIndex support more index type and data type (#41021)
issue: https://github.com/milvus-io/milvus/issues/27678
cherry pick from : https://github.com/milvus-io/milvus/pull/39180,
https://github.com/milvus-io/milvus/pull/40429

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2025-05-28 08:40:28 +08:00
Xianhui Lin 6a0e182e13
enhance: support TTL expiration with queries returning no results (#42086)
support TTL expiration with queries returning no results
issue:https://github.com/milvus-io/milvus/issues/41959

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-05-27 18:28:27 +08:00
Buqian Zheng 2e3539319d
feat: vector field raw data to mmap by default (#41975)
issue: https://github.com/milvus-io/milvus/issues/41435

should address https://github.com/milvus-io/milvus/issues/41774

this PR also: 
* added caching layer memory overhead metric
* re-enable TextMatch.GrowingLoadData test

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-05-22 11:56:25 +08:00
foxspy 3dbad0306a
fix: Add bypass thread pool mode to avoid growing indexes blocking insert/load (#41012)
issue: #40825

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-05-20 14:30:24 +08:00
congqixia f2a8330f87
fix: [StorageV2] Use correct group building index (#41925)
Related to #39173 #41534

This pr fixes an issue that building mem index may report datatype not
match error when collection split fields into multiple groups

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-20 13:26:23 +08:00
Buqian Zheng b0260d8676
feat: manual evict cache after built interim index (#41836)
issue: https://github.com/milvus-io/milvus/issues/41435

this PR also makes HasRawData of ChunkedSegmentSealedImpl to return
based on metadata, without needing to load the cache just to answer this
simple question.

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-05-16 16:34:23 +08:00
Buqian Zheng cae0091071
feat: make SkipIndex lazyload (#41826)
issue: https://github.com/milvus-io/milvus/issues/41435

this PR also:

1. fixed the skip index for VARCHAR. before this PR, skip index of
VARCHAR uses the minmax of the entire column as the minmax of chunk 0,
and provides no minmax for other chunks.
2. refactored some skip index loading related code
3. partly fixed a bug in test_expr.cpp

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-05-15 01:30:23 +08:00
cai.zhang 4ead8caaba
fix: prevent crash when contains_all/any is used with empty array (#41739)
issue: #41348 

related and optimized by #41347

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: Sangho Park <hoyaspark@gmail.com>
2025-05-14 14:32:22 +08:00
zhagnlu f094d026f8
fix: add params to ignore config type exception (#41776)
#41707

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-05-13 13:48:56 +08:00
Buqian Zheng ff5c2770e5
feat: cachinglayer: various improvements (#41546)
issue: https://github.com/milvus-io/milvus/issues/41435

this PR is based on https://github.com/milvus-io/milvus/pull/41436. 

Improvements include:

- Lazy Load support for Storage v1
- Use Low/High watermark to control eviction
- Caching Layer related config changes
- Removed ChunkCache related configs and code in golang
- Add `PinAllCells` helper method to CacheSlot class
- Modified ValueAt, RawAt, PrimitiveRawAt to Bulk version, to reduce
caching layer overhead
- Removed some unclear templated bulk_subscript methods
- CachedSearchIterator to store PinWrapper when searching on
ChunkedColumn, and removed unused contrustor.

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-05-10 09:19:16 +08:00
zhagnlu f674e232b9
fix: GetValueFromConfig return nullopt instead of exception for null value (#41709)
#41707

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-05-09 11:18:53 +08:00
zhagnlu 39e7ad33d7
enhance: add optimize for like expr (#41066)
#41065

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-05-08 14:28:52 +08:00
foxspy e2ddbe4962
feat: add cachinglayer to index (#41653)
issue: #41435

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2025-05-08 10:12:54 +08:00
Bingyi Sun 4c08090687
feat: Add json index support for json contains expr (#41478)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-05-06 11:44:52 +08:00
sthuang e9442f575d
feat: storage v2 seal segment load (#41567)
storage v2 chunked seal segment loading is based on caching layer. A
cell unit in storage v2 is a parquet row group in remote object storage,
containing all fields. Therefore, each field needs a proxy to do related
one field operations.

<img width="965" alt="Screenshot 2025-04-28 at 10 59 30"
src="https://github.com/user-attachments/assets/83e93a10-3b1d-4066-ac17-b996d5650416"
/>

related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-30 14:22:58 +08:00
sthuang 6c377b6e86
feat: Storage v2 index and stats raw data (#41534)
related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-30 08:48:54 +08:00
Buqian Zheng 3de904c7ea
feat: add cachinglayer to sealed segment (#41436)
issue: https://github.com/milvus-io/milvus/issues/41435

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-04-28 10:52:40 +08:00
congqixia b5443ddbd0
enhance: [AddField] Reopen loaded segments after AddField (#41529)
Related to #39718

This PR:
- Add reopen logic for growing & sealed segments
- Lazy reopen when schema version increases
- Add FinishLoad api for loading progress

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-26 08:48:39 +08:00
Buqian Zheng 1c8b9c127d
fix: Make sure segment in ut is destroyed before static MmapManager singleton (#41508)
issue: #41507

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-04-25 18:50:38 +08:00
congqixia b36c88f3c8
enhance: [AddField] Broadcast schema change via WAL (#41373)
Related to #39718

Add Broadcast logic for collection schema change and notifies:
- Streamnode - Delegator
- Streamnode - Flush component
- QueryNodes via grpc

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-22 16:28:37 +08:00
Spade A 5b1430f27e
enhance: tantivy collector set bitset directly (#39748)
fix: #39755

The following shows a simple benchmark where insert 1M docs where all
rows are "hello", the latency is segcore level, CPU is 9900K:
master: 2.62ms
this PR: 2.11ms

bench mark code:

```
TEST(TextMatch, TestPerf) {
    auto schema = GenTestSchema({}, true);
    auto seg = CreateSealedSegment(schema, empty_index_meta);
    int64_t N = 1000000;
    uint64_t seed = 19190504;
    auto raw_data = DataGen(schema, N, seed);
    auto str_col = raw_data.raw_->mutable_fields_data()
                       ->at(1)
                       .mutable_scalars()
                       ->mutable_string_data()
                       ->mutable_data();
    for (int64_t i = 0; i < N - 1; i++) {
        str_col->at(i) = "hello";
    }
    SealedLoadFieldData(raw_data, *seg);
    seg->CreateTextIndex(FieldId(101));

    auto now = std::chrono::high_resolution_clock::now();
    auto expr = GetMatchExpr(schema, "hello", OpType::TextMatch);
    auto final = ExecuteQueryExpr(expr, seg.get(), N, MAX_TIMESTAMP);
    auto end = std::chrono::high_resolution_clock::now();
    auto duration =
        std::chrono::duration_cast<std::chrono::microseconds>(end - now);
    std::cout << "TextMatch query time: " << duration.count() << "ms"
              << std::endl;
}
```

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-20 23:02:41 +08:00
Ted Xu d50781c8cc
enhance: support nullable group by keys (#41313)
See #36264

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-04-18 10:08:34 +08:00
Spade A 62293cb582
fix: revert batch add (#41374)
issue: #41375

todo: to fix the problems fixed in the issue.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-17 22:32:38 +08:00
sthuang 1f1c836fb9
feat: Storage v2 growing segment load (#41001)
support parallel loading sealed and growing segments with storage v2
format by async reading row groups.
related: #39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-16 17:14:33 +08:00
Bingyi Sun a953eaeaf0
enhance: support binary range expression for json path index (#41025)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-15 19:32:33 +08:00
Chun Han 59b14d38f5
enhance: Optimize index format for improved load performance(#40838) (#40839)
related: https://github.com/milvus-io/milvus/issues/40838

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-04-15 03:10:30 +08:00
Bingyi Sun bf617115ca
enhance: Remove single chunk segment related codes (#39249)
https://github.com/milvus-io/milvus/issues/39112

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-11 18:56:29 +08:00
Xianhui Lin 3bc24c264f
enhance: Add json key inverted index in stats for optimization (#38039)
Add json key inverted index in stats for optimization
https://github.com/milvus-io/milvus/issues/36995

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-10 15:20:28 +08:00
Spade A e9fa30f462
fix: remove single segment logic in V7 (#41159)
Ref: https://github.com/milvus-io/milvus/issues/40823

It does not make any sense to create single segment tantivy index for
old version such as 2.4 by using tantivy V7.
So, clean the relevant code.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-09 19:54:27 +08:00
sthuang 50e02e3598
enhance: update packed reader api (#41055)
related: https://github.com/milvus-io/milvus/issues/39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-04-09 10:18:26 +08:00
Spade A c6a0c2ab64
enhance: process tantivy document add by batch (#40124)
issue: https://github.com/milvus-io/milvus/issues/40006

This PR make tantivy document add by batch. Add document by batch can
greately reduce the latency of scheduling the document add operation
(call tantivy `add_document` only schdules the add operation and it
returns immediately after scheduled) , because each call involes a tokio
block_on which is relatively heavy.

Reduce scheduling part not necessarily reduces the overall latency if
the index writer threads does not process indexing quickly enough.
But if scheduling itself is pretty slow, even the index writer threads
process indexing very fast (by increasing thread number), the overall
performance can still be limited.

The following codes bench the PR (Note, the duration only counts for
scheduling without commit)
```
fn test_performance() {
    let field_name = "text";
    let dir = TempDir::new().unwrap();
    let mut index_wrapper = IndexWriterWrapper::create_text_writer(
        field_name,
        dir.path().to_str().unwrap(),
        "default",
        "",
        1,
        50_000_000,
        false,
        TantivyIndexVersion::V7,
    )
    .unwrap();

    let mut batch = vec![];
    for i in 0..1_000_000 {
        batch.push(format!("hello{:04}", i));
    }
    let batch_ref = batch.iter().map(|s| s.as_str()).collect::<Vec<_>>();

    let now = std::time::Instant::now();
    index_wrapper
        .add_data_by_batch(&batch_ref, Some(0))
        .unwrap();
    let elapsed = now.elapsed();
    println!("add_data_by_batch elapsed: {:?}", elapsed);
}
```
Latency roughly reduces from 1.4s to 558ms.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-08 19:50:24 +08:00
Bingyi Sun da21640ac3
fix: Fix the bug that null data can not be filtered by null expr (#41124)
issue: https://github.com/milvus-io/milvus/issues/41063

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-08 19:12:24 +08:00
Bingyi Sun 355f62d6c9
fix: Align brute force search with json index for exists expr (#41116)
issue: #35528

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-07 15:42:23 +08:00
zhagnlu 0a378dc308
fix:fix format error for json (#41026)
#40963

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-07 10:22:22 +08:00
Bingyi Sun fcb03b5bd1
feat: add json null/exists expression (#41004)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-04-03 17:48:21 +08:00
Spade A f552ec67dd
fix: support building tantivy index with low version(5) (#40822)
fix: https://github.com/milvus-io/milvus/issues/40823
To solve the problem in the issue, we have to support building tantivy
index with low version
for those query nodes with low tantivy version.

This PR does two things:
1. refactor codes for IndexWriterWrapper to make it concise
2. enable IndexWriterWrapper to build tantivy index by different tantivy
crate

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-04-02 18:46:20 +08:00
smellthemoon cb1e86e17c
enhance: support add field (#39800)
after the pr merged, we can support to insert, upsert, build index,
query, search in the added field.
can only do the above operates in added field after add field request
complete, which is a sync operate.

compact will be supported in the next pr.
#39718

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2025-04-02 14:24:31 +08:00
Bingyi Sun 27ff3a42e7
enhance: Record simdjson error (#41003)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-31 17:56:19 +08:00
Bingyi Sun 9676365af9
fix: Fix json index not equal filter (#40647)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-27 23:06:23 +08:00
zhagnlu 7fdb2e144f
enhance:change multi or expr to in expr (#40757)
#40752

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-03-25 11:06:18 +08:00
aoiasd 92bdf7a0c1
enhance: support run anayser return detaild token (#40458)
relate: https://github.com/milvus-io/milvus/issues/39705

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-03-19 15:48:15 +08:00
zhagnlu 6c55db44f1
enhance: reorder sub expr for conjunct expr (#39872)
two point:
 (1) reoder conjucts expr's subexpr, postpone heavy operations
sequence: int(column) -> index(column) -> string(column) -> light
conjuct
...... -> json(column) -> heavy conjuct -> two_column_compare
(2) support pre filter for expr execute, skip scan raw data that had
been skipped
     because of preceding expr result.

#39869

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-03-19 14:50:14 +08:00
zhagnlu 7ebe3d7038
enhance: refine chunk access logic and add some comment on data (#40618)
#40367

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-03-16 22:20:08 +08:00
Spade A 3db56560fb
fix: fix concurrent issues in null offset (#40363)
issue: #40308
This issue fixes these two concurrent issues:
1. element in null_offset is used to set bitset where the size of bitset
is initialized by tantivy document count. However, there may still be
some documents that are not committed in tantivy but are null in
null_offset. So array out of range occurs.
2. null_offset can be read and write concurrently but there's no
synchronization protection.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-03-05 17:48:00 +08:00
Bingyi Sun 7040ba1c12
enhance: make json path index support term filter (#40140)
issue: #35528

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-03-04 11:56:02 +08:00
zhagnlu 8c19e5c4a7
enhance: decrease delete record dump snapshot limit (#40101)
#40100

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-03-02 17:55:59 +08:00
Chun Han 259f9106ad
enhance: refine variable-length-type memory usage(#38736) (#39578)
related: #38736

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-02-27 21:13:58 +08:00
Spade A 476cf61d98
fix: random sample consider empty input (#40201)
issue: #40198

Fix random sample does not consider empty input, that is no data is hit
by filter expression.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-02-26 16:15:58 +08:00
Bingyi Sun db4769281c
fix: Fall back to a brute-force search if json index type unmatched (#40076)
issue: https://github.com/milvus-io/milvus/issues/35528
If the query data type does not match the index type, fall back to a
brute-force search

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-02-24 16:25:57 +08:00
sthuang 3eb3af5f08
feat: explicitly specify column groups for storage v2 api (#39790)
* use the new packed reader and writer api to be compatible with current
etcd meta
* For the new packed writer API: column groups and paths are explicitly
defined by users and won't split column groups by memory in storage v2.
Packed writer follows the user-defined column groups to split arrow
record and write into the corresponding file path.
* For the new packed reader API: read paths are explicitly defined by
users.
related: #39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-02-21 22:03:54 +08:00
Spade A 52c7d7dd80
fix: offset combined with term should be based on Token positions in phrase match (#39931)
fix: #39711

Unlike English sentence where each words are parsed exactly once and one
after one with position length 1, one Chinese word may be parsed to
multiple words with position length larger than 1.

For example, "badminton and skiing" will be parsed to Token{ start: 0,
length: 1, text: "badminton" }, Token{ start: 1, length: 1, text: "and"
}, and Token{ start: 2, length: 1, text: "tennis" }.

While for exmaple for Chinsese: "羽毛球和滑雪" may be parsed to Token{ start:
0, length: 2, text: "羽毛" }, Token{ start: 0, length: 3, text: "羽毛球" },
Token{ start: 3, length: 1, text: "和" }, and Token{ start: 4, length: 2,
text: "滑雪" }.

This PR fix that the code not recognizes this situation.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-02-18 20:38:51 +08:00
Spade A 0dc21f0aeb
feat: support random sample (#39532)
issue: #39541

This PR implements random sample, the syntax is:
```
filter="random_sample(factor)"
or 
filter="boolean_expression && random_sample(factor)"

where 
factor is a float between (0, 1) and 
boolean_expression is like
 "1 <= number < 10", "color in ["read, "blue"]" or others
```

---------

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-02-18 12:40:50 +08:00
zhagnlu 316534e065
enhance: optimize delete init construct code (#39327)
#39326

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-02-17 21:05:26 +08:00
Bingyi Sun b59555057d
feat: support json index (#36750)
https://github.com/milvus-io/milvus/issues/35528

This PR adds json index support for json and dynamic fields. Now you can
only do unary query like 'a["b"] > 1' using this index. We will support
more filter type later.

basic usage:
```
collection.create_index("json_field", {"index_type": "INVERTED",
    "params": {"json_cast_type": DataType.STRING, "json_path":
'json_field["a"]["b"]'}})
```

There are some limits to use this index:
1. If a record does not have the json path you specify, it will be
ignored and there will not be an error.
2. If a value of the json path fails to be cast to the type you specify,
it will be ignored and there will not be an error.
3. A specific json path can have only one json index.
4. If you try to create more than one json indexes for one json field,
sdk(pymilvus<=2.4.7) may return immediately because of internal
implementation. This will be fixed in a later version.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-02-15 14:06:15 +08:00
sthuang c4ae9f4ece
feat: introduce third-party milvus-storage (#39418)
related: https://github.com/milvus-io/milvus/issues/39173

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-01-24 17:21:13 +08:00
Cai Yudong 5730b69e56
feat: Enable more VECTOR_INT8 unittest (#39569)
Issue: #38666

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2025-01-24 17:03:07 +08:00
zhagnlu 8117d59f85
fix:fix GetValueFromConfig for bool type (#39526)
#39525

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-01-24 16:17:05 +08:00
Cai Yudong 341d6c1eb7
feat: Update segcore for VECTOR_INT8 (#39415)
Issue: #38666

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2025-01-21 11:03:03 +08:00
Bingyi Sun 140c5a0a75
enhance: add unit test for string pk (#39329)
https://github.com/milvus-io/milvus/issues/39107

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2025-01-20 19:03:04 +08:00
Cai Yudong 5b35fc700d
enhance: [skip-e2e] Use template to remove duplicate unittest (#39396)
Issue: #38666

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2025-01-18 10:33:01 +08:00
Cai Yudong 64feeb0e2b
enhance: Rename API GenDataset to GenFieldData in unittest (#39386)
Issue: #38666

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2025-01-17 15:55:03 +08:00
Spade A 8c4ba70a4c
fix: enable to build index with single segment (#39233)
fix https://github.com/milvus-io/milvus/issues/39232

---------

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-01-16 11:01:06 +08:00
Zhen Ye 3e788f0fbd
enhance: record memory size (uncompressed) item for index (#38770)
issue: #38715

- Current milvus use a serialized index size(compressed) for estimate
resource for loading.
- Add a new field `MemSize` (before compressing) for index to estimate
resource.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-14 10:33:06 +08:00
Buqian Zheng 5e38f01e5b
enhance: update knowhere version (#39212)
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-01-14 10:21:05 +08:00
Alexander Guzhva 3447ff7310
enhance: [bitset] extend op_find() to be able to search both 0 and 1 (#39176)
issue: #39124 

`bitset::find_first()` and `bitset::find_next()` now accept one more
parameter, which allows to search for `0` bit instead of `1` bit

Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
2025-01-14 09:50:58 +08:00
Cai Yudong 2a02bbe3ee
enhance: Use template to remove unittest duplication (#39144)
Issue: #38666

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2025-01-13 09:58:57 +08:00
Spade A 032292a432
feat: support phrase match query (#38869)
The relevant issue: https://github.com/milvus-io/milvus/issues/38930

---------

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-01-12 20:24:58 +08:00
Spade A 8abf6c9149
fix: build text index when loading field data (#39070)
fix: https://github.com/milvus-io/milvus/issues/39053
may fix https://github.com/milvus-io/milvus/issues/38644 which could be
caused by https://github.com/milvus-io/milvus/issues/39053

---------

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-01-09 15:24:56 +08:00
Gao f0dae81494
fix: set iterative filter hint to false when no expr specified (#39033)
issue: https://github.com/milvus-io/milvus/issues/39013

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2025-01-08 12:56:56 +08:00
Chun Han 3739446a33
enhance: refine array view to optimize memory usage(#38736) (#38808)
related: #38736

700m data, array_length=10
non-mmap_offsets_uint64: 2.0G
mmap_offsets_uint64: 1.1G
mmap_offsets_uint32: 880MB

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-01-07 13:26:55 +08:00
smellthemoon 907fc24f85
enhance: support null expr (#38772)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2025-01-02 14:16:54 +08:00
Bingyi Sun 2557e3f2a9
enhance: Initialize field id to avoid negative number (#38789)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-12-27 18:00:50 +08:00
Patrick Weizhi Xu 85f462be1a
enhance: speed up search iterator stage 1 (#37947)
issue: #37548

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2024-12-26 10:32:49 +08:00
Ted Xu acc8fb7af6
enhance: eliminate compile warnings (part2) (#38535)
See #38435

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-12-25 15:30:50 +08:00
Zhen Ye b537a72309
fix: interted index out of range (#38577)
issue: #38546, #38486

Signed-off-by: chyezh <chyezh@outlook.com>
2024-12-19 15:20:47 +08:00
zhagnlu 9afcc5bc5c
fix:fix incorrect dir operations when create or load inverted index (#38359)
#37944

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-12-17 20:06:45 +08:00
Bingyi Sun dd4f33ae19
fix: Fix chunked segment can not warmup using mmap (#38492)
issue: #38410

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-12-17 13:42:45 +08:00
Ted Xu 33aecb0655
fix: build break on target test_cpp under OSX (#38479)
See: #38434

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-12-17 13:38:45 +08:00
Bingyi Sun 3e2a2f278b
enhance: Handle rust error in c++ (#38113)
https://github.com/milvus-io/milvus/issues/37930

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-12-16 19:40:45 +08:00
zhagnlu 01de0afc4e
enhance: refactor delete mvcc function (#38066)
#37413

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-12-15 18:02:43 +08:00
Ted Xu 3038383e36
fix: UT compile broken under osx (#38432)
See: #38434

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-12-13 16:24:43 +08:00
zhagnlu efbfa1cc3e
fix:fix ut failed for debug (#38384)
#38382

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-12-13 14:38:43 +08:00
Gao 994fc544e7
enhance: support iterative filter execution (#37363)
issue: #37360

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-12-11 11:32:44 +08:00
cqy123456 8216345b07
enhance: reduce copy of bitset and id conversion of brurtforce search (#37675)
issue: https://github.com/milvus-io/milvus/issues/37798

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-11-19 15:48:40 +08:00
Bingyi Sun 6b82320953
fix: Fix using wrong upperbound when searching by pk (#37769)
issue: https://github.com/milvus-io/milvus/issues/37649

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-19 10:16:31 +08:00
smellthemoon 3d28d99411
fix: to use the correct offset in span (#37780)
#37734

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-11-18 21:56:30 +08:00
aoiasd e9391acf80
fix: bm25 brute force search need index params k1 and b (#37721)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-18 15:44:31 +08:00
Zhen Ye 3f1614e9d9
enhance: add trace_id into segcore logs (#37656)
issue: #37655

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-18 10:20:30 +08:00
zhagnlu e4b6773d0a
fix: fix create text index dir conflict bug (#37693)
#37623

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-11-15 18:26:30 +08:00
Bingyi Sun 65d3c6622a
enhance: Optimize GetChunkIDByOffset and add ut (#37704)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-15 14:16:31 +08:00
Bingyi Sun d1596297d9
fix: Fix query failure with inverted index (#37686)
https://github.com/milvus-io/milvus/issues/37649

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-15 10:28:31 +08:00
Bingyi Sun 1b4f7e3ac1
enhance: Add more expr ut for chunked segment (#37600)
related pr: #37570

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-14 18:40:32 +08:00
smellthemoon 3389a6b500
enhance: support null in text match index (#37517)
#37508

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-11-13 11:08:29 +08:00
Chun Han 2d29dcd30c
enhance:refine group_strict_size parameter(#37482) (#37483)
related: #37482

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-11-12 09:56:28 +08:00
aoiasd 12951f0abb
enhance: rename tokenizer to analyzer and check analyzer params (#37478)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-10 16:12:26 +08:00
Bingyi Sun 40ba5a3414
fix: fix chunked segment term filter expression and add ut (#37392)
issue: https://github.com/milvus-io/milvus/issues/37143

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-07 11:04:19 -08:00
aoiasd d67853fa89
feat: Tokenizer support build with params and clone for concurrency (#37048)
relate: https://github.com/milvus-io/milvus/issues/35853
https://github.com/milvus-io/milvus/issues/36751

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-06 17:48:24 +08:00
cai.zhang 625b6176cd
fix: Search for pk using raw data to reduce the overhead caused by views (#37202)
issue: #37152

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-05 20:36:24 +08:00