Commit Graph

587 Commits (b983ef9fcae01f9287bcf03c13de8f9a3e51dca6)

Author SHA1 Message Date
cqy123456 8216345b07
enhance: reduce copy of bitset and id conversion of brurtforce search (#37675)
issue: https://github.com/milvus-io/milvus/issues/37798

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-11-19 15:48:40 +08:00
Bingyi Sun 6b82320953
fix: Fix using wrong upperbound when searching by pk (#37769)
issue: https://github.com/milvus-io/milvus/issues/37649

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-19 10:16:31 +08:00
smellthemoon 3d28d99411
fix: to use the correct offset in span (#37780)
#37734

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-11-18 21:56:30 +08:00
aoiasd e9391acf80
fix: bm25 brute force search need index params k1 and b (#37721)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-18 15:44:31 +08:00
Zhen Ye 3f1614e9d9
enhance: add trace_id into segcore logs (#37656)
issue: #37655

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-18 10:20:30 +08:00
zhagnlu e4b6773d0a
fix: fix create text index dir conflict bug (#37693)
#37623

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-11-15 18:26:30 +08:00
Bingyi Sun 65d3c6622a
enhance: Optimize GetChunkIDByOffset and add ut (#37704)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-15 14:16:31 +08:00
Bingyi Sun d1596297d9
fix: Fix query failure with inverted index (#37686)
https://github.com/milvus-io/milvus/issues/37649

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-15 10:28:31 +08:00
Bingyi Sun 1b4f7e3ac1
enhance: Add more expr ut for chunked segment (#37600)
related pr: #37570

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-14 18:40:32 +08:00
smellthemoon 3389a6b500
enhance: support null in text match index (#37517)
#37508

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-11-13 11:08:29 +08:00
Chun Han 2d29dcd30c
enhance:refine group_strict_size parameter(#37482) (#37483)
related: #37482

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-11-12 09:56:28 +08:00
aoiasd 12951f0abb
enhance: rename tokenizer to analyzer and check analyzer params (#37478)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-10 16:12:26 +08:00
Bingyi Sun 40ba5a3414
fix: fix chunked segment term filter expression and add ut (#37392)
issue: https://github.com/milvus-io/milvus/issues/37143

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-07 11:04:19 -08:00
aoiasd d67853fa89
feat: Tokenizer support build with params and clone for concurrency (#37048)
relate: https://github.com/milvus-io/milvus/issues/35853
https://github.com/milvus-io/milvus/issues/36751

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-06 17:48:24 +08:00
cai.zhang 625b6176cd
fix: Search for pk using raw data to reduce the overhead caused by views (#37202)
issue: #37152

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-05 20:36:24 +08:00
Bingyi Sun bd04cac4b3
fix: fix group by on chunked segment (#37292)
https://github.com/milvus-io/milvus/issues/37244

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-11-05 17:12:23 +08:00
zhenshan.cao 63843dce33
fix: Fix conan gdal building problem (#37338)
issue:https://github.com/milvus-io/milvus/issues/27576

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-10-31 21:04:16 +08:00
Hao Tan 67c4340565
feat: Geospatial Data Type and GIS Function Support for milvus server (#35990)
issue:https://github.com/milvus-io/milvus/issues/27576

# Main Goals
1. Create and describe collections with geospatial fields, enabling both
client and server to recognize and process geo fields.
2. Insert geospatial data as payload values in the insert binlog, and
print the values for verification.
3. Load segments containing geospatial data into memory.
4. Ensure query outputs can display geospatial data.
5. Support filtering on GIS functions for geospatial columns.

# Solution
1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.
6. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.

---------

Signed-off-by: tasty-gumi <1021989072@qq.com>
2024-10-31 20:58:20 +08:00
smellthemoon b8492498ac
fix: mask with valid data when preCheckOverflow (#37221)
#37175

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-31 10:44:26 +08:00
Bingyi Sun 90948e9444
fix: add SearchOnSealed unit test and fix a bug (#37241)
issue: https://github.com/milvus-io/milvus/issues/37244

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-30 10:26:19 +08:00
smellthemoon 44ddcb5a63
fix: not check has_value before get value in JSON (#37128)
https://github.com/milvus-io/milvus/issues/36236
also: https://github.com/milvus-io/milvus/issues/37113

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-25 17:19:28 +08:00
Yinzuo Jiang 3628593d20
feat: Implement custom function module in milvus expr (#36560)
OSPP 2024 project:
https://summer-ospp.ac.cn/org/prodetail/247410235?list=org&navpage=org

Solutions:

- parser (planparserv2)
    - add CallExpr in planparserv2/Plan.g4
    - update parser_visitor and show_visitor
- grpc protobuf
    - add CallExpr in plan.proto
- execution (`core/src/exec`)
- add `CallExpr` `ValueExpr` and `ColumnExpr` (both logical and
physical) for function call and function parameters
- function factory (`core/src/exec/expression/function`)
    - create a global hashmap when starting milvus (see server.go)
- the global hashmap stores function signatures and their function
pointers, the CallExpr in execution engine can get the function pointer
by function signature.
- custom functions
    - empty(string)
    - starts_with(string, string)
- add cpp/go unittests and E2E tests

closes: #36559

Signed-off-by: Yinzuo Jiang <jiangyinzuo@foxmail.com>
2024-10-25 15:25:30 +08:00
smellthemoon 6ef014d931
fix: get correct size when sealed segment chunked (#37062)
#37019

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-25 12:01:31 +08:00
yellow-shine 8902e2220e
enhance: enable asan for cpp unittest (#37041)
https://github.com/milvus-io/milvus/issues/35854

Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2024-10-23 17:21:27 +08:00
Alexander Guzhva 5a1f752272
enhance: [bitset] multiple 'and' and 'or' in a single op (#33345)
issue #34117
* Refactoring
* Added a capability to perform multiple bitwise `and` and `or`
operations in a single op
* AVX2, AVX512, ARM NEON, ARM SVE backed bitwise `and`, `op`, `xor` and
`sub` ops
* more unit tests for bitset
* fixed a bug in `or_with_count` for certain bitset sizes
* fixed a bug for certain offset values for inplace operations that take
two bitsets

Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
2024-10-22 16:25:33 +08:00
smellthemoon 6bedc7e8c8
fix: not set valid_data in bitmap index when mmap (#37023)
#37013

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-22 12:03:26 +08:00
foxspy 3de57ec4fa
enhance: add vector index mgr to remove vector index type dependency (#36843)
issue: #34298

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-10-17 22:15:25 +08:00
smellthemoon eb3e4583ec
enhance: all op(Null) is false in expr (#35527)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-17 21:14:30 +08:00
cqy123456 b474374ea5
enhance: use growingMmapEnabled to control the behavior of interim index, not vectorField (#36500)
issue:https://github.com/milvus-io/milvus/issues/36392
related pr: https://github.com/milvus-io/milvus/pull/36391

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-10-17 20:25:24 +08:00
cqy123456 aa904be6ec
enhance: support sparse vector mmap in growing segment type (#36566)
issue: https://github.com/milvus-io/milvus/issues/32984
related pr: https://github.com/milvus-io/milvus/pull/36565

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-10-15 10:59:23 +08:00
Zhen Ye f46c3acea9
fix: heap buffer overflow when unittest at index wrapper (#36838)
issue: #35852

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-10-14 18:13:22 +08:00
Bingyi Sun a75bb85f3a
feat: support chunked column for sealed segment (#35764)
This PR splits sealed segment to chunked data to avoid unnecessary
memory copy and save memory usage when loading segments so that loading
can be accelerated.

To support rollback to previous version, we add an option
`multipleChunkedEnable` which is false by default.

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-12 15:04:52 +08:00
aoiasd db34572c56
feat: support load and query with bm25 metric (#36071)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-10-11 10:23:20 +08:00
Buqian Zheng f7b811450d
feat: add enable_tokenizer params to VarChar field (#36480)
issue: #35922

add an enable_tokenizer param to varchar field: must be set to true so
that a varchar field can enable_match or used as input of BM25 function

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-10-10 20:33:21 +08:00
SimFG 130a923dec
enhance: the estimate method when loading the collection (#36307)
- issue: #36530

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
Co-authored-by: xianliang.li <xianliang.li@zilliz.com>
2024-10-09 17:35:19 +08:00
Rijin-N a05a37a583
enhance: GCS native support (GCS implemented using Google Cloud Storage libraries) (#36214)
Native support for Google cloud storage using the Google Cloud Storage
libraries. Authentication is performed using GCS service account
credentials JSON.

Currently, Milvus supports Google Cloud Storage using S3-compatible APIs
via the AWS SDK. This approach has the following limitations:

1. Overhead: Translating requests between S3-compatible APIs and GCS can
introduce additional overhead.
2. Compatibility Limitations: Some features of the original S3 API may
not fully translate or work as expected with GCS.

To address these limitations, This enhancement is needed.

Related Issue: #36212
2024-09-30 13:23:32 +08:00
yihao.dai 8ed34dce84
enhance: Reopen chunk cache cpp ut (#33622)
issue: https://github.com/milvus-io/milvus/issues/33210

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-28 18:19:15 +08:00
Buqian Zheng 8495bc6bbc
fix: fix broken Sparse Float Vector raw data mmap (#36183)
issue: https://github.com/milvus-io/milvus/issues/36182

* improved `Column.h` to make the code much more readable and
maintainable, and added detailed comments.
* fixed an issue where `ArrayColumn::NumRows()` always returns 0 when
the mmap backing storage is a file.
* removed unused `ColumnBase` constructors and unnecessary members so we
don't get confused.
* Updated `test_chunk_cache.cpp` to make the tests parameterized: to
test both mmap enabled and disabled. Added sparse field in the test to
add coverage.
* re-enabled test `Sealed::GetSparseVectorFromChunkCache`. 
* But 2 other disabled tests `Sealed::WarmupChunkCache` and
`Sealed::GetVectorFromChunkCache` remain disabled, there seems to be
errors. @bigsheeper PTAL.

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-09-25 18:59:13 +08:00
zhagnlu 489087d18b
enhance: refactor executor framework V2 (#35251)
#32636

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-09-13 20:57:09 +08:00
Jiquan Long 89bf226f0b
feat: support keyword text match (#35923)
fix: #35922

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-09-10 15:11:08 +08:00
congqixia 851f3b9883
fix: Make legacy non-lexicographic branch break swtich (#36125)
Related to #35941
Previous PR: #36034

This patch makes the switch branching logic correct and make the unit
test work for cases which does not select the whole dataset.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-10 10:15:07 +08:00
Chun Han 4641fd9195
enhance: make search groupby stop when reaching topk groups (#35814)
related: #33544

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-09-02 18:25:03 +08:00
Zhen Ye b2eb9fe2a7
fix: memory leak in unittest and open the USE_ASAN option when build unittest (#35855)
issue: #35854

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-09-02 15:59:04 +08:00
cai.zhang 2c9bb4dfa3
feat: Support stats task to sort segment by PK (#35054)
issue: #33744 

This PR includes the following changes:
1. Added a new task type to the task scheduler in datacoord: stats task,
which sorts segments by primary key.
2. Implemented segment sorting in indexnode.
3. Added a new field `FieldStatsLog` to SegmentInfo to store token index
information.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-09-02 14:19:03 +08:00
Jiquan Long a52ba3d09d
enhance: allow many segments for inverted index (#35616)
fix: https://github.com/milvus-io/milvus/issues/35615

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-08-28 11:30:59 +08:00
zhagnlu 4d2f96c760
enhance: support bitmap mmap (#35399)
#32900

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-08-27 16:34:59 +08:00
yihao.dai f2b83d316b
enhance: Support memory mode chunk cache (#35347)
Chunk cache supports loading raw vectors into memory.

issue: https://github.com/milvus-io/milvus/issues/35273

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-25 15:42:58 +08:00
Zhen Ye a773836b89
enhance: optimize milvus core building (#35610)
issue: #35549,#35611,#35633

- remove milvus_segcore milvus_indexbuilder..., add libmilvus_core
- core building only link once
- move opendal compilation into cmake
- fix odr

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-08-23 12:35:02 +08:00
smellthemoon 80dbe87759
enhance: support null value in index (#35238)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-16 15:30:54 +08:00
Buqian Zheng f4a91e135b
enhance: Allow empty sparse row (#34700)
issue: #29419

* If a sparse vector with 0 non-zero value is inserted, no ANN search on
this sparse vector field will return it as a result. User may retrieve
this row via scalar query or ANN search on another vector field though.
* If the user uses an empty sparse vector as the query vector for a ANN
search, no neighbor will be returned.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-08-16 14:14:54 +08:00