Commit Graph

103 Commits (621dbc9107a031dc472b9865593390b887c0b746)

Author SHA1 Message Date
yihao.dai 8ed34dce84
enhance: Reopen chunk cache cpp ut (#33622)
issue: https://github.com/milvus-io/milvus/issues/33210

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-28 18:19:15 +08:00
Buqian Zheng 8495bc6bbc
fix: fix broken Sparse Float Vector raw data mmap (#36183)
issue: https://github.com/milvus-io/milvus/issues/36182

* improved `Column.h` to make the code much more readable and
maintainable, and added detailed comments.
* fixed an issue where `ArrayColumn::NumRows()` always returns 0 when
the mmap backing storage is a file.
* removed unused `ColumnBase` constructors and unnecessary members so we
don't get confused.
* Updated `test_chunk_cache.cpp` to make the tests parameterized: to
test both mmap enabled and disabled. Added sparse field in the test to
add coverage.
* re-enabled test `Sealed::GetSparseVectorFromChunkCache`. 
* But 2 other disabled tests `Sealed::WarmupChunkCache` and
`Sealed::GetVectorFromChunkCache` remain disabled, there seems to be
errors. @bigsheeper PTAL.

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-09-25 18:59:13 +08:00
zhagnlu 489087d18b
enhance: refactor executor framework V2 (#35251)
#32636

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-09-13 20:57:09 +08:00
Zhen Ye b2eb9fe2a7
fix: memory leak in unittest and open the USE_ASAN option when build unittest (#35855)
issue: #35854

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-09-02 15:59:04 +08:00
yihao.dai f2b83d316b
enhance: Support memory mode chunk cache (#35347)
Chunk cache supports loading raw vectors into memory.

issue: https://github.com/milvus-io/milvus/issues/35273

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-25 15:42:58 +08:00
smellthemoon 80dbe87759
enhance: support null value in index (#35238)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-16 15:30:54 +08:00
zhagnlu 4b553b0333
enhance: revert remove duplicated pk function (#35103)
issue: #34778
 Revert "fix: fix query count(*) concurrently"
 Revert "enhance: mark duplicated pk as deleted "

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-08-05 10:48:17 +08:00
smellthemoon 475c333fa2
enhance: add valid_data in span (#35030)
#31728

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-02 15:40:14 +08:00
smellthemoon 5616b7e8d2
enhance: support null in c data_datacodec and load null value (#32183)
1. support read and write null in segcore
    will store valid_data(use uint8_t type to save memory) in fieldData.
2. support load null
binlog reader read and write data into column(sealed segment),
insertRecord(growing segment). In sealed segment, store valid_data
directly. In growing segment, considering prior implementation and easy
code reading, it covert uint8_t to fbvector<bool>, which may optimize in
future.
3.  retrieve valid_data.
    parse valid_data in search/query.
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-07-23 16:07:51 +08:00
zhagnlu 804dd5409a
enhance: mark duplicated pk as deleted (#34586)
fix #34247

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-07-16 14:25:39 +08:00
zhagnlu 3030e4625e
enhance: refactor variable column to reduce memory cost (#33875)
#33874

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-06-30 20:16:06 +08:00
cqy123456 32f685ff12
enhance: growing segment support mmap (#32633)
issue: https://github.com/milvus-io/milvus/issues/32984

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-06-18 14:42:00 +08:00
Buqian Zheng 8cb350598c
enhance: Improve GetVectorById of Sparse Float Vector (#33209)
issue: #29419

* sparse float vector to support raw data mmap

For get vector from chunk cache, I added a unit test but marking it as
skipped due to a known issue. I have tested it locally.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-06-12 10:09:55 +08:00
chyezh e19d17076f
fix: delete may lost when enable lru cache, some field should be reset when ReleaseData (#32012)
issue: #30361

- Delete may be lost when segment is not data-loaded status in lru
cache. skip filtering to fix it.

- `stats_` and `variable_fields_avg_size_` should be reset when
`ReleaseData`

- Remove repeat load delta log operation in lru.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-04-16 11:17:20 +08:00
Buqian Zheng 96cfae55a5
feat: [Sparse Float Vector] segcore to support sparse vector search and get raw vector by id (#30629)
This PR adds the ability to search/get sparse float vectors in segcore,
and added unit tests by modifying lots of existing tests into
parameterized ones.

https://github.com/milvus-io/milvus/issues/29419

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-12 09:16:30 -07:00
Cai Yudong 8a219e0102
feat: Support knowhere trace using OpenTelemetry (#30750)
Issue: #21508

Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2024-02-28 12:29:00 +08:00
yah01 57397b1307
enhance: add new LRU cache impl (#30360)
- remove  the unused LRU cache
- add new LRU cache impl which wraps github.com/karlseguin/ccache

related #30361

---------

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-02-27 20:58:40 +08:00
MrPresent-Han 77eb6defb1
feat: support groupby on growing and non-indexed sealed egment(#30307) (#30644)
related: #30308

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-02-21 14:02:53 +08:00
zhagnlu e8a6f1ea2b
fix: erase pk empty check when pk index replace raw data (#30432)
#30350

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-02-07 14:56:47 +08:00
yihao.dai c02fb64ad6
enhance: Allows proactive warming up of chunk cache (#30182)
Allows proactive warming up of chunk cache. Original vector data will be
asynchronously loaded into the chunk cache during the load process. It
has the potential to significantly reduce query/search latency for a
certain duration after the load, albeit with a concurrent increase in
disk usage.

issue: https://github.com/milvus-io/milvus/issues/30181

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-25 19:55:39 +08:00
Xu Tong e429965f32
Add float16 approve for multi-type part (#28427)
issue:https://github.com/milvus-io/milvus/issues/22837

Add bfloat16 vector, add the index part of float16 vector.

Signed-off-by: Writer-X <1256866856@qq.com>
2024-01-11 15:48:51 +08:00
zhenshan.cao 60e88fb833
fix: Restore the MVCC functionality. (#29749)
When the TimeTravel functionality was previously removed, it
inadvertently affected the MVCC functionality within the system. This PR
aims to reintroduce the internal MVCC functionality as follows:

1. Add MvccTimestamp to the requests of Search/Query and the results of
Search internally.
2. When the delegator receives a Query/Search request and there is no
MVCC timestamp set in the request, set the delegator's current tsafe as
the MVCC timestamp of the request. If the request already has an MVCC
timestamp, do not modify it.
3. When the Proxy handles Search and triggers the second phase ReQuery,
divide the ReQuery into different shards and pass the MVCC timestamp to
the corresponding Query requests.

issue: #29656

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-01-09 11:38:48 +08:00
MrPresent-Han 9e2e7157e9
feat: support search_group_by for milvus(#25324) (#28983)
related: #25324

Search GroupBy function, used to aggregate result entities based on a
specific scalar column.
several points to mention:

1. Temporarliy, the whole groupby is implemented separated from
iterative expr framework **for the first period**
2. In the long term, the groupBy operation will be incorporated into the
iterative expr framework:https://github.com/milvus-io/milvus/pull/28166
3. This pr includes some unrelated mocked interface regarding alterIndex
due to some unworth-to-mention reasons. All these un-associated content
will be removed before the final pr is merged. This version of pr is
only for review
4. All other related details were commented in the files comparison

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-05 15:50:47 +08:00
Gao 9b52cb6417
enhance: improve reducing results when many segments are filtered (#29073)
Do not fill the invalid ids for the empty results, it will incur useless
memory overhead and reduce overhead when nq and topk is large.

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2023-12-20 12:56:42 +08:00
zhagnlu a602171d06
enhance: Refactor runtime and expr framework (#28166)
#28165

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-12-18 12:04:42 +08:00
yah01 f7d2ab6677
enhance: reduce 1x copy for variable length field while retrieving (#28345)
- Reduce 1x copy for varchar/string/JSON/array types while retrieving
- Reduce 1x copy for int8/int16 while retrieving

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-15 18:08:20 +08:00
MrPresent-Han 836f300536
support skip-index based on chunk-metrics to accelerate expr filter(#27925) (#28297)
related: #27925

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-11-15 11:20:19 +08:00
Xu Tong 8ec85f5f4c
Add template for VectorMemIndex (#28324)
Signed-off-by: Writer-X <1256866856@qq.com>
2023-11-11 13:20:22 +08:00
yah01 dc89730a50
Support collection-level mmap control (#26901)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-02 23:52:16 +08:00
Enwei Jiao 8ae9c947ae
Use OpenDAL to access object store (#25642)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-11-01 09:00:14 +08:00
yihao.dai 106c17f304
Make read ahead policy in ChunkCache configurable (#27291)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-09-28 15:47:27 +08:00
foxspy 5db4a0489e
dynamic index version control (#27335)
Co-authored-by: longjiquan <jiquan.long@zilliz.com>
2023-09-25 21:39:27 +08:00
foxspy 370b6fde58
milvus support multi index engine (#27178)
Co-authored-by: longjiquan <jiquan.long@zilliz.com>
2023-09-22 09:59:26 +08:00
yah01 93e2eb78c9
Delete only if primary keys exist (#25292)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-09-20 19:03:25 +08:00
cai.zhang a362bb1457
Support array datatype (#26369)
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2023-09-19 14:23:23 +08:00
yihao.dai bb6711f28c
Add ChunkCache: support get vector from storage (#26142)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-09-15 10:21:20 +08:00
Enwei Jiao ca1349708b
Remove time travel ralted testcase (#26119)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-08-10 18:53:17 +08:00
yah01 53c3bf053e
Fix unstable sealed segment bruteforce unittest (#25867)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-07-25 09:05:00 +08:00
yah01 dd5f896dc8
Load batch by batch (#25212)
This will significantly reduce the memory usage while loading
- 1x memory usage and MBs overhead for buffer (memory mode)
- only MBs overhead for buffer (mmap mode)

Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-07-06 13:58:27 +08:00
Enwei Jiao 816158e4af
Remove outdated searchplan (#25282)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-07-04 18:30:25 +08:00
xige-16 04082b3de2
Migrate the ability to upload and download binlog to cpp (#22984)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-06-25 14:38:44 +08:00
PowderLi 3f4356df10
fix the spelling of `field` (#25008)
Signed-off-by: PowderLi <min.li@zilliz.com>
2023-06-21 14:00:42 +08:00
yah01 a413842e38
Fix deleted data is still visible (#24849)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-06-16 17:16:41 +08:00
foxspy 6f4ed517de
add growing segment index (#23615)
Signed-off-by: xianliang <xianliang.li@zilliz.com>
2023-04-26 10:14:41 +08:00
yihao.dai 092d743917
Add support for getting vectors by ids (#23450)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-04-23 09:00:32 +08:00
yah01 546080dcdd
Support to retrieve json (#23563)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-21 11:46:32 +08:00
Cai Yudong 5f4673fd16
Optimize unittest to save runtime (#23248)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2023-04-07 14:20:29 +08:00
yah01 081572d31c
Refactor QueryNode (#21625)
Signed-off-by: yah01 <yang.cen@zilliz.com>
Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: aoiasd <zhicheng.yue@zilliz.com>
2023-03-27 00:42:00 +08:00
Jiquan Long 8139106b51
Feat: count entities by expression (#22765)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-03-16 19:31:55 +08:00
Jiquan Long a36fefb009
Fix cpplint (#22657)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-03-10 09:47:54 +08:00