Commit Graph

40 Commits (6f1e9cd0f4aab81ff1dc0351d2f30942c6070e84)

Author SHA1 Message Date
zhagnlu 976b6fc0e4
enhance: change opendal as compile configurable (#30384)
#30373

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-02-20 19:16:52 +08:00
Jiquan Long 3f46c6d459
feat: support inverted index (#28783)
issue: https://github.com/milvus-io/milvus/issues/27704

Add inverted index for some data types in Milvus. This index type can
save a lot of memory compared to loading all data into RAM and speed up
the term query and range query.

Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL`
and `VARCHAR`.

Not supported: `ARRAY` and `JSON`.

Note:
- The inverted index for `VARCHAR` is not designed to serve full-text
search now. We will treat every row as a whole keyword instead of
tokenizing it into multiple terms.
- The inverted index don't support retrieval well, so if you create
inverted index for field, those operations which depend on the raw data
will fallback to use chunk storage, which will bring some performance
loss. For example, comparisons between two columns and retrieval of
output fields.

The inverted index is very easy to be used.

Taking below collection as an example:

```python
fields = [
		FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100),
		FieldSchema(name="int8", dtype=DataType.INT8),
		FieldSchema(name="int16", dtype=DataType.INT16),
		FieldSchema(name="int32", dtype=DataType.INT32),
		FieldSchema(name="int64", dtype=DataType.INT64),
		FieldSchema(name="float", dtype=DataType.FLOAT),
		FieldSchema(name="double", dtype=DataType.DOUBLE),
		FieldSchema(name="bool", dtype=DataType.BOOL),
		FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000),
		FieldSchema(name="random", dtype=DataType.DOUBLE),
		FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields)
collection = Collection("demo", schema)
```

Then we can simply create inverted index for field via:

```python
index_type = "INVERTED"
collection.create_index("int8", {"index_type": index_type})
collection.create_index("int16", {"index_type": index_type})
collection.create_index("int32", {"index_type": index_type})
collection.create_index("int64", {"index_type": index_type})
collection.create_index("float", {"index_type": index_type})
collection.create_index("double", {"index_type": index_type})
collection.create_index("bool", {"index_type": index_type})
collection.create_index("varchar", {"index_type": index_type})
```

Then, term query and range query on the field can be speed up
automatically by the inverted index:

```python
result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"])
result = collection.query(expr='int64 < 5', output_fields=["pk"])
result = collection.query(expr='int64 > 2997', output_fields=["pk"])
result = collection.query(expr='1 < int64 < 5', output_fields=["pk"])
```

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-12-31 19:50:47 +08:00
Bingyi Sun 36f69ea031
feat: integrate storagev2 in building index of segcore (#28768)
issue: https://github.com/milvus-io/milvus/issues/28655

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-12-05 16:48:54 +08:00
Enwei Jiao 8ae9c947ae
Use OpenDAL to access object store (#25642)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-11-01 09:00:14 +08:00
Enwei Jiao 0f2f4a0a75
Remove useless parameters for Makefile (#27622)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-10-11 20:45:35 +08:00
Enwei Jiao 4aed32ff61
Use librdkafka for all platform (#25538)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-07-13 15:34:33 +08:00
yah01 60fdd7e4f4
Introduce simdjson (#23644)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-26 10:30:34 +08:00
Cai Yudong ab3cbdfc61
Partial change to prepare for GPU index type support (#22591)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2023-03-14 23:21:56 +08:00
Enwei Jiao b25b3ef431
Integreation with Velox (#22102)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-02-16 17:26:35 +08:00
jaime e05ac56283
Revert build arrow with Conan (#21258)
Signed-off-by: yun.zhang <yun.zhang@zilliz.com>

Signed-off-by: yun.zhang <yun.zhang@zilliz.com>
2022-12-16 18:27:24 +08:00
Enwei Jiao 958e94f6f0
Use Conan as c++ package manager (#19920)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>

Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2022-11-23 10:39:11 +08:00
zhagnlu 05bbb12d30
Skip build aws sdk when not need diskann (#19965) (#19966)
Signed-off-by: zhagnlu <lu.zhang@zilliz.com>

Signed-off-by: zhagnlu <lu.zhang@zilliz.com>
Co-authored-by: zhagnlu <lu.zhang@zilliz.com>
2022-10-21 18:35:29 +08:00
mumon b48c37cb44
Fix typo in core (#19661)
Signed-off-by: Ziyu Wang <15871035978@163.com>

Signed-off-by: Ziyu Wang <15871035978@163.com>
2022-10-12 17:05:23 +08:00
xige-16 428840178c
Support diskann index for vector field (#19093)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-09-21 20:16:51 +08:00
xige-16 4de1bfe5bc
Add cpp data codec (#18538)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
Co-authored-by: zhagnlu lu.zhang@zilliz.com

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-09-09 22:12:34 +08:00
bigsheeper cef8b1e7cc
Enable jemalloc (#18349)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2022-07-20 22:22:31 +08:00
Enwei Jiao 16c3aedc15
refine complie configuration (#17502)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2022-06-24 21:12:15 +08:00
bigsheeper 92d06b2e30
Purge memory by the memory state and try to purge after each search (#17565)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2022-06-17 17:46:10 +08:00
bigsheeper cdcdfa1ea5
Disable jemalloc and use malloc_trim instead (#17538)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2022-06-14 11:12:09 +08:00
bigsheeper 2d9a52206d
Use jemalloc in QueryNode, DataNode and IndexNode (#17470)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2022-06-10 11:24:08 +08:00
Xiaofan 9579a645c6
Support compile marisa on Macos (#17261)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2022-05-31 13:28:02 +08:00
Enwei Jiao d28a2db46c
move arrow from storage to core (#17061)
Signed-off-by: Enwei Jiao <jiaoew2011@gmail.com>
2022-05-22 20:03:58 +08:00
Jiquan Long fd589baca7
Integrates marisa trie index (#16192)
Signed-off-by: dragondriver <jiquan.long@zilliz.com>
2022-04-01 15:31:29 +08:00
jaime 307a8ce535
Support compile and run on Mac (#15491)
Co-authored-by: jaime <yun.zhang@zilliz.com>
Co-authored-by: Cai Yudong <yudong.cai@zilliz.com>
Co-authored-by: Jenny Li <jing.li@zilliz.com>
Co-authored-by: Nemo <yuchen.gao@zilliz.com>
Signed-off-by: yun.zhang <yun.zhang@zilliz.com>

Co-authored-by: Cai Yudong <yudong.cai@zilliz.com>
Co-authored-by: Jenny Li <jing.li@zilliz.com>
Co-authored-by: Nemo <yuchen.gao@zilliz.com>
2022-02-09 14:27:46 +08:00
Cai Yudong 1ae249adb5
Update profiler CMakeLists.txt (#13001)
Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
2021-12-08 23:23:06 +08:00
Cai Yudong 85efcd8582
Move fiu nlohmann and easylogging to core/thirdparty (#12981)
Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
2021-12-08 18:45:07 +08:00
Cai Yudong 45bac3e4ec
Move profiler under core/thirdparty (#12949)
Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
2021-12-08 13:01:04 +08:00
Cai Yudong 7b97b155e8
Remove duplicated thirdparty (#12925)
Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
2021-12-08 10:37:37 +08:00
ZhiShen 6a770e5c38
Make knowhere compile independently (#7606)
Make knowhere compile independently

1. Make knowhere compile independently
    * Add gtest, arrow, and some other libraries to index.
    * Add cache, log and some other files to knowhere.
    * Add CMakeLists files to index's thirdparty.

2. Modified the compilation content of knowhere
    * Delete some content of compile library.
    * Add IMPORTED_GLOBAL property to faiss.

3.  Change the compilation location of some libraries
    * Make OpenBLas compiled in thirdpartycore.cmake.
    * Make faiss compiled in thirdparty/CMakeLists.

Change the content of knowhere/CMakeLists

1. Change easyloggingpp and nlohmann into index/thirdparty.
2. Change MILVUS_THIRDPARTY_SRC into KNOWHERE_THIRDPARTY_SRC.

Delete FindOpenBLAS

1. Delete Openblas.cmake.

2. The search task for openBlas is assigned to ThirdpartyCore.

3. Some changes were made to build.sh in index.

Fix the openBLas compilation problem

Delete the if-else in compilation of faiss;

Now when complie faiss, it will find the Openblas as we wish.

Fix some problem:

1. delete arrow

2. set openblas_source to AUTO

3. change a include_dir

4. delete MKL

5. delete the CMakeLists in index/utils,cache,log

Change variable build_test to knowhere_build_test in index/build.sh

Change the include location of  GNUInstallDirs

set CMAKE_INSTALL_LIBDIR

Resolves: milvus-io#5183
See also: milvus-io#6604

Signed-off-by: Shen Zhi <m13120163046@163.com>
2021-10-13 17:06:33 +08:00
FluorineDog af1900b42a Remove ConcurrentBitsetPtr in segcore
Signed-off-by: FluorineDog <guilin.gou@zilliz.com>
2021-03-09 16:16:43 +08:00
FluorineDog ef98dab2a9 Support segcore config
Signed-off-by: FluorineDog <guilin.gou@zilliz.com>
2021-03-04 17:09:48 +08:00
xige-16 4c491471ee Add release collection and release partition interface for query node
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2021-02-24 15:58:55 +08:00
xige-16 7a7a73e89c Fix high memory usage in pulsarTtStream
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2021-02-23 11:40:12 +08:00
FluorineDog 15dd17488e Support benchmark
Signed-off-by: FluorineDog <guilin.gou@zilliz.com>
2021-02-23 10:47:21 +08:00
XuanYang-cn e6f726e73a Add cache for thirdparty files cache
Signed-off-by: XuanYang-cn <xuan.yang@zilliz.com>
2020-12-08 18:51:07 +08:00
quicksilver d66d48c6b6 Enable UnitTest
Signed-off-by: quicksilver <zhifeng.zhang@zilliz.com>
2020-10-27 15:51:16 +08:00
shengjh 77b2fcf015 Refactor manipulationreq and add tso(pre-alloc from master)
Signed-off-by: shengjh <1572099106@qq.com>
2020-10-27 12:01:27 +08:00
quicksilver eb64839aef Update build environment
Signed-off-by: quicksilver <zhifeng.zhang@zilliz.com>
2020-10-26 15:45:18 +08:00
FluorineDog e84b0180c9 Refactor cmake and build script and add timed benchmark
Signed-off-by: FluorineDog <guilin.gou@zilliz.com>
2020-10-23 18:01:24 +08:00
zhenshan.cao 64295db471 Refact master and proxy and add etcdutil
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2020-10-15 21:31:50 +08:00