issue: https://github.com/milvus-io/milvus/issues/27704
Add inverted index for some data types in Milvus. This index type can
save a lot of memory compared to loading all data into RAM and speed up
the term query and range query.
Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL`
and `VARCHAR`.
Not supported: `ARRAY` and `JSON`.
Note:
- The inverted index for `VARCHAR` is not designed to serve full-text
search now. We will treat every row as a whole keyword instead of
tokenizing it into multiple terms.
- The inverted index don't support retrieval well, so if you create
inverted index for field, those operations which depend on the raw data
will fallback to use chunk storage, which will bring some performance
loss. For example, comparisons between two columns and retrieval of
output fields.
The inverted index is very easy to be used.
Taking below collection as an example:
```python
fields = [
FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100),
FieldSchema(name="int8", dtype=DataType.INT8),
FieldSchema(name="int16", dtype=DataType.INT16),
FieldSchema(name="int32", dtype=DataType.INT32),
FieldSchema(name="int64", dtype=DataType.INT64),
FieldSchema(name="float", dtype=DataType.FLOAT),
FieldSchema(name="double", dtype=DataType.DOUBLE),
FieldSchema(name="bool", dtype=DataType.BOOL),
FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000),
FieldSchema(name="random", dtype=DataType.DOUBLE),
FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields)
collection = Collection("demo", schema)
```
Then we can simply create inverted index for field via:
```python
index_type = "INVERTED"
collection.create_index("int8", {"index_type": index_type})
collection.create_index("int16", {"index_type": index_type})
collection.create_index("int32", {"index_type": index_type})
collection.create_index("int64", {"index_type": index_type})
collection.create_index("float", {"index_type": index_type})
collection.create_index("double", {"index_type": index_type})
collection.create_index("bool", {"index_type": index_type})
collection.create_index("varchar", {"index_type": index_type})
```
Then, term query and range query on the field can be speed up
automatically by the inverted index:
```python
result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"])
result = collection.query(expr='int64 < 5', output_fields=["pk"])
result = collection.query(expr='int64 > 2997', output_fields=["pk"])
result = collection.query(expr='1 < int64 < 5', output_fields=["pk"])
```
---------
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
Make knowhere compile independently
1. Make knowhere compile independently
* Add gtest, arrow, and some other libraries to index.
* Add cache, log and some other files to knowhere.
* Add CMakeLists files to index's thirdparty.
2. Modified the compilation content of knowhere
* Delete some content of compile library.
* Add IMPORTED_GLOBAL property to faiss.
3. Change the compilation location of some libraries
* Make OpenBLas compiled in thirdpartycore.cmake.
* Make faiss compiled in thirdparty/CMakeLists.
Change the content of knowhere/CMakeLists
1. Change easyloggingpp and nlohmann into index/thirdparty.
2. Change MILVUS_THIRDPARTY_SRC into KNOWHERE_THIRDPARTY_SRC.
Delete FindOpenBLAS
1. Delete Openblas.cmake.
2. The search task for openBlas is assigned to ThirdpartyCore.
3. Some changes were made to build.sh in index.
Fix the openBLas compilation problem
Delete the if-else in compilation of faiss;
Now when complie faiss, it will find the Openblas as we wish.
Fix some problem:
1. delete arrow
2. set openblas_source to AUTO
3. change a include_dir
4. delete MKL
5. delete the CMakeLists in index/utils,cache,log
Change variable build_test to knowhere_build_test in index/build.sh
Change the include location of GNUInstallDirs
set CMAKE_INSTALL_LIBDIR
Resolves: milvus-io#5183
See also: milvus-io#6604
Signed-off-by: Shen Zhi <m13120163046@163.com>