issue:https://github.com/milvus-io/milvus/issues/27576
# Main Goals
1. Create and describe collections with geospatial fields, enabling both
client and server to recognize and process geo fields.
2. Insert geospatial data as payload values in the insert binlog, and
print the values for verification.
3. Load segments containing geospatial data into memory.
4. Ensure query outputs can display geospatial data.
5. Support filtering on GIS functions for geospatial columns.
# Solution
1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.
6. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.
---------
Signed-off-by: tasty-gumi <1021989072@qq.com>
fix not append valid data when transfer to insert record and add a tiny
check when in groupBy field.
#35924
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
issue: #34357
Go Parquet uses dictionary encoding by default, and it will fall back to
plain encoding if the dictionary size exceeds the dictionary size page
limit. Users can specify custom fallback encoding by using
`parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However,
Go Parquet [fallbacks to plain
encoding](e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238))
rather than custom encoding method users provide. Therefore, this patch
only turns off dictionary encoding for the primary key.
With a 5 million auto ID primary key benchmark, the parquet file size
improves from 13.93 MB to 8.36 MB when dictionary encoding is turned
off, reducing primary key storage space by 40%.
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
add sparse float vector support to different milvus components,
including proxy, data node to receive and write sparse float vectors to
binlog, query node to handle search requests, index node to build index
for sparse float column, etc.
https://github.com/milvus-io/milvus/issues/29419
---------
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
fix: #29757
In previous code, `ColumnBasedInsertMsgToInsertData` adds empty field if
the insertMsg parameter does not have the column schema defined. This
may lead to unexpected behavior of caller functions.
This PR:
- Add column missing check
- Add column length check
- Generate BlobInfo for ColumnBasedInsertMsgToInsertData result
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Benchmark Milvus with https://github.com/qdrant/vector-db-benchmark and
specify the datasets as 'deep-image-96-angular'. Meanwhile, do perf
profiling during 'upload + index' stage of vector-db-benchmark and see
the following hot spots.
39.59%--github.com/milvus-io/milvus/internal/storage.MergeInsertData
|
|--21.43%--github.com/milvus-io/milvus/internal/storage.MergeFieldData
| |
| |--17.22%--runtime.memmove
| |
| |--1.53%--asm_exc_page_fault
| ......
|
|--18.16%--runtime.memmove
|
|--1.66%--asm_exc_page_fault
......
The hot code path is in storage.MergeInsertData() which updates
buffer.buffer by creating a new 'InsertData' instance and merging both
the old buffer.buffer and addedBuffer into it. When it calls golang
runtime.memmove to move buffer.buffer which is with big size (>1M), the
hot spots appear.
To avoid the above overhead, update storage.MergeInsertData() by
appending addedBuffer to buffer.buffer, instead of moving buffer.buffer
and addedBuffer to a new 'InsertData'. This change removes the hot spots
'runtime.memmove' from perf profiling output. Additionally, the 'upload
+ index' time, which is one performance metric of vector-db-benchmark,
is reduced around 60% with this change.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>