issue: #43427
pr: #37417
Support R-Tree index for geometry datatype.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>
issue: #43427
pr: #37417
This pr's main goal is merge #37417 to milvus 2.5 without conflicts.
# Main Goals
1. Create and describe collections with geospatial type
2. Insert geospatial data into the insert binlog
3. Load segments containing geospatial data into memory
4. Enable query and search can display geospatial data
5. Support using GIS funtions like ST_EQUALS in query
# Solution
1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.Now only support brutal search
6. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.
---------
Signed-off-by: Yinwei Li <yinwei.li@zilliz.com>
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: cai.zhang <cai.zhang@zilliz.com>
pr: #43833
Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
Signed-off-by: Li Liu <li.liu@zilliz.com>
Co-authored-by: Alexander Guzhva <alexanderguzhva@gmail.com>
issue: #43040
pr: #42665
This patch introduces a disk file writer that supports Direct IO.
Currently, it is exclusively utilized during the QueryNode load process.
Below is its parameters:
1. `common.diskWriteMode` This parameter controls the write mode of the
local disk, which is used to write temporary data downloaded from remote
storage. Currently, only QueryNode uses 'common.diskWrite*' parameters.
Support for other components will be added in the future.
The options include 'direct' and 'buffered'. The default value is
'buffered'.
2. `common.diskWriteBufferSizeKb` Disk write buffer size in KB, only
used when disk write mode is 'direct', default is 64KB.
Current valid range is [4, 65536]. If the value is not aligned to 4KB,
it will be rounded up to the nearest multiple of 4KB.
3. `common.diskWriteNumThreads` This parameter controls the number of
writer threads used for disk write operations. The valid range is [0,
hardware_concurrency]. It is designed to limit the maximum concurrency
of disk write operations to reduce the impact on disk read performance.
For example, if you want to limit the maximum concurrency of disk write
operations to 1, you can set this parameter to 1.
The default value is 0, which means the caller will perform write
operations directly without using an additional writer thread pool. In
this case, the maximum concurrency of disk write operations is
determined by the caller's thread pool size.
Both parameters can be updated during runtime.
---------
Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
Cherry-pick from master
pr: #43321
Related to #43262
This patch fixes following logic bug:
- When multiple chunks are loaded and size cannot be divided by 8, just
appending uint8_t as bitmap will cause null bitmap dislocation
- `null_bitmap_data()` points to start of whole row group, which may not
stand for current `arrow::Array`
The current solutions is:
- Reorganize the null_bitmap with currect size & offset
- Pass `array->offset()` in tuple to info the current offset
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #42900
pr: #43179
Upd: also handles Inf and NaN values, and the division by zero case for
fp32 and fp64
Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
related: #42803
1. add a new thread pools using folly::CPUThreadPoolExecutor, named by
FThreadPools
2. reading vectors from chunkcache will use the separated
CHUNKCACHE_POOL to avoid being influenced by load collection
3. Note. For safety on cloud side on 2.5.x, only read-chunk-cache
operations is using this newly created thread pools other caller points
for threadpool will be mutated in the near future
4. master-branch doesn't need this pr as caching layer unified the chunk
cache behaviour
Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>