milvus

Commit Graph

Author	SHA1	Message	Date
zhenshan.cao	63843dce33	fix: Fix conan gdal building problem (#37338 ) issue:https://github.com/milvus-io/milvus/issues/27576 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-10-31 21:04:16 +08:00
Hao Tan	67c4340565	feat: Geospatial Data Type and GIS Function Support for milvus server (#35990 ) issue:https://github.com/milvus-io/milvus/issues/27576 # Main Goals 1. Create and describe collections with geospatial fields, enabling both client and server to recognize and process geo fields. 2. Insert geospatial data as payload values in the insert binlog, and print the values for verification. 3. Load segments containing geospatial data into memory. 4. Ensure query outputs can display geospatial data. 5. Support filtering on GIS functions for geospatial columns. # Solution 1. Add Type: Modify the Milvus core by adding a Geospatial type in both the C++ and Go code layers, defining the Geospatial data structure and the corresponding interfaces. 2. Dependency Libraries: Introduce necessary geospatial data processing libraries. In the C++ source code, use Conan package management to include the GDAL library. In the Go source code, add the go-geom library to the go.mod file. 3. Protocol Interface: Revise the Milvus protocol to provide mechanisms for Geospatial message serialization and deserialization. 4. Data Pipeline: Facilitate interaction between the client and proxy using the WKT format for geospatial data. The proxy will convert all data into WKB format for downstream processing, providing column data interfaces, segment encapsulation, segment loading, payload writing, and cache block management. 5. Query Operators: Implement simple display and support for filter queries. Initially, focus on filtering based on spatial relationships for a single column of geospatial literal values, providing parsing and execution for query expressions. 6. Client Modification: Enable the client to handle user input for geospatial data and facilitate end-to-end testing.Check the modification in pymilvus. --------- Signed-off-by: tasty-gumi <1021989072@qq.com>	2024-10-31 20:58:20 +08:00
aoiasd	db34572c56	feat: support load and query with bm25 metric (#36071 ) relate: https://github.com/milvus-io/milvus/issues/35853 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-10-11 10:23:20 +08:00
aoiasd	139787371e	feat: support embedding bm25 sparse vector and flush bm25 stats log (#36036 ) relate: https://github.com/milvus-io/milvus/issues/35853 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-09-19 10:57:12 +08:00
smellthemoon	21b135c7c2	fix: not append valid data when transfer to insert record (#36027 ) fix not append valid data when transfer to insert record and add a tiny check when in groupBy field. #35924 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-09-06 14:53:04 +08:00
wei liu	c45f38aa61	enhance: Update protobuf-go to protobuf-go v2 (#34394 ) issue: #34252 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-29 11:31:51 +08:00
shaoting-huang	88b373b024	enhance: binlog primary key turn off dict encoding (#34358 ) issue: #34357 Go Parquet uses dictionary encoding by default, and it will fall back to plain encoding if the dictionary size exceeds the dictionary size page limit. Users can specify custom fallback encoding by using `parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However, Go Parquet [fallbacks to plain encoding](`e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238)`) rather than custom encoding method users provide. Therefore, this patch only turns off dictionary encoding for the primary key. With a 5 million auto ID primary key benchmark, the parquet file size improves from 13.93 MB to 8.36 MB when dictionary encoding is turned off, reducing primary key storage space by 40%. Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-07-17 17:47:44 +08:00
smellthemoon	2a1356985d	enhance: support null in go payload (#32296 ) #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-06-19 17:08:00 +08:00
shaoting-huang	0ecd694305	enhance: legacy code clean up (#33838 ) issue: #33839 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-06-14 14:25:56 +08:00
cqy123456	976928ecd1	fix: fix fp16/bf16 some code missing and add more fp16/bf16 test (#31612 ) issue: #31534 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2024-03-28 14:11:10 +08:00
SimFG	b1a1cca10b	feat: add more operation detail info for better allocation (#30438 ) issue: #30436 --------- Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-03-28 06:33:11 +08:00
yihao.dai	31cf849f68	enhance: Support retriving file size from importutilv2.Reader (#31533 ) To reduce the overhead caused by listing the S3 objects, add an interface to importutil.Reader to retrieve file sizes. issue: https://github.com/milvus-io/milvus/issues/31532, https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-03-25 20:29:07 +08:00
Chun Han	c3264ca3e3	feat: support segment pruner (#31003 ) related: #30376	2024-03-22 13:57:06 +08:00
Buqian Zheng	3c80083f51	feat: [Sparse Float Vector] add sparse vector support to milvus components (#30630 ) add sparse float vector support to different milvus components, including proxy, data node to receive and write sparse float vectors to binlog, query node to handle search requests, index node to build index for sparse float column, etc. https://github.com/milvus-io/milvus/issues/29419 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-03-13 14:32:54 -07:00
aoiasd	a0537156c0	enhance: delete codc deserialize data by stream batch (#30407 ) relate: https://github.com/milvus-io/milvus/issues/30404 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-02-06 17:04:25 +08:00
Xu Tong	e429965f32	Add float16 approve for multi-type part (#28427 ) issue：https://github.com/milvus-io/milvus/issues/22837 Add bfloat16 vector, add the index part of float16 vector. Signed-off-by: Writer-X <1256866856@qq.com>	2024-01-11 15:48:51 +08:00
congqixia	f18a7191f2	enhance: make `ColumnBasedInsertMsgToInsertData` check field missing (#29758 ) fix: #29757 In previous code, `ColumnBasedInsertMsgToInsertData` adds empty field if the insertMsg parameter does not have the column schema defined. This may lead to unexpected behavior of caller functions. This PR: - Add column missing check - Add column length check - Generate BlobInfo for ColumnBasedInsertMsgToInsertData result --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-09 11:50:48 +08:00
congqixia	e576271a24	Fix buffer FieldData has no `ElementType` and array logsize always zero (#28295 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-11-09 14:16:20 +08:00
yah01	9658367a3c	Refine chunk manager errors (#27590 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-10-31 12:18:15 +08:00
SimFG	26f06dd732	Format the code (#27275 ) Signed-off-by: SimFG <bang.fu@zilliz.com>	2023-09-21 09:45:27 +08:00
Xu Tong	9166011c4a	Add float16 vector (#25852 ) Signed-off-by: Writer-X <1256866856@qq.com>	2023-09-08 10:03:16 +08:00
bjzhjing	548c82eca5	Refactor storage.MergeInsertData() to optimize the merging process (#26839 ) Benchmark Milvus with https://github.com/qdrant/vector-db-benchmark and specify the datasets as 'deep-image-96-angular'. Meanwhile, do perf profiling during 'upload + index' stage of vector-db-benchmark and see the following hot spots. 39.59%--github.com/milvus-io/milvus/internal/storage.MergeInsertData \| \|--21.43%--github.com/milvus-io/milvus/internal/storage.MergeFieldData \| \| \| \|--17.22%--runtime.memmove \| \| \| \|--1.53%--asm_exc_page_fault \| ...... \| \|--18.16%--runtime.memmove \| \|--1.66%--asm_exc_page_fault ...... The hot code path is in storage.MergeInsertData() which updates buffer.buffer by creating a new 'InsertData' instance and merging both the old buffer.buffer and addedBuffer into it. When it calls golang runtime.memmove to move buffer.buffer which is with big size (>1M), the hot spots appear. To avoid the above overhead, update storage.MergeInsertData() by appending addedBuffer to buffer.buffer, instead of moving buffer.buffer and addedBuffer to a new 'InsertData'. This change removes the hot spots 'runtime.memmove' from perf profiling output. Additionally, the 'upload + index' time, which is one performance metric of vector-db-benchmark, is reduced around 60% with this change. Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>	2023-09-05 21:41:48 +08:00
congqixia	41af0a98fa	Use go-api/v2 for milvus-proto (#24770 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-06-09 01:28:37 +08:00
congqixia	73a181d226	Fix get vector it timeout and improve some string const usage (#24141 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-05-16 17:41:22 +08:00
Enwei Jiao	967a97b9bd	Support json & array types (#23408 ) Signed-off-by: yah01 <yang.cen@zilliz.com> Co-authored-by: yah01 <yang.cen@zilliz.com>	2023-04-20 11:32:31 +08:00
jaime	c9d0c157ec	Move some modules from internal to public package (#22572 ) Signed-off-by: jaime <yun.zhang@zilliz.com>	2023-04-06 19:14:32 +08:00
Enwei Jiao	697dedac7e	Use cockroachdb/errors to replace other error pkg (#22390 ) Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>	2023-02-26 11:31:49 +08:00
Xiaofan	949d5d078f	Fix memory calculation in dataCodec (#21800 ) Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>	2023-01-28 11:09:52 +08:00
SimFG	a55f739608	Separate public proto files (#19782 ) Signed-off-by: SimFG <bang.fu@zilliz.com> Signed-off-by: SimFG <bang.fu@zilliz.com>	2022-10-16 20:49:27 +08:00
SimFG	d7f38a803d	Separate some proto files (#19218 ) Signed-off-by: SimFG <bang.fu@zilliz.com> Signed-off-by: SimFG <bang.fu@zilliz.com>	2022-09-16 16:56:49 +08:00
congqixia	68a6587374	Set insert&stats binlog timestamp range (#19005 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2022-09-04 09:05:09 +08:00
xige-16	515d0369de	Support string type in segcore (#16546 ) Signed-off-by: xige-16 <xi.ge@zilliz.com> Co-authored-by: dragondriver <jiquan.long@zilliz.com> Co-authored-by: dragondriver <jiquan.long@zilliz.com>	2022-04-29 13:35:49 +08:00
jaime	68b1b82faf	Remove DataKV interface (#16692 ) Signed-off-by: yun.zhang <yun.zhang@zilliz.com>	2022-04-28 21:03:47 +08:00
xige-16	205c92e54b	Support insert string data (#15993 ) Signed-off-by: xige-16 <xi.ge@zilliz.com>	2022-03-25 14:27:25 +08:00
Jiquan Long	3121619758	Chunk manager support scalar data (#16010 ) Signed-off-by: dragondriver <jiquan.long@zilliz.com>	2022-03-11 14:39:59 +08:00
Jiquan Long	f71651e294	Support column-based insert data in message stream (#15802 ) Signed-off-by: dragondriver <jiquan.long@zilliz.com>	2022-03-04 15:09:56 +08:00
XuanYang-cn	dd860a76cf	[skip e2e]Update license for storage util (#14453 ) Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2021-12-28 20:11:55 +08:00
congqixia	89a26b2679	Fix redundant init declaration in storage/util.go (#13381 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2021-12-15 11:13:10 +08:00
congqixia	001752eb91	Fix zero length slice declaration in storage/util.go (#12412 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2021-11-30 09:57:44 +08:00
bigsheeper	93149c5ad9	Load growing segment in query node (#11664 ) Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2021-11-12 18:27:10 +08:00
cai.zhang	5b42a3223c	Increase compatibility for EstimateMemorySize interface (#10603 ) Signed-off-by: cai.zhang <cai.zhang@zilliz.com>	2021-10-26 15:34:21 +08:00
dragondriver	f85271cf3f	Estimate memory size by descriptor event (#9688 ) Signed-off-by: dragondriver <jiquan.long@zilliz.com>	2021-10-12 17:00:34 +08:00
dragondriver	7daa319dc2	[skip ci] Rename EstimateMemorySize to GetBinlogSize (#9651 ) Signed-off-by: dragondriver <jiquan.long@zilliz.com>	2021-10-11 18:20:30 +08:00
dragondriver	1bc4b36617	Estimate the memory size of binlog file (#9612 ) Signed-off-by: dragondriver <jiquan.long@zilliz.com>	2021-10-11 14:10:48 +08:00

44 Commits (7c5a8012cf8e04b426ceb21aa797e944873a28cb)