milvus

Commit Graph

Author	SHA1	Message	Date
yihao.dai	ad950368fe	enhance: Fix parquet import OOM (#43756 ) Each ColumnReader consumes ReaderProperties.BufferSize memory independently. Therefore, the bufferSize should be divided by the number of columns to ensure total memory usage stays within the intended limit. issue: https://github.com/milvus-io/milvus/issues/43755 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-08-08 18:57:40 +08:00
wei liu	46dfe260da	enhance: Add timestamp filtering support to L0Reader (#43747 ) issue: #43745 Add timestamp filtering capability to L0Reader to match the functionality available in the regular Reader. This enhancement allows filtering delete records based on timestamp range during L0 import operations. Changes include: - Add tsStart and tsEnd fields to l0Reader struct for timestamp filtering - Modify NewL0Reader function signature to accept tsStart and tsEnd parameters - Implement timestamp filtering logic in Read method to skip records outside the specified range - Update L0ImportTask and L0PreImportTask to parse timestamp parameters from request options and pass them to NewL0Reader - Add comprehensive test case TestL0Reader_ReadWithTsFilter to verify ts filtering functionality using mockey framework Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-08-06 16:49:39 +08:00
yihao.dai	a29b3272b0	fix: Improve import memory management to prevent OOM (#43568 ) 1. Use blocking memory allocation to wait until memory becomes available 2. Perform memory allocation at the file level instead of per task 3. Limit Parquet file reader batch size to prevent excessive memory consumption 4. Limit import buffer size from 20% to 10% of total memory issue: https://github.com/milvus-io/milvus/issues/43387, https://github.com/milvus-io/milvus/issues/43131 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-28 21:25:35 +08:00
Spade A	faeb7fd410	feat: impl StructArray -- create schema, insert, and retrieve data (#42855 ) Ref https://github.com/milvus-io/milvus/issues/42148 https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of storage for handling with VectorArray. This PR: 1. impls the go part of storage for VectorArray 2. impls the collection creation with StructArrayField and VectorArray 3. insert and retrieve data from the collection. --------- Signed-off-by: SpadeA <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>	2025-07-27 01:30:55 +08:00
Ted Xu	9041bf1b9a	fix: including shouldCopy parameter in file readers (#43578 ) This parameter determines whether the returned value should be a copy or a reference from the arrow array. The updates enhance memory management and provide more control over data handling during deserialization. See #43186 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-07-26 17:30:55 +08:00
yihao.dai	9fbd41a97d	fix: Adjust binlog and parquet reader buffer size for import (#43495 ) 1. Modify the binlog reader to stop reading a fixed 4096 rows and instead use the calculated bufferSize to avoid generating small binlogs. 2. Use a fixed bufferSize (32MB) for the Parquet reader to prevent OOM. issue: https://github.com/milvus-io/milvus/issues/43387 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-23 21:28:54 +08:00
yihao.dai	1984be646c	fix: Fix storagev2 binlog import (#43221 ) issue: https://github.com/milvus-io/milvus/issues/43218 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-07-13 22:52:49 +08:00
congqixia	5a9efb3f81	enhance: [StorageV2] Refine storage rw option usage & validation (#43175 ) Related to #39173 This PR: - Make all datanode task passes storage config via storage config option - Remove legacy comments, rootPath & bucketName parameters - Fix clustering compaction option behavior - Add validation logic for `rwOptions` - Use correct storageType from storageConfig - Add storage config in sync task --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-07-11 01:14:48 +08:00
groot	1ee8cea35b	enhance: bulkinsert handle nullable/defaultValue/functionOutput fields (#42956 ) issue: https://github.com/milvus-io/milvus/issues/42173 Signed-off-by: yhmo <yihua.mo@zilliz.com>	2025-07-04 14:20:44 +08:00
cai.zhang	ebe1c95bb1	enhance: Add Size interface to FileReader to eliminate the StatObject call during Read (#42908 ) issue: #42907 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2025-06-25 14:36:41 +08:00
Zhen Ye	43f0c56ce7	fix: limit the concurency of zstd compression and decrease the memory usage of binlog generation (#42630 ) issue: #42028 - limit the concurrency of zstd compression. - zstd.go modified from `github.com/apache/arrow/go/v17/parquet/compress/ztsd.go` - may be related to #42129 Signed-off-by: chyezh <chyezh@outlook.com>	2025-06-11 09:06:34 +08:00
groot	14563ad2b3	enhance: bulkinsert handles nullable/default (#42127 ) issue: https://github.com/milvus-io/milvus/issues/42096, https://github.com/milvus-io/milvus/issues/42130 Signed-off-by: yhmo <yihua.mo@zilliz.com>	2025-05-28 18:02:28 +08:00
Ted Xu	7660be0993	feat: bulk insert support storage v2 (#41843 ) See #39173 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-05-19 10:34:24 +08:00
yihao.dai	6c1a37fca1	fix: Fix import reader goroutine leak (#41869 ) Close the chunk manager's reader after the import completes to prevent goroutine leaks. issues: https://github.com/milvus-io/milvus/issues/41868 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-05-16 10:18:35 +08:00
yihao.dai	71b14fc32b	enhance: Skip disk quota check for l0 import (#41571 ) issue: https://github.com/milvus-io/milvus/issues/41569 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-04-29 10:46:54 +08:00
yihao.dai	16eb5eb921	enhance: Accelerate delete filtering during binlog import (#41551 ) Use map for deleteData instead of slice to accelerate delete filtering during binlog import. issue: https://github.com/milvus-io/milvus/issues/41550 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-04-27 18:56:38 +08:00
SimFG	91d40fa558	fix: Update logging context and upgrade dependencies (#41318 ) - issue: #41291 --------- Signed-off-by: SimFG <bang.fu@zilliz.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2025-04-23 10:52:38 +08:00
yihao.dai	b4cb8a4b13	enhance: Add UTF-8 string validation for import (#40694 ) issue: https://github.com/milvus-io/milvus/issues/40684 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-04-01 19:04:21 +08:00
groot	aae3a3598e	enhance: bulkinsert supports parsing sparse vector form parquet struct (#40927 ) issue: https://github.com/milvus-io/milvus/issues/40777 Signed-off-by: yhmo <yihua.mo@zilliz.com>	2025-03-31 14:20:30 +08:00
Buqian Zheng	03b63bf982	fix: use NewInsertDataWithFunctionOutputField when importing binlog file (#40741 ) issue: https://github.com/milvus-io/milvus/issues/40740 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2025-03-19 10:50:14 +08:00
groot	9fbfcda48e	fix: Fix a crash issue of bulkinsert (#40331 ) issue: https://github.com/milvus-io/milvus/issues/40291 pr: https://github.com/milvus-io/milvus/pull/40304 Signed-off-by: yhmo <yihua.mo@zilliz.com>	2025-03-14 18:14:07 +08:00
yihao.dai	bab30a41bf	enhance: Improve import error msgs (#40567 ) issue: https://github.com/milvus-io/milvus/issues/40208 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-03-13 21:02:07 +08:00
Ted Xu	df4285c9ef	enhance: API integration with storage v2 in clustering-compactions (#40133 ) See #39173 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2025-03-13 14:12:06 +08:00
Xiaofan	fb48b3c7ac	fix: empty sparse row in importer (#40585 ) fix #40584 parquet bulk writer can not finish 0 dim sparse vector. Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2025-03-13 01:29:41 +08:00
jaime	c8a96377bb	enhance: move object storage client creation to pkg package (#40440 ) issue: #40439 Signed-off-by: jaime <yun.zhang@zilliz.com>	2025-03-12 20:38:07 +08:00
yihao.dai	2ca2e2dbc8	fix: Fix parsing import endTs (#40332 ) Parsing import beginTs, endTs as a hybrid timestamp. issue: https://github.com/milvus-io/milvus/issues/40326 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2025-03-10 17:38:04 +08:00
sthuang	90acc8a58f	enhance: upgrade go arrow version from 12.0.1 to 17.0.0 (#39916 ) related: https://github.com/milvus-io/milvus/issues/39915 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2025-02-25 10:30:02 +08:00
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
Cai Yudong	7476eb3625	feat: Support bulk insert for Int8Vector (#39499 ) Issue: #38666 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2025-01-23 10:19:06 +08:00
Zhen Ye	bb8d1ab3bf	enhance: make new go package to manage proto (#39114 ) issue: #39095 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-10 10:49:01 +08:00
smellthemoon	92a2d608ac	fix: Bulk insert failed when the nullable/default_value field is not exist (#39063 ) #39036 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2025-01-09 19:27:03 +08:00
yihao.dai	2b53b0905e	fix: Fix 0 read count during import (#38694 ) issue: https://github.com/milvus-io/milvus/issues/38693 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-12-24 20:06:49 +08:00
congqixia	b0bd290a6e	enhance: Use internal json(sonic) to replace std json lib (#37708 ) Related to #35020 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-18 10:46:31 +08:00
zhenshan.cao	63843dce33	fix: Fix conan gdal building problem (#37338 ) issue:https://github.com/milvus-io/milvus/issues/27576 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-10-31 21:04:16 +08:00
Hao Tan	67c4340565	feat: Geospatial Data Type and GIS Function Support for milvus server (#35990 ) issue:https://github.com/milvus-io/milvus/issues/27576 # Main Goals 1. Create and describe collections with geospatial fields, enabling both client and server to recognize and process geo fields. 2. Insert geospatial data as payload values in the insert binlog, and print the values for verification. 3. Load segments containing geospatial data into memory. 4. Ensure query outputs can display geospatial data. 5. Support filtering on GIS functions for geospatial columns. # Solution 1. Add Type: Modify the Milvus core by adding a Geospatial type in both the C++ and Go code layers, defining the Geospatial data structure and the corresponding interfaces. 2. Dependency Libraries: Introduce necessary geospatial data processing libraries. In the C++ source code, use Conan package management to include the GDAL library. In the Go source code, add the go-geom library to the go.mod file. 3. Protocol Interface: Revise the Milvus protocol to provide mechanisms for Geospatial message serialization and deserialization. 4. Data Pipeline: Facilitate interaction between the client and proxy using the WKT format for geospatial data. The proxy will convert all data into WKB format for downstream processing, providing column data interfaces, segment encapsulation, segment loading, payload writing, and cache block management. 5. Query Operators: Implement simple display and support for filter queries. Initially, focus on filtering based on spatial relationships for a single column of geospatial literal values, providing parsing and execution for query expressions. 6. Client Modification: Enable the client to handle user input for geospatial data and facilitate end-to-end testing.Check the modification in pymilvus. --------- Signed-off-by: tasty-gumi <1021989072@qq.com>	2024-10-31 20:58:20 +08:00
yihao.dai	b45cf2d49f	enhance: Add max length check for csv import (#37077 ) 1. Add max length check for csv import. 2. Tidy import options. 3. Tidy common import util functions. issue: https://github.com/milvus-io/milvus/issues/34150 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-10-25 14:37:29 +08:00
OxalisCu	60e51f1076	fix: unicode replacement character (0xFFFD) are not supported as csv delimiter (#36310 ) https://github.com/milvus-io/milvus/issues/36309 Signed-off-by: OxalisCu <2127298698@qq.com>	2024-10-17 14:45:40 +08:00
smellthemoon	463c47ced1	enhance: support default value in import (#36700 ) https://github.com/milvus-io/milvus/issues/31728 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-10-17 12:05:24 +08:00
Buqian Zheng	82c5cf2fa2	feat: add bulk insert support for Functions (#36715 ) issue: https://github.com/milvus-io/milvus/issues/35853 and https://github.com/milvus-io/milvus/issues/35856 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-10-12 17:19:20 +08:00
smellthemoon	b60164b882	enhance: support null in bulk insert of binlog to help backup null (#36526 ) https://github.com/milvus-io/milvus/issues/36341 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-09-26 14:35:14 +08:00
smellthemoon	89397d1e66	enhance: adjust parquet reader type check with null type (#36266 ) #36252 remove no need type check. if users use null type writer to write parquet, hope it successfully. Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-09-19 18:43:10 +08:00
smellthemoon	fc1bdd4c84	fix: to forbid bulk insert with nullable field in numpy files (#36246 ) #36241 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-09-14 15:35:07 +08:00
OxalisCu	3a381bc247	enhance: Bulkinsert supports null in csv formats (#35912 ) see details in this issue https://github.com/milvus-io/milvus/issues/35911 --------- Signed-off-by: OxalisCu <2127298698@qq.com>	2024-09-09 19:17:07 +08:00
cai.zhang	2c9bb4dfa3	feat: Support stats task to sort segment by PK (#35054 ) issue: #33744 This PR includes the following changes: 1. Added a new task type to the task scheduler in datacoord: stats task, which sorts segments by primary key. 2. Implemented segment sorting in indexnode. 3. Added a new field `FieldStatsLog` to SegmentInfo to store token index information. --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-09-02 14:19:03 +08:00
Xiaofan	50fcfe8ef1	enhance: add nan and inf check (#35683 ) fix #35594 add float check on files Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2024-08-25 15:22:57 +08:00
OxalisCu	ed4eaffc9d	enhance: add csv support for bulkinsert (#34938 ) See this issue for details: #34937 --------- Signed-off-by: OxalisCu <2127298698@qq.com>	2024-08-21 17:47:01 +08:00
Ted Xu	41646c8439	feat: integrate new deltalog format (#35522 ) See #34123 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-08-20 19:06:56 +08:00
smellthemoon	80a7c78f28	enhance: import supports null in parquet and json formats (#35558 ) #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-08-20 16:50:55 +08:00
nish112022	3948bd4e79	fix: Added check for validating varchar,array max length (#35499 ) issue : https://github.com/milvus-io/milvus/issues/34150 This is for numpy,parquet,json readers. --------- Signed-off-by: Nischay Yadav <nischay.yadav@ibm.com>	2024-08-20 11:42:55 +08:00
yihao.dai	b71e058bc5	enhance: Add import option to skip disk quota check (#35274 ) Add an option to skip the disk quota check for backup-restore import. issue: https://github.com/milvus-io/milvus/issues/33775 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-08-05 16:40:16 +08:00

1 2

85 Commits (master)