Commit Graph

22 Commits (99da94ee3247e58580ac6eaaf4d5b61e48541ef2)

Author SHA1 Message Date
wayblink a26e965e6a
enhance:[cherry-pick] Add compaction task slot usage logic (#34625)
issue: #34544
pr: #34581

---------

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-07-18 09:55:43 +08:00
aoiasd 07daa8f12b
enhance:[Cherry-pick] avoid maintain checkpoint info in sync manager (#33413) (#34285)
relate: https://github.com/milvus-io/milvus/issues/32915
pr: https://github.com/milvus-io/milvus/pull/33413

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-07-03 19:02:09 +08:00
congqixia f741bb7526
enhance: [2.4] Avoid merging insert data when buffering insert msgs (#34205)
Cherry-pick from master
pr: #33526 #33817
See also #33561

This PR:
- Use zero copy when buffering insert messages
- Make `storage.InsertCodec` support serialize multiple insert data
chunk into same batch binlog files

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-27 10:14:05 +08:00
congqixia 3cf526e0cc
fix: [2.4] Deep copy ImportTask.segmentsInfo to prevent data race (#34090) (#34126)
Cherry-pick from master
pr: #34090
See also #34089

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-26 11:50:11 +08:00
yihao.dai b71a404776
fix: Check if the import job exists (#33672) (#33673)
issue: https://github.com/milvus-io/milvus/issues/33671

pr: https://github.com/milvus-io/milvus/pull/33672

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-06-10 21:50:29 +08:00
yihao.dai ed1dee9e38
enhance: Support L0 import (#33514) (#33712)
issue: https://github.com/milvus-io/milvus/issues/33157

pr: https://github.com/milvus-io/milvus/pull/33514

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-06-08 11:17:52 +08:00
yihao.dai e81ae1e5a4
fix: Fix import segment size is uneven (#33605) (#33634)
The data coordinator computed the appropriate number of import segments,
thus when importing in the data node, one can randomly select a segment.

issue: https://github.com/milvus-io/milvus/issues/33604

pr: https://github.com/milvus-io/milvus/pull/33605

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-06-05 18:49:52 +08:00
yihao.dai e282e1408e
enhance: Abstract Execute interface for import/preimport task (#33234) (#33607)
Abstract Execute interface for import/preimport task, simplify import
scheduler.

issue: https://github.com/milvus-io/milvus/issues/33157

pr: https://github.com/milvus-io/milvus/pull/33234

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-06-05 11:17:56 +08:00
congqixia 2c1e8f4774
enhance: Use `struct{}` for sync task future result (#32673)
Related to #27675

Use `struct{}` instead `error` for sync task future result type to
reduce result size and preventing logci error.

Also change some unused parameter to `_` to suppress lint warning

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-04-29 10:59:26 +08:00
yihao.dai 558feed5ed
fix: Use pk from binlog during import (#32118)
During binlog import, even if the primary key's autoID is set to true,
the primary key from the binlog should be used instead of being
reassigned.

issue: https://github.com/milvus-io/milvus/discussions/31943,
https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-04-16 14:51:20 +08:00
yihao.dai aa96843d31
fix: Fix import hanging and improve logging output (#32166)
Fix import hanging when the previous import task failed, and improve
parquet import logging outout.

issue: https://github.com/milvus-io/milvus/issues/31834

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-04-13 22:03:23 +08:00
yihao.dai 49d109de18
enhance: Use an individual buffer size parameter for imports (#31833)
Use an individual buffer size parameter for imports and set buffer size
to 64MB.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-04-08 21:07:18 +08:00
yihao.dai 4e264003bf
enhance: Ensure ImportV2 waits for the index to be built and refine some logic (#31629)
Feature Introduced:
1. Ensure ImportV2 waits for the index to be built

Enhancements Introduced:
1. Utilization of local time for timeout ts instead of allocating ts
from rootcoord.
3. Enhanced input file length check for binlog import.
4. Removal of duplicated manager in datanode.
5. Renaming of executor to scheduler in datanode.
6. Utilization of a thread pool in the scheduler in datanode.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-04-01 20:09:13 +08:00
yihao.dai 31cf849f68
enhance: Support retriving file size from importutilv2.Reader (#31533)
To reduce the overhead caused by listing the S3 objects, add an
interface to importutil.Reader to retrieve file sizes.

issue: https://github.com/milvus-io/milvus/issues/31532,
https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-25 20:29:07 +08:00
yihao.dai f65a796d18
enhance: Add max file num limit and max file size limit for import (#31497)
The max number of import files per request should not exceed 1024 by
default (configurable).
The import file size allowed for importing should not exceed 16GB by
default (configurable).

issue: https://github.com/milvus-io/milvus/issues/28521

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-22 18:13:06 +08:00
yihao.dai 776709e5ff
fix: Fix binlog import (#31310)
Fix binlog import functionality by removing the existing check and
refining the size retrieval process.

issue: https://github.com/milvus-io/milvus/issues/31221,
https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-17 20:59:04 +08:00
yihao.dai 811316d2ba
fix: Fix binlog import and refine error reporting (#31241)
1. Fix binlog import with partition key.
2. Refine binlog import error reportins.
3. Avoid division by zero when retrieving import progress.

issue: https://github.com/milvus-io/milvus/issues/31221,
https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-15 10:55:05 +08:00
yihao.dai b5c67948b7
enhance: Enhance and modify the return content of ImportV2 (#31192)
1. The Import APIs now provide detailed progress information for each
imported file, including details such as file name, file size, progress,
and more.
2. The APIs now return the collection name and the completion time.
3. Other modifications include changing jobID to jobId and other similar
adjustments.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-13 19:51:03 +08:00
yihao.dai a434d33e75
feat: Add import scheduler and manager (#29367)
This PR introduces novel managerial roles for importv2:
1. ImportMeta: To manage all the import tasks;
2. ImportScheduler: To process tasks and modify their states;
3. ImportChecker: To ascertain the completion of all tasks and instigate
relevant operations.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-01 18:31:02 +08:00
yihao.dai 18b979d9b4
enhance: Extend support for varchar autoID to BulkInsertV2 (#30477)
issue: https://github.com/milvus-io/milvus/issues/30476

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-02-04 16:57:05 +08:00
yihao.dai 7ce876a072
fix: Decoupling importing segment from flush process (#30402)
This pr decoups importing segment from flush process by:
1. Exclude the importing segment from the flush policy, this approch
avoids notifying the datanode to flush the importing segment, which may
not exist.
2. When RootCoord call Flush, DataCoord directly set the importing
segment state to `Flushed`.

issue: https://github.com/milvus-io/milvus/issues/30359

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-02-03 13:01:12 +08:00
yihao.dai c5918290e6
feat: Add import executor and manager for datanode (#29438)
This PR introduces novel importv2 roles for datanode:
1. Executor: To execute tasks, a import task will be divided into the
following steps: read data -> hash data -> sync data;
2. Manager: To manage all the tasks;

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-31 20:45:04 +08:00