Related to #43230
This PR
- Move segcore setup function to `initcore` package to remove cgo
dependency from pkg
- Register core callback only for components depends on segcore
- Rectify `UpdateLogLevel` implementation
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #43745
Add timestamp filtering capability to L0Reader to match the
functionality available in the regular Reader. This enhancement allows
filtering delete records based on timestamp range during L0 import
operations.
Changes include:
- Add tsStart and tsEnd fields to l0Reader struct for timestamp
filtering
- Modify NewL0Reader function signature to accept tsStart and tsEnd
parameters
- Implement timestamp filtering logic in Read method to skip records
outside the specified range
- Update L0ImportTask and L0PreImportTask to parse timestamp parameters
from request options and pass them to NewL0Reader
- Add comprehensive test case TestL0Reader_ReadWithTsFilter to verify ts
filtering functionality using mockey framework
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
1. Use blocking memory allocation to wait until memory becomes available
2. Perform memory allocation at the file level instead of per task
3. Limit Parquet file reader batch size to prevent excessive memory
consumption
4. Limit import buffer size from 20% to 10% of total memory
issue: https://github.com/milvus-io/milvus/issues/43387,
https://github.com/milvus-io/milvus/issues/43131
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Ref https://github.com/milvus-io/milvus/issues/42148https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
This parameter determines whether the returned value should be a copy or
a reference from the arrow array. The updates enhance memory management
and provide more control over data handling during deserialization.
See #43186
---------
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
See: #43186
In this PR:
1. Flush renamed to FlushChunk, while a new Flush primitive is
introduced to serialize values to records.
2. Segment mapping in clustering compaction now process data by records
instead of values, it calls flush to all buffers after each record is
processed.
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
Realted #43520
Datanode may have memory leakage when reader returns error. In
previously mention issue, datanodes got OOM killed due to continueous
error in read path.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #43522
Currently, passing partial schema to storage v2 packed reader may
trigger SEGV during clustering compaction unit test.
This patch implement `NeededFields` differently in each `RecordReader`
imlementation. For now, v2 will implemented as no-op. This will be
supported after packed reader support this API.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #43072, #43289
- manage the schema version at recovery storage.
- update the schema when creating collection or alter schema.
- get schema at write buffer based on version.
- recover the schema when upgrading from 2.5.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
1. Modify the binlog reader to stop reading a fixed 4096 rows and
instead use the calculated bufferSize to avoid generating small binlogs.
2. Use a fixed bufferSize (32MB) for the Parquet reader to prevent OOM.
issue: https://github.com/milvus-io/milvus/issues/43387
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Correct read and buffer size to 64MB to prevent OOM during clustering
compaction.
issue: https://github.com/milvus-io/milvus/issues/43310
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #43107
- Add checkLoadConfigChanges() to apply load config during startup
- Call config check in startQueryCoord() after restart
- Skip auto-updates for collections with user-specified replica numbers
- Add is_user_specified_replica_mode field to preserve user settings
- Add comprehensive unit tests with mockey
Ensures existing collections use latest cluster-level config after
restart.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
- Introduce dynamic buffer sizing to avoid generating small binlogs
during import
- Refactor import slot calculation based on CPU and memory constraints
- Implement dynamic pool sizing for sync manager and import tasks
according to CPU core count
issue: https://github.com/milvus-io/milvus/issues/43131
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related to #39173
This PR
- Close packed reader after sort
- Release arrow.Record preventing memory leakage
- Invoke `pack_reader->Close()` for CloseReader
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Fix issues in end-to-end tests:
1. **Split column groups based on schema**, rather than estimating by
average chunk row size. **Ensure column group consistency within a
segment**, to avoid errors caused by loading multiple column group
chunks simultaneously.
2. **Use sorted segmentId** when generating the stats binlog path, to
ensure consistent and correct file path resolution.
3. **Determine field IDs as follows**:
For multi-column column groups, retrieve the field ID list from
metadata.
For single-column column groups, use the column group ID directly as the
field ID.
related: #39173fix: #42862
---------
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
1. Optimize the import process: skip subsequent steps and mark the task
as complete if the number of imported rows is 0.
2. Improve import integration tests:
a. Add a test to verify that autoIDs are not duplicated
b. Add a test for the corner case where all data is deleted
c. Shorten test execution time
3. Enhance import logging:
a. Print imported segment information upon completion
b. Include file name in failure logs
issue: https://github.com/milvus-io/milvus/issues/42488,
https://github.com/milvus-io/milvus/issues/42518
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related to #39173
Like logic in #41919, storage v2 fs shall use complete paths with
bucketName prefix to be compatible with its definition. This PR fills
bucket name from config when creating reader for compaction tasks.
NOTE: the bucket name shall be read from task params config for
compaction task pooling.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Increase insert buffer size from 16MB to 64MB, while keeping delete
buffer size at 16MB.
issue: https://github.com/milvus-io/milvus/issues/42518
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related to #42530
The cluster id is missing when drop worker drop causing redoing task on
report duplicated task error.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Remove the unlimited logID mechanism and switch to redundantly
allocating a large number of IDs.
issue: https://github.com/milvus-io/milvus/issues/42518
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Remove the hardcoded batchSize of 100,000 and instead trigger a write
every 64MB based on actual data size. This prevents sort stats from
generating excessively large binlog files.
issue: https://github.com/milvus-io/milvus/issues/42400
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #41976
- make drop partition message as a broadcast message.
- add gc when drop partition message is acked.
- add a call back to handle the broadcast message when ack.
- the ack operation of broadcast message will retry until success.
Signed-off-by: chyezh <chyezh@outlook.com>
This PR fixes a minor typo in a log message in the `PreExecute` method
of `internal/datanode/index/task_index.go`.
Corrected "dimesion" to "dimension".
Signed-off-by: hckex <33862757+hckex@users.noreply.github.com>
This PR fixes a minor typo in the README file of the datanode module.
Corrected "imformation" to "information".
Signed-off-by: hckex <33862757+hckex@users.noreply.github.com>