cai.zhang
2a516697c2
enhance: [2.5] Only download necessary fields during clustering analyze phase ( #43362 )
...
issue: #43310
master pr: #43322
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-23 10:18:53 +08:00
yihao.dai
7c8370ccd2
fix: [2.5] Fix ants.Pool goroutine leak ( #41893 )
...
1. Release the pool after it is no longer in use.
2. Upgrade ants.Pool to fix the goroutine leak issue (see
https://github.com/panjf2000/ants/pull/287 ).
issue: https://github.com/milvus-io/milvus/issues/41838
pr: https://github.com/milvus-io/milvus/pull/41892
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-05-16 19:12:22 +08:00
yihao.dai
a7c818cadb
fix: [2.5] Fix no candidate segments error for small import ( #41772 )
...
When autoID is enabled, the preimport task estimates row distribution by
evenly dividing the total row count (numRows) across all vchannels:
`estimatedCount = numRows / vchannelNum`.
However, the actual import task hashes real auto-generated IDs to
determine
the target vchannel. This mismatch can lead to inaccurate row
distribution estimation
in such corner cases:
- Importing 1 row into 2 vchannels:
• Preimport: 1 / 2 = 0 → both v0 and v1 are estimated to have 0 rows
• Import: real autoID (e.g., 457975852966809057) hashes to v1
→ actual result: v0 = 0, v1 = 1
To resolve such corner case, we now allocate at least one segment for
each vchannel
when autoID is enabled, ensuring all vchannels are prepared to receive
data even
if no rows are estimated for them.
issue: https://github.com/milvus-io/milvus/issues/41759
pr: https://github.com/milvus-io/milvus/pull/41771
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-05-14 10:36:22 +08:00
SimFG
18eb627533
fix: [2.5] Update logging context and upgrade dependencies ( #41319 )
...
- issue: #41291
- pr: #41318
---------
Signed-off-by: SimFG <bang.fu@zilliz.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-04-24 23:50:40 +08:00
yihao.dai
27ea5d14dc
fix: [2.5] Fix delete data loss due to duplicate binlogID ( #40976 )
...
With concurrenct L0 compaction
(https://github.com/milvus-io/milvus/pull/36816 ), delta logs might be
written to the same L1 segment, causing logID duplication when using the
incremental beginLogID. This PR removes the beginLogID mechanism and
instead passes a log ID range, where the number of IDs in the range
equals the number of compaction segment binlogs multiplied by an
expansion factor.
issue: https://github.com/milvus-io/milvus/issues/40207
pr: https://github.com/milvus-io/milvus/pull/40960
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-03-28 14:34:21 +08:00
XuanYang-cn
281260e48a
fix: Massive memory cost when compacting ( #40763 )
...
downloads batch binlogs instead of all segment's binlogs
See also: #40761
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-03-20 11:28:11 +08:00
XuanYang-cn
f455923ac9
enhance: Use correct counter metrics for overall wa calculation ( #40394 ) ( #40679 )
...
pr: #40394
- Use CounterVec to calculate sum of increase during a time period.
- Use entries number instead of binlog size
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-03-17 15:06:19 +08:00
congqixia
709594f158
enhance: [2.5] Use v2 package name for pkg module ( #40117 )
...
Cherry-pick from master
pr: #39990
Related to #39095
https://go.dev/doc/modules/version-numbers
Update pkg version according to golang dep version convention
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-23 00:46:01 +08:00
XuanYang-cn
8067113133
enhance: [cp25]Enable to observe write amplification ( #39661 ) ( #39743 )
...
pr: #39661
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-02-17 16:00:17 +08:00
congqixia
a48749cc11
enhance: [2.5] Use mockery pkg config for datacoord&datanode ( #39567 ) ( #39577 )
...
Cherry-pick from master
pr: #39567
Related to #38339
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-01-24 17:21:13 +08:00
XuanYang-cn
afef5fed60
fix: Clustering compaction ignoring deltalogs ( #39133 )
...
See also: #39131
pr: #39132
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-01-10 14:07:05 +08:00
Zhen Ye
95809ca767
enhance: make new go package to manage proto ( #39128 )
...
issue: #39095
pr: #39114
---------
Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-10 10:53:01 +08:00
XuanYang-cn
b457c2f415
enhance: [2.5]Add missing delete metrics ( #38634 ) ( #38747 )
...
Add 2 counter metrics:
- Total delete entries from deltalog:
milvus_datanode_compaction_delete_count
- Total missing deletes: milvus_datanode_compaction_missing_delete_count
See also: #34665
pr: #38634
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-01-07 11:20:56 +08:00
tinswzy
27229f7907
enhance: refine exists log print with ctx ( #38080 )
...
issue: #35917
Refines exists log print with ctx
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2024-12-14 22:36:44 +08:00
cai.zhang
6ffc57c8dc
fix: Fix sorting buffer in clustering compaction ( #38417 )
...
issue: #28410
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-12-13 10:12:49 +08:00
Ted Xu
dc85d8e968
enhance: improve mix compaction performance by removing max segment limitations ( #38344 )
...
See #37234
---------
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-12-11 20:38:42 +08:00
cai.zhang
41b19c6b1d
enhance: Determine the number of buffers based on the resource limits of the DataNode ( #38209 )
...
issue: #28410
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-12-08 18:02:40 +08:00
cai.zhang
dae4160466
enhance: Whether to enable mergeSort mode when performing mixCompaction ( #37664 )
...
issue: #37579
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-19 11:28:31 +08:00
yihao.dai
81879425e1
enhance: Optimize the performance of stats task ( #37374 )
...
1. Increase the writer's `batchSize` to avoid multiple serialization
operations.
2. Perform asynchronous upload of binlog files to prevent blocking the
data processing flow.
3. Reduce multiple calls to `writer.Flush()`.
issue: https://github.com/milvus-io/milvus/issues/37373
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-08 10:08:27 +08:00
Ted Xu
bc9562feb1
enhance: avoid memory copy and serde in mix compaction ( #37479 )
...
See: #37234
---------
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-11-07 16:30:57 -08:00
aoiasd
b4c749dcd5
fix: merge sort segment loss data ( #37400 )
...
relate: https://github.com/milvus-io/milvus/issues/37238
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-11-07 11:18:26 +08:00
Ted Xu
b792b199d7
enhance: load deltalogs on demand when doing compactions ( #37310 )
...
See #37234
---------
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-11-01 16:40:21 +08:00
Ted Xu
262a994d6d
enhance: generally improve the performance of mix compactions ( #37163 )
...
See #37234
---------
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-10-29 18:12:20 +08:00
cai.zhang
04c306e63f
fix: Fix clustering compaction task leak ( #36800 )
...
issue: #36686
bug reason:
- The clustering compaction tasks on the datanode were never cleaned up.
- The clustering compaction task contains a mapping from clustering key
to buffer, this caused a large memory leak.
fix:
- clean the tasks on datanode by datacoord when clustering compaction
finished.
- reset the mapping that from clustering key to buffer on datanode when
clustering finished.
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-10-17 20:43:30 +08:00
aoiasd
5ec4163d0f
feat: support bm25 logs mixcompaction ( #36072 )
...
relate: https://github.com/milvus-io/milvus/issues/35853
---------
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-10-14 16:57:22 +08:00
CharlesFeng
7c8b71e26c
fix: BinlogDeserializeReader leak in mix_compactor.go ( #36270 )
...
https://github.com/milvus-io/milvus/issues/36269
Signed-off-by: fengjun2016 <jornfeng@gmail.com>
2024-10-11 15:41:20 +08:00
XuanYang-cn
290ceb4e84
enhance: Add more info in logs ( #36731 )
...
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-10 17:51:25 +08:00
wayblink
00a5025949
enhance: support clustering compaction on null value ( #36372 )
...
issue: #36055
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-09-30 14:33:17 +08:00
cai.zhang
2adca8b754
fix: Fix data race for cluerting compaction ( #36440 )
...
issue: #36438
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-09-28 17:19:21 +08:00
aoiasd
139787371e
feat: support embedding bm25 sparse vector and flush bm25 stats log ( #36036 )
...
relate: https://github.com/milvus-io/milvus/issues/35853
---------
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-09-19 10:57:12 +08:00
cai.zhang
8395c8a8db
enhance: Update stats task to optional ( #35947 )
...
issue: #33744
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-09-12 20:37:08 +08:00
smellthemoon
3f75bf1f20
fix: clustering compact not support null ( #36152 )
...
#36055
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-09-11 14:49:06 +08:00
XuanYang-cn
2687747278
fix: Set an empty segment if compaction deleted all inserts ( #36044 )
...
See also: #36038
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-09-09 14:23:05 +08:00
Chun Han
e480b103bd
feat: supporing hybrid search group_by ( #35982 )
...
related: #35096
Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-09-08 17:09:04 +08:00
SimFG
5247631289
fix: fill the metric type field in the LoadMetaInfo object ( #35962 )
...
- issue: #35960
Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-09-05 20:50:23 -07:00
cai.zhang
90bdb171ab
fix: Fix data race for clustering compaction writer ( #35957 )
...
issue: #35950
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-09-05 04:07:10 +08:00
yihao.dai
6fd33285e1
fix: Fix compile error ( #35901 )
...
/kind improvement
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-02 14:50:35 +08:00
cai.zhang
2c9bb4dfa3
feat: Support stats task to sort segment by PK ( #35054 )
...
issue: #33744
This PR includes the following changes:
1. Added a new task type to the task scheduler in datacoord: stats task,
which sorts segments by primary key.
2. Implemented segment sorting in indexnode.
3. Added a new field `FieldStatsLog` to SegmentInfo to store token index
information.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-09-02 14:19:03 +08:00
yihao.dai
1413ffe9b1
enhance: Rename preAllocatedSegments ( #35871 )
...
Rename `preAllocatedSegments` to `preAllocatedSegmentIDs` to avoid
confusion.
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-01 17:09:01 +08:00
XuanYang-cn
323400c190
enhance: Enable to write multiple segments in mix compactor ( #35705 )
...
Prevent segments to be written larger than maxSize * expansionRate
See also: #35584
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-08-30 11:29:01 +08:00
congqixia
ab532ae199
enhance: Add back BF lazy load logic for datanode watch channel ( #35646 )
...
Add back lazy loading statslog when watch dml channel on datanode.
Related to #22994 #27675
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-22 19:42:57 +08:00
Ted Xu
41646c8439
feat: integrate new deltalog format ( #35522 )
...
See #34123
---------
Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-08-20 19:06:56 +08:00
XuanYang-cn
967f38672a
enhance: Add integration tests for l0 ( #35429 )
...
See also: #34796
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-08-19 10:56:54 +08:00
cai.zhang
1bbf7a3c0e
enhance: Optimize the use of locks and avoid double flush clustering buffer writer ( #35486 )
...
issue: #35436
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-08-16 02:24:58 +08:00
cai.zhang
196b343a94
fix: Fix data race for clustering compaction ( #35435 )
...
issue: #35436
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-08-13 17:10:20 +08:00
cai.zhang
aaab827a16
fix: Fix the issue of missing stats log after clustering compaction ( #35266 )
...
issue: #35265
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-08-08 14:24:17 +08:00
yihao.dai
a4439cc911
enhance: Implement flusher in streamingNode ( #34942 )
...
- Implement flusher to:
- Manage the pipelines (creation, deletion, etc.)
- Manage the segment write buffer
- Manage sync operation (including receive flushMsg and execute flush)
- Add a new `GetChannelRecoveryInfo` RPC in DataCoord.
- Reorganize packages: `flushcommon` and `datanode`.
issue: https://github.com/milvus-io/milvus/issues/33285
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-02 18:30:23 +08:00
wayblink
95462668ca
enhance: unify time in clustering compaction task to unix ( #35167 )
...
#34495
Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-08-02 10:30:19 +08:00
zhenshan.cao
aa247f192d
enhance: remove unused code for StorageV2 ( #35132 )
...
issue: https://github.com/milvus-io/milvus/issues/34168
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-08-01 12:08:13 +08:00
cai.zhang
9412002d7d
fix: Fix data race for clustering buffer writer ( #35145 )
...
issue: #34495
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-08-01 11:20:13 +08:00