Commit Graph

129 Commits (master)

Author SHA1 Message Date
Zhen Ye f5cee0012a
fix: remove panic for message type in recovery storage and marshal log (#43976)
issue: #43897

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-21 14:23:47 +08:00
Zhen Ye a86b6f2a54
enhance: extend the stats manage at streaming shard manager for L0 (#43371)
issue: #42416

- Rename the InsertMetric into ModifiedMetric.
- Add L0 control configuration.
- Add some L0 current state collect.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-18 20:41:46 +08:00
Zhen Ye 7b005c48bf
enhance: support util template generation for messages (#43881)
issue: #43880

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-18 01:19:44 +08:00
congqixia f032044125
enhance: Refine segcore param change callback (#43838)
Related to #43230

This PR
- Move segcore setup function to `initcore` package to remove cgo
dependency from pkg
- Register core callback only for components depends on segcore
- Rectify `UpdateLogLevel` implementation

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-08-13 19:31:44 +08:00
Zhen Ye 8ff118a9ff
fix: call IntoMessageProto instead of Payload when rpc (#43678)
issue: #43677

Signed-off-by: chyezh <chyezh@outlook.com>
2025-08-06 14:45:40 +08:00
Zhen Ye 3e3775fb81
fix: panics when describe collection internal failure (#43630)
issue: #43629

- also fix the scanner_switchable panic underlying wal scanner return
context error.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-29 20:33:36 +08:00
Zhen Ye 070aabd27e
enhance: fix remove flushing state of segment (#43560)
issue: #43559, #42884

- also fix the data lost when streaming resuming from old arch message.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-25 18:08:54 +08:00
Zhen Ye e9ab73e93d
enhance: add schema version at recovery storage (#43500)
issue: #43072, #43289

- manage the schema version at recovery storage.
- update the schema when creating collection or alter schema.
- get schema at write buffer based on version.
- recover the schema when upgrading from 2.5.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-23 21:38:54 +08:00
Zhen Ye b142589942
enhance: support all partitions in shard manager for L0 segment (#43385)
issue: #42416

- change the key from partitionID into PartitionUniqueKey to support
AllPartitionsID

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-18 11:40:51 +08:00
Zhen Ye 07fa2cbdd3
enhance: wal balance consider the wal status on streamingnode (#43265)
issue: #42995

- don't balance the wal if the producing-consuming lag is too long.
- don't balance if the rebalance is set as false.
- don't balance if the wal is balanced recently.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-18 11:10:51 +08:00
Zhen Ye ffc8c0730c
fix: wrong metric for sn timetick (#43312)
issue: #43266

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-14 20:40:50 +08:00
Zhen Ye 15a6631147
enhance: add quota limit based on sn consuming lag (#43105)
issue: #42995

- The consuming lag at streaming node will be reported to coordinator.
- The consuming lag will trigger the write limit and deny by quota
center.
- Set the ttProtection by default.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-11 14:10:49 +08:00
Zhen Ye f598ca2b4e
fix: block at msgpack adaptor and wrong metrics (#43235)
issue: #43018

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-11 10:14:49 +08:00
Zhen Ye 490c5d5088
fix: lost message version after compatible message modification (#43217)
issue: #43018

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-10 10:36:48 +08:00
Zhen Ye 46b6f1b9e2
fix: panic when logging a old message should be skipped (#43076)
issue: #43074

- fix: panic when logging a old message should be skipped, #43074 
- fix: make the ack of broadcaster idompotent, #43026
- fix: lost dropping collection when upgrading, #43092
- fix: panic when DropPartition happen after DropCollection, #43027,
#43078

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-04 16:04:44 +08:00
Zhen Ye e97e44d56e
enhance: limit the gc concurrency when cpu is high (#43059)
issue: #42833

Signed-off-by: chyezh <chyezh@outlook.com>
2025-07-04 09:22:43 +08:00
cai.zhang f6b2a71c95
enhance: Remove chunkmanager-related dependencies from datanode (#43021)
issue: #41611

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2025-07-03 14:44:45 +08:00
Zhen Ye 8367e4ec6a
fix: set 72h for wal retention (#42910)
issue: #42706

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-27 17:36:43 +08:00
Zhen Ye 3602817c53
fix: dynamic log level for streaming node (#42964)
issue: #42963

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-26 19:12:50 +08:00
Zhen Ye 1f66b650e9
fix: pulsar cannot work properly if backlog exceed (#42653)
issue: #42649

- the sync operation of different pchannel is concurrent now.
- add a option to notify the backlog clear automatically.
- make pulsar walimpls can be recovered from backlog exceed.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-13 14:28:37 +08:00
yihao.dai e6da4a64b5
fix: Pre-check import message to prevent pipeline block indefinitely (#42415)
Pre-check import message to prevent pipeline block indefinitely.

issue: https://github.com/milvus-io/milvus/issues/42414

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-06-11 13:40:38 +08:00
Zhen Ye 0567f512b3
fix: streamingnode get stucked when stop (#42501)
issue: #42498

- fix: sealed segment cannot be flushed after upgrading
- fix: get mvcc panic when upgrading
- ignore the L0 segment when graceful stop of querynode.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-05 12:22:31 +08:00
Zhen Ye fc010e44a8
fix: release memory after pop from heap (#42482)
issue: #42481

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-04 10:00:32 +08:00
Zhen Ye e479467582
fix: panic when upgrading from old arch (#42422)
issue: #42405

- add delete rows into header when upsert.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-31 22:56:29 +08:00
Zhen Ye 66cc194ab2
enhance: add partition gc at streaming arch (#42179)
issue: #41976

- make drop partition message as a broadcast message.
- add gc when drop partition message is acked.
- add a call back to handle the broadcast message when ack.
- the ack operation of broadcast message will retry until success.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-29 23:20:30 +08:00
Zhen Ye 4bad293655
enhance: make upgrading from 2.5.x less down time (#42082)
issue: #40532

- start timeticksync at rootcoord if the streaming service is not
available
- stop timeticksync if the streaming service is available
- open a read-only wal if some nodes in cluster is not upgrading to 2.6
- allow to open read-write wal after all nodes in cluster is upgrading
to 2.6

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-29 23:02:29 +08:00
Zhen Ye b94cee2413
fix: growing segment from old arch is not flushed after upgrading (#42164)
issue: #42162

- enhance: add read ahead buffer size issue #42129
- fix: rocksmq consumer's close operation may get stucked
- fix: growing segment from old arch is not flushed after upgrading

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-29 23:00:28 +08:00
Zhen Ye 38c804fb01
fix: more stable recovery graceful closing and stable unittest (#42013)
issue: #41544

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-23 17:52:26 +08:00
congqixia 244aa30076
fix: Lock before reading flusher cp sampling truncate cp (#42019)
Related to #42018

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-05-22 21:38:28 +08:00
Zhen Ye c9b0748ff9
enhance: add delete rows into delete msg header and more metric (#41952)
issue: #41544

- add delete rows into delete messsage header
- add more insert/delete metrics
- fix non-broadcast message has broadcast header

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-22 20:28:26 +08:00
Zhen Ye 458ab86894
fix: stop retry if collection not found too much when get recovery from coord (#41980)
issue: #41966

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-22 16:22:24 +08:00
Zhen Ye 59ab274dbe
fix: use flusher and recovery checkpoint together to determine the truncate position (#41934)
issue: #41544

- unify the log field of message
- use the minimum one of flusher and recovery storage checkpoint as the
truncate position

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-20 16:10:24 +08:00
yihao.dai 65dd3982d8
fix: Fix ants.Pool goroutine leak (#41892)
1. Release the pool after it is no longer in use.
2. Upgrade ants.Pool to fix the goroutine leak issue (see [PR
#287](https://github.com/panjf2000/ants/pull/287)).

issue: https://github.com/milvus-io/milvus/issues/41838

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-05-19 17:56:22 +08:00
Zhen Ye 59dff668dc
enhance: schema change without manual flush (#41882)
issue: #39718

- remove the manual flush message from schema change operation
- add flush segment id handle into schema change processes

Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: congqixia <congqi.xia@zilliz.com>
2025-05-19 10:14:22 +08:00
Zhen Ye a3d5ad135e
fix: recover a dropped collection from wal if create collection message can be seen (#41902)
issue: #41654

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-17 07:38:21 +08:00
Zhen Ye d3fff1769e
fix: streaming node panic with when binary size is set as zero (#41879)
issue: #41853

- persist the estimated binary size for insert message into wal.
- add metric to record the total growing rows of channel.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-16 11:12:22 +08:00
Zhen Ye ae43230703
enhance: set jemalloc prof disable by default (#41850)
issue: #40730

- add assertion for insert message
- add more buffer for seal notifier

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-15 20:10:23 +08:00
Zhen Ye 0a465bb5b7
enhance: use recovery+shardmanager, remove segment assignment interceptor (#41824)
issue: #41544

- add lock interceptor into wal.
- use recovery and shardmanager to replace the original implementation
of segment assignment.
- remove redundant implementation and unittest.
- remove redundant proto definition.
- use 2 streamingnode in e2e.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-14 23:00:23 +08:00
Zhen Ye 21d6d1669e
fix: wal should be reopen if wal append receive the fence error (#41807)
issue: #41544

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-14 01:02:56 +08:00
Zhen Ye 7beafe99a7
enhance: implement wal garbage collector with truncate api (#41770)
issue: #41544

- add a truncator implementation into wal recovery storage.
- add metrics for recovery storage.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-13 22:08:56 +08:00
Zhen Ye 61b6ca5b73
enhance: add in mem shard manager (#41749)
issue: #41544

- Implement in-memory shard manager to maintain the shard state at write
ahead.
- Remove all rpc and meta operation at write ahead, make the segment
assignment logic only use wal and memory.
- Refactor global stats management, add node-level flush policy.
- Fix the recovery storage inconsistency bug when graceful close.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-13 12:04:56 +08:00
Zhen Ye e675da76e4
enhance: simplify the proto message, make segment assignment code more clean (#41671)
issue: #41544

- simplify the proto message for flush and create segment.
- simplify the msg handler for flowgraph.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-11 20:49:00 +08:00
Zhen Ye 452d6fb709
fix: write buffer leak if the wal flusher is cancelled when recovery (#41719)
issue: #41715

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-10 09:32:56 +08:00
Zhen Ye 3dd9a1147b
enhance: add lock interceptor and recoverable txn manager (#41640)
issue: #41544

- add a lock interceptor at vchannel granularity.
- make txn manager recoverable and add FailTxnAtVChannel operation.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-09 11:14:53 +08:00
Zhen Ye de8f0af20d
enhance: use dispatcher at delegator when enable streaming (#41266)
issue: #38399

- add an adaptor type to adapt the streaming service client and
msgstream client to reuse the msgdispatcher.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-05-06 01:12:53 +08:00
Zhen Ye dfbb02a5f7
enhance: make streaming message as a log field for easier coding (#41545)
issue: #41544

- implement message can be logged as a field by zap.
- fix too many slow log for woodpecker.

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-28 14:38:42 +08:00
Zhen Ye 6a15790799
enhance: add interface for message and fix write ahead buffer (#41470)
issue: #41439

- add IsPersisted and VChannel interface for message
- add WithNotPersisted() for message builder
- fix the persisted time tick lost at write ahead buffer

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-27 10:24:38 +08:00
Zhen Ye a3d621cb5e
fix: remove the concurrent limits for streaming service (#41484)
issue: #41479

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-24 20:36:38 +08:00
Zhen Ye ecfc868dcb
fix: write buffer not unregistered when datasyncservice is gone (#41496)
issue: #41495

Signed-off-by: chyezh <chyezh@outlook.com>
2025-04-24 19:38:38 +08:00
congqixia b36c88f3c8
enhance: [AddField] Broadcast schema change via WAL (#41373)
Related to #39718

Add Broadcast logic for collection schema change and notifies:
- Streamnode - Delegator
- Streamnode - Flush component
- QueryNodes via grpc

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-04-22 16:28:37 +08:00