issue: #47178
This commit introduces a rate limiting mechanism for Write-Ahead Logging
(WAL) operations to prevent overload during high traffic. Key changes
include:
- Added `RateLimitObserver` to monitor and control the rate of DML
operations.
- Add Adaptive RateLimitController to apply the strategy of rate limit.
- WAL will slow down if the recovery-storage works on catchup mode or
node memory is high.
- Updated `WAL` and related components to handle rate limit states,
including rejection and slowdown.
- Introduced new error codes for rate limit rejection in the streaming
error handling.
- Enhanced tests to cover the new rate limiting functionality.
These changes aim to improve the stability and performance of the
streaming service under load.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Related to #46199
## Summary
Remove 5 unused or misused Go dependencies to reduce module bloat and
consolidate overlapping libraries:
- **`mgutz/ansi`** → replaced with inline ANSI escape codes (only used
for 3 color constants in migration console)
- **`valyala/fastjson`** → replaced with `tidwall/gjson` (only 1 file
used fastjson; gjson is already used in 22+ files)
- **`google.golang.org/grpc/examples`** → replaced with existing
`rootcoordpb` (test file pulled in entire grpc examples repo for a mock
server)
- **`remeh/sizedwaitgroup`** → replaced with `chan` semaphore +
`sync.WaitGroup` (only 2 files, trivial pattern)
- **`pkg/errors`** → replaced with `cockroachdb/errors` (the project
standard; `pkg/errors` was used in 1 file)
## Behavior change: DeleteLog.Parse() fail-fast on missing fields
The `fastjson` → `gjson` migration adds explicit `Exists()` validation
for `ts`, `pk`, and `pkType` fields in the JSON parsing branch.
Previously, both fastjson and gjson would silently return zero values
for missing fields, causing `dl.Pk` to remain nil and panicking
downstream. The new code fails fast with a descriptive error at parse
time. This is a defensive improvement (the original code had identical
silent-failure behavior).
## Performance impact
| Change | Path type | Perf delta | Matters? |
|--------|-----------|------------|----------|
| `pkg/errors` → `cockroachdb/errors` | Cold (offline CLI tool
`config-docs-generator`) | Negligible | No |
| `mgutz/ansi` → inline ANSI codes | Cold (offline CLI tool
`migration/console`) | Marginally faster (eliminates map lookup) | No |
| `fastjson` → `gjson` (`DeleteLog.Parse`) | Warm — old-format deltalog
deserialization only | **~2.5x slower** per JSON parse (143ns→361ns) |
**No** — see below |
| `grpc/examples` → `rootcoordpb` | Test only (`client_test.go`) | None
| No |
| `sizedwaitgroup` → chan+WaitGroup | Test only (`wal_test.go`,
`test_framework.go`) | None | No |
### fastjson → gjson regression detail
`DeleteLog.Parse()` is called per-row during deltalog deserialization,
but **only for the legacy single-field format**. The new multi-field
parquet format (`newDeltalogMultiFieldReader`) reads pk/ts as separate
Arrow columns and bypasses `Parse()` entirely. Legacy deltalogs are
rewritten to parquet format during compaction, so this is a dying code
path. Additionally, deltalog loading is I/O-bound — the JSON parse cost
(~361ns/row) is negligible compared to disk read and Arrow
deserialization overhead.
Benchmark (Go 1.24, arm64):
```
BenchmarkFastjsonSmall-4 8,315,624 143.1 ns/op 0 B/op 0 allocs/op
BenchmarkGjsonOptimized-4 3,321,613 361.4 ns/op 96 B/op 1 allocs/op
```
## Test plan
- [x] CI build passes
- [x] CI code-check passes
- [ ] CI ut-go passes
- [ ] CI e2e passes
- [x] Boundary test cases added (bare number, missing pkType/ts/pk)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Signed-off-by: Li Liu <li.liu@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
issue: #47647
Refactor the cluster-level broadcast mechanism to decouple the message
package from the channel registration lifecycle:
- Replace internal provider pattern with opaque ClusterChannels type
passed externally to WithClusterLevelBroadcast()
- Add channel package singleton (syncutil.Future) exposing
GetClusterChannels() and GetPChannelNames() blocking accessors
- Add PChannel() interface to MutableMessage/ImmutableMessage for
deriving physical channel from virtual channel
- Validate non-control-channel entries are physical channels using
funcutil.IsPhysicalChannel and use funcutil.IsOnPhysicalChannel for
control channel matching
- Move control channel substitution logic into WithClusterLevelBroadcast
to simplify callers (datacoord, coordinator, assignment service)
- Add lock interceptor unit tests and cluster broadcast test coverage
- Add integration test for FlushAll with streaming node restart to
verify data integrity across node lifecycle
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
design doc:
https://github.com/milvus-io/milvus-design-docs/blob/main/design_docs/20260105-external_table.md
issue: #45881
This change introduces manual refresh capability for external
collections, allowing users to trigger on-demand data synchronization
from external sources. It replaces the legacy update mechanism with a
more robust job-task hierarchy and persistent state management.
Key changes:
- Add RefreshExternalCollection, GetRefreshExternalCollectionProgress,
and ListRefreshExternalCollectionJobs APIs across Client, Proxy,
and DataCoord
- Implement ExternalCollectionRefreshManager to manage refresh jobs
with a 1:N Job-Task hierarchy
- Add ExternalCollectionRefreshMeta for persistent storage of jobs and
tasks in the metastore
- Add ExternalCollectionRefreshChecker for task state management and
worker assignment
- Implement ExternalCollectionRefreshInspector for periodic job
cleanup
- Use WAL Broadcast mechanism for distributed consistency and
idempotency
- Replace legacy external_collection_inspector and update tasks with
the new refresh-based implementation
- Add comprehensive unit tests for refresh job lifecycle and state
transitions
design doc:
https://github.com/milvus-io/milvus-design-docs/blob/main/design_docs/20260105-external_table.md
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Related to #46358
Add a new BatchUpdateManifest API that allows updating manifest versions
for multiple segments in a single request. The update is broadcast via
the streaming WAL to ensure consistency across the cluster. This
includes the proto definitions, proxy task, datacoord service handler,
meta operator, streaming message type registration, and associated
tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
when default key is provided, all the new database will be encrypted by
default and use the
default key as the key.
See also: #40013
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
issue: #44726
Introduce an immutable option to prevent accidental modification of
critical configurations.
Support switching of WAL implementation.
Note: This PR depends on [milvus-proto PR
#503](https://github.com/milvus-io/milvus-proto/pull/503) being merged
first.
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
issue: #44358
Implement complete snapshot management system including creation,
deletion, listing, description, and restoration capabilities across all
system components.
Key features:
- Create snapshots for entire collections
- Drop snapshots by name with proper cleanup
- List snapshots with collection filtering
- Describe snapshot details and metadata
Components added/modified:
- Client SDK with full snapshot API support and options
- DataCoord snapshot service with metadata management
- Proxy layer with task-based snapshot operations
- Protocol buffer definitions for snapshot RPCs
- Comprehensive unit tests with mockey framework
- Integration tests for end-to-end validation
Technical implementation:
- Snapshot metadata storage in etcd with proper indexing
- File-based snapshot data persistence in object storage
- Garbage collection integration for snapshot cleanup
- Error handling and validation across all operations
- Thread-safe operations with proper locking mechanisms
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant/assumption: snapshots are immutable point‑in‑time
captures identified by (collection, snapshot name/ID); etcd snapshot
metadata is authoritative for lifecycle (PENDING → COMMITTED → DELETING)
and per‑segment manifests live in object storage (Avro / StorageV2). GC
and restore logic must see snapshotRefIndex loaded
(snapshotMeta.IsRefIndexLoaded) before reclaiming or relying on
segment/index files.
- New capability added: full end‑to‑end snapshot subsystem — client SDK
APIs (Create/Drop/List/Describe/Restore + restore job queries),
DataCoord SnapshotWriter/Reader (Avro + StorageV2 manifests),
snapshotMeta in meta, SnapshotManager orchestration
(create/drop/describe/list/restore), copy‑segment restore
tasks/inspector/checker, proxy & RPC surface, GC integration, and
docs/tests — enabling point‑in‑time collection snapshots persisted to
object storage and restorations orchestrated across components.
- Logic removed/simplified and why: duplicated recursive
compaction/delta‑log traversal and ad‑hoc lookup code were consolidated
behind two focused APIs/owners (Handler.GetDeltaLogFromCompactTo for
delta traversal and SnapshotManager/SnapshotReader for snapshot I/O).
MixCoord/coordinator broker paths were converted to thin RPC proxies.
This eliminates multiple implementations of the same traversal/lookup,
reducing divergence and simplifying responsibility boundaries.
- Why this does NOT introduce data loss or regressions: snapshot
create/drop use explicit two‑phase semantics (PENDING → COMMIT/DELETING)
with SnapshotWriter writing manifests and metadata before commit; GC
uses snapshotRefIndex guards and
IsRefIndexLoaded/GetSnapshotBySegment/GetSnapshotByIndex checks to avoid
removing referenced files; restore flow pre‑allocates job IDs, validates
resources (partitions/indexes), performs rollback on failure
(rollbackRestoreSnapshot), and converts/updates segment/index metadata
only after successful copy tasks. Extensive unit and integration tests
exercise pending/deleting/GC/restore/error paths to ensure idempotence
and protection against premature deletion.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #46500
- simplify the run_go_codecov.sh to make sure the set -e to protect any
sub command failure.
- remove all embed etcd in test to make full test can be run at local.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## PR Summary: Simplify Go Unit Tests by Removing Embedded etcd and
Async Startup Scaffolding
**Core Invariant:**
This PR assumes that unit tests can be simplified by running without
embedded etcd servers (delegating to environment-based or external etcd
instances via `kvfactory.GetEtcdAndPath()` or `ETCD_ENDPOINTS`) and by
removing goroutine-based async startup scaffolding in favor of
synchronous component initialization. Tests remain functionally
equivalent while becoming simpler to run and debug locally.
**What is Removed or Simplified:**
1. **Embedded etcd test infrastructure deleted**: Removes
`EmbedEtcdUtil` type and its public methods (SetupEtcd,
TearDownEmbedEtcd) from `pkg/util/testutils/embed_etcd.go`, removes the
`StartTestEmbedEtcdServer()` helper from `pkg/util/etcd/etcd_util.go`,
and removes etcd embedding from test suites (e.g., `TaskSuite`,
`EtcdSourceSuite`, `mixcoord/client_test.go`). Tests now either skip
etcd-dependent tests (via `MILVUS_UT_WITHOUT_KAFKA=1` environment flag
in `kafka_test.go`) or source etcd from external configuration (via
`kvfactory.GetEtcdAndPath()` in `task_test.go`, or `ETCD_ENDPOINTS`
environment variable in `etcd_source_test.go`). This eliminates the
overhead of spinning up temporary etcd servers for unit tests.
2. **Async startup scaffolding replaced with synchronous
initialization**: In `internal/proxy/proxy_test.go` and
`proxy_rpc_test.go`, the `startGrpc()` method signature removes the
`sync.WaitGroup` parameter; components are now created, prepared, and
run synchronously in-place rather than in goroutines (e.g., `go
testServer.startGrpc(ctx, &p)` becomes `testServer.startGrpc(ctx, &p)`
running synchronously). Readiness checks (e.g., `waitForGrpcReady()`)
remain in place to ensure startup safety without concurrency constructs.
This simplifies control flow and reduces debugging complexity.
3. **Shell script orchestration unified with proper error handling**: In
`scripts/run_go_codecov.sh` and `scripts/run_intergration_test.sh`,
per-package inline test invocations are consolidated into a single
`test_cmd()` function with unified `TEST_CMD_WITH_ARGS` array containing
race, coverage, verbose, and other flags. The problematic `set -ex` is
replaced with `set -e` alone (removing debug output noise while
preserving strict error semantics), ensuring the scripts fail fast on
any command failure.
**Why No Regression:**
- Test assertions and code paths remain unchanged; only deployment
source of etcd (embedded → external) and startup orchestration (async →
sync) change.
- Readiness verification (e.g., `waitForGrpcReady()`) is retained,
ensuring components are initialized before test execution.
- Test flags (race detection, coverage, verbosity) are uniformly applied
across all packages via unified `TEST_CMD_WITH_ARGS`, preserving test
coverage and quality.
- `set -e` alone is sufficient for strict failure detection without the
`-x` flag's verbose output.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #45841
- CPP log make the multi log line in one debug, remove the "\n\t".
- remove some log that make no sense.
- slow down some log like ChannelDistManager.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: logging is purely observational — this PR only
reduces, consolidates, or reformats diagnostic output (removing
per-item/noise logs, consolidating batched logs, and converting
multi-line log strings) while preserving all control flow, return
values, and state mutations across affected code paths.
- Removed / simplified logic: deleted low-value per-operation debug/info
logs (e.g., ListIndexes, GetRecoveryInfo, GcConfirm,
push-to-reorder-buffer, several streaming/wal/debug traces), replaced
per-item inline logs with single batched deferred logs in
querynodev2/delegator (logExcludeInfo) and CleanInvalid, changed C++
PlanNode ToString() multi-line output to compact single-line bracketed
format (removed "\n\t"), and added thresholded interceptor logging
(InterceptorMetrics.ShouldBeLogged) and message-type-driven log levels
to avoid verbose entries.
- Why this does NOT cause data loss or behavioral regression: no
function signatures, branching, state updates, persistence calls, or
return values were changed — examples: ListIndexes still returns the
same Status/IndexInfos; GcConfirm still constructs and returns
resp.GetGcFinished(); Insert and CleanInvalid still perform the same
insert/removal operations (only their per-item logging was aggregated);
PlanNode ToString changes only affect emitted debug strings. All error
handling and control flow paths remain intact.
- Enhancement intent: reduce log volume and improve signal-to-noise for
debugging by removing redundant, noisy logs and emitting concise,
rate-/threshold-limited summaries while preserving necessary diagnostics
and original program behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Introduce a ScannerStartupDelay configuration to enable WAL write-only
recovery, allowing fence messages to be persisted during
primary–secondary switchover when the StreamingNode is trapped in crash
loops.
issue: https://github.com/milvus-io/milvus/issues/46368
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added a configurable WAL scanner pause/resume and a consumer request
flag to optionally ignore pause signals.
* **Metrics**
* Added a scanner pause gauge and pause-duration tracking for WAL
scanning.
* **Tests**
* Added coverage for pause-consumption behavior and cleanup in stream
client tests.
* **Chores**
* Consolidated flush-all logging into a single field and added a helper
for bulk message conversion.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
### **User description**
issue: #46507
we use the assign/unassign api to manage the consumer manually, the
commit operation will generate a new consumer group which is not what we
want. so we disable the auto commit to avoid it, also see:
https://github.com/confluentinc/confluent-kafka-python/issues/250#issuecomment-331377925
___
### **PR Type**
Bug fix
___
### **Description**
- Disable auto-commit in Kafka consumer configuration
- Prevents unwanted consumer group creation from manual offset
management
- Clarifies offset reset behavior with explanatory comments
___
### Diagram Walkthrough
```mermaid
flowchart LR
A["Kafka Consumer Config"] --> B["Set enable.auto.commit to false"]
B --> C["Prevent auto consumer group creation"]
A --> D["Set auto.offset.reset to earliest"]
D --> E["Handle deleted offsets gracefully"]
```
<details><summary><h3>File Walkthrough</h3></summary>
<table><thead><tr><th></th><th align="left">Relevant
files</th></tr></thead><tbody><tr><td><strong>Bug
fix</strong></td><td><table>
<tr>
<td>
<details>
<summary><strong>builder.go</strong><dd><code>Disable auto-commit and
add configuration comments</code>
</dd></summary>
<hr>
pkg/streaming/walimpls/impls/kafka/builder.go
<ul><li>Added <code>enable.auto.commit</code> configuration set to
<code>false</code> to prevent <br>automatic consumer group creation<br>
<li> Added explanatory comments for both <code>auto.offset.reset</code>
and <br><code>enable.auto.commit</code> settings<br> <li> Clarifies that
manual assign/unassign API is used for consumer <br>management</ul>
</details>
</td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46508/files#diff-4b5635821fdc8b585d16c02d8a3b59079d8e667b2be43a073265112d72701add">+7/-0</a>
</td>
</tr>
</table></td></tr></tbody></table>
</details>
___
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Bug Fixes
* Kafka consumer now reads from the earliest available messages and
auto-commit has been disabled to support manual offset management.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #43897
also for issue: #46166
add ack_sync_up flag into broadcast message header, which indicates that
whether the broadcast operation is need to be synced up between the
streaming node and the coordinator.
If the ack_sync_up is false, the broadcast operation will be acked once
the recovery storage see the message at current vchannel, the fast ack
operation can be applied to speed up the broadcast operation.
If the ack_sync_up is true, the broadcast operation will be acked after
the checkpoint of current vchannel reach current message.
The fast ack operation can not be applied to speed up the broadcast
operation, because the ack operation need to be synced up with streaming
node.
e.g. if truncate collection operation want to call ack once callback
after the all segment are flushed at current vchannel, it should set the
ack_sync_up to be true.
TODO: current implementation doesn't promise the ack sync up semantic,
it only promise FastAck operation will not be applied, wait for 3.0 to
implement the ack sync up semantic. only for truncate api now.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
This PR:
1. Define and implement the new FlushAllMessage.
2. Refactor FlushAll to flush the entire cluster.
issue: https://github.com/milvus-io/milvus/issues/45919
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #44369
woodpecker related[ issue:
#59](https://github.com/zilliztech/woodpecker/issues/59)
Refactor the WAL retention logic in Milvus StreamingNode:
- Remove the simple sampling-based truncation mechanism.
- After flush, WAL data is directly truncated.
- The retention control is now delegated to the underlying message queue
(MQ) implementation.
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
issue: #45452
- alias/rename related DDL should use database level exclusive lock
- alias cannot use as the resource key of lock, use collection name
instead
- transfer replica should use WAL-based framework
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #45403, #45463
- fix the Nightly E2E failures.
- fix the wrong update timetick of altering collection to fix the
related load failure.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #44172, #45210, #44851
kafka will auto reset the offset to "latest" if the offset is
Out-of-range. the recovery of milvus wal cannot read any message from
that. So once the offset is out-of-range, kafka should read from eariest
to read the latest uncleared data.
https://kafka.apache.org/documentation/#consumerconfigs_auto.offset.reset
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #43897
- Load/Release collection/partition is implemented by WAL-based DDL
framework now.
- Support AlterLoadConfig/DropLoadConfig in wal now.
- Load/Release operation can be synced by new CDC now.
- Refactor some UT for load/release DDL.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #43897
- Part of collection/index related DDL is implemented by WAL-based DDL
framework now.
- Support following message type in wal, CreateCollection,
DropCollection, CreatePartition, DropPartition, CreateIndex, AlterIndex,
DropIndex.
- Part of collection/index related DDL can be synced by new CDC now.
- Refactor some UT for collection/index DDL.
- Add Tombstone scheduler to manage the tombstone GC for collection or
partition meta.
- Move the vchannel allocation into streaming pchannel manager.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #45088, #45086
- Message on control channel should trigger the checkpoint update.
- LastConfrimedMessageID should be recovered from the minimum of
checkpoint or the LastConfirmedMessageID of uncommitted txn.
- Add more log info for wal debugging.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #44697, #44696
- The DDL executing order of secondary keep same with order of control
channel timetick now.
- filtering the control channel operation on shard manager of
streamingnode to avoid wrong vchannel of create segment.
- fix that the immutable txn message lost replicate header.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #43897
- UpdateReplicateConfig operation will broadcast AlterReplicateConfig
message into all pchannels with cluster-exclusive-lock.
- Begin txn message will use commit message timetick now (to avoid
timetick rollback when CDC with txn message).
- If current cluster is secondary, the UpdateReplicateConfig will wait
until the replicate configuration is consistent with the config
replicated from primary.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #43897
- Return LastConfirmedMessageID when wal append operation.
- Add resource-key-based locker for broadcast-ack operation to protect
the coord state when executing ddl.
- Resource-key-based locker is held until the broadcast operation is
acked.
- ResourceKey support shared and exclusive lock.
- Add FastAck execute ack right away after the broadcast done to speed
up ddl.
- Ack callback will support broadcast message result now.
- Add tombstone for broadcaster to avoid to repeatedly commit DDL and
ABA issue.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #44123
- support replicate message in wal of milvus.
- support CDC-replicate recovery from wal.
- fix some CDC replicator bugs
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #43897
- add ddl messages proto and add some message utilities.
- support shard/exclusive resource-key-lock.
- add all ddl callbacks future into broadcast registry.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #43897
- The original broadcast ack operation need to recover message from
etcd, which can not support cdc.
- immutable message will set as the ack parameter to fix it.
Signed-off-by: chyezh <chyezh@outlook.com>
1. Enable Milvus to read cipher configs
2. Enable cipher plugin in binlog reader and writer
3. Add a testCipher for unittests
4. Support pooling for datanode
5. Add encryption in storagev2
See also: #40321
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
issue: #43785
- pulsar client will print log into milvus logger now.
- pulsar client open the metric by default.
- upgrade the pulsar client to v0.15.1, and use offical repo.
- the fixing of milvus-io/pulsar-client-go is already covered by
official v0.15.1.
Signed-off-by: chyezh <chyezh@outlook.com>