Commit Graph

1766 Commits (17532517c611cafe5ec7a79bda47c9f296e82682)

Author SHA1 Message Date
sijie-ni-0214 98934d3212
enhance: make thread pool max threads size configurable (#48379)
## Summary

- Replace the hardcoded thread pool max threads limit (16) with a
configurable parameter `common.threadCoreCoefficient.maxThreadsSize`,
default 16, only effective when greater than 0
- Add missing Resize watchers for middle and low priority thread pools
- When `maxThreadsSize` changes dynamically, update the limit first then
resize all pools to ensure correct ordering

issue: #48378

Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>
2026-03-24 06:07:29 +08:00
Chun Han cc9c60eeaf
fix: HashTable dynamic rehash and group count limit for GROUP BY aggregation (#48174)
## Summary
- HashTable now dynamically rehashes (doubles capacity) when load factor
exceeds 7/8, fixing crash when GROUP BY cardinality > ~1792
- Added configurable `queryNode.segcore.maxGroupByGroups` (default 100K)
to cap total groups and prevent OOM on both C++ (per-segment HashTable)
and Go (cross-segment agg reducer) layers
- Added 4 C++ unit tests covering rehash basic/correctness, max groups
limit, and multiple rehash rounds

issue: #47569

## Test plan
- [ ] C++ unit tests: `--gtest_filter="*HashTableRehash*:*MaxGroups*"`
- [ ] E2E: GROUP BY aggregation with >2K unique values should succeed
- [ ] E2E: Set `queryNode.segcore.maxGroupByGroups` to small value,
verify clear error message

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 18:21:29 +08:00
tinswzy aed7c8bcfb
enhance: update WP version v0.1.25 (#45011)
#43638
update wp to latest 
Introduce the beta version of the wp service mode.

---------

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2026-03-23 05:57:30 +08:00
yihao.dai 89bd75fab7
fix: apply denylist retry to pack_writer writeLog and binlog import (#48402)
### Summary
Follow-up to #48152 which applied denylist retry to parquet/json/csv
imports but missed two other paths.

- **fix(High)**: `pack_writer.go` `writeLog` now skips retry only for
non-retryable errors (permission denied, bucket not found, invalid
credentials, etc.), matching the denylist strategy in
`retryable_reader.go`.
- **fix(Medium)**: Binlog import's `WithDownloader` callbacks now use
`multiReadWithRetry`, skipping retry only for non-retryable errors.
Previously all transient failures were not retried.
- **fix(Low)**: `IsMilvusError` in `merr/utils.go` switched from
`errors.Cause` (root only) to `errors.As` (full chain traversal).

### Out of Scope
- `pack_writer_v2.go` / `pack_writer_v3.go` — same retry pattern but
different code path (multi-part upload); separate fix.
- `writeDelta` — no retry wrapper; separate concern.

issue: #48153

---------

Signed-off-by: Yihao Dai <yihao.dai@zilliz.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 18:25:27 +08:00
yihao.dai cc792486a8
feat: Support force promote for primary-secondary failover (#47352)
Add force_promote flag to UpdateReplicateConfiguration API for disaster
recovery.

Changes:

- Add ForcePromote field to UpdateReplicateConfigurationRequest

- Refactor UpdateReplicateConfiguration to accept request object instead
of separate params

- Add WithForcePromote() method to ReplicateConfigurationBuilder

- Implement force promote validation and handling in assignment service

- Add integration tests for force promote scenarios

Force promote allows a secondary cluster to immediately become
standalone primary

when the original primary is unavailable, enabling active-passive
failover.

issue: https://github.com/milvus-io/milvus/issues/47351

design doc:
https://github.com/milvus-io/milvus-design-docs/blob/main/design_docs/20260202-force_promote_failover.md

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Signed-off-by: Yihao Dai <yihao.dai@zilliz.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-20 16:09:28 +08:00
Zhen Ye 48ba5fbfcd
fix: use sliding window for old version message lastConfirmedMessageID to prevent long catchup (#48390)
issue: #48389

Previously, all old version (v0) WAL messages shared the same
lastConfirmedMessageID pointing to the very first v0 message. When a
tailing scanner fell back to catchup mode (e.g., due to WAL ownership
change), it would restart from this extremely old position, causing
catchup times of 14+ minutes during which tsafe could not advance and
all search requests would timeout.

This change replaces the fixed first-message ID with a configurable
sliding window (default size 30). The lastConfirmedMessageID now points
to the message N positions back, bounding the WAL replay distance on
fallback to at most N messages.

Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:20:34 +08:00
sthuang c54f2e47a1
fix: split QueryCoord executor into channel/non-channel pools to prev… (#48309)
…ent deadlock

related: #48308 

When hundreds of channels need rebalancing (e.g. during upgrade),
channel tasks could fill the entire per-node executor pool, blocking
segment/leader tasks and causing a deadlock. This fix splits the single
pool into two independent pools:
- Channel task pool: controlled by channelTaskCapFraction (default 0.1)
- Non-channel task pool: remainder of total capacity

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2026-03-19 14:05:27 +08:00
congqixia e4a3b02d95
enhance: add all-replicas-ready checkpoint metric for query coord (#48305)
The existing current_target_checkpoint_unix_seconds metric advances as
soon as any single replica is ready per channel, which does not reflect
the true minimum state of the cluster in multi-replica mode.

Add a new metric current_target_all_replicas_checkpoint_unix_seconds
that only updates when all replicas have at least one ready delegator
for a given channel, correctly reflecting the cluster-wide minimum
checkpoint state.

issue: #48304

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 12:19:27 +08:00
Zhen Ye bd6054a2fd
enhance: improve replica and node assignment stability during scale-up/down (#48275)
issue: #48239

Problem:
During replica scale-up/down, RG name changes cause cross-RG node
migration. The async resource_observer uses non-deterministic map
iteration to select which nodes go to which RG, breaking the original
node-to-replica mapping. This can leave a newly created replica without
QN nodes, causing query unavailability when the old replicas are
released.

Root cause:
1. ReassignReplicaToRG iterates maps non-deterministically, so the
existing replica may be transferred to any RG rather than the
lexicographically smallest one.
2. When UpdateResourceGroups zeroes an RG and creates new ones, nodes
transit through __recycle_resource_group with non-deterministic
selection at each hop, scrambling the original node assignment.

Fix:
1. Sort RG iteration in ReassignReplicaToRG and resource_observer to
ensure deterministic replica-to-RG mapping: existing replica goes to
lex-smallest RG during scale-up, lex-smallest RG's replica is preserved
during scale-down.
2. Add transferNodesOnRGSwap in updateResourceGroups: when a batch
update zeroes an RG (old request=N,limit=N -> new 0,0) and another RG
takes the same config (new request=N,limit=N), directly swap their node
lists before persisting, bypassing the async observer entirely.

Extra fix:
- Skip DQL forwarding to legacy proxy in cluster mode to avoid
legacy-proxy overloading. The API proto GetShardLeader is only lost in
standalone mode.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 09:47:30 +08:00
Zhen Ye 446f06eb02
enhance: Implement rate limiting in WAL append operations (#47179)
issue: #47178

This commit introduces a rate limiting mechanism for Write-Ahead Logging
(WAL) operations to prevent overload during high traffic. Key changes
include:

- Added `RateLimitObserver` to monitor and control the rate of DML
operations.
- Add Adaptive RateLimitController to apply the strategy of rate limit.
- WAL will slow down if the recovery-storage works on catchup mode or
node memory is high.
- Updated `WAL` and related components to handle rate limit states,
including rejection and slowdown.
- Introduced new error codes for rate limit rejection in the streaming
error handling.
- Enhanced tests to cover the new rate limiting functionality.

These changes aim to improve the stability and performance of the
streaming service under load.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 16:19:26 +08:00
tinswzy c7cea10912
fix: disable ConditionWrite when using AK/SK on Aliyun OSS (#48310)
issue: #43638
wp related https://github.com/zilliztech/woodpecker/issues/115

Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
2026-03-18 12:39:26 +08:00
yihao.dai 0bfcae0cfb
enhance: switch import retry from allowlist to denylist strategy (#48152)
### Summary
Switches Milvus import operations from allowlist (only retry specific
errors) to denylist strategy (retry all errors except
permanent/validation ones). This improves reliability by automatically
retrying transient failures while still failing fast on permanent
errors.

### Changes
1. **New error types** (pkg/util/merr/errors.go):
- Added 6 new non-retryable error types for permanent and validation
errors
   - ErrIoPermissionDenied, ErrIoBucketNotFound, ErrIoInvalidCredentials
   - ErrIoInvalidArgument, ErrIoInvalidRange, ErrIoEntityTooLarge

2. **Denylist function** (pkg/util/merr/utils.go):
- Implemented IsNonRetryableErr() to identify errors that should NOT be
retried
   - Checks for permanent errors (permission, credentials, not found)
   - Checks for client validation errors (invalid argument, range, size)

3. **Cloud provider error mapping**
(internal/storage/remote_chunk_manager.go):
   - Enhanced MinIO/S3 error mapping: 11 error codes
   - Enhanced Azure Blob error mapping: 4 error codes
   - Enhanced GCP Cloud Storage error mapping: 3 HTTP status codes
   - Maps provider-specific errors to Milvus error types

4. **Import retry logic**
(internal/util/importutilv2/common/retryable_reader.go):
- Updated Read() to use denylist: retry all errors except non-retryable
ones
   - Preserves EOF handling (no retry on EOF)

5. **Write retry logic** (internal/flushcommon/syncmgr/pack_writer.go):
   - Updated writeLog() with denylist retry
   - Updated writeDelta() with denylist retry

### Testing
- Added comprehensive unit tests for IsNonRetryableErr()
- Added tests for all cloud provider error mappings
- Updated import read/write retry tests with denylist scenarios
- static-check passed

### Issue
issue: #48153

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-18 12:03:26 +08:00
congqixia 3ec434da8b
fix: prevent CASCachedValue permanent failure with FallbackKeys (#48313)
When a ParamItem's primary key equals DefaultValue and a FallbackKey has
a different value, getWithRaw() overwrote `raw` with the fallback value.
CASCachedValue then re-reads the primary key and compares it against
`raw` — mismatch causes CAS to permanently fail. The cache is never
populated, forcing every call through the write-lock path and causing
goroutine contention on proxy search hot paths.

Introduce `effectiveRaw` to separate the CAS comparison value (always
the primary key's raw value) from the computation value (may come from
fallback), so CAS succeeds and the cache is properly populated.

Related to #48312

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2026-03-18 11:03:26 +08:00
Chun Han 182b35c347
feat: add segcore ORDER BY support for query (#48125)
## Summary
- Add `QueryOrderByNode` operator with `SortBuffer` for heap-based
partial sort in segcore
- Add `FillOrderByResult` path in `SegmentInterface` that bypasses
`find_first` for ORDER BY queries
- Parse `order_by_fields` from query plan proto into segcore plan nodes
- Proto changes: add `OrderByField` message and `order_by_fields` field
in `QueryPlanNode`

issue: #41675

## Test plan
- [x] C++ unit test `test_sort_buffer.cpp` covers SortBuffer with
various data types, null handling, multi-key sort
- [ ] Full E2E coverage in the follow-up Go-layer PR

design doc:
https://github.com/milvus-io/milvus-design-docs/blob/main/design_docs/20260203-query-orderby.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 11:39:26 +08:00
Xiaofan 32ed2df763
fix: preserve trailing slash semantics for meta KV prefix paths (#48053)
issue: #47998 
- add util.GetPath/GetPrefixPath and replace path.Join in etcd/tikv key
composition
- normalize metastore prefix builders and Walk/Load/Remove prefix usage
with trailing /
  - fix suffix snapshot prefix/root handling when root is empty
- add regression tests for trailing-slash behavior and prefix isolation

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 21:13:42 +08:00
Li Liu f1bb8356eb
enhance: upgrade Go dependencies (casbin, gin, lo, cockroachdb/errors) (#47943)
## Summary
- Upgrade Go dependencies across root, pkg, and client modules:
  - `casbin/casbin` v2.44.2 → v2.135.0
  - `gin-gonic/gin` v1.9.1 → v1.11.0
  - `samber/lo` v1.27.0 → v1.52.0
  - `cockroachdb/errors` v1.9.1 → v1.12.0
  - `google.golang.org/protobuf` v1.36.5 → v1.36.9
- Adapt source code to breaking API changes:
- `lo.Last()` now returns `(T, bool)` instead of `(T, error)` (lo v1.52)
- `gin.LogFormatterParams.Keys` is now `map[any]any` instead of
`map[string]any` (gin v1.11)

issue: https://github.com/milvus-io/milvus/issues/33482

## Test plan
- [x] `go mod tidy` clean on all three modules (root, pkg, client)
- [x] Local lint passes with no new errors
- [ ] CI code-check passes
- [ ] CI ut-go passes
- [ ] CI e2e passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Li Liu <li.liu@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 21:11:26 +08:00
junjiejiangjjj 4b68cd657a
enhance: Add gemini embedding (#48215)
https://github.com/milvus-io/milvus/issues/35856

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2026-03-16 15:15:26 +08:00
Li Liu 2be995f927
fix: fix flaky TestCheckCtxValid and TestProxy port binding (#48246)
## Summary
- **TestCheckCtxValid**: replace `time.Sleep(25ms)` with
`assert.Eventually` to eliminate timing dependency — 5ms margin between
sleep and context deadline was too tight on busy CI nodes
(build-ut-ciloop #91)
- **TestProxy/TestProxyRpcLimit**: bind to `localhost:0` instead of
hardcoded port 19530 to avoid TCP TIME_WAIT conflicts when ciloop reruns
tests (build-ut-ciloop #123)

## Test plan
- [x] TestCheckCtxValid: 500/500 passed post-fix
- [ ] TestProxy: syntax verified (gofmt OK), CI will validate full
compilation

issue: https://github.com/milvus-io/milvus/issues/48118

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Li Liu <li.liu@zilliz.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 15:09:26 +08:00
cai.zhang fc4982bd62
fix: prevent node panic when unsupported types used as ClusteringKey (#48184)
## Summary

Fixes #47540

- **Bug 1 (MixCoord panic):** `compactionInspector.analyzeScheduler` was
never initialized in production code. When FloatVector is used as
ClusteringKey, the `doAnalyze()` path calls `analyzeScheduler.Enqueue()`
on a nil pointer → panic. Fixed by passing `analyzeScheduler` to
`newCompactionInspector`.
- **Bug 2 (DataNode round-robin panic):** `validateClusteringKey()` had
no field type whitelist, so JSON/Bool/Array passed schema validation.
During clustering compaction, `NewScalarFieldValue()` panics on
unsupported types. Fixed by adding `IsClusteringKeyType()` check to
reject unsupported types at collection creation time.

## Test plan

- [x] `TestIsClusteringKeyType` — verifies supported/unsupported type
classification
- [x] `TestClusteringKey` — new sub-tests for JSON, Bool, Array as
ClusteringKey (all rejected)
- [ ] Existing `TestClusteringKey` sub-tests (normal, multiple keys,
vector key) still pass
- [ ] `TestCompactionPlanHandler*` tests pass with updated
`newCompactionInspector` signature

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 21:21:25 +08:00
Spade A f163e94ff1
feat: impl StructArray -- support element-level query (#47906)
issue: https://github.com/milvus-io/milvus/issues/42148
design doc:
https://github.com/milvus-io/milvus-design-docs/blob/main/design_docs/20260306-struct.md

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2026-03-13 17:55:29 +08:00
aoiasd 9d31bf4f1f
fix: add FileResource privileges to RBAC v2 built-in privilege groups (#48126)
## Summary
- Add FileResource privileges to v2 built-in privilege groups in
`pkg/util/constant.go`

issue: #47893

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2026-03-13 15:33:24 +08:00
Buqian Zheng c87f15698f
feat: update scalar index serialized format to V3 (#47690)
This PR added support for scalar index format V3, packing each scalar
index into a single file, reducing the amount of metadata in ETCD, and
number of files in S3.

~All C++ UTs are updated to run with V3 format. Production code will
still run on V2.~

issue: https://github.com/milvus-io/milvus/issues/47417
design doc:
https://github.com/milvus-io/milvus-design-docs/blob/main/design_docs/20260209-scalar-index-unified-format.md

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2026-03-13 14:51:25 +08:00
Xiaofan 9551cf48a0
fix: use t.TempDir() in TestOnEvent to prevent temp dir collision (#48187)
## Summary

Fixes #48186

Remaining fix for `TestOnEvent` in `pkg/config/manager_test.go` on
master branch. The `assert.NoError` inside `assert.Eventually` and
hardcoded etcd dir were already fixed; this replaces the last
`os.MkdirTemp("", "milvus")` with `t.TempDir()` for the yaml config file
directory.

- Use `t.TempDir()` for the yaml file directory so Go's test framework
handles isolation and cleanup, preventing directory collisions in
parallel CI runs

## Test plan
- [ ] `go test -v -count=1 -tags "dynamic,test" -run TestOnEvent
./pkg/config/`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11 20:01:24 +08:00
Gao 5b3382abdf
enhance: add big topk optimization property (#47848)
issue: #48011 

  ## Summary
- Add a new collection-level property `bigtopk_optimization.enabled`
that allows collections to use significantly higher TopK limits (up to
1M by default vs 16384) for search and query operations
- When enabled, auto-index creation selects an IVF-based index type
(configurable via `autoIndex.params.bigTopK.build`, default IVF_SQ8)
instead of the default HNSW, which is better suited for large TopK
retrieval scenarios
- Introduce dedicated quota parameters (`quotaAndLimits.limits.bigTopK`
and `quotaAndLimits.limits.bigMaxQueryResultWindow`) to control the
relaxed limits independently
- The property can be set at collection creation or via alter
collection, but changing it requires dropping any existing vector index
first
- This PR also fix the partitionkey isolation alter bug when non related
properties could affect isolation's vector index check

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2026-03-11 12:31:23 +08:00
sparknack 6dc8418eac
enhance: add async warmup policy support for caching layer (#47627)
issue: #47902

Integrate milvus-common commit 7b54b6e which adds
CacheWarmupPolicy_Async and a prefetch thread pool for background cache
warmup. This enables segments to be marked as loaded immediately while
cache warming happens asynchronously, reducing load latency.

---------

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
2026-03-10 14:45:23 +08:00
sthuang 3cd435d6ac
fix: [RBAC] grant cleanup on drop and migration on rename and meta cache interceptor (#48140)
related: #48061
related: #48062
related: #48137 

- Reconstruct logical etcd keys to avoid double rootPath prefix in
  delete/migrate operations
- Use typeutil.After for privilege name extraction instead of broken
   length-based substring
- Match wildcard dbName  and use DefaultTenant consistently
- Move grant cleanup to DropCollection ack callback
- Enable resolveAliasForPrivilege by default and fix alias cache
- passed rbac alias e2e tests

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
Signed-off-by: Li Liu <li.liu@zilliz.com>
Co-authored-by: Li Liu <li.liu@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 10:13:24 +08:00
congqixia 1bd23fda87
enhance: [loon] move storage v3 deltalog files under basePath/_delta/ to align with loon convention (#48150)
Related to #44956

Previously, V3 deltalogs were written to
{rootPath}/delta_log/{collID}/{partID}/{segID}/{logID}, separate from
the segment's basePath. The manifest used complex "../" relative paths
to bridge the two locations. This change writes V3 deltalogs directly to
{basePath}/_delta/{logID}, aligning with the C loon library's native
_delta/ convention and simplifying the manifest relative path to just
the logID filename. Legacy V1 segments and existing manifests remain
backward compatible.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 10:01:25 +08:00
Bingyi Sun 39606eec74
fix: Optimize namespace compaction and query implementation (#46512)
### **User description**
issue: https://github.com/milvus-io/milvus/issues/44011
Updated the DescribeCollection functionality to exclude the namespace
field from the response schema. This change ensures that these fields
are not returned in the collection description.


___

### **PR Type**
Bug fix, Enhancement


___

### **Description**
- Filter namespace field from DescribeCollection response schema

- Add support for PartitionKeySortCompaction in compaction task handling

- Fix lambda capture issues in C++ expression evaluation code

- Add comprehensive test coverage for namespace field filtering


___

### Diagram Walkthrough


```mermaid
flowchart LR
  A["DescribeCollection Request"] --> B["Filter Fields"]
  B --> C["Exclude Dynamic Fields"]
  B --> D["Exclude Namespace Field"]
  C --> E["Response Schema"]
  D --> E
  F["Compaction Task"] --> G["Create/Complete Task"]
  G --> H["Support PartitionKeySortCompaction"]
  H --> I["Task Execution"]
```



<details><summary><h3>File Walkthrough</h3></summary>

<table><thead><tr><th></th><th align="left">Relevant
files</th></tr></thead><tbody><tr><td><strong>Bug
fix</strong></td><td><table>
<tr>
  <td>
    <details>
<summary><strong>service_provider.go</strong><dd><code>Filter namespace
field from DescribeCollection response</code>&nbsp; &nbsp;
</dd></summary>
<hr>

internal/proxy/service_provider.go

<ul><li>Added import for <code>common</code> package to access
<code>NamespaceFieldName</code> constant<br> <li> Modified field
filtering logic in <code>DescribeCollection</code> to exclude
<br>namespace field alongside dynamic fields<br> <li> Updated filter
condition to check both <code>IsDynamic</code> flag and field name
<br>equality</ul>


</details>


  </td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46512/files#diff-e72e29bf5e62a9c5a797c0045e0d6f427d5c49e587c848f68e81ceabaa3e2a0c">+2/-1</a>&nbsp;
&nbsp; &nbsp; </td>

</tr>

<tr>
  <td>
    <details>
<summary><strong>task.go</strong><dd><code>Exclude namespace field from
task execution</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></summary>
<hr>

internal/proxy/task.go

<ul><li>Updated <code>describeCollectionTask.Execute</code> to filter
out namespace field <br>from response schema<br> <li> Modified condition
to skip fields that are either dynamic or have <br>namespace field
name<br> <li> Ensures namespace field is not included in collection
description <br>results</ul>


</details>


  </td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46512/files#diff-b4aff1bcd223cde92858085e1be028292b0899b08c9664aa590a952f01106e3d">+1/-1</a>&nbsp;
&nbsp; &nbsp; </td>

</tr>

<tr>
  <td>
    <details>
<summary><strong>Expr.cpp</strong><dd><code>Fix lambda capture and
remove duplicate includes</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; </dd></summary>
<hr>

internal/core/src/exec/expression/Expr.cpp

<ul><li>Removed duplicate include of <code>expr/ITypeExpr.h</code><br>
<li> Removed unused include of <code>fmt/format.h</code><br> <li> Fixed
lambda capture issues in <code>SetNamespaceSkipIndex</code> function by
using <br>pointer capture<br> <li> Changed from reference capture
<code>[&]</code> to explicit pointer and value <br>captures to avoid
dangling references<br> <li> Extracted namespace field ID and value as
const variables for safer <br>lambda capture</ul>


</details>


  </td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46512/files#diff-1051ba05ef883a88cf621aa1ca7ba4f3b433be44f574877c1888f39eed42cb50">+16/-16</a>&nbsp;
</td>

</tr>
</table></td></tr><tr><td><strong>Tests</strong></td><td><table>
<tr>
  <td>
    <details>
<summary><strong>service_provider_test.go</strong><dd><code>Add test for
namespace field filtering</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; </dd></summary>
<hr>

internal/proxy/service_provider_test.go

<ul><li>Added imports for <code>schemapb</code> and <code>common</code>
packages<br> <li> Created new test
<br><code>TestCachedProxyServiceProvider_DescribeCollection_FilterNamespaceField</code><br>
<li> Test verifies namespace and dynamic fields are filtered while user
<br>fields are preserved<br> <li> Uses mock cache to simulate collection
metadata with namespace field</ul>


</details>


  </td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46512/files#diff-47cdd329529238ca17e2a5c9f66f548cf941e19d332203a099d0fa8a9e67dd47">+65/-0</a>&nbsp;
&nbsp; </td>

</tr>

<tr>
  <td>
    <details>
<summary><strong>task_test.go</strong><dd><code>Add namespace field
filtering test case</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; </dd></summary>
<hr>

internal/proxy/task_test.go

<ul><li>Added comprehensive test
<br><code>TestDescribeCollectionTask_FilterNamespaceField</code><br>
<li> Test creates schema with namespace field as partition key and
dynamic <br>metadata field<br> <li> Verifies both namespace and dynamic
fields are excluded from describe <br>result<br> <li> Confirms user
fields like id and fvec are properly included</ul>


</details>


  </td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46512/files#diff-ab14021f2a0170ad1c610628871f98c5cd4386ce95efe7af94b4509bdea4ccf7">+90/-0</a>&nbsp;
&nbsp; </td>

</tr>
</table></td></tr><tr><td><strong>Enhancement</strong></td><td><table>
<tr>
  <td>
    <details>
<summary><strong>compaction_inspector.go</strong><dd><code>Support
PartitionKeySortCompaction in task creation</code>&nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; </dd></summary>
<hr>

internal/datacoord/compaction_inspector.go

<ul><li>Added
<code>datapb.CompactionType_PartitionKeySortCompaction</code> to switch
case <br>in <code>createCompactTask</code><br> <li> Now handles
PartitionKeySortCompaction alongside MixCompaction and
<br>SortCompaction<br> <li> Routes PartitionKeySortCompaction to
<code>newMixCompactionTask</code> handler</ul>


</details>


  </td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46512/files#diff-1c884001f2e84de177fea22b584f3de70a6e73695dbffa34031be9890d17da6d">+1/-1</a>&nbsp;
&nbsp; &nbsp; </td>

</tr>

<tr>
  <td>
    <details>
<summary><strong>meta.go</strong><dd><code>Handle
PartitionKeySortCompaction in mutation completion</code>&nbsp;
</dd></summary>
<hr>

internal/datacoord/meta.go

<ul><li>Added
<code>datapb.CompactionType_PartitionKeySortCompaction</code> to switch
case <br>in <code>CompleteCompactionMutation</code><br> <li> Routes
PartitionKeySortCompaction to
<code>completeSortCompactionMutation</code> <br>handler<br> <li> Ensures
proper mutation completion for partition key sort compaction
<br>type</ul>


</details>


  </td>
<td><a
href="https://github.com/milvus-io/milvus/pull/46512/files#diff-1b1c74d883d233d2457813f0708bd3a3418102555615d6abca5e04b9815e6c37">+1/-1</a>&nbsp;
&nbsp; &nbsp; </td>

</tr>
</table></td></tr></tbody></table>

</details>

___



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: schemapb.CollectionSchema.EnableNamespace is the
authoritative per-collection flag for namespace behavior — it is
persisted on Collection models, copied into DescribeCollection
responses, and used at runtime to derive
datapb.SegmentInfo.IsNamespaceSorted and
datapb.CompactionSegmentBinlogs/CompactionSegment.IsNamespaceSorted
(i.e., per-collection EnableNamespace → per-segment IsNamespaceSorted).
- Removed/simplified logic: eliminated property-based namespace toggles
and partition-key-sort special-cases (TriggerTypePartitionKeySort /
TriggerTypeClusteringPartitionKeySort and
IsPartitionKeySortCompactionEnabled) and removed the dual namespace-skip
API in expression evaluation (SetNamespaceSkipFunc /
SetNamespaceSkipIndex); routing now reuses existing Mix/Sort/Clustering
handlers and uniformly treats IsNamespaceSorted alongside IsSorted in
compaction, index, and inspection checks.
- Why no data loss or behavior regression: wire compatibility preserved
(SegmentInfo field number unchanged when renaming
is_partition_key_sorted → is_namespace_sorted), binlog/segment IDs,
manifest paths and storage versions are unchanged, and compaction
requests only add/pass an extra boolean flag — no binlog content,
identifiers, or storage layout are modified; DescribeCollection now only
changes user-facing filtering of schema fields (internal stored schema
unchanged) and unit tests were added for filtering to prevent
regressions.
- Bug fix (milvus-io/milvus#44011): fixes leakage of internal
namespace/meta fields in DescribeCollection by filtering out fields with
IsDynamic==true and the Namespace field name (common.NamespaceFieldName)
and by propagating EnableNamespace through
DescribeCollectionTask.Execute; tests added in
internal/proxy/service_provider_test.go and internal/proxy/task_test.go
validate the fix.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2026-03-09 21:11:23 +08:00
congqixia 6b910a0074
enhance: bump OpenTelemetry to v1.40.0 to fix CWE-426 untrusted search path (#48058)
Related to #48070

Upgrade go.opentelemetry.io/otel and related packages from v1.34.0 to
v1.40.0 across all Go modules to address CWE-426 (Untrusted Search Path)
vulnerability. Also bumps transitive dependencies including auto/sdk
v1.1.0 -> v1.2.1, go-logr v1.4.2 -> v1.4.3, and golang.org/x/sys v0.38.0
-> v0.40.0.

See: https://cwe.mitre.org/data/definitions/426.html

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2026-03-09 10:57:22 +08:00
sijie-ni-0214 f40965c65a
enhance: optimize mixcoord cpu usage (#47618)
issue: https://github.com/milvus-io/milvus/issues/47055

---------

Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>
2026-03-08 22:31:21 +08:00
Xiaofan d69bdd288c
fix: fix macOS 15 ARM64 compilation issues (#7437) (#47810)
issue: #47809

---------

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 23:19:21 +08:00
congqixia d88c5655e1
enhance: enable Loon FFI by default and support storage v3 in file managers (#47984)
Related to #44956

Enable useLoonFFI config by default and extend DiskFileManagerImpl and
MemFileManagerImpl to handle STORAGE_V3 using the same code paths as
STORAGE_V2 for caching raw data, optional fields to disk and memory.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2026-03-06 19:03:21 +08:00
wei liu 53790e6a46
fix: Prevent GC from deleting index files referenced by snapshots (#48022)
## Summary

- Add buildID tracking to `SnapshotRefIndex` to precisely protect
snapshot-referenced index files from GC deletion
- Fix the `segIdx==nil` branch in `recycleUnusedIndexFiles` where orphan
index files were deleted without snapshot checks (root cause of
persistent RestoreSnapshot failures after PR #47669)
- Add snapshot protection to `recycleUnusedBinLogWithChecker`,
`recycleUnusedTextIndexFiles`, and `recycleUnusedJSONStatsFiles`
- Use per-collection `IsRefIndexLoadedForCollection` instead of global
`IsAllRefIndexLoaded` where collection context is available

issue: #47658

## Root Cause

Previous fix PR #47669 added snapshot protection to GC paths but missed
the `segIdx==nil` branch in `recycleUnusedIndexFiles`. After compaction,
`recycleUnusedSegIndexes` removes old segment index metadata. Later,
`recycleUnusedIndexFiles` walks storage, finds orphan index files, and
`CheckCleanSegmentIndex(buildID)` returns `(true, nil)`. The
`segIdx==nil` branch deleted all files without any snapshot reference
check.

## Fix Approach

Add `build_ids` field to `SnapshotMetadata` proto and
`SnapshotRefIndex`. When a snapshot is saved, all buildIDs from segment
index files are extracted and persisted. GC functions use
`GetSnapshotByBuildID(buildID)` for direct set lookups instead of
relying on `segIdx` for collection/index context.

## Test plan

- [x] All 433 datacoord unit tests pass
- [x] `make milvus` build OK
- [x] `make lint-fix` lint OK
- [ ] E2E snapshot performance test suite (RS-01, RS-04, SI-02
scenarios)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2026-03-06 18:39:21 +08:00
yihao.dai 5748d8df4f
enhance: add per-cluster TLS config for CDC outbound mTLS connections (#47968)
## Summary
- Add `BuildTLSConfig` helper and `TLSConfig` field to SDK
`ClientConfig` for mTLS support
- Add `GetClusterTLSConfig(clusterID)` for dynamic per-cluster
paramtable lookup via
`tls.clusters.<clusterID>.{caPemPath,clientPemPath,clientKeyPath}`
- CDC `NewMilvusClient` reads per-cluster TLS config by target cluster
ID, enabling different certs per target cluster

All target clusters' certs are pre-configured on every node, so CDC
topology switchover (e.g., A→B,C to B→A,C) works without process
restart.

## Test plan
- [x] Unit tests for `BuildTLSConfig` (valid certs, missing CA, invalid
cert pair)
- [x] Unit tests for `GetClusterTLSConfig` (per-cluster lookup, missing
config)
- [x] Unit tests for `buildCDCTLSConfig` (no config, partial config,
invalid CA, per-cluster isolation)

issue: #47843

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 11:27:20 +08:00
sthuang f5b6db2929
fix: resolve collection alias in RBAC and clean up grants on collection lifecycle (#47851)
related: https://github.com/milvus-io/milvus/issues/47850
1. Privilege interceptor: resolve collection alias to real collection
name
   before RBAC permission check, so that operating via alias checks
   permission against the real collection, not the alias name.

2. MetaCache alias cache: add aliasInfo cache with positive/negative
   entries to avoid repeated DescribeAlias RPC calls. Cache is
invalidated on alias removal, collection removal, and database removal.

3. Catalog grant cleanup: add DeleteGrantByCollectionName and
   MigrateGrantCollectionName to RootCoordCatalog interface and kv
   implementation. On collection drop, delete all associated grants;
   on collection rename, migrate grants to the new name.

4. Feature flag: add proxy.resolveAliasForPrivilege config to
   enable/disable alias resolution in the privilege interceptor.

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2026-03-05 18:39:22 +08:00
sijie-ni-0214 b0a6a75f2b
enhance: optimize qn load speed (#47423)
issue: https://github.com/milvus-io/milvus/issues/47422

---------

Signed-off-by: sijie-ni-0214 <sijie.ni@zilliz.com>
2026-03-05 16:55:21 +08:00
jiaqizho ca89df5512
enhance: support configurable TLS minimum version for object storage connections (#48000)
Related to https://github.com/milvus-io/milvus/issues/44999

Currently Milvus doesn't allow users to control the TLS version used
when connecting to object storage (MinIO/S3/Azure/GCP). Some
environments require enforcing TLS 1.3 for compliance, but there's no
way to set that today.

This adds a new config option `minio.ssl.tlsMinVersion` that lets users
specify the minimum TLS version ("1.0", "1.1", "1.2", "1.3", or
"default"). It works across all supported storage backends including
MinIO/S3, Azure Blob, and GCP native. The setting is plumbed through
paramtable, proto StorageConfig, and all the places that create storage
clients (compaction, datacoord, datanode, storagev2, etc.).

For the GCP native backend, this also adds proper UseIAM/ADC support
that was previously missing, since the TLS transport injection needed to
handle both credential modes correctly.

Also fixed the GCP MinIO-compatible path to reuse any custom transport
(e.g. with TLS config) as the backend for the OAuth2 token wrapping,
instead of always creating a new default transport.

Unit tests cover the TLS version parsing, HTTP client construction, and
version enforcement (proving a TLS 1.3 client correctly rejects a TLS
1.2-only server). Integration tests are included but gated behind
environment variables.

Signed-off-by: jiaqizho <jiaqi.zhou@zilliz.com>
2026-03-04 19:45:21 +08:00
Chun Han 9031783ea6
enhance: involve text index when estimating memory cost for loading (#47899)
related: #47539

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 19:15:21 +08:00
wei liu 220c691500
feat: [ExternalTable Part4] Support data mapping for external collections (#47730)
design doc:
https://github.com/milvus-io/milvus-design-docs/blob/main/design_docs/20260105-external_table.md

issue: https://github.com/milvus-io/milvus/issues/45881
## Summary
- Pre-allocate segment IDs in DataCoord, pass to DataNode for direct
final-path manifest writes (eliminating two-phase ID workflow)
- Add FFI bridges for file exploration (`ExploreFiles`, `GetFileInfo`)
and manifest creation (`CreateManifestForSegment`,
`ReadFragmentsFromManifest`)
- Implement fragment-to-segment balancing with configurable target rows
per segment
- Add `ExternalSpec` parser for external data format configuration
- Extend `UpdateExternalCollectionRequest` proto with schema, storage
config, and pre-allocated segment ID fields
- Add E2E test for external collection refresh with data verification

> **Note**: This PR includes Part3 changes (PR #47303). After Part3 is
merged, this PR will be rebased to only contain Part4-specific changes.

## Test plan
- [x] Unit tests for `task_refresh_external_collection.go` (28 tests)
- [x] Unit tests for `task_update.go` and fragment utilities (40 tests)
- [x] Unit tests for FFI bridges (`exttable_test.go`, 9 tests)
- [x] Unit tests for `ExternalSpec` parser
- [x] Unit tests for paramtable config
- [x] Integration test with real Parquet files
- [x] `make lint-fix` passes
- [ ] E2E test with MinIO backend

---------

Signed-off-by: Jiquan Long <jiquan.long@zilliz.com>
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2026-03-04 18:09:21 +08:00
Sulimov Dmitriy 7ae9e51fcf
feat: Add Yandex Cloud (YC) text embedding support (#47939)
issue: #47938

- Updated milvus.yaml to include configuration for Yandex Cloud model
service.
- Implemented CreateYCEmbeddingServer function to mock Yandex Cloud
embedding service.
- Added support for YC provider in text embedding function logic.
- Enhanced error handling for unsupported providers.
- Added unit tests for Yandex Cloud embedding functionality and its
disabled state.

Design documents location:
https://github.com/milvus-io/milvus-design-docs/blob/main/design_docs/20260227-yc-text-embedding-provider.md

---------

Signed-off-by: edddoubled <vrs_2.1@yandex.ru>
Co-authored-by: edddoubled <vrs_2.1@yandex.ru>
2026-03-03 22:37:19 +08:00
Zhen Ye 46a43fc3a5
fix: fast-fail ServerIDMismatch for node connections and increase walBalancer operationTimeout (#47981)
For node connections (isNode=true), ServerIDMismatch now returns
needRetry=false immediately instead of retrying 10 times with
exponential backoff (~52.6s). Retrying is futile because the NodeID
injected via the interceptor at connection time never changes during
retry. Coord connections keep existing retry behavior.

Also increase streaming.walBalancer.operationTimeout default from 30s to
30m.

issue: #46182

Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 22:41:20 +08:00
wei liu 4c91d25468
enhance: unify copy segment logic with manifest path support (#47751)
## Summary

issue: #44358

Refactors the copy segment pipeline for snapshot restore, consolidating
the implementation with a unified file collection approach that
eliminates code duplication between manifest mode and binlog mode.

Key changes:
- **Unified file collection**: Add `SegmentFiles` struct and
`collectSegmentFiles` with manifest/pb automatic fallback
- **Manifest path support (StorageV3+)**: Resolve binlog paths from
packed manifest instead of protobuf fields, with `storage_version` field
added to `CopySegmentSource` proto
- **Fail-fast error handling**: `SyncCopySegmentTask` errors now call
`markTaskAndJobFailed` instead of just logging warnings
- **Testability improvements**: Extract `copyFile`/`listAllFiles`
helpers, refactor all tests from mockery to mockey

Architecture:
- InsertBinlogs: try manifest (S3 list) first, fallback to pb if empty
- Other 7 types: always from pb (not yet in manifest)
- Single unified flow: collect -> map -> copy -> build result

Additional fixes:
- Fix `BinlogTypeBM25` constant to match actual path format
(`bm25_stats`)
- Improve error messages with segment ID and path context
- Reduce `CopySegmentAndIndexFiles` from 136 lines to 93 lines

## Test plan

- [x] Unit tests pass (`go test -tags dynamic,test` for `importv2`
package)
- [x] `make lint-fix` passes
- [x] Snapshot E2E tests pass (7/7):
  - TestCreateSnapshot
  - TestSnapshotRestoreWithMultiSegment
  - TestSnapshotRestoreWithMultiShardMultiPartition
  - TestSnapshotRestoreWithMultiFields
  - TestSnapshotRestoreEmptyCollection
  - TestSnapshotRestoreWithJSONStats
  - TestSnapshotRestoreAfterDropPartitionAndCollection

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2026-03-02 15:37:19 +08:00
yihao.dai 564279e3c3
enhance: allow pchannel count increase in ReplicateConfiguration (#47792)
## Summary

- Allow pchannel count increase (append-only) in
`validateClusterConsistency` for CU scaling
- Existing pchannels must be preserved at the same positions; decrease
and reorder are still rejected
- Equal pchannel count across clusters is still enforced in
`validateClusterBasic`

## Test plan

- [x] Unit tests pass for `pkg/util/replicateutil/...`
- [x] Verified pchannel increase (append) is accepted
- [x] Verified pchannel decrease is rejected
- [x] Verified pchannel reorder/replace is rejected

issue: https://github.com/milvus-io/milvus/issues/47791

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 11:25:18 +08:00
cai.zhang 40b5a43689
enhance: Add phase-level timing logs and metrics for sort compaction (#47673)
issue: #47671

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2026-03-02 10:01:20 +08:00
yihao.dai a355d81134
enhance: add configurable skip list for replicate message types (#47777)
- Add `streaming.replication.skipMessageTypes` config parameter
(comma-separated message type names, default
`AlterResourceGroup,DropResourceGroup`)
- On the secondary side, `overwriteReplicateMessage()` checks incoming
message type against skip set and returns `IgnoreOperation`, which
`ReplicateStreamServer` already handles gracefully
- Configurable at runtime (refreshable)

issue: https://github.com/milvus-io/milvus/issues/47776

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 16:37:24 +08:00
yihao.dai 650bc7f9a5
enhance: support different replica numbers on secondary CDC cluster (#47780)
## Summary
- Add `use_local_replica_config` flag to `AlterLoadConfigMessageHeader`
so the secondary CDC cluster can use its own
`ClusterLevelLoadReplicaNumber` / `ClusterLevelLoadResourceGroups`
instead of blindly applying the primary's replica config
- Secondary's `ReplicateService` sets the flag on every replicated
`AlterLoadConfigMessage`; `LoadCollectionJob` reads local config when
flag is set
- Use `generateReplicas` for idempotent replica generation, ensuring WAL
replay does not create duplicate replicas
- Default to 1 replica in `__default_resource_group` when local config
is not explicitly set (instead of falling back to primary's config)

## Test plan
- [x] Unit test: replicated AlterLoadConfig gets
`UseLocalReplicaConfig=true`
- [x] Unit test: local config overrides primary config when set
- [x] Unit test: defaults to 1 replica in `__default_resource_group`
when local config not set
- [x] Unit test: flag=false uses primary config directly
- [x] Unit test: `getLocalReplicaConfig` first load, idempotent replay,
not set, no RGs, alloc error

issue: #47779

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2026-02-27 21:19:20 +08:00
cai.zhang 331b277368
fix: Use physical time for entity-level TTL filtering (#47518)
## Problem
In strong consistency mode, when query_timestamp is significantly older
than current physical time (e.g., replaying old data with
guaranteeTimestamp from the past), entity-level TTL filtering
incorrectly used query_timestamp instead of current physical time. This
caused:
- Valid data to be incorrectly filtered as "expired"
- Zero search/query results even when data should be visible

## Root Cause
The TTL filter expression (`ttl_field > physical_us OR ttl_field IS
NULL`) was using physical time derived from query_timestamp, which could
be hours or days old in strong consistency scenarios. Entity TTL should
always use current physical time, not query timestamp.

## Solution
1. **C++ Core Changes**:
- Add `physical_time_us_` parameter to QueryContext (defaults to
query_timestamp)
   - Pass physical_time_us through Search/Retrieve API chain:
* QueryContext constructor → Search/Retrieve C API → segment
implementation
- CreateTTLFieldFilterExpression now uses
QueryContext::get_physical_time_us()
   - Add ExecuteQueryExpr overload with physical_time_us for testing

2. **Go Layer Changes**:
   - Extract physical time from guaranteeTimestamp in query tasks:
     * query_task.go: Search operations
     * query_stream_task.go: Streaming search operations
- Use tsoutil.ParseHybridTs() to convert TSO timestamp to physical time
- Pass physical_time_us to C++ Search/Retrieve calls via
plan.go/segment.go

3. **Testing**:
   - EntityTTLTest.cpp: Core TTL filtering with physical_time_us
- EntityTTLEdgeCaseTest.cpp: Edge cases (no TTL field, zero
physical_time_us)
   - Tests verify QueryContext stores/returns physical_time_us correctly
- Tests validate TTL filtering uses physical_time_us, not
query_timestamp

## Behavior
- **Before**: TTL filter used query_timestamp (old in strong
consistency)
- **After**: TTL filter uses current physical time (from
guaranteeTimestamp)
- **Backward compatible**: When physical_time_us=0, falls back to
query_timestamp

## Technical Details
- TSO timestamp conversion: `tsoutil.ParseHybridTs(timestamp)` →
physical_ms
- Physical time in microseconds: `physical_us = physical_ms * 1000`
- TTL expression unchanged: `ttl_field > physical_us OR ttl_field IS
NULL`

Fixes #47413

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2026-02-27 19:01:20 +08:00
Li Liu 72918439ef
enhance: remove unused Go dependencies (ansi, fastjson, grpc/examples, sizedwaitgroup) (#47852)
Related to #46199

## Summary

Remove 5 unused or misused Go dependencies to reduce module bloat and
consolidate overlapping libraries:

- **`mgutz/ansi`** → replaced with inline ANSI escape codes (only used
for 3 color constants in migration console)
- **`valyala/fastjson`** → replaced with `tidwall/gjson` (only 1 file
used fastjson; gjson is already used in 22+ files)
- **`google.golang.org/grpc/examples`** → replaced with existing
`rootcoordpb` (test file pulled in entire grpc examples repo for a mock
server)
- **`remeh/sizedwaitgroup`** → replaced with `chan` semaphore +
`sync.WaitGroup` (only 2 files, trivial pattern)
- **`pkg/errors`** → replaced with `cockroachdb/errors` (the project
standard; `pkg/errors` was used in 1 file)

## Behavior change: DeleteLog.Parse() fail-fast on missing fields

The `fastjson` → `gjson` migration adds explicit `Exists()` validation
for `ts`, `pk`, and `pkType` fields in the JSON parsing branch.
Previously, both fastjson and gjson would silently return zero values
for missing fields, causing `dl.Pk` to remain nil and panicking
downstream. The new code fails fast with a descriptive error at parse
time. This is a defensive improvement (the original code had identical
silent-failure behavior).

## Performance impact

| Change | Path type | Perf delta | Matters? |
|--------|-----------|------------|----------|
| `pkg/errors` → `cockroachdb/errors` | Cold (offline CLI tool
`config-docs-generator`) | Negligible | No |
| `mgutz/ansi` → inline ANSI codes | Cold (offline CLI tool
`migration/console`) | Marginally faster (eliminates map lookup) | No |
| `fastjson` → `gjson` (`DeleteLog.Parse`) | Warm — old-format deltalog
deserialization only | **~2.5x slower** per JSON parse (143ns→361ns) |
**No** — see below |
| `grpc/examples` → `rootcoordpb` | Test only (`client_test.go`) | None
| No |
| `sizedwaitgroup` → chan+WaitGroup | Test only (`wal_test.go`,
`test_framework.go`) | None | No |

### fastjson → gjson regression detail

`DeleteLog.Parse()` is called per-row during deltalog deserialization,
but **only for the legacy single-field format**. The new multi-field
parquet format (`newDeltalogMultiFieldReader`) reads pk/ts as separate
Arrow columns and bypasses `Parse()` entirely. Legacy deltalogs are
rewritten to parquet format during compaction, so this is a dying code
path. Additionally, deltalog loading is I/O-bound — the JSON parse cost
(~361ns/row) is negligible compared to disk read and Arrow
deserialization overhead.

Benchmark (Go 1.24, arm64):
```
BenchmarkFastjsonSmall-4       8,315,624    143.1 ns/op    0 B/op   0 allocs/op
BenchmarkGjsonOptimized-4      3,321,613    361.4 ns/op   96 B/op   1 allocs/op
```

## Test plan

- [x] CI build passes
- [x] CI code-check passes
- [ ] CI ut-go passes
- [ ] CI e2e passes
- [x] Boundary test cases added (bare number, missing pkType/ts/pk)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Signed-off-by: Li Liu <li.liu@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 00:49:27 +08:00
Gao d91a86aef8
enhance: auto warmup for the big tenant collection (#47630)
issue: #47371

---------

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2026-02-26 22:49:19 +08:00
zhuwenxing c5102683d3
enhance: increase default maxVectorFieldNum from 4 to 10 (#47866)
## Summary
- Increase the default `maxVectorFieldNum` from **4** to **10** to
accommodate the growing variety of vector types (dense, sparse,
function-based) supported by Milvus
- Update related test constants and hardcoded values in both Go and
Python test suites

## Changes
- `configs/milvus.yaml`: default config value 4 → 10
- `pkg/util/paramtable/component_param.go`: Go param default "4" → "10"
- `tests/go_client/common/consts.go`: Go test constant 4 → 10
- `tests/python_client/common/common_type.py`: Python test constant 4 →
10
- `tests/python_client/milvus_client/test_milvus_client_collection.py`:
replace hardcoded "4" in error message with constant reference

Closes #47402

## Test plan
- [x] Verify collection creation with up to 10 vector fields succeeds
- [x] Verify collection creation with 11+ vector fields fails with
proper error message
- [x] Run existing Go integration tests (`tests/go_client`)
- [x] Run existing Python client tests (`tests/python_client`)

Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
2026-02-26 14:18:47 +08:00