Layer 2 (OpenSSL FIPS) changes:
- Add programmatic FIPS activation via OSSL_LIB_CTX_load_config in
boring_enabled.go (gated by //go:build boringcrypto)
- Add openssl-fips.cnf with fips + default providers and
default_properties = fips=yes
- Use absolute .include path for fipsmodule.cnf — OpenSSL resolves
relative .include from the process working directory, not the config
file's directory, causing silent FIPS provider load failure
- Add RAND_bytes probe after config load to verify the FIPS provider is
truly functional (EVP_default_properties_is_fips_enabled only checks the
property string, not whether the provider loaded)
- Dockerfiles: add openssl fipsinstall + OPENSSL_MODULES env var
- Log OpenSSL FIPS status from C++ via
EVP_default_properties_is_fips_enabled
Layer 1 (Go BoringCrypto) changes:
- Add GOEXPERIMENT=boringcrypto build flag (conditional on
MILVUS_FIPS_ENABLED=ON)
- Add boringEnabled() build-tagged functions for startup logging
s2n-tls upgrade:
- Override s2n 1.4.1 (from aws-c-io) to 1.6.0 in conanfile.py. s2n 1.4.1
only detects FIPS via the legacy OPENSSL_FIPS define (not set by OpenSSL
3.x). s2n 1.6.0 adds EVP_default_properties_is_fips_enabled() detection
so s2n enters FIPS mode and uses RAND_bytes() through the FIPS provider.
See also: #48202, #48301
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
Removes the `IsTriggerKill` / SIGINT block from
`datacoord.stopServiceWatch()` and `querycoordv2.watchNodes()`.
### Root Cause
In MixCoord mode all three coordinators share the same etcd `Session`
object. During shutdown, when any coordinator calls `session.Stop()` it
cancels the shared context, closing the other coordinators' etcd watches
— triggering `stopServiceWatch()` / `watchNodes()` which sent SIGINT to
the process during a normal, coordinated teardown.
### Why remove SIGINT?
- `go s.Stop()` already handles unexpected session loss — SIGINT was
just a backstop in case `Stop()` hangs
- No other component (rootcoord, proxy, datanode, querynode) has this
logic
- The false-positive risk (killing the process during normal shutdown)
outweighs the marginal benefit
issue: #48242
---------
Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The config gen-yaml tool initialized BaseTable with default options,
which connected to etcd and loaded environment variables. This caused
runtime config values (e.g. mq.type=pulsar from etcd) to override
code-defined defaults, producing incorrect generated yaml files.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
## Summary
- Replace spf13/viper with yaml.v3 in the only 2 files that used it
(printer.go, server_test.go)
- Remove viper + 7 transitive deps from go.mod (8 packages total)
- C++ ghost dep removal (libsodium, simde) deferred to separate PR to
avoid CI conan issues
### Removed packages
spf13/viper, fsnotify/fsnotify, hashicorp/hcl, mitchellh/mapstructure,
pelletier/go-toml, spf13/afero, spf13/jwalterweatherman, subosito/gotenv
## Test plan
- [ ] CI passes (Go build + code-check)
- [ ] `go mod tidy` produces no diff
- [ ] querynodev2 TestInit_QueryHook passes with new yamlConfigWriter
issue: https://github.com/milvus-io/milvus/issues/47919🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Signed-off-by: Li Liu <li.liu@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Use `getEffectiveVersion()` instead of hardcoded `common.Version` for
startup banner and `milvus_build_info` Prometheus metric
- Dev builds now correctly display `master-20260228-add6e4c6d4` instead
of always showing `2.6.6`
- Release builds are unaffected (git tag matches semver, so
`getEffectiveVersion()` returns the same value)
## Motivation
The startup banner (`Version:` line) and
`milvus_build_info{version="..."}` metric currently always show the
hardcoded semver from `common.Version` (e.g. `2.6.6`), regardless of
whether the binary is a release build or a dev build. This makes it
impossible to identify the actual build version from logs or monitoring
dashboards.
`getEffectiveVersion()` already exists and is used by the `GetVersion`
RPC and `Connect` RPC (via `MILVUS_GIT_BUILD_TAGS` env var, introduced
in PR #47822). This PR makes the banner and metrics consistent with
those APIs.
### What changes
| Component | Before | After (release) | After (dev) |
|-----------|--------|-----------------|-------------|
| Banner `Version:` | `2.6.6` | `2.6.6` | `master-20260228-add6e4c6d4` |
| Metric `milvus_build_info{version=}` | `2.6.6` | `2.6.6` |
`master-20260228-add6e4c6d4` |
| Session etcd version | `2.6.6` | `2.6.6` (unchanged) | `2.6.6`
(unchanged) |
| `GetVersion` RPC | `master-20260228-...` | `2.6.6` (unchanged) |
`master-20260228-...` (unchanged) |
### What is NOT affected
- `common.Version` itself (still hardcoded semver, used for session
compatibility)
- Session registration in etcd (still uses `common.Version` for cluster
version negotiation)
- Coordinator version range checks (still uses `common.Version`)
## Test plan
- [x] `go vet ./cmd/milvus/...` passes
- [x] Verified `getEffectiveVersion()` fallback: when `MilvusVersion` is
empty or "unknown", returns `common.Version.String()`
- [x] Verified no Grafana dashboard on internal 4am Grafana queries
`milvus_build_info` — zero PromQL impact
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Signed-off-by: yanliang567 <82361606+yanliang567@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Related to #46199
## Summary
Remove 5 unused or misused Go dependencies to reduce module bloat and
consolidate overlapping libraries:
- **`mgutz/ansi`** → replaced with inline ANSI escape codes (only used
for 3 color constants in migration console)
- **`valyala/fastjson`** → replaced with `tidwall/gjson` (only 1 file
used fastjson; gjson is already used in 22+ files)
- **`google.golang.org/grpc/examples`** → replaced with existing
`rootcoordpb` (test file pulled in entire grpc examples repo for a mock
server)
- **`remeh/sizedwaitgroup`** → replaced with `chan` semaphore +
`sync.WaitGroup` (only 2 files, trivial pattern)
- **`pkg/errors`** → replaced with `cockroachdb/errors` (the project
standard; `pkg/errors` was used in 1 file)
## Behavior change: DeleteLog.Parse() fail-fast on missing fields
The `fastjson` → `gjson` migration adds explicit `Exists()` validation
for `ts`, `pk`, and `pkType` fields in the JSON parsing branch.
Previously, both fastjson and gjson would silently return zero values
for missing fields, causing `dl.Pk` to remain nil and panicking
downstream. The new code fails fast with a descriptive error at parse
time. This is a defensive improvement (the original code had identical
silent-failure behavior).
## Performance impact
| Change | Path type | Perf delta | Matters? |
|--------|-----------|------------|----------|
| `pkg/errors` → `cockroachdb/errors` | Cold (offline CLI tool
`config-docs-generator`) | Negligible | No |
| `mgutz/ansi` → inline ANSI codes | Cold (offline CLI tool
`migration/console`) | Marginally faster (eliminates map lookup) | No |
| `fastjson` → `gjson` (`DeleteLog.Parse`) | Warm — old-format deltalog
deserialization only | **~2.5x slower** per JSON parse (143ns→361ns) |
**No** — see below |
| `grpc/examples` → `rootcoordpb` | Test only (`client_test.go`) | None
| No |
| `sizedwaitgroup` → chan+WaitGroup | Test only (`wal_test.go`,
`test_framework.go`) | None | No |
### fastjson → gjson regression detail
`DeleteLog.Parse()` is called per-row during deltalog deserialization,
but **only for the legacy single-field format**. The new multi-field
parquet format (`newDeltalogMultiFieldReader`) reads pk/ts as separate
Arrow columns and bypasses `Parse()` entirely. Legacy deltalogs are
rewritten to parquet format during compaction, so this is a dying code
path. Additionally, deltalog loading is I/O-bound — the JSON parse cost
(~361ns/row) is negligible compared to disk read and Arrow
deserialization overhead.
Benchmark (Go 1.24, arm64):
```
BenchmarkFastjsonSmall-4 8,315,624 143.1 ns/op 0 B/op 0 allocs/op
BenchmarkGjsonOptimized-4 3,321,613 361.4 ns/op 96 B/op 1 allocs/op
```
## Test plan
- [x] CI build passes
- [x] CI code-check passes
- [ ] CI ut-go passes
- [ ] CI e2e passes
- [x] Boundary test cases added (bare number, missing pkType/ts/pk)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Signed-off-by: Li Liu <li.liu@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Make `GetVersion` API return version strings consistent with Docker
image tags, replacing the hardcoded semver
- Assemble `MilvusVersion` in Makefile via `git describe --exact-match
--tags` (release) or `branch-date-commit` (dev), inject via ldflags
- Both `detail=False` (`GetVersion` RPC) and `detail=True` (`Connect`
RPC `server_info.build_tags`) return the new format
| Scenario | Before | After |
|----------|--------|-------|
| Release (tag v2.6.11) | `2.6.11` | `2.6.11` |
| Dev (master branch) | `2.6.11` | `master-20260224-2d14975d18` |
| Feature branch | `2.6.11` | `feature-xxx-20260224-2d14975d18` |
Banner still displays semver via `common.Version.String()` — unchanged.
issue: #47823
## Test plan
- [x] Makefile dry-run verified: `MilvusVersion` correctly injected in
all 4 ldflags targets
- [x] Tag scenario verified: `v2.6.11` → `2.6.11` (v prefix stripped)
- [x] Dev scenario verified:
`feature/getversion-docker-tag-format-20260224-2d14975d18`
- [x] Go syntax verified: `gofmt` and `go vet` pass
- [ ] Full `make build-go` (requires C++ env)
- [ ] E2E: `client.get_server_version()` and
`client.get_server_version(detail=True)`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Signed-off-by: yanliang567 <82361606+yanliang567@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
issue: #44726
Introduce an immutable option to prevent accidental modification of
critical configurations.
Support switching of WAL implementation.
Note: This PR depends on [milvus-proto PR
#503](https://github.com/milvus-io/milvus-proto/pull/503) being merged
first.
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
issue: #45640
- After async logging, the C log and go log has no order promise,
meanwhile the C log format is not consistent with Go Log; so we close
the output of glog, just forward the log result operation into Go side
which will be handled by the async zap logger.
- Use CGO to filter all cgo logging and promise the order between c log
and go log.
- Also fix the metric name, add new metric to count the logging.
- TODO: after woodpecker use the logger of milvus, we can add bigger
buffer for logging.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: all C (glog) and Go logs must be routed through the
same zap async pipeline so ordering and formatting are preserved; this
PR ensures every glog emission is captured and forwarded to zap before
any async buffering diverges the outputs.
- Logic removed/simplified: direct glog outputs and hard
stdout/stderr/log_dir settings are disabled (configs/glog.conf and flags
in internal/core/src/config/ConfigKnowhere.cpp) because they are
redundant once a single zap sink handles all logs; logging metrics were
simplified from per-length/volatile gauges to totalized counters
(pkg/metrics/logging_metrics.go & pkg/log/*), removing duplicate
length-tracking and making accounting consistent.
- No data loss or behavior regression (concrete code paths): Google
logging now adds a GoZapSink (internal/core/src/common/logging_c.h,
logging_c.cpp) that calls the exported CGO bridge goZapLogExt
(internal/util/cgo/logging/logging.go). Go side uses
C.GoStringN/C.GoString to capture full message and file, maps glog
severities to zapcore levels, preserves caller info, and writes via the
existing zap async core (same write path used by Go logs). The C++
send() trims glog's trailing newline and forwards exact buffers/lengths,
so message content, file, line, and severity are preserved and
serialized through the same async writer—no log entries are dropped or
reordered relative to Go logs.
- Capability added (where it takes effect): a CGO bridge that forwards
glog into zap—new Go-exported function goZapLogExt
(internal/util/cgo/logging/logging.go), a GoZapSink in C++ that forwards
glog sends (internal/core/src/common/logging_c.h/.cpp), and blank
imports of the cgo initializer across multiple packages (various
internal/* files) to ensure the bridge is registered early so all C logs
are captured.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: chyezh <chyezh@outlook.com>
generated a library that wraps the go expr parser, and embedded that
into libmilvus-core.so
issue: https://github.com/milvus-io/milvus/issues/45702
see `internal/core/src/plan/milvus_plan_parser.h` for the exposed
interface
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Introduced C++ API for plan parsing with schema registration and
expression parsing capabilities.
* Plan parser now available as shared libraries instead of a standalone
binary tool.
* **Refactor**
* Reorganized build system to produce shared library artifacts instead
of executable binaries.
* Build outputs relocated to standardized library and include
directories.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
issue: #45640
- log may be dropped if the underlying file system is busy.
- use async write syncer to avoid the log operation block the milvus
major system.
- remove some log dependency from the until function to avoid
dependency-loop.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Related to #44620
Related to unstable ut "internal/querycoordv2 TestServer/TestNodeUp"
Introduce SessionWatcher interface to fix race condition and goroutine
leak that caused unstable unit test TestServer/TestNodeUp.
Changes:
- Add SessionWatcher interface with EventChannel() and Stop() methods
- Refactor WatchServices() to return SessionWatcher instead of raw
channel
- Fix cleanup order in QueryCoordV2: stop watcher before session
- Update DataCoord, ConnectionManager to use SessionWatcher
- Add MockSessionWatcher for testing
Fixes race condition between session context cancellation and internal
loop exit. Eliminates goroutine leak by providing explicit lifecycle
management.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #45068
When component.Prepare() fails (e.g., net listener creation error), the
sign channel was never closed, causing runComponent to block
indefinitely at <-sign. This resulted in the entire process hanging
after logging the error message.
Changes:
- Move close(sign) to defer statement in runComponent goroutine
- Ensures sign channel is always closed regardless of success/failure
- Allows proper error propagation through future.Await() mechanism
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #43897
- Alter collection/database is implemented by WAL-based DDL framework
now.
- Support AlterCollection/AlterDatabase in wal now.
- Alter operation can be synced by new CDC now.
- Refactor some UT for alter DDL.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
relate: https://github.com/milvus-io/milvus/issues/43687
We used to run the temporary analyzer and validate analyzer on the
proxy, but the proxy should not be a computation-heavy node. This PR
move all analyzer calculations to the streaming node.
---------
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
issue: #44014
- On standalone, the query node inside need to load segment and watch
channel, so the querynode is not a embeded querynode in streamingnode
without `LabelStreamingNodeEmbeddedQueryNode`. The channel dist manager
can not confirm a standalone node is a embededStreamingNode.
Bug is introduced by #44099
Signed-off-by: chyezh <chyezh@outlook.com>
related: #39173
Core Features
* Parquet File Analysis: Analyze Milvus binlog Parquet files with
metadata extraction
* MinIO Integration: Direct connection to MinIO storage for remote file
analysis
* Vector Data Deserialization: Specialized handling of Milvus vector
data in binlog files
* Interactive CLI: Command-line interface with interactive exploration
Analysis Capabilities
* Metadata & Vector Analysis: Extract schema info, row counts, and
vector statistics
* Data Export: Export data to JSON format with configurable limits
* Query Functionality: Search for specific records by ID
* Batch Processing: Analyze multiple Parquet files simultaneously
User Experience
* Verbose Output: Detailed logging for debugging
* Error Handling: Robust error handling for file access and parsing
* Flexible Output: Support for single file and batch analysis formats
---------
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
Co-authored-by: nico <109071306+NicoYuan1986@users.noreply.github.com>
Ref https://github.com/milvus-io/milvus/issues/42148https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.
---------
Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
issue: #42833
- also fix the error metric for async cgo.
- also make sure the roles can be seen when node startup, #43041.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #41445
- make multiple node stop concurrently (otherwise streamingnode stop
will be blocked by querynode).
- change vchannel count updating when collection is dropping.
Signed-off-by: chyezh <chyezh@outlook.com>
enhance: update MixCoord registration in MilvusRoles
The `runMixCoord` function in `MilvusRoles` was updated to use the
`RegisterMixCoord` function from the `rootcoord_metrics` package instead
of `RegisterRootCoord`. This change aligns with the recent modifications
made to the `rootcoord_metrics` package.
issue:https://github.com/milvus-io/milvus/issues/41338
---------
Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
Merge RootCoord, DataCoord And QueryCoord into MixCoord
Make Session into one
issue : https://github.com/milvus-io/milvus/issues/37764
---------
Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
#35856
1. Add function-related configuration in milvus.yaml
2. Add null and empty value check to TextEmbeddingFunction
Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
issue: #39735
related to #39726
- Removed CPU profile dump from util.go's pprof collection
- Avoid potential blocking in StopCPUProfile() during shutdown
- Maintain goroutine/heap/block/mutex profiles for diagnostics
- Ensure safe shutdown timeout handling without profile stalls
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>