Commit Graph

173 Commits (17532517c611cafe5ec7a79bda47c9f296e82682)

Author SHA1 Message Date
cai.zhang fc4982bd62
fix: prevent node panic when unsupported types used as ClusteringKey (#48184)
## Summary

Fixes #47540

- **Bug 1 (MixCoord panic):** `compactionInspector.analyzeScheduler` was
never initialized in production code. When FloatVector is used as
ClusteringKey, the `doAnalyze()` path calls `analyzeScheduler.Enqueue()`
on a nil pointer → panic. Fixed by passing `analyzeScheduler` to
`newCompactionInspector`.
- **Bug 2 (DataNode round-robin panic):** `validateClusteringKey()` had
no field type whitelist, so JSON/Bool/Array passed schema validation.
During clustering compaction, `NewScalarFieldValue()` panics on
unsupported types. Fixed by adding `IsClusteringKeyType()` check to
reject unsupported types at collection creation time.

## Test plan

- [x] `TestIsClusteringKeyType` — verifies supported/unsupported type
classification
- [x] `TestClusteringKey` — new sub-tests for JSON, Bool, Array as
ClusteringKey (all rejected)
- [ ] Existing `TestClusteringKey` sub-tests (normal, multiple keys,
vector key) still pass
- [ ] `TestCompactionPlanHandler*` tests pass with updated
`newCompactionInspector` signature

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 21:21:25 +08:00
marcelo-cjl 52f10023a3
fix: change queryPreExecute merge logic from row-based to column-based (#47122)
issue: #46953 
- GenNullableFieldData only supported scalar types, causing error when
partial upsert after adding nullable vector fields

issue: #47160  
- pickFieldData used row index directly to access Contents array without
converting to data index, causing index out of range panic

relate: #45993


Row-based processing had two problems:
- Nullable vector handling was tightly coupled with other field types,
requiring complex special-case logic
- Update path was inefficient for nullable vectors: data had to be read
first then updated in-place. For example, changing a valid vector to
null requires shifting subsequent data since null values are not stored,
which is very inefficient.

Column-based processing solves both:
- Each field type is processed independently, nullable vector logic is
cleanly separated from other types
- Nullable vectors are appended directly without read-then-update,
avoiding expensive data shifting operations - Iterate over fields
instead of rows when merging upsert and existing data

key changes:
- Remove nullable vector special handling: nullableVectorMergeContext,
buildNullableVectorIdxMap, rebuildNullableVectorFieldData, etc.
- Use generic column utilities: AppendFieldDataByColumn,
UpdateFieldDataByColumn
- Add vector type support to GenNullableFieldData
- Add unit test for nullable vector upsert scenarios

---------

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
2026-02-10 18:10:41 +08:00
Spade A ba29e0ba48
fix: stl-sort allows struct scalar fields (#47626)
issue: https://github.com/milvus-io/milvus/issues/47620

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2026-02-09 10:31:52 +08:00
Li Liu 8fb3fc927e
enhance: Support fp32 to fp16/bf16 auto-conversion in search (#47133)
Add automatic conversion of fp32 search vectors to fp16/bf16 when the
collection field uses lower precision types. This allows users to search
fp16/bf16 collections with fp32 query vectors (which is what most
embedding models output).

Changes:
- Add vector type matching utility (isVectorTypeMatch)
- Add float16/bfloat16 range validation before conversion
- Add ConvertPlaceholderGroup function for type conversion
- Integrate conversion into normal search and hybrid search
- Add GetFieldByID helper function in typeutil

The conversion happens in the Proxy layer before requests are sent to
QueryNode. Out-of-range values are rejected with clear error messages.

issue: #47207

Signed-off-by: Li Liu <li.liu@zilliz.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 14:45:30 +08:00
cqy123456 eb63a363bd
enhance: support minhash function in DIDO (#45322)
issue : https://github.com/milvus-io/milvus/issues/41746
This PR adds MinHash "DIDO" (Data In, Data Out) support to Milvus, which
allows computing MinHash signatures on-the-fly during search operations
instead of requiring pre-stored vectors.

  Key changes:
- Implemented SIMD-optimized C++ MinHash computation (AVX2/AVX512 for
x86, NEON/SVE for ARM)
- Added runtime CPU detection and function hooks to automatically select
the best SIMD implementation
- Integrated MinHash computation into search pipeline (brute force
search, growing segment search)
- Added support for LSH-based MinHash search with configurable band
width and bit width parameters
- Enabled direct text-to-signature conversion during query execution,
reducing storage overhead

This enables efficient text deduplication and similarity search without
storing pre-computed MinHash vectors.

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2026-01-14 14:19:27 +08:00
wei liu b79a655f1c
feat: [ExternalTable Part1] Enable create external collection (#46886)
issue: #45881

- Add ExternalSource and ExternalSpec fields to collection schema
- Add ExternalField mapping for field schema to map external columns
- Implement ValidateExternalCollectionSchema() to enforce restrictions:
  - No primary key (virtual PK generated automatically)
  - No dynamic fields, partition keys, clustering keys, or auto ID
  - No text match or function features
  - All user fields must have external_field mapping
- Return virtual PK schema for external collections in
GetPrimaryFieldSchema()
- Skip primary key validation for external collections during creation
- Add comprehensive unit tests and integration tests
- Add design document and user guide

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Co-authored-by: sunby <sunbingyi1992@gmail.com>
2026-01-13 10:15:27 +08:00
Chun Han b7ee93fc52
feat: support query aggregtion(#36380) (#44394)
related: #36380

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: aggregation is centralized and schema-aware — all
aggregate functions are created via the exec Aggregate registry
(milvus::exec::Aggregate) and validated by ValidateAggFieldType, use a
single in-memory accumulator layout (Accumulator/RowContainer) and
grouping primitives (GroupingSet, HashTable, VectorHasher), ensuring
consistent typing, null semantics and offsets across planner → exec →
reducer conversion paths (toAggregateInfo, Aggregate::create,
GroupingSet, AggResult converters).

- Removed / simplified logic: removed ad‑hoc count/group-by and reducer
code (CountNode/PhyCountNode, GroupByNode/PhyGroupByNode, cntReducer and
its tests) and consolidated into a unified AggregationNode →
PhyAggregationNode + GroupingSet + HashTable execution path and
centralized reducers (MilvusAggReducer, InternalAggReducer,
SegcoreAggReducer). AVG now implemented compositionally (SUM + COUNT)
rather than a bespoke operator, eliminating duplicate implementations.

- Why this does NOT cause data loss or regressions: existing data-access
and serialization paths are preserved and explicitly validated —
bulk_subscript / bulk_script_field_data and FieldData creation are used
for output materialization; converters (InternalResult2AggResult ↔
AggResult2internalResult, SegcoreResults2AggResult ↔
AggResult2segcoreResult) enforce shape/type/row-count validation; proxy
and plan-level checks (MatchAggregationExpression,
translateOutputFields, ValidateAggFieldType, translateGroupByFieldIds)
reject unsupported inputs (ARRAY/JSON, unsupported datatypes) early.
Empty-result generation and explicit error returns guard against silent
corruption.

- New capability and scope: end-to-end GROUP BY and aggregation support
added across the stack — proto (plan.proto, RetrieveRequest fields
group_by_field_ids/aggregates), planner nodes (AggregationNode,
ProjectNode, SearchGroupByNode), exec operators (PhyAggregationNode,
PhyProjectNode) and aggregation core (Aggregate implementations:
Sum/Count/Min/Max, SimpleNumericAggregate, RowContainer, GroupingSet,
HashTable) plus proxy/querynode reducers and tests — enabling grouped
and global aggregation (sum, count, min, max, avg via sum+count) with
schema-aware validation and reduction.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2026-01-06 16:29:25 +08:00
aoiasd 90809d1d86
fix: highlight with multi analyzer failed (#46527)
relate: https://github.com/milvus-io/milvus/issues/46498

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: text fields configured with multi_analyzer_params must
include a "by_field" string that names another field containing per-row
analyzer choices; schemaInfo.GetMultiAnalyzerNameFieldID caches and
returns the dependent field ID (or 0 if none) and relies on that mapping
to make per-row analyzer names available to the highlighter.
- What changed / simplified: the highlighter is now schema-aware —
addTaskWithSearchText accepts *schemaInfo and uses
GetMultiAnalyzerNameFieldID to resolve the analyzer-name field;
resolution and caching moved into schemaInfo.multiAnalyzerFieldMap
(meta_cache.go), eliminating ad-hoc/typeutil-only lookups and duplicated
logic; GetMultiAnalyzerParams now gates on EnableAnalyzer(),
centralizing analyzer enablement checks.
- Why this fixes the bug (root cause): fixes #46498 — previously the
highlighter failed when the analyzer-by-field was not in output_fields.
The change (1) populates task.AnalyzerNames (defaulting missing names to
"default") when multi-analyzer is configured and (2) appends the
analyzer-name field ID to LexicalHighlighter.extraFields so FieldIDs
includes it; the operator then requests the analyzer-name column at
search time, ensuring per-row analyzer selection is available for
highlighting.
- No data-loss or regression: when no multi-analyzer is configured
GetMultiAnalyzerNameFieldID returns 0 and behavior is unchanged; the
patch only adds the analyzer-name field to requested output IDs (no
mutation of stored data). Error handling on malformed params is
preserved (errors are returned instead of silently changing data), and
single-analyzer behavior remains untouched.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-12-30 11:55:21 +08:00
Buqian Zheng dce44f2a20
test: reduce test time for TestSparsePlaceholderGroupSize (#46637)
issue: #44452

## Summary
Reduce test combinations in `TestSparsePlaceholderGroupSize` to decrease
test execution time:
- `nqs`: from `[1, 10, 100, 1000, 10000]` to `[1, 100, 10000]`
- `averageNNZs`: from `[1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024,
2048]` to `[1, 4, 16, 64, 256, 1024]`

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## TestSparsePlaceholderGroupSize Test Reduction

**Core Invariant:** The sparse vector NNZ estimation algorithm
(`EstimateSparseVectorNNZFromPlaceholderGroup`) must maintain accuracy
within bounded error thresholds—individual cases < 10% error and no more
than 2% of cases exceeding 5% error—across representative parameter
ranges.

**Test Coverage Optimized, Not Removed:** Test combinations reduced from
60 to 18 by pruning redundant parameter points while retaining critical
coverage: nqs now tests [1, 100, 10000] (min, mid, max) and averageNNZs
tests [1, 4, 16, 64, 256, 1024] (exponential spacing). Variant
generation logic (powers of 2 scaling) remains unchanged, ensuring error
scenarios are still exercised.

**No Behavioral Regression:** The algorithm under test is untouched;
only test case frequency decreases. The same assertions validate error
bounds are satisfied—individual assertions (`assert.Less(errorRatio,
10.0)`) and statistical assertions (`assert.Less(largeErrorRatio, 2.0)`)
remain identical, confirming that estimation quality is still verified.

**Why Safe:** Exponential spacing of removed parameters (e.g., nqs: 10,
1000 removed; averageNNZs: 2, 8, 32, 128, 512, 2048 removed) addresses
diminishing returns—intermediate values provide no new error scenarios
beyond what surrounding powers-of-2 values expose, while keeping test
execution time proportional to coverage value gained.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-12-29 09:51:21 +08:00
marcelo-cjl 3b599441fd
feat: Add nullable vector support for proxy and querynode (#46305)
related: #45993 

This commit extends nullable vector support to the proxy layer,
querynode,
and adds comprehensive validation, search reduce, and field data
handling
    for nullable vectors with sparse storage.
    
    Proxy layer changes:
- Update validate_util.go checkAligned() with getExpectedVectorRows()
helper
      to validate nullable vector field alignment using valid data count
- Update checkFloatVectorFieldData/checkSparseFloatVectorFieldData for
      nullable vector validation with proper row count expectations
- Add FieldDataIdxComputer in typeutil/schema.go for logical-to-physical
      index translation during search reduce operations
- Update search_reduce_util.go reduceSearchResultData to use
idxComputers
      for correct field data indexing with nullable vectors
- Update task.go, task_query.go, task_upsert.go for nullable vector
handling
    - Update msg_pack.go with nullable vector field data processing
    
    QueryNode layer changes:
    - Update segments/result.go for nullable vector result handling
- Update segments/search_reduce.go with nullable vector offset
translation
    
    Storage and index changes:
- Update data_codec.go and utils.go for nullable vector serialization
- Update indexcgowrapper/dataset.go and index.go for nullable vector
indexing
    
    Utility changes:
- Add FieldDataIdxComputer struct with Compute() method for efficient
      logical-to-physical index mapping across multiple field data
- Update EstimateEntitySize() and AppendFieldData() with fieldIdxs
parameter
    - Update funcutil.go with nullable vector support functions

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Full support for nullable vector fields (float, binary, float16,
bfloat16, int8, sparse) across ingest, storage, indexing, search and
retrieval; logical↔physical offset mapping preserves row semantics.
  * Client: compaction control and compaction-state APIs.

* **Bug Fixes**
* Improved validation for adding vector fields (nullable + dimension
checks) and corrected search/query behavior for nullable vectors.

* **Chores**
  * Persisted validity maps with indexes and on-disk formats.

* **Tests**
  * Extensive new and updated end-to-end nullable-vector tests.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
2025-12-24 10:13:19 +08:00
Xiaofan ca2e27f576
enhance: remove uncessary segment size estimation and make it configurable (#46302)
fix #46300
remove unused segment size estimation, and make size estimation configurable

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2025-12-13 02:58:46 +08:00
Chun Han d9f8e38d6a
fix: query failed for int value on edge(#46075) (#46126)
related: #46075

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-12-10 15:59:12 +08:00
Zhen Ye c3fe6473b8
enhance: support async write syncer for milvus logging (#45805)
issue: #45640

- log may be dropped if the underlying file system is busy.
- use async write syncer to avoid the log operation block the milvus
major system.
- remove some log dependency from the until function to avoid
dependency-loop.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-11-28 17:43:11 +08:00
zhenshan.cao bec6d1d1e1
enhance: timestamptz support groupby (#45762)
issue: https://github.com/milvus-io/milvus/issues/45761

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2025-11-21 18:39:05 +08:00
zhenshan.cao 352a8d06ec
fix: Partial update panic with TIMESTAMPTZ (#45740)
issue: https://github.com/milvus-io/milvus/issues/45729

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2025-11-20 21:20:12 +08:00
zhenshan.cao 6327c9a514
fix: Fix bugs related to TimestampTz (#45111)
issue: https://github.com/milvus-io/milvus/issues/44527
https://github.com/milvus-io/milvus/issues/44537
https://github.com/milvus-io/milvus/issues/44538
https://github.com/milvus-io/milvus/issues/44585
https://github.com/milvus-io/milvus/issues/44622

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2025-11-04 16:51:33 +08:00
Spade A ce2862d325
fix: fix parquet import bug in STRUCT (#45028)
issue: https://github.com/milvus-io/milvus/issues/45006
ref: https://github.com/milvus-io/milvus/issues/42148

Previsouly, the parquet import is implemented based on that the STRUCT
in the parquet files is hanlded in the way that each field in struct is
stored in a single column.
However, in the user's perspective, the array of STRUCT contains data is
something like STRUCT_A:
for one row, [struct{field1_1, field2_1, field3_1}, struct{field1_2,
field2_2, field3_2}, ...], rather than {[field1_1, field1_2, ...],
[field2_1, field2_2, ...], [field3_1, field3_2, field3_3, ...]}.

This PR fixes this.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-10-27 10:26:06 +08:00
Spade A d8591f9548
fix: csv/json import with STRUCT adapts concatenated struct name (#45000)
After https://github.com/milvus-io/milvus/pull/44557, the field name in
STRUCT field becomes STRUCT_NAME[FIELD_NAME]
This PR make import consider the change.

issue: https://github.com/milvus-io/milvus/issues/45006
ref: https://github.com/milvus-io/milvus/issues/42148

TODO: parquet is much more complex than csv/json, and I will leave it to
a separate PR.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-10-24 10:22:15 +08:00
aoiasd cfeb095ad7
enhance: forbid build analyzer at proxy (#44067)
relate: https://github.com/milvus-io/milvus/issues/43687
We used to run the temporary analyzer and validate analyzer on the
proxy, but the proxy should not be a computation-heavy node. This PR
move all analyzer calculations to the streaming node.

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-10-23 10:58:12 +08:00
wei liu 529c98520c
enhance: Add nullable support for Geometry and Timestamptz types (#44846)
issue: #44800
This commit enhances the upsert and validation logic to properly handle
nullable Geometry (WKT/WKB) and Timestamptz data types:

- Add ToCompressedFormatNullable support for TimestamptzData,
GeometryWktData, and GeometryData to filter out null values during data
compression
- Implement GenNullableFieldData for Timestamptz and Geometry types to
generate nullable field data structures
- Update FillWithNullValue to handle both GeometryData and
GeometryWktData with null value filling logic
- Add UpdateFieldData support for Timestamptz, GeometryData, and
GeometryWktData field updates
- Comprehensive unit tests covering all new data type handling scenarios

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-10-15 14:04:00 +08:00
cai.zhang 19346fa389
feat: Geospatial Data Type and GIS Function support for milvus (#44547)
issue: #43427

This pr's main goal is merge #37417 to milvus 2.5 without conflicts.

# Main Goals

1. Create and describe collections with geospatial type
2. Insert geospatial data into the insert binlog
3. Load segments containing geospatial data into memory
4. Enable query and search can display  geospatial data
5. Support using GIS funtions like ST_EQUALS in query
6. Support R-Tree index for geometry type

# Solution

1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.Now only support brutal search
7. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.

---------

Signed-off-by: Yinwei Li <yinwei.li@zilliz.com>
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Co-authored-by: ZhuXi <150327960+Yinwei-Yu@users.noreply.github.com>
2025-09-28 19:43:05 +08:00
junjiejiangjjj f07979f91d
enhance: add support for controlling function output field insertion (#44162)
#44053

Signed-off-by: junjie.jiang <junjie.jiang@zilliz.com>
2025-09-24 17:26:04 +08:00
Tianx 2c0c5ef41e
feat: timestamptz expression & index & timezone (#44080)
issue: https://github.com/milvus-io/milvus/issues/27467

>My plan is as follows.
>- [x] M1 Create collection with timestamptz field
>- [x] M2 Insert timestamptz field data
>- [x] M3 Retrieve timestamptz field data
>- [x] M4 Implement handoff
>- [x] M5 Implement compare operator
>- [x] M6 Implement extract operator
 >- [x] M8 Support database/collection level default timezone
>- [x] M7 Support STL-SORT index for datatype timestamptz

---

The third PR of issue: https://github.com/milvus-io/milvus/issues/27467,
which completes M5, M6, M7, M8 described above.

## M8 Default Timezone

We will be able to use alter_collection() and alter_database() in a
future Python SDK release to modify the default timezone at the
collection or database level.

For insert requests, the timezone will be resolved using the following
order of precedence: String Literal-> Collection Default -> Database
Default.
For retrieval requests, the timezone will be resolved in this order:
Query Parameters -> Collection Default -> Database Default.
In both cases, the final fallback timezone is UTC.


## M5: Comparison Operators

We can now use the following expression format to filter on the
timestamptz field:

- `timestamptz_field [+/- INTERVAL 'interval_string'] {comparison_op}
ISO 'iso_string' `

- The interval_string follows the ISO 8601 duration format, for example:
P1Y2M3DT1H2M3S.

- The iso_string follows the ISO 8601 timestamp format, for example:
2025-01-03T00:00:00+08:00.

- Example expressions: "tsz + INTERVAL 'P0D' != ISO
'2025-01-03T00:00:00+08:00'" or "tsz != ISO
'2025-01-03T00:00:00+08:00'".

## M6: Extract

We will be able to extract sepecific time filed by kwargs in a future
Python SDK release.
The key is `time_fields`, and value should be one or more of "year,
month, day, hour, minute, second, microsecond", seperated by comma or
space. Then the result of each record would be an array of int64.



## M7: Indexing Support

Expressions without interval arithmetic can be accelerated using an
STL-SORT index. However, expressions that include interval arithmetic
cannot be indexed. This is because the result of an interval calculation
depends on the specific timestamp value. For example, adding one month
to a date in February results in a different number of added days than
adding one month to a date in March.

--- 

After this PR, the input / output type of timestamptz would be iso
string. Timestampz would be stored as timestamptz data, which is int64_t
finally.

> for more information, see https://en.wikipedia.org/wiki/ISO_8601

---------

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-09-23 10:24:12 +08:00
yihao.dai 51f69f32d0
feat: Add CDC support (#44124)
This PR implements a new CDC service for Milvus 2.6, providing log-based
cross-cluster replication.

issue: https://github.com/milvus-io/milvus/issues/44123

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2025-09-16 16:32:01 +08:00
Spade A eb793531b9
feat: impl StructArray -- support import for CSV/JSON/PARQUET/BINLOG (#44201)
Ref https://github.com/milvus-io/milvus/issues/42148

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-09-15 20:41:59 +08:00
wei liu 18371773dd
enhance: Optimize partial update merge logic by unifying nullable format (#44197)
issue: #43980
This commit optimizes the partial update merge logic by standardizing
nullable field representation before merge operations to avoid corner
cases during the merge process.

Key changes:
- Unify nullable field data format to FULL FORMAT before merge execution
- Add extensive unit tests for bounds checking and edge cases

The optimization ensures:
- Consistent nullable field representation across SDK and internal
- Proper handling of null values during merge operations
- Prevention of index out-of-bounds errors in vector field updates
- Better error handling and validation for partial update scenarios

This resolves issues where different nullable field formats could cause
merge failures or data corruption during partial update operations.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-09-10 17:27:56 +08:00
wei liu 5ef793c393
fix: Fix panic when upsert with partial_update=true on empty table (#44155)
issue: #43980
Fix panic issue caused by incorrect nullable field merging logic when
upsert converts to insert operation on empty tables.

- Add AppendFieldDataWithNullData to handle nullable field merging
- Fix existing data merge with skipAppendNullData=false
- Fix insert data merge with skipAppendNullData=true
- Add unit tests for nullable field data appending scenarios

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-09-02 16:47:52 +08:00
wei liu 16af4e230a
fix: Prevent panic in upsert due to missing nullable fields [Proxy] (#44070)
issue: #43980
Fixes a panic that occurred when a partial update was converted to an
insert due to a non-existent primary key. The panic was caused by
missing nullable fields that were not provided in the original partial
update request.

The upsert pre-execution logic is refactored to handle this correctly:
- Explicitly splits upsert data into 'insert' and 'update' batches.
- Automatically generates data for missing nullable or default-value
fields during inserts, preventing the panic.
- Enhances `typeutil.UpdateFieldData` to support different source and
destination indexes for flexible data merging.
- Adds comprehensive unit tests for mixed upsert, pure insert, and pure
update scenarios.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-08-29 18:33:51 +08:00
Spade A 8456f824be
feat: impl StructArray -- miscellaneous staffs for struct array (#43960)
Ref https://github.com/milvus-io/milvus/issues/42148

1. enable storage v2
2. implement some missing staffs
3. fix some bugs and add tests

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
2025-08-26 21:35:53 +08:00
Tianx c0d62268ac
feat: add timesatmptz data type (#44005)
issue: https://github.com/milvus-io/milvus/issues/27467
>
https://github.com/milvus-io/milvus/issues/27467#issuecomment-3092211420
> * [x]  M1 Create collection with timestamptz field
> * [x]  M2 Insert timestamptz field data
> * [x]  M3 Retrieve timestamptz field data
> * [x]  M4 Implement handoff[ ]  

The second PR of issue:
https://github.com/milvus-io/milvus/issues/27467, which completes M1-M4
described above.

---------

Signed-off-by: xtx <xtianx@smail.nju.edu.cn>
2025-08-26 15:59:53 +08:00
Spade A d6a428e880
feat: impl StructArray -- support create index for vector array (embedding list) and search on it (#43726)
Ref https://github.com/milvus-io/milvus/issues/42148

This PR supports create index for vector array (now, only for
`DataType.FLOAT_VECTOR`) and search on it.
The index type supported in this PR is `EMB_LIST_HNSW` and the metric
type is `MAX_SIM` only.

The way to use it:
```python
milvus_client = MilvusClient("xxx:19530")
schema = milvus_client.create_schema(enable_dynamic_field=True, auto_id=True)
...
struct_schema = milvus_client.create_struct_array_field_schema("struct_array_field")
...
struct_schema.add_field("struct_float_vec", DataType.ARRAY_OF_VECTOR, element_type=DataType.FLOAT_VECTOR, dim=128, max_capacity=1000)
...
schema.add_struct_array_field(struct_schema)
index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name="struct_float_vec", index_type="EMB_LIST_HNSW", metric_type="MAX_SIM", index_params={"nlist": 128})
...
milvus_client.create_index(COLLECTION_NAME, schema=schema, index_params=index_params)
```

Note: This PR uses `Lims` to convey offsets of the vector array to
knowhere where vectors of multiple vector arrays are concatenated and we
need offsets to specify which vectors belong to which vector array.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
2025-08-20 10:27:46 +08:00
wei liu d3c95eaa77
enhance: Support partial field updates with upsert API (#42877)
issue: #29735
Implement partial field update functionality for upsert operations,
supporting scalar, vector, and dynamic JSON fields without requiring all
collection fields.

Changes:
- Add queryPreExecute to retrieve existing records before upsert
- Implement UpdateFieldData function for merging data
- Add IDsChecker utility for efficient primary key lookups
- Fix JSON data creation in tests using proper map marshaling
- Add test cases for partial updates of different field types

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2025-08-19 11:15:45 +08:00
Spade A faeb7fd410
feat: impl StructArray -- create schema, insert, and retrieve data (#42855)
Ref https://github.com/milvus-io/milvus/issues/42148

https://github.com/milvus-io/milvus/pull/42406 impls the segcore part of
storage for handling with VectorArray.
This PR:
1. impls the go part of storage for VectorArray
2. impls the collection creation with StructArrayField and VectorArray
3. insert and retrieve data from the collection.

---------

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
2025-07-27 01:30:55 +08:00
XuanYang-cn 4dcaa97682
fix: Use diskSegmentMaxSize for coll with sparse and dense vectors (#43194)
Previous code uses diskSegmentMaxSize if and only if all of the
collection's vector fields are indexed with DiskANN index.

When introducing sparse vectors, since sparse vector cannot be indexed
with DiskANN index, collections with both dense and sparse vectors will
use maxSize instead.

This PR changes the requirments of using diskSegmentMaxSize to all dense
vectors are indexed with DiskANN indexs, ignoring sparse vector fields.

See also: #43193

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2025-07-16 18:04:52 +08:00
Chun Han 07745439b5
fix: empty search groupby result causing crash(#43137) (#43214)
related: #43137

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2025-07-10 12:04:48 +08:00
congqixia 74ea57bac1
enhance: Remove unused load field check from proxy (#42816)
Related to #42489

Since load list works as hint after cachelayer implemented, the related
check logic could be removed to keep code logic clean.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-06-19 19:34:47 +08:00
Zhen Ye fc010e44a8
fix: release memory after pop from heap (#42482)
issue: #42481

Signed-off-by: chyezh <chyezh@outlook.com>
2025-06-04 10:00:32 +08:00
Ted Xu ae32203d3a
fix: support group by with nullable grouping keys (#41797)
See #36264

In this PR:
- Enhanced error handling in parse of grouping field.
- Fixed null handling in reduce tasks in proxy nodes. 
- Updated tests to reflect changes in error handling and data processing
logic.

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-05-17 20:54:22 +08:00
aoiasd f52c2909c4
feat: support multi analyzer for bm25 function (#41351)
relate: https://github.com/milvus-io/milvus/issues/41213

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2025-04-23 18:22:38 +08:00
SimFG 91d40fa558
fix: Update logging context and upgrade dependencies (#41318)
- issue: #41291

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-04-23 10:52:38 +08:00
Xianhui Lin f9febe3bae
enhance: Merge RootCoord, DataCoord And QueryCoord into MixCoord (#41006)
Merge RootCoord, DataCoord And QueryCoord into MixCoord
Make Session into one
issue : https://github.com/milvus-io/milvus/issues/37764

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
2025-04-11 16:36:30 +08:00
Xianhui Lin 3bc24c264f
enhance: Add json key inverted index in stats for optimization (#38039)
Add json key inverted index in stats for optimization
https://github.com/milvus-io/milvus/issues/36995

---------

Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2025-04-10 15:20:28 +08:00
yihao.dai b4cb8a4b13
enhance: Add UTF-8 string validation for import (#40694)
issue: https://github.com/milvus-io/milvus/issues/40684

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-04-01 19:04:21 +08:00
Ted Xu 688505ab1c
enhance: cleanup lint check exclusions (#40829)
See: #40828

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-03-21 18:12:14 +08:00
Buqian Zheng c12abf4e2a
enhance: improve sparse query nnz metric (#40713)
add query type and field id label; add metric for hybrid search

issue: https://github.com/milvus-io/milvus/issues/35853

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2025-03-18 17:20:16 +08:00
Zhen Ye f6fb4bc442
fix: backoff will retry infinitely after reaching max elapse (#40589)
issue: #40588

Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-13 16:24:06 +08:00
Zhen Ye 96a010da7a
fix: msgstream adaptor may not gc quickly after comsumed (#40555)
issue: #40540

Signed-off-by: chyezh <chyezh@outlook.com>
2025-03-12 14:00:04 +08:00
congqixia cb7f2fa6fd
enhance: Use v2 package name for pkg module (#39990)
Related to #39095

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
yihao.dai d72d2281ca
fix: Fix concurrent map (#39775)
issue: https://github.com/milvus-io/milvus/issues/39778

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2025-02-22 09:51:54 +08:00
Patrick Weizhi Xu 04fff74a56
feat: introduce Text data type (#39874)
issue: https://github.com/milvus-io/milvus/issues/39818

This PR mimics Varchar data type, allows insert, search, query, delete,
full-text search and others.
Functionalities related to filter expressions are disabled temporarily. 

Storage changes for Text data type will be in the following PRs.

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2025-02-19 11:04:51 +08:00