Commit Graph

500 Commits (master)

Author SHA1 Message Date
Gifi Siby f54e8e9554
fix: Folders are also considered for WalkWithObjects to adhere bulk insert failure on using GCS buckets (#39352)
Related task: [#39134
](https://github.com/milvus-io/milvus/issues/39134)

Previous PR: [#39150 ](https://github.com/milvus-io/milvus/pull/39150)
2025-01-19 02:49:03 +08:00
Cai Yudong 5bf1b2b929
feat: Support Int8Vector in go (#38990)
Issue: #38666

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2025-01-14 20:43:06 +08:00
Zhen Ye bb8d1ab3bf
enhance: make new go package to manage proto (#39114)
issue: #39095

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2025-01-10 10:49:01 +08:00
zhagnlu 6ee94d00b9
fix:fix calculate arrow nest type and add ut (#38527)
#37767

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-12-18 11:54:44 +08:00
zhagnlu c3edc85359
fix: fix wrong size of arrow array for zero-copy mode (#38449)
#37767

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-12-15 18:14:43 +08:00
congqixia 1ed686783f
enhance: Use `PrimaryKeys` to replace interface slice for segment delete (#37880)
Related to #35303

Reduce temporary memory usage for PK interface for segment delete.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-22 11:52:33 +08:00
congqixia b0bd290a6e
enhance: Use internal json(sonic) to replace std json lib (#37708)
Related to #35020

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-18 10:46:31 +08:00
Buqian Zheng 00edec2ebd
fix: incorrect dim set when creating BM25 doc sparse array (#37717)
issue: https://github.com/milvus-io/milvus/issues/35853

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-11-16 16:36:30 +08:00
XuanYang-cn 31a8d08bdd
fix: Correct varchar primarykey size calculation (#37617)
See also: #37582

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-14 14:16:38 +08:00
Ted Xu 31d0c84f67
fix: panic calclulating data size on writing binlogs (#37575)
See: #37568

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-11-11 14:08:27 +08:00
XuanYang-cn ce2fa3de01
fix: Set correct memory size (#37551)
See also: #37549

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-09 01:42:26 +08:00
Ted Xu bc9562feb1
enhance: avoid memory copy and serde in mix compaction (#37479)
See: #37234

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-11-07 16:30:57 -08:00
congqixia 92028b7ff7
enhance: Use cancel label for ctx canceled storage op (#37468)
Previously failed label is used for canceled storage op, which may cause
wrong alarm when user cancel load operation or etc. This PR utilizes
cancel label when such case happens.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-06 19:38:24 +08:00
zhenshan.cao 63843dce33
fix: Fix conan gdal building problem (#37338)
issue:https://github.com/milvus-io/milvus/issues/27576

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-10-31 21:04:16 +08:00
Hao Tan 67c4340565
feat: Geospatial Data Type and GIS Function Support for milvus server (#35990)
issue:https://github.com/milvus-io/milvus/issues/27576

# Main Goals
1. Create and describe collections with geospatial fields, enabling both
client and server to recognize and process geo fields.
2. Insert geospatial data as payload values in the insert binlog, and
print the values for verification.
3. Load segments containing geospatial data into memory.
4. Ensure query outputs can display geospatial data.
5. Support filtering on GIS functions for geospatial columns.

# Solution
1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.
6. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.

---------

Signed-off-by: tasty-gumi <1021989072@qq.com>
2024-10-31 20:58:20 +08:00
Ted Xu 262a994d6d
enhance: generally improve the performance of mix compactions (#37163)
See #37234

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-10-29 18:12:20 +08:00
congqixia 3106384fc4
enhance: Return deltadata for `DeleteCodec.Deserialize` (#37214)
Related to #35303 #30404

This PR change return type of `DeleteCodec.Deserialize` from
`storage.DeleteData` to `DeltaData`, which
reduces the memory usage of interface header.

Also refine `storage.DeltaData` methods to make it easier to usage.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 12:04:24 +08:00
congqixia 7774b7275e
enhance: Replace PrimaryKey slice with PrimaryKeys saving memory (#37127)
Related to #35303

Slice of `storage.PrimaryKey` will have extra interface cost for each
element, which may cause notable memory usage when delta row count
number is large.

This PR replaces PrimaryKey slice with PrimaryKeys interface saving the
extra interface cost.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-28 10:29:30 +08:00
Buqian Zheng 383350c120
feat: added more checks for function creation check (#36766)
issue: https://github.com/milvus-io/milvus/issues/35853

* BM25 Function now takes no params, k1, b should be passed via index
params
* support BM25 full text search when metric type is not present in
search request
* add more strict validation with functions at collection creation time

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-10-13 17:43:22 +08:00
Buqian Zheng 82c5cf2fa2
feat: add bulk insert support for Functions (#36715)
issue: https://github.com/milvus-io/milvus/issues/35853 and
https://github.com/milvus-io/milvus/issues/35856

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-10-12 17:19:20 +08:00
aoiasd db34572c56
feat: support load and query with bm25 metric (#36071)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-10-11 10:23:20 +08:00
Rijin-N a05a37a583
enhance: GCS native support (GCS implemented using Google Cloud Storage libraries) (#36214)
Native support for Google cloud storage using the Google Cloud Storage
libraries. Authentication is performed using GCS service account
credentials JSON.

Currently, Milvus supports Google Cloud Storage using S3-compatible APIs
via the AWS SDK. This approach has the following limitations:

1. Overhead: Translating requests between S3-compatible APIs and GCS can
introduce additional overhead.
2. Compatibility Limitations: Some features of the original S3 API may
not fully translate or work as expected with GCS.

To address these limitations, This enhancement is needed.

Related Issue: #36212
2024-09-30 13:23:32 +08:00
smellthemoon b60164b882
enhance: support null in bulk insert of binlog to help backup null (#36526)
https://github.com/milvus-io/milvus/issues/36341

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-09-26 14:35:14 +08:00
aoiasd 139787371e
feat: support embedding bm25 sparse vector and flush bm25 stats log (#36036)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-09-19 10:57:12 +08:00
CharlesFeng 62f4a6a112
fix: binlog reader not released in time (#36078)
https://github.com/milvus-io/milvus/issues/36077

Signed-off-by: fengjun2016 <jornfeng@gmail.com>
2024-09-07 08:15:06 +08:00
smellthemoon 21b135c7c2
fix: not append valid data when transfer to insert record (#36027)
fix not append valid data when transfer to insert record and add a tiny
check when in groupBy field.
#35924

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-09-06 14:53:04 +08:00
smellthemoon a3f2f044d6
fix: not set nullable when stream writer write headers (#35799)
#35802

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-29 20:59:00 +08:00
Ted Xu 41646c8439
feat: integrate new deltalog format (#35522)
See #34123

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-08-20 19:06:56 +08:00
smellthemoon 80a7c78f28
enhance: import supports null in parquet and json formats (#35558)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-20 16:50:55 +08:00
cai.zhang 1bbf7a3c0e
enhance: Optimize the use of locks and avoid double flush clustering buffer writer (#35486)
issue: #35436

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-08-16 02:24:58 +08:00
congqixia c0ee25afd8
fix: Use k locations only for basic BF test location (#35380)
Related to #35379

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-09 07:52:22 +08:00
congqixia de8a266d8a
enhance: Enable linux code checker (#35084)
See also #34483

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-30 15:53:51 +08:00
wei liu c45f38aa61
enhance: Update protobuf-go to protobuf-go v2 (#34394)
issue: #34252

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-29 11:31:51 +08:00
congqixia 4ee6c69217
enhance: Add Segment Level in milvus segment info APIs (#34763)
See also #34746

This PR add segment level field in response of
`GetPersistentSegmentInfo` and `GetQuerySegmentInfo`

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-26 10:01:46 +08:00
Chun Han c46c401112
fix: refine handling type for segment pruner(#34923) (#34925)
related: #34923

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-07-25 13:57:45 +08:00
smellthemoon 5616b7e8d2
enhance: support null in c data_datacodec and load null value (#32183)
1. support read and write null in segcore
    will store valid_data(use uint8_t type to save memory) in fieldData.
2. support load null
binlog reader read and write data into column(sealed segment),
insertRecord(growing segment). In sealed segment, store valid_data
directly. In growing segment, considering prior implementation and easy
code reading, it covert uint8_t to fbvector<bool>, which may optimize in
future.
3.  retrieve valid_data.
    parse valid_data in search/query.
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-07-23 16:07:51 +08:00
shaoting-huang 88b373b024
enhance: binlog primary key turn off dict encoding (#34358)
issue: #34357 

Go Parquet uses dictionary encoding by default, and it will fall back to
plain encoding if the dictionary size exceeds the dictionary size page
limit. Users can specify custom fallback encoding by using
`parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However,
Go Parquet [fallbacks to plain
encoding](e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238))
rather than custom encoding method users provide. Therefore, this patch
only turns off dictionary encoding for the primary key.

With a 5 million auto ID primary key benchmark, the parquet file size
improves from 13.93 MB to 8.36 MB when dictionary encoding is turned
off, reducing primary key storage space by 40%.

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-07-17 17:47:44 +08:00
congqixia eb4bfa3281
fix: Revert reuse deserialize result to fix data overwritten (#34683)
See also #34637

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-15 22:31:38 +08:00
congqixia 531092c031
enhance: Add lint rule to forbid gogo protobuf (#34594)
github.com/gogo/protobuf is deprecated and could be error prune after
upgrade protobuf message to v2.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-12 10:19:35 +08:00
SimFG 5016038781
enhance: release the record in delete codec and add some log for compaction (#34454)
/kind improvement

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-07-09 15:40:17 +08:00
Ted Xu eae4dfca7b
fix: reuse deserialize result to help improve memory management (#34507)
Fixed #33268
The original reuse is broken by #33359

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-07-09 14:12:10 +08:00
congqixia 3333160b8d
enhance: Fix lint issues from recent PRs (#34482)
See also #34483
Some lint issues are introduced due to lack of static check run. This PR
fixes these problems.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-09 10:06:24 +08:00
shaoting-huang f4dd7c7efb
enhance: add delta log stream new format reader and writer (#34116)
issue: #34123

Benchmark case: The benchmark run the go benchmark function
`BenchmarkDeltalogFormat` which is put in the Files changed. It tests
the performance of serializing and deserializing from two different data
formats under a 10 million delete log dataset.

Metrics: The benchmarks measure the average time taken per operation
(ns/op), memory allocated per operation (MB/op), and the number of
memory allocations per operation (allocs/op).
| Test Name | Avg Time (ns/op) | Time Comparison | Memory Allocation
(MB/op) | Memory Comparison | Allocation Count (allocs/op) | Allocation
Comparison |

|---------------------------------|------------------|-----------------|---------------------------|-------------------|------------------------------|------------------------|
| one_string_format_reader | 2,781,990,000 | Baseline | 2,422 | Baseline
| 20,336,539 | Baseline |
| pk_ts_separate_format_reader | 480,682,639 | -82.72% | 1,765 | -27.14%
| 20,396,958 | +0.30% |
| one_string_format_writer | 5,483,436,041 | Baseline | 13,900 |
Baseline | 70,057,473 | Baseline |
| pk_and_ts_separate_format_writer| 798,591,584 | -85.43% | 2,178 |
-84.34% | 30,270,488 | -56.78% |

Both read and write operations show significant improvements in both
speed and memory allocation.

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-07-06 09:08:09 +08:00
Chun Han fcafdb6d5f
enhance: reconstruct scalar part's code for segment-pruner(#30376) (#34346)
related: #30376
1. support more complex expr
2. add more ut test for unrelated fields

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-07-04 16:36:09 +08:00
congqixia 0fd0fcfe1d
enhance: Fix lint issues & sdk testcase (#34399)
Some lint issue is not detect due to recent static check pipeline issue.
This PR fixes these problem and Go milvusclient testcases.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-03 19:42:10 +08:00
smellthemoon ef3ced8138
fix: descriptor event in previous version not has nullable to parse error (#34235)
#34176

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-07-01 16:38:06 +08:00
congqixia e04f1f9748
enhance: Add unittest for `storage.DeleteLog` (#34190)
See also #33787
Backport unit test part in #34188

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-26 17:14:04 +08:00
congqixia fd922d921a
enhance: Add nilness linter and fix some small issues (#34049)
Add `nilness` for govet linter and fixed some detected issues

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-24 14:52:03 +08:00
Chun Han ca7ef26e4b
fix: sync part stats task cannot be finished(#30376) (#34027)
related: #30376
also: refine log output for query_coord task by rephrasing action string

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-06-24 10:16:02 +08:00
Ted Xu 78885a44c4
fix: turn on compression on stream writers (#34067)
See #31679

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-06-24 10:08:02 +08:00