Commit Graph

497 Commits (984a605d47c5624fdd1509f487306ecb0f89e810)

Author SHA1 Message Date
smellthemoon 2a1356985d
enhance: support null in go payload (#32296)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-06-19 17:08:00 +08:00
Ted Xu 6d5747cb3e
feat: adding deltalog stream reader and writer (#33844)
See #31679

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-06-19 14:42:01 +08:00
shaoting-huang 8cdc0e6233
fix: fix data codec writer close (#33818)
issue:#33813

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-06-18 13:59:57 +08:00
congqixia f993b2913b
enhance: Reserve space of payload writer when serialize data (#33817)
See also #33561 #33562

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-17 12:06:04 +08:00
XuanYang-cn f67b6dc2b0
fix: DeleteData merge wrong data casuing data loss (#33820)
See also: #33819

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-06-14 17:57:56 +08:00
shaoting-huang 0ecd694305
enhance: legacy code clean up (#33838)
issue: #33839

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-06-14 14:25:56 +08:00
wei liu ab93d9c23d
enhance: Use BatchPkExist to reduce bloom filter func call cost (#33611)
issue:#33610

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-06-13 17:57:56 +08:00
congqixia 512ea6be5f
enhance: Avoid merging insert data when buffering insert msgs (#33562)
See also #33561

This PR:
- Use zero copy when buffering insert messages
- Make `storage.InsertCodec` support serialize multiple insert data
chunk into same batch binlog files

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-13 11:15:56 +08:00
congqixia b39dfc25dc
enhance: Use fastjson lib for unmarshal delete log (#33787)
```
goos: linux
goarch: amd64
GOMAXPROC=1
cpu: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
BenchmarkJsonSerdeStd             343872              3568 ns/op            1335 B/op         25 allocs/op
BenchmarkJsonSerdeFastjson       5124177               234.9 ns/op            16 B/op          1 allocs/op
```

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-12 20:41:57 +08:00
wayblink a1232fafda
feat: Major compaction (#33620)
#30633

Signed-off-by: wayblink <anyang.wang@zilliz.com>
Co-authored-by: MrPresent-Han <chun.han@zilliz.com>
2024-06-10 21:34:08 +08:00
wei liu c6a1c49e02
enhance: Use Blocked Bloom Filter instead of basic bloom fitler impl. (#33405)
issue: #32995
To speed up the construction and querying of Bloom filters, we chose a
blocked Bloom filter instead of a basic Bloom filter implementation.

WARN: This PR is compatible with old version bf impl, but if fall back
to old milvus version, it may causes bloom filter deserialize failed.

In single Bloom filter test cases with a capacity of 1,000,000 and a
false positive rate (FPR) of 0.001, the blocked Bloom filter is 5 times
faster than the basic Bloom filter in both querying and construction, at
the cost of a 30% increase in memory usage.

- Block BF construct time	{"time": "54.128131ms"}
- Block BF size	                {"size": 3021578}
- Block BF Test cost	        {"time": "55.407352ms"}
- Basic BF construct time	{"time": "210.262183ms"}
- Basic BF size	                {"size": 2396308}
- Basic BF Test cost	        {"time": "192.596229ms"}

In multi Bloom filter test cases with a capacity of 100,000, an FPR of
0.001, and 100 Bloom filters, we reuse the primary key locations for all
Bloom filters to avoid repeated hash computations. As a result, the
blocked Bloom filter is also 5 times faster than the basic Bloom filter
in querying.

- Block BF TestLocation cost    {"time": "529.97183ms"}
- Basic BF TestLocation cost	{"time": "3.197430181s"}

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-31 17:49:45 +08:00
wei liu 322a4c5b8c
enhance: Remove StringPrimaryKey to reduce unnecessary copy and function call cost (#33486)
issue: #33497

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-31 15:41:45 +08:00
congqixia 73c9b80a7d
enhance: Store locations for largest K in `LocationCache` (#33429)
See also #32642

`LocationCache` used map to store different locations for different K
which may cause lots of CPU time when get locations many times.

This PR change the implementation of LocationCache to store only the
location for the largest K used to totally remove the map access
operation.

See pprof from test of @XuanYang-cn 

![image](https://github.com/milvus-io/milvus/assets/84113973/ad17cff8-62ad-4d78-9bb0-f6df0512f4ea)

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-05-29 10:05:42 +08:00
Ted Xu 066c8ea175
feat: stream reader/writer to support nulls (#33080)
See: #31728

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-05-27 16:27:42 +08:00
congqixia 970bf18a49
fix: Allocate new slice for each batch in streaming reader (#33359)
Related to #33268

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-05-24 18:07:41 +08:00
Ted Xu a8bd9bea39
fix: adding blob memory size in binlog serde (#33324)
See: #33280

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-05-24 10:33:40 +08:00
Cai Yudong 4004e4c545
enhance: Optimize bulk insert unittest (#33224)
Issue: #22837

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-05-24 10:23:41 +08:00
Ted Xu a9c7ce72b8
enhance: enable stream writer in compactions (#32612)
See #31679

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-05-17 15:05:37 +08:00
cai.zhang 6ea7633bd5
enhance: Add memory size for binlog (#33025)
issue: #33005
1. add `MemorySize` field for insert binlog.
2. `LogSize` means the file size in the storage object.
3. `MemorySize` means the size of the data in the memory.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2024-05-15 12:59:34 +08:00
Cai Yudong 4fc7915c70
enhance: unify data generation test APIs (#32955)
Issue: #22837

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-05-14 14:33:33 +08:00
congqixia 0e5765b116
enhance: Utilize `TestLocations` ability to accelerate write & compaction (#32948)
See also #32642

This PR reuses hash locations for bloom filter prediction utilizing
`storage.Location`, like enhancement #32642.

Also adds a utility struct in storage: `LocationCache` to storage
locations for variable K (numbers of hash functions)

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-05-13 10:15:32 +08:00
wei liu 5038036ece
enhance: Reuse hash locations during access bloom fitler (#32642)
issue: #32530 

when try to match segment bloom filter with pk, we can reuse the hash
locations. This PR maintain the max hash Func, and compute hash location
once for all segment, reuse hash location can speed up bf access

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-07 06:13:47 -07:00
Cai Yudong bcdbd1966e
feat: Support sparse float vector bulk insert for binlog/json/parquet (#32649)
Issue: #22837

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-05-07 18:43:30 +08:00
aoiasd 31dca3249e
enhance: add type info for payload writer error message and add log when querynode find new collection (#32522)
relate: https://github.com/milvus-io/milvus/issues/32668

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-05-07 14:45:29 +08:00
Aldrin cb8dbc3c83
fix: Removed minio bucket after use in test (#32624)
issue: https://github.com/milvus-io/milvus/issues/32616

- Forcefully deleted the non empty minio bucket with dummy data.

Signed-off-by: Aldrin <imagesai32@gmail.com>
2024-04-28 13:51:26 +08:00
chyezh 2586c2f1b3
enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740)
issue: #19095,#29655,#31718

- Change `ListWithPrefix` to `WalkWithPrefix` of OOS into a pipeline
mode.

- File garbage collection is performed in other goroutine.

- Segment Index Recycle clean index file too.

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-04-25 20:41:27 +08:00
Buqian Zheng 8a1017a152
enhance: add helpers to parse sparse float vector in JSON (#32543)
issue: #29419

added helper functions to parse JSON representation of sparse float
vectors, will be used by both the restful server and the import utils.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-04-25 14:47:24 +08:00
Cai Yudong 5fc439c600
feat: Bulk insert support fp16/bf16 (#32157)
Issue: #22837

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-04-22 10:05:22 +08:00
Ted Xu dc5ea6f17c
feat: adding binlog streaming writer (#31537)
See #31679

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-04-11 10:33:20 +08:00
aoiasd 5b693c466d
fix: delegator filter out all partition's delete msg when loading segment (#31585)
May cause deleted data queryable a period of time.
relate: https://github.com/milvus-io/milvus/issues/31484
https://github.com/milvus-io/milvus/issues/31548

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-04-09 15:21:24 +08:00
Cai Yudong 00438f408f
enhance: Unify data type check APIs for go (#31887)
Issue: #22837

Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
2024-04-07 14:27:22 +08:00
cqy123456 976928ecd1
fix: fix fp16/bf16 some code missing and add more fp16/bf16 test (#31612)
issue: #31534

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-03-28 14:11:10 +08:00
SimFG b1a1cca10b
feat: add more operation detail info for better allocation (#30438)
issue: #30436

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-03-28 06:33:11 +08:00
groot 5be395354c
fix: minio ssl compatible issue (#31607)
issue: https://github.com/milvus-io/milvus/issues/30709

Signed-off-by: yhmo <yihua.mo@zilliz.com>
2024-03-27 14:41:20 +08:00
yihao.dai 31cf849f68
enhance: Support retriving file size from importutilv2.Reader (#31533)
To reduce the overhead caused by listing the S3 objects, add an
interface to importutil.Reader to retrieve file sizes.

issue: https://github.com/milvus-io/milvus/issues/31532,
https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-25 20:29:07 +08:00
Chun Han c3264ca3e3
feat: support segment pruner (#31003)
related: #30376
2024-03-22 13:57:06 +08:00
groot c81909bfab
enhance: Support MinIO TLS connection (#31311)
issue: https://github.com/milvus-io/milvus/issues/30709
pr: #31292

Signed-off-by: yhmo <yihua.mo@zilliz.com>
Co-authored-by: Chen Rao <chenrao317328@163.com>
2024-03-21 11:15:20 +08:00
Buqian Zheng d7dbc3c9d8
fix: [sparse float vector] support the new streaming deserialize reader (#31325)
issue: https://github.com/milvus-io/milvus/issues/31324

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-17 13:59:04 +08:00
Buqian Zheng 3c80083f51
feat: [Sparse Float Vector] add sparse vector support to milvus components (#30630)
add sparse float vector support to different milvus components,
including proxy, data node to receive and write sparse float vectors to
binlog, query node to handle search requests, index node to build index
for sparse float column, etc.

https://github.com/milvus-io/milvus/issues/29419

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-13 14:32:54 -07:00
Ted Xu 987d9023a5
enhance: Enable binlog deserialize reader in datanode compaction (#31036)
See #30863

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-03-08 18:25:02 +08:00
wayblink 875036b81b
feat: Define FieldValue, FieldStats and PartitionStats (#30286)
Define FieldValue, FieldStats, PartitionStats
FieldValue is largely copied from PrimaryKey
FieldStats is largely copied from PrimaryKeyStats
PartitionStats is map[segmentid][]FieldStats
Each partition can have a PartitionStats file

/kind feature
related: #30287
related: #30633

---------

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-03-06 20:42:37 -08:00
Ted Xu 71adafa933
enhance: adding a streaming deserialize reader for binlogs (#30860)
See #30863

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-03-04 19:31:09 +08:00
yihao.dai a434d33e75
feat: Add import scheduler and manager (#29367)
This PR introduces novel managerial roles for importv2:
1. ImportMeta: To manage all the import tasks;
2. ImportScheduler: To process tasks and modify their states;
3. ImportChecker: To ascertain the completion of all tasks and instigate
relevant operations.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-01 18:31:02 +08:00
SimFG 229fc4f755
enhance: retry to read when the s3 get the unexpect eof error (#30861)
/kind improvement
issue: #30877

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-02-28 16:28:53 +08:00
Ted Xu 12acaf3e4f
enhance: Adding a generic stream payload reader (#30682)
See: #30404

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-02-21 17:10:52 +08:00
wayblink f976385421
enhance: replace binlogIO with io.BinlogIO in datanode (#29725)
#30633

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-02-20 14:38:51 +08:00
cai.zhang 77ba3ce3f3
enhance: Use virtual host for tencent cloud (#30650)
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-02-20 14:08:51 +08:00
aoiasd a0537156c0
enhance: delete codc deserialize data by stream batch (#30407)
relate: https://github.com/milvus-io/milvus/issues/30404

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-02-06 17:04:25 +08:00
XuanYang-cn d744962aa1
fix: Correct Size calculation of DeleteData (#30397)
This PR would correct the actual deltalog size

See also: #30191

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-02-02 10:47:04 +08:00
congqixia e677af19b0
enhance: Add PrimaryKeys interface to reduce memory usage (#30405)
See also #30404

`PrimaryKey` is used to hold pk values for both int64 & varchar data
type. Since it is an interface it may occupies more memory than pure
slices when holding a group of pks.

This PR add `PrimaryKeys` interface when some other module need to hold
lots of PrimaryKeys.
By using this interface, it could reduce the memory of pk slice to half
when using Int64 Pk data type and reduce interface cost for each row of
varchar as well.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-02-01 09:57:11 +08:00
yihao.dai c5918290e6
feat: Add import executor and manager for datanode (#29438)
This PR introduces novel importv2 roles for datanode:
1. Executor: To execute tasks, a import task will be divided into the
following steps: read data -> hash data -> sync data;
2. Manager: To manage all the tasks;

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-31 20:45:04 +08:00
cai.zhang 6cf2f09b60
feat: Support tencent cloud object storage for milvus (#30163)
issue: #30162

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-01-23 11:28:56 +08:00
cai.zhang 6bfa826320
fix: Fix bug for read data from azure (#30007)
issue: #30005

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-01-22 15:44:54 +08:00
Xu Tong e429965f32
Add float16 approve for multi-type part (#28427)
issue:https://github.com/milvus-io/milvus/issues/22837

Add bfloat16 vector, add the index part of float16 vector.

Signed-off-by: Writer-X <1256866856@qq.com>
2024-01-11 15:48:51 +08:00
congqixia f18a7191f2
enhance: make `ColumnBasedInsertMsgToInsertData` check field missing (#29758)
fix: #29757

In previous code, `ColumnBasedInsertMsgToInsertData` adds empty field if
the insertMsg parameter does not have the column schema defined. This
may lead to unexpected behavior of caller functions.

This PR:
- Add column missing check
- Add column length check
- Generate BlobInfo for ColumnBasedInsertMsgToInsertData result

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-09 11:50:48 +08:00
yihao.dai 3d07b6682c
feat: Add import reader for numpy (#29253)
This PR implements a new numpy reader for import.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-08 19:42:49 +08:00
yah01 97e4ec5a69
enhance: use random root path for minio unit tests (#29753)
this avoids the conflicts while running multiple unit tests

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2024-01-08 15:58:48 +08:00
yihao.dai 23183ffb0f
feat: Add import reader for json (#29252)
This PR implements a new json reader for import.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-05 18:12:48 +08:00
smellthemoon 1c1f2a1371
enhance:change some logs (#29579)
related #29588

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-01-05 16:12:48 +08:00
yihao.dai 3561586edf
feat: Add import reader for binlog (#28910)
This PR defines the new import reader interfaces and implement a binlog
reader for import.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-05 11:48:47 +08:00
cai.zhang dc8b5c1130
enhance: Read azure file without ReadAll (#29602)
issue: #29292

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-01-04 20:50:46 +08:00
Jiquan Long 3f46c6d459
feat: support inverted index (#28783)
issue: https://github.com/milvus-io/milvus/issues/27704

Add inverted index for some data types in Milvus. This index type can
save a lot of memory compared to loading all data into RAM and speed up
the term query and range query.

Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL`
and `VARCHAR`.

Not supported: `ARRAY` and `JSON`.

Note:
- The inverted index for `VARCHAR` is not designed to serve full-text
search now. We will treat every row as a whole keyword instead of
tokenizing it into multiple terms.
- The inverted index don't support retrieval well, so if you create
inverted index for field, those operations which depend on the raw data
will fallback to use chunk storage, which will bring some performance
loss. For example, comparisons between two columns and retrieval of
output fields.

The inverted index is very easy to be used.

Taking below collection as an example:

```python
fields = [
		FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100),
		FieldSchema(name="int8", dtype=DataType.INT8),
		FieldSchema(name="int16", dtype=DataType.INT16),
		FieldSchema(name="int32", dtype=DataType.INT32),
		FieldSchema(name="int64", dtype=DataType.INT64),
		FieldSchema(name="float", dtype=DataType.FLOAT),
		FieldSchema(name="double", dtype=DataType.DOUBLE),
		FieldSchema(name="bool", dtype=DataType.BOOL),
		FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000),
		FieldSchema(name="random", dtype=DataType.DOUBLE),
		FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields)
collection = Collection("demo", schema)
```

Then we can simply create inverted index for field via:

```python
index_type = "INVERTED"
collection.create_index("int8", {"index_type": index_type})
collection.create_index("int16", {"index_type": index_type})
collection.create_index("int32", {"index_type": index_type})
collection.create_index("int64", {"index_type": index_type})
collection.create_index("float", {"index_type": index_type})
collection.create_index("double", {"index_type": index_type})
collection.create_index("bool", {"index_type": index_type})
collection.create_index("varchar", {"index_type": index_type})
```

Then, term query and range query on the field can be speed up
automatically by the inverted index:

```python
result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"])
result = collection.query(expr='int64 < 5', output_fields=["pk"])
result = collection.query(expr='int64 > 2997', output_fields=["pk"])
result = collection.query(expr='1 < int64 < 5', output_fields=["pk"])
```

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-12-31 19:50:47 +08:00
MrPresent-Han ed644983e2
enhance: add param for bloomfilter(#29388) (#29490)
related: #29388

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-12-28 18:10:46 +08:00
congqixia 6a86ac0ac6
fix: Align minio object storage ut to new minio server behavior (#29014)
See also #29013

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-06 15:42:43 +08:00
yihao.dai b4353ca4ce
enhance: Remove vector chunk manager (#28569)
We have implemented the chunkcache (in cpp) to retrieve vectors, hence
rendering the vectorchunkcache (in golang) obsolete.

issue: https://github.com/milvus-io/milvus/issues/28568

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-11-30 18:00:33 +08:00
XuanYang-cn aae7e62729
feat: Add levelzero compaction in DN (#28470)
See also: #27606

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-11-30 14:30:28 +08:00
cai.zhang f5f4f0872e
enhance: Support importing data with parquet file (#28608)
issue: #28272

Numpy does not support array type import. 
Array type data is imported through parquet.

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2023-11-29 20:52:27 +08:00
yihao.dai 4bd426dbe7
fix: Fix minio latency monitoring for get operation (#28510)
see also: https://github.com/milvus-io/milvus/issues/28509

Currently Minio latency monitoring for get operation only collects the
duration of getting object (which just returns an io.Reader and does not
really read from minio), this pr will correct this behavior.

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-11-28 10:00:27 +08:00
congqixia 8a9ab69369
fix: Skip statslog generation flushing empty L0 segment (#28733)
See also #27675

When L0 segment contains only delta data, merged statslog shall be
skiped when performing sync task

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-25 15:10:25 +08:00
yah01 cc952e0486
enhance: optimize forwarding level0 deletions by respecting partition (#28456)
- Cache the level 0 deletions after loading level0 segments
- Divide the level 0 deletions by partition
related: #27349

---------

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-21 18:24:22 +08:00
congqixia 2b3fa8f67b
fix: Add length check for `storage.NewPrimaryKeyStats` (#28576)
See also #28575
Add zero-length check for `storage.NewPrimaryKeyStats`. This function
shall return error when non-positive rowNum passed.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-21 10:28:21 +08:00
Bingyi Sun 59355cb3dc
Update arrow version to v12 (#28425)
issue: https://github.com/milvus-io/milvus/issues/28423

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-11-15 10:36:19 +08:00
congqixia e576271a24
Fix buffer FieldData has no `ElementType` and array logsize always zero (#28295)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-09 14:16:20 +08:00
yah01 ece592a42f
Deliver L0 segments delete records (#27722)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-07 01:44:18 +08:00
PowderLi 0252871d30
fix azure ListObjects (#27931)
Signed-off-by: PowderLi <min.li@zilliz.com>
2023-11-01 11:34:14 +08:00
Enwei Jiao 8ae9c947ae
Use OpenDAL to access object store (#25642)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-11-01 09:00:14 +08:00
yah01 9658367a3c
Refine chunk manager errors (#27590)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-31 12:18:15 +08:00
zhenshan.cao 6c3f29d003
Identify service providers based on addresses (#27907)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2023-10-25 17:28:10 +08:00
zhagnlu 6060dd7ea8
Add chunk manager request timeout (#27692)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-10-23 20:08:08 +08:00
XuanYang-cn 7358c3527b
Add iterators (#27643)
See also: #27606

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-10-18 19:34:08 +08:00
congqixia 2f201c25e2
Remove deprecated io/ioutil usage (#27747)
`io/ioutil` package is deprecated, use `io`,`os` package replacement
also added golangci-lint rule to block future reference

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: guoguangwu <guoguangwu@magic-shield.com>
2023-10-17 20:32:09 +08:00
XuanYang-cn 2f16339aac
Enhance InsertData and FieldData (#27436)
1. Add NewInsertData
2. Add GetRowNum(), GetMemorySize(), and, Append() for InsertData
3. Add AppendRow() for FieldData for compaction

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-10-17 17:36:11 +08:00
congqixia 670cb386e7
Add back `gocritic` linter and fix related issues (#27289)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-09-22 10:05:26 +08:00
SimFG 26f06dd732
Format the code (#27275)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
congqixia cc9974979f
Add staticcheck linter and fix existing problems (#27174)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-09-19 10:05:22 +08:00
PowderLi 4feb3fa7c6
support azure (#26398)
Signed-off-by: PowderLi <min.li@zilliz.com>
2023-09-19 10:01:23 +08:00
Xu Tong 9166011c4a
Add float16 vector (#25852)
Signed-off-by: Writer-X <1256866856@qq.com>
2023-09-08 10:03:16 +08:00
bjzhjing 548c82eca5
Refactor storage.MergeInsertData() to optimize the merging process (#26839)
Benchmark Milvus with https://github.com/qdrant/vector-db-benchmark and
specify the datasets as 'deep-image-96-angular'. Meanwhile, do perf
profiling during 'upload + index' stage of vector-db-benchmark and see
the following hot spots.

39.59%--github.com/milvus-io/milvus/internal/storage.MergeInsertData
        |
        |--21.43%--github.com/milvus-io/milvus/internal/storage.MergeFieldData
        |          |
        |          |--17.22%--runtime.memmove
        |                     |
        |                     |--1.53%--asm_exc_page_fault
        |                     ......
        |
        |--18.16%--runtime.memmove
                   |
                   |--1.66%--asm_exc_page_fault
                   ......

The hot code path is in storage.MergeInsertData() which updates
buffer.buffer by creating a new 'InsertData' instance and merging both
the old buffer.buffer and addedBuffer into it. When it calls golang
runtime.memmove to move buffer.buffer which is with big size (>1M), the
hot spots appear.

To avoid the above overhead, update storage.MergeInsertData() by
appending addedBuffer to buffer.buffer, instead of moving buffer.buffer
and addedBuffer to a new 'InsertData'. This change removes the hot spots
'runtime.memmove' from perf profiling output. Additionally, the 'upload
+ index' time, which is one performance metric of vector-db-benchmark,
is reduced around 60% with this change.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
2023-09-05 21:41:48 +08:00
Enwei Jiao fb0705df1b
Decouple basetable and componentparam (#26725)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-05 10:31:48 +08:00
zhagnlu 411f9ac823
Upgrade minio-go and add region and virtual host config for segcore chunk manager (#26194)
Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-08-11 10:37:36 +08:00
congqixia 2770ac4df5
Fix nilness linter errors (#26218)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-08-09 11:31:15 +08:00
zhenshan.cao 2c6c7749e2
Enable print_log support json data type (#26118)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2023-08-04 11:27:05 +08:00
xige-16 f33451b3d8
Write the cache file to the cacheStorage.rootpath dir (#25715)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-07-28 10:59:02 +08:00
xige-16 94d6cbb238
Fix querynode panic when binlog ts wrong (#25635)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-07-18 10:41:20 +08:00
xige-16 33c2012675
Add more metrics (#25081)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-06-26 17:52:44 +08:00
Xiaofan e8911ebda7
Add retry time when lazy load BF (#25096)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-06-25 11:32:43 +08:00
PowderLi 3f4356df10
fix the spelling of `field` (#25008)
Signed-off-by: PowderLi <min.li@zilliz.com>
2023-06-21 14:00:42 +08:00
yah01 8bc5282eb3
Fix datanode always retries to load stats even file corrupted (#25012)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-06-20 16:40:42 +08:00
Enwei Jiao 1ef8f0fceb
Remove cgo PayloadWriter (#24892)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-06-14 18:04:38 +08:00
yah01 a9dccec03a
Add go payload writer (#24656) (#24762)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-06-09 13:52:39 +08:00