Commit Graph

132 Commits (cc8f7aa11013041f85cd04a6a4c52657ed07443c)

Author SHA1 Message Date
SimFG 5016038781
enhance: release the record in delete codec and add some log for compaction (#34454)
/kind improvement

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-07-09 15:40:17 +08:00
congqixia 2f691f1e67
enhance: Unify DeleteLog parsing code (#34009)
See also #33787

The parsing delete log is distributed in lots of places, which is not
recommended and hard to maintain.

This PR abstract common parsing logic into `DeleteLog.Parse` method to
unify implementation and make it easier to replace json parsing lib.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-21 16:54:01 +08:00
shaoting-huang 5f02e52561
enhance: Refactor data codec deserialize (#33923)
#33922

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-06-20 11:17:59 +08:00
smellthemoon 2a1356985d
enhance: support null in go payload (#32296)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-06-19 17:08:00 +08:00
shaoting-huang 8cdc0e6233
fix: fix data codec writer close (#33818)
issue:#33813

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-06-18 13:59:57 +08:00
congqixia f993b2913b
enhance: Reserve space of payload writer when serialize data (#33817)
See also #33561 #33562

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-17 12:06:04 +08:00
XuanYang-cn f67b6dc2b0
fix: DeleteData merge wrong data casuing data loss (#33820)
See also: #33819

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-06-14 17:57:56 +08:00
shaoting-huang 0ecd694305
enhance: legacy code clean up (#33838)
issue: #33839

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-06-14 14:25:56 +08:00
congqixia 512ea6be5f
enhance: Avoid merging insert data when buffering insert msgs (#33562)
See also #33561

This PR:
- Use zero copy when buffering insert messages
- Make `storage.InsertCodec` support serialize multiple insert data
chunk into same batch binlog files

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-13 11:15:56 +08:00
congqixia b39dfc25dc
enhance: Use fastjson lib for unmarshal delete log (#33787)
```
goos: linux
goarch: amd64
GOMAXPROC=1
cpu: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
BenchmarkJsonSerdeStd             343872              3568 ns/op            1335 B/op         25 allocs/op
BenchmarkJsonSerdeFastjson       5124177               234.9 ns/op            16 B/op          1 allocs/op
```

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-12 20:41:57 +08:00
cai.zhang 6ea7633bd5
enhance: Add memory size for binlog (#33025)
issue: #33005
1. add `MemorySize` field for insert binlog.
2. `LogSize` means the file size in the storage object.
3. `MemorySize` means the size of the data in the memory.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2024-05-15 12:59:34 +08:00
Buqian Zheng 3c80083f51
feat: [Sparse Float Vector] add sparse vector support to milvus components (#30630)
add sparse float vector support to different milvus components,
including proxy, data node to receive and write sparse float vectors to
binlog, query node to handle search requests, index node to build index
for sparse float column, etc.

https://github.com/milvus-io/milvus/issues/29419

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-13 14:32:54 -07:00
Ted Xu 71adafa933
enhance: adding a streaming deserialize reader for binlogs (#30860)
See #30863

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-03-04 19:31:09 +08:00
Ted Xu 12acaf3e4f
enhance: Adding a generic stream payload reader (#30682)
See: #30404

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-02-21 17:10:52 +08:00
wayblink f976385421
enhance: replace binlogIO with io.BinlogIO in datanode (#29725)
#30633

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-02-20 14:38:51 +08:00
aoiasd a0537156c0
enhance: delete codc deserialize data by stream batch (#30407)
relate: https://github.com/milvus-io/milvus/issues/30404

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-02-06 17:04:25 +08:00
XuanYang-cn d744962aa1
fix: Correct Size calculation of DeleteData (#30397)
This PR would correct the actual deltalog size

See also: #30191

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-02-02 10:47:04 +08:00
Xu Tong e429965f32
Add float16 approve for multi-type part (#28427)
issue:https://github.com/milvus-io/milvus/issues/22837

Add bfloat16 vector, add the index part of float16 vector.

Signed-off-by: Writer-X <1256866856@qq.com>
2024-01-11 15:48:51 +08:00
Jiquan Long 3f46c6d459
feat: support inverted index (#28783)
issue: https://github.com/milvus-io/milvus/issues/27704

Add inverted index for some data types in Milvus. This index type can
save a lot of memory compared to loading all data into RAM and speed up
the term query and range query.

Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL`
and `VARCHAR`.

Not supported: `ARRAY` and `JSON`.

Note:
- The inverted index for `VARCHAR` is not designed to serve full-text
search now. We will treat every row as a whole keyword instead of
tokenizing it into multiple terms.
- The inverted index don't support retrieval well, so if you create
inverted index for field, those operations which depend on the raw data
will fallback to use chunk storage, which will bring some performance
loss. For example, comparisons between two columns and retrieval of
output fields.

The inverted index is very easy to be used.

Taking below collection as an example:

```python
fields = [
		FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100),
		FieldSchema(name="int8", dtype=DataType.INT8),
		FieldSchema(name="int16", dtype=DataType.INT16),
		FieldSchema(name="int32", dtype=DataType.INT32),
		FieldSchema(name="int64", dtype=DataType.INT64),
		FieldSchema(name="float", dtype=DataType.FLOAT),
		FieldSchema(name="double", dtype=DataType.DOUBLE),
		FieldSchema(name="bool", dtype=DataType.BOOL),
		FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000),
		FieldSchema(name="random", dtype=DataType.DOUBLE),
		FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields)
collection = Collection("demo", schema)
```

Then we can simply create inverted index for field via:

```python
index_type = "INVERTED"
collection.create_index("int8", {"index_type": index_type})
collection.create_index("int16", {"index_type": index_type})
collection.create_index("int32", {"index_type": index_type})
collection.create_index("int64", {"index_type": index_type})
collection.create_index("float", {"index_type": index_type})
collection.create_index("double", {"index_type": index_type})
collection.create_index("bool", {"index_type": index_type})
collection.create_index("varchar", {"index_type": index_type})
```

Then, term query and range query on the field can be speed up
automatically by the inverted index:

```python
result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"])
result = collection.query(expr='int64 < 5', output_fields=["pk"])
result = collection.query(expr='int64 > 2997', output_fields=["pk"])
result = collection.query(expr='1 < int64 < 5', output_fields=["pk"])
```

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-12-31 19:50:47 +08:00
XuanYang-cn aae7e62729
feat: Add levelzero compaction in DN (#28470)
See also: #27606

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-11-30 14:30:28 +08:00
congqixia 8a9ab69369
fix: Skip statslog generation flushing empty L0 segment (#28733)
See also #27675

When L0 segment contains only delta data, merged statslog shall be
skiped when performing sync task

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-25 15:10:25 +08:00
yah01 cc952e0486
enhance: optimize forwarding level0 deletions by respecting partition (#28456)
- Cache the level 0 deletions after loading level0 segments
- Divide the level 0 deletions by partition
related: #27349

---------

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-21 18:24:22 +08:00
yah01 ece592a42f
Deliver L0 segments delete records (#27722)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-07 01:44:18 +08:00
XuanYang-cn 2f16339aac
Enhance InsertData and FieldData (#27436)
1. Add NewInsertData
2. Add GetRowNum(), GetMemorySize(), and, Append() for InsertData
3. Add AppendRow() for FieldData for compaction

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-10-17 17:36:11 +08:00
SimFG 26f06dd732
Format the code (#27275)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
Xu Tong 9166011c4a
Add float16 vector (#25852)
Signed-off-by: Writer-X <1256866856@qq.com>
2023-09-08 10:03:16 +08:00
congqixia 2770ac4df5
Fix nilness linter errors (#26218)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-08-09 11:31:15 +08:00
xige-16 94d6cbb238
Fix querynode panic when binlog ts wrong (#25635)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-07-18 10:41:20 +08:00
PowderLi 3f4356df10
fix the spelling of `field` (#25008)
Signed-off-by: PowderLi <min.li@zilliz.com>
2023-06-21 14:00:42 +08:00
congqixia 41af0a98fa
Use go-api/v2 for milvus-proto (#24770)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-06-09 01:28:37 +08:00
Enwei Jiao d3af451d92
Upgrade golangci-lint (#24707)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-06-07 19:34:36 +08:00
aoiasd c84bdcea49
merge stats log when segment flushing or compacting (#23570)
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-05-29 10:21:28 +08:00
Enwei Jiao 967a97b9bd
Support json & array types (#23408)
Signed-off-by: yah01 <yang.cen@zilliz.com>
Co-authored-by: yah01 <yang.cen@zilliz.com>
2023-04-20 11:32:31 +08:00
jaime c9d0c157ec
Move some modules from internal to public package (#22572)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-04-06 19:14:32 +08:00
yah01 081572d31c
Refactor QueryNode (#21625)
Signed-off-by: yah01 <yang.cen@zilliz.com>
Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: aoiasd <zhicheng.yue@zilliz.com>
2023-03-27 00:42:00 +08:00
Xiaofan 949d5d078f
Fix memory calculation in dataCodec (#21800)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-01-28 11:09:52 +08:00
congqixia f745d7f489
Fix compaction target segment rowNum is always 0 (#20937)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2022-12-01 20:33:17 +08:00
Xiaofan 633a749880
Recude IndexCodec Load Memory (#20621)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2022-11-18 10:47:08 +08:00
Xiaofan 2bfecf5b4e
Refine bloomfilter and memory usage (#20168)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2022-10-31 17:41:34 +08:00
SimFG a55f739608
Separate public proto files (#19782)
Signed-off-by: SimFG <bang.fu@zilliz.com>

Signed-off-by: SimFG <bang.fu@zilliz.com>
2022-10-16 20:49:27 +08:00
SimFG d7f38a803d
Separate some proto files (#19218)
Signed-off-by: SimFG <bang.fu@zilliz.com>

Signed-off-by: SimFG <bang.fu@zilliz.com>
2022-09-16 16:56:49 +08:00
xige-16 4de1bfe5bc
Add cpp data codec (#18538)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
Co-authored-by: zhagnlu lu.zhang@zilliz.com

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-09-09 22:12:34 +08:00
xige-16 e40061b864
Update binlog event format (#18347)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-08-11 14:06:38 +08:00
yah01 70f8bea4b4
Avoid growing slice as deserializing binlogs (#17421)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2022-06-08 11:46:06 +08:00
yah01 7af02fa531
Improve load performance, load binlogs concurrently per file, deserialize binlogs concurrently per field/segment (#16514)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2022-04-25 15:57:47 +08:00
godchen bb7a0766fe
Add dependency factory (#16204)
Signed-off-by: godchen0212 <qingxiang.chen@zilliz.com>
2022-04-07 22:05:32 +08:00
xige-16 99984b88e1
Support delete varChar value (#16229)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-04-02 17:43:29 +08:00
Jiquan Long ba37531456
Add support for loading multiple indexes (#16138)
Signed-off-by: dragondriver <jiquan.long@zilliz.com>
2022-03-30 21:11:28 +08:00
Xiaofan 801eeffbcc
Replace cgo parquet reader to go parquet reader (#16199)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2022-03-30 15:21:28 +08:00
xige-16 205c92e54b
Support insert string data (#15993)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-03-25 14:27:25 +08:00