Commit Graph

20 Commits (master)

Author SHA1 Message Date
Ted Xu 878ce56079
fix: correct memory size estimation on arrays ()
See: 

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-03-05 16:54:09 +08:00
sthuang 90acc8a58f
enhance: upgrade go arrow version from 12.0.1 to 17.0.0 ()
related: https://github.com/milvus-io/milvus/issues/39915

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-02-25 10:30:02 +08:00
congqixia cb7f2fa6fd
enhance: Use v2 package name for pkg module ()
Related to 

https://go.dev/doc/modules/version-numbers

Update pkg version according to golang dep version convention

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2025-02-22 23:15:58 +08:00
Ted Xu 8562a102ec
enhance: API integration with storage v2 in mix-compactions ()
See 

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-02-22 14:23:54 +08:00
smellthemoon 8b974c5742
enhance: support compact if lack of binlog ()
https://github.com/milvus-io/milvus/issues/39718

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2025-02-22 10:51:56 +08:00
sthuang 3eb3af5f08
feat: explicitly specify column groups for storage v2 api ()
* use the new packed reader and writer api to be compatible with current
etcd meta
* For the new packed writer API: column groups and paths are explicitly
defined by users and won't split column groups by memory in storage v2.
Packed writer follows the user-defined column groups to split arrow
record and write into the corresponding file path.
* For the new packed reader API: read paths are explicitly defined by
users.
related: 

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-02-21 22:03:54 +08:00
Ted Xu 2978b0890e
enhance: iterative download data during compaction to reduce memory cost ()
See 

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-02-13 10:36:47 +08:00
sthuang 15c8798b93
feat: storage v2 serde reader and writer ()
related: https://github.com/milvus-io/milvus/issues/39173

---------

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2025-02-11 16:00:46 +08:00
Ted Xu 427b6a4c94
enhance: reduce stats task cost by skipping ser/de ()
See 

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2025-02-06 17:14:45 +08:00
congqixia b0bd290a6e
enhance: Use internal json(sonic) to replace std json lib ()
Related to 

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-18 10:46:31 +08:00
Ted Xu 31d0c84f67
fix: panic calclulating data size on writing binlogs ()
See: 

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-11-11 14:08:27 +08:00
Ted Xu bc9562feb1
enhance: avoid memory copy and serde in mix compaction ()
See: 

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-11-07 16:30:57 -08:00
Ted Xu 262a994d6d
enhance: generally improve the performance of mix compactions ()
See 

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-10-29 18:12:20 +08:00
smellthemoon a3f2f044d6
fix: not set nullable when stream writer write headers ()


Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-29 20:59:00 +08:00
Ted Xu 41646c8439
feat: integrate new deltalog format ()
See 

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-08-20 19:06:56 +08:00
shaoting-huang 88b373b024
enhance: binlog primary key turn off dict encoding ()
issue:  

Go Parquet uses dictionary encoding by default, and it will fall back to
plain encoding if the dictionary size exceeds the dictionary size page
limit. Users can specify custom fallback encoding by using
`parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However,
Go Parquet [fallbacks to plain
encoding](e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238))
rather than custom encoding method users provide. Therefore, this patch
only turns off dictionary encoding for the primary key.

With a 5 million auto ID primary key benchmark, the parquet file size
improves from 13.93 MB to 8.36 MB when dictionary encoding is turned
off, reducing primary key storage space by 40%.

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-07-17 17:47:44 +08:00
congqixia 3333160b8d
enhance: Fix lint issues from recent PRs ()
See also 
Some lint issues are introduced due to lack of static check run. This PR
fixes these problems.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-09 10:06:24 +08:00
shaoting-huang f4dd7c7efb
enhance: add delta log stream new format reader and writer ()
issue: 

Benchmark case: The benchmark run the go benchmark function
`BenchmarkDeltalogFormat` which is put in the Files changed. It tests
the performance of serializing and deserializing from two different data
formats under a 10 million delete log dataset.

Metrics: The benchmarks measure the average time taken per operation
(ns/op), memory allocated per operation (MB/op), and the number of
memory allocations per operation (allocs/op).
| Test Name | Avg Time (ns/op) | Time Comparison | Memory Allocation
(MB/op) | Memory Comparison | Allocation Count (allocs/op) | Allocation
Comparison |

|---------------------------------|------------------|-----------------|---------------------------|-------------------|------------------------------|------------------------|
| one_string_format_reader | 2,781,990,000 | Baseline | 2,422 | Baseline
| 20,336,539 | Baseline |
| pk_ts_separate_format_reader | 480,682,639 | -82.72% | 1,765 | -27.14%
| 20,396,958 | +0.30% |
| one_string_format_writer | 5,483,436,041 | Baseline | 13,900 |
Baseline | 70,057,473 | Baseline |
| pk_and_ts_separate_format_writer| 798,591,584 | -85.43% | 2,178 |
-84.34% | 30,270,488 | -56.78% |

Both read and write operations show significant improvements in both
speed and memory allocation.

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-07-06 09:08:09 +08:00
congqixia 2f691f1e67
enhance: Unify DeleteLog parsing code ()
See also 

The parsing delete log is distributed in lots of places, which is not
recommended and hard to maintain.

This PR abstract common parsing logic into `DeleteLog.Parse` method to
unify implementation and make it easier to replace json parsing lib.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-21 16:54:01 +08:00
Ted Xu 6d5747cb3e
feat: adding deltalog stream reader and writer ()
See 

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-06-19 14:42:01 +08:00