milvus/internal/storage
shaoting-huang 88b373b024
enhance: binlog primary key turn off dict encoding (#34358)
issue: #34357 

Go Parquet uses dictionary encoding by default, and it will fall back to
plain encoding if the dictionary size exceeds the dictionary size page
limit. Users can specify custom fallback encoding by using
`parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However,
Go Parquet [fallbacks to plain
encoding](e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238))
rather than custom encoding method users provide. Therefore, this patch
only turns off dictionary encoding for the primary key.

With a 5 million auto ID primary key benchmark, the parquet file size
improves from 13.93 MB to 8.36 MB when dictionary encoding is turned
off, reducing primary key storage space by 40%.

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-07-17 17:47:44 +08:00
..
aliyun Identify service providers based on addresses (#27907) 2023-10-25 17:28:10 +08:00
gcp Format the code (#27275) 2023-09-21 09:45:27 +08:00
tencent feat: Support tencent cloud object storage for milvus (#30163) 2024-01-23 11:28:56 +08:00
OWNERS [skip ci]Update OWNERS files (#11898) 2021-11-16 15:41:11 +08:00
azure_object_storage.go enhance: Add nilness linter and fix some small issues (#34049) 2024-06-24 14:52:03 +08:00
azure_object_storage_test.go enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740) 2024-04-25 20:41:27 +08:00
binlog_iterator.go enhance: legacy code clean up (#33838) 2024-06-14 14:25:56 +08:00
binlog_iterator_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
binlog_reader.go fix: descriptor event in previous version not has nullable to parse error (#34235) 2024-07-01 16:38:06 +08:00
binlog_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
binlog_util.go Move some modules from internal to public package (#22572) 2023-04-06 19:14:32 +08:00
binlog_util_test.go Format the code (#27275) 2023-09-21 09:45:27 +08:00
binlog_writer.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
binlog_writer_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
data_codec.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
data_codec_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
data_sorter.go feat: [Sparse Float Vector] add sparse vector support to milvus components (#30630) 2024-03-13 14:32:54 -07:00
data_sorter_test.go enhance: add helpers to parse sparse float vector in JSON (#32543) 2024-04-25 14:47:24 +08:00
delta_data.go enhance: Unify DeleteLog parsing code (#34009) 2024-06-21 16:54:01 +08:00
delta_data_test.go enhance: Add unittest for `storage.DeleteLog` (#34190) 2024-06-26 17:14:04 +08:00
event_data.go enhance: Fix lint issues from recent PRs (#34482) 2024-07-09 10:06:24 +08:00
event_header.go Move some modules from internal to public package (#22572) 2023-04-06 19:14:32 +08:00
event_reader.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
event_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
event_writer.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
event_writer_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
factory.go enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740) 2024-04-25 20:41:27 +08:00
field_stats.go feat: Major compaction (#33620) 2024-06-10 21:34:08 +08:00
field_stats_test.go feat: Major compaction (#33620) 2024-06-10 21:34:08 +08:00
field_value.go enhance: reconstruct scalar part's code for segment-pruner(#30376) (#34346) 2024-07-04 16:36:09 +08:00
field_value_test.go feat: Define FieldValue, FieldStats and PartitionStats (#30286) 2024-03-06 20:42:37 -08:00
index_data_codec.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
index_data_codec_test.go enhance: Add memory size for binlog (#33025) 2024-05-15 12:59:34 +08:00
insert_data.go enhance: Add lint rule to forbid gogo protobuf (#34594) 2024-07-12 10:19:35 +08:00
insert_data_test.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
local_chunk_manager.go enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740) 2024-04-25 20:41:27 +08:00
local_chunk_manager_test.go enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740) 2024-04-25 20:41:27 +08:00
minio_object_storage.go enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740) 2024-04-25 20:41:27 +08:00
minio_object_storage_test.go fix: Removed minio bucket after use in test (#32624) 2024-04-28 13:51:26 +08:00
options.go enhance: Support MinIO TLS connection (#31311) 2024-03-21 11:15:20 +08:00
partition_stats.go fix: sync part stats task cannot be finished(#30376) (#34027) 2024-06-24 10:16:02 +08:00
partition_stats_test.go feat: Define FieldValue, FieldStats and PartitionStats (#30286) 2024-03-06 20:42:37 -08:00
payload.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
payload_reader.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
payload_reader_test.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
payload_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
payload_writer.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
payload_writer_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
pk_statistics.go enhance: Use BatchPkExist to reduce bloom filter func call cost (#33611) 2024-06-13 17:57:56 +08:00
primary_key.go enhance: Remove StringPrimaryKey to reduce unnecessary copy and function call cost (#33486) 2024-05-31 15:41:45 +08:00
primary_key_test.go Use go-api/v2 for milvus-proto (#24770) 2023-06-09 01:28:37 +08:00
primary_keys.go enhance: Add PrimaryKeys interface to reduce memory usage (#30405) 2024-02-01 09:57:11 +08:00
primary_keys_test.go enhance: Add PrimaryKeys interface to reduce memory usage (#30405) 2024-02-01 09:57:11 +08:00
print_binlog.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
print_binlog_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
remote_chunk_manager.go enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740) 2024-04-25 20:41:27 +08:00
remote_chunk_manager_test.go enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740) 2024-04-25 20:41:27 +08:00
serde.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
serde_events.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
serde_events_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
serde_test.go enhance: add delta log stream new format reader and writer (#34116) 2024-07-06 09:08:09 +08:00
stats.go enhance: Use Blocked Bloom Filter instead of basic bloom fitler impl. (#33405) 2024-05-31 17:49:45 +08:00
stats_test.go enhance: Use Blocked Bloom Filter instead of basic bloom fitler impl. (#33405) 2024-05-31 17:49:45 +08:00
storage_test.go enhance: Remove vector chunk manager (#28569) 2023-11-30 18:00:33 +08:00
types.go enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740) 2024-04-25 20:41:27 +08:00
unsafe.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
unsafe_test.go [skip e2e]Update license for storage unsafe (#14452) 2021-12-28 20:03:56 +08:00
utils.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
utils_test.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00