From 09fe8b638ae28c7fe1a2d5fc510ff0fb00594ab1 Mon Sep 17 00:00:00 2001 From: Xiaofan <83447078+xiaofan-luan@users.noreply.github.com> Date: Thu, 7 Oct 2021 02:03:02 +0800 Subject: [PATCH] [skip ci]Add document for sealed segment variable and functions (#9364) Signed-off-by: xiaofan-luan --- docs/design_docs/segcore/segment_sealed.md | 32 ++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/docs/design_docs/segcore/segment_sealed.md b/docs/design_docs/segcore/segment_sealed.md index 7e5edbc3cb..129b2472b7 100644 --- a/docs/design_docs/segcore/segment_sealed.md +++ b/docs/design_docs/segcore/segment_sealed.md @@ -1,7 +1,7 @@ # SegmentSealed SegmentSealed has extra interface rather than segment_inferface: -1. `LoadIndex(loadIndexInfo)`: load the index. indexInfo containts +1. `LoadIndex(loadIndexInfo)`: load the index. indexInfo contains 1. `FieldId` 2. `IndexParams`: index paramters in KV structure KV 3. `VecIndex`: vector index @@ -10,4 +10,32 @@ SegmentSealed has extra interface rather than segment_inferface: 3. `DropIndex(fieldId)`: drop and release exist index of specified field 4. `DropFieldData(fieldId)`: drop and release exist data for specified field -Search is executatble as long as all the column involved in the search are loaded. \ No newline at end of file +Search is executatble as long as all the column involved in the search are loaded. + +# SegmentSealedImpl internal data definition +1. `row_count_opt_`: + 1. Fill row count when load the first entity + 2. All the other column loaded must match the same row count +3. `xxx_ready_bitset_` `system_ready_count_` + 1. Used to record whether the corresponding column is loaded. Bitset corresponds to FieldOffset + 2. Query is executatble If and only if the following conditions are met: + 1. system_ready_count_ == 2, which means all the system column RowId/Timestamp is loaded + 2. The scalar columns involved in the query has been loaded + 3. For the vector columns involved in the query, either the original data or the index is loaded +4. `scalar_indexings_`: Store scalar index + + 1. Use StructuredSortedIndex in knowhere +5. `primary_key_index_`: store index for pk column + 1. Use brand new ScalarIndexBase format + 2. **Note,The functions here may overlap with scalar indexes. It is recommended to replace scalar index with ScalarIndexBase** +6. `field_datas_`: store original data + 1. `aligned_vector` format ganruatee `int/float` data are aligned +7. `SealedIndexingRecord vecindexs_`: store vector index +8. `row_ids_/timestamps_`: RowId/Timestamp data +9. `TimestampIndex`: Index for Timestamp column +10. `schema`: schema + +# SegmentSealedImpl internal function dafinion +1.Most functions are the implementation of the corresponding functions of segment interface, which will not be repeated here +2. `update_row_count`: Used to update the row_count field +3. `mask_with_timestamps`: Use Timestamp column to update search bitmask,used to support Time Travel function \ No newline at end of file