mirror of https://github.com/milvus-io/milvus.git
3.4 KiB
3.4 KiB
Segment Interface
External Interface
get_row_count
: Get the number of entities in the segmentget_schema
: Get the corresponding collection schema in the segmentGetMemoryUsageInBytes
: Get memory usage of a segmentSearch(plan, placeholderGroup, timestamp) -> QueryResult
: Perform search operations according to the plan containing search parameters and predicate conditions, and return search results. Ensure that the time of all search results is before the specified timestamp(MVCC)FillTargetEntry(plan, &queryResult)
: Fill the missing column data for search results based on target columns in the plan
See design details ${milvus_root}/internal/core/src/segcore/SegmentInterface.h
Basic Concepts:
- Segment: Data is sharded into segments based on written timestamp, and the sharding logic is controlled by data coordinator.
- Chunk: Further division of segment data, chunk is continuous data for each column
- There will be only one chunk in each sealed segment.
- In growing segment, chunks are currently divided by a fixed number of rows. With data ingestion, the number of chunks will increase
- Span: Similar to std::span, point to continuous data in memory
- SystemField: Extra field stores system info, currently including RowID and Timestamp field.
- SegOffset: The entity identifier in the segment
SegmentInternalInterface internal functions
num_chunk()
: total chunk numbersize_per_chunk()
: length of each chunkget_active_count(Timestamp)
: entity count after filter by Timestampchunk_data(FieldOffset, chunk_id) -> Span<T>
: return continuous data for specified column and chunkchunk_scalar_index(FieldOffset, chunk_id) -> const StructuredIndex<T>&
: return the inverted index of specified column and chunknum_chunk_index
: the number of indexes (including scalars and vector indexes) that have been created:- In growing segment, this value is the number of chunks for which the inverted index has been created. In these chunks, the index can be used to speed up the calculation.
- SealedSegment must be 1
debug()
: debug is used to print extra information while debuggingvector_search (vec_count, query..., timestamp, bitset, output)
: Search the vector columnvec_count
: specifies how many entities participated in the vector search calculation, the rest of the segments are filtered out because their timestamp is larger than specified timestamp. This function is mainly used in growing segment as multi version control(MVCC)query...
: multiple variables jointly specify the parameters and search vectortimestamp
: timestamp is used for time traveling, filter out data with timestamp. Mainly for sealed segmentbitset
: calculated bit mask value as an outputoutput
: output QueryResult
bulk_subscript(FieldOffset|SystemField, seg_offsets..., output)
:- given seg_offsets, calculate
results[i] = FieldData[seg_offsets[i]]
, for GetEntityByIds - FieldData is defined by FieldOffset or SystemField
- given seg_offsets, calculate
search_ids(IdArray, timestamp) -> pair<IdArray, SegOffsets>
:- Find the corresponding segment offsets according to the primary key in an id array
- The returned order is not guaranteed, but the two returned fields must correspond to each other one by one.
- Entities without PKs will not be returned
check_search(Plan)
: check if the Plan is valid- It mainly checks whether the columns used in the plan have been loaded