commit
aa64c908d0
|
@ -1,252 +1,237 @@
|
|||
/*
|
||||
|
||||
Series:
|
||||
Package tsi1 provides a memory-mapped index implementation that supports
|
||||
high cardinality series.
|
||||
|
||||
╔══════Series List═════╗
|
||||
║ ┌───────────────────┐║
|
||||
║ │ Term List │║
|
||||
║ ├───────────────────┤║
|
||||
║ │ Series Data │║
|
||||
║ ├───────────────────┤║
|
||||
║ │ Trailer │║
|
||||
║ └───────────────────┘║
|
||||
╚══════════════════════╝
|
||||
Overview
|
||||
|
||||
╔══════════Term List═══════════╗
|
||||
║ ┌──────────────────────────┐ ║
|
||||
║ │ Term Count <uint32> │ ║
|
||||
║ └──────────────────────────┘ ║
|
||||
║ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ║
|
||||
║ ┃ ┌──────────────────────┐ ┃ ║
|
||||
║ ┃ │ len(Term) <varint> │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │ Term <byte...> │ ┃ ║
|
||||
║ ┃ └──────────────────────┘ ┃ ║
|
||||
║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║
|
||||
║ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ║
|
||||
║ ┃ ┌──────────────────────┐ ┃ ║
|
||||
║ ┃ │ len(Term) <varint> │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │ Term <byte...> │ ┃ ║
|
||||
║ ┃ └──────────────────────┘ ┃ ║
|
||||
║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║
|
||||
╚══════════════════════════════╝
|
||||
The top-level object in tsi1 is the Index. It is the primary access point from
|
||||
the rest of the system. The Index is composed of LogFile and IndexFile objects.
|
||||
|
||||
╔═════════Series Data══════════╗
|
||||
║ ┌──────────────────────────┐ ║
|
||||
║ │ Series Count <uint32> │ ║
|
||||
║ └──────────────────────────┘ ║
|
||||
║ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ║
|
||||
║ ┃ ┌──────────────────────┐ ┃ ║
|
||||
║ ┃ │ Flag <uint8> │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │ len(Series) <varint> │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │ Series <byte...> │ ┃ ║
|
||||
║ ┃ └──────────────────────┘ ┃ ║
|
||||
║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║
|
||||
║ ... ║
|
||||
╚══════════════════════════════╝
|
||||
Log files are small write-ahead log files that record new series immediately
|
||||
in the order that they are received. The data within the file is indexed
|
||||
in-memory so it can be quickly accessed. When the system is restarted, this log
|
||||
file is replayed and the in-memory representation is rebuilt.
|
||||
|
||||
╔════════════Trailer══════════════╗
|
||||
║ ┌─────────────────────────────┐ ║
|
||||
║ │ Term List Offset <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Term List Size <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Series Data Offset <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Series Data Pos <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Sketch Offset <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Sketch Size <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Tomb Sketch Offset <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Tomb Sketch Size <uint64> │ ║
|
||||
║ └─────────────────────────────┘ ║
|
||||
╚═════════════════════════════════╝
|
||||
Index files also contain series information, however, they are highly indexed
|
||||
so that reads can be performed quickly. Index files are built through a process
|
||||
called compaction where a log file or multiple index files are merged together.
|
||||
|
||||
|
||||
Tag Block:
|
||||
Operations
|
||||
|
||||
╔═══════Tag Block════════╗
|
||||
║┌──────────────────────┐║
|
||||
║│ Tag Values Block │║
|
||||
║├──────────────────────┤║
|
||||
║│ ... │║
|
||||
║├──────────────────────┤║
|
||||
║│ Tag Keys Block │║
|
||||
║├──────────────────────┤║
|
||||
║│ Trailer │║
|
||||
║└──────────────────────┘║
|
||||
╚════════════════════════╝
|
||||
The index can perform many tasks related to series, measurement, & tag data.
|
||||
All data is inserted by adding a series to the index. When adding a series,
|
||||
the measurement, tag keys, and tag values are all extracted and indexed
|
||||
separately.
|
||||
|
||||
╔═══════Tag Values Block═══════╗
|
||||
║ ║
|
||||
║ ┏━━━━━━━━Value List━━━━━━━━┓ ║
|
||||
║ ┃ ┃ ║
|
||||
║ ┃┏━━━━━━━━━Value━━━━━━━━━━┓┃ ║
|
||||
║ ┃┃┌──────────────────────┐┃┃ ║
|
||||
║ ┃┃│ Flag <uint8> │┃┃ ║
|
||||
║ ┃┃├──────────────────────┤┃┃ ║
|
||||
║ ┃┃│ len(Value) <varint> │┃┃ ║
|
||||
║ ┃┃├──────────────────────┤┃┃ ║
|
||||
║ ┃┃│ Value <byte...> │┃┃ ║
|
||||
║ ┃┃├──────────────────────┤┃┃ ║
|
||||
║ ┃┃│ len(Series) <varint> │┃┃ ║
|
||||
║ ┃┃├──────────────────────┤┃┃ ║
|
||||
║ ┃┃│SeriesIDs <uint32...> │┃┃ ║
|
||||
║ ┃┃└──────────────────────┘┃┃ ║
|
||||
║ ┃┗━━━━━━━━━━━━━━━━━━━━━━━━┛┃ ║
|
||||
║ ┃ ... ┃ ║
|
||||
║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║
|
||||
║ ┏━━━━━━━━Hash Index━━━━━━━━┓ ║
|
||||
║ ┃ ┌──────────────────────┐ ┃ ║
|
||||
║ ┃ │ len(Values) <uint32> │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │Value Offset <uint64> │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │ ... │ ┃ ║
|
||||
║ ┃ └──────────────────────┘ ┃ ║
|
||||
║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║
|
||||
╚══════════════════════════════╝
|
||||
Once a series has been added, it can be removed in several ways. First, the
|
||||
individual series can be removed. Second, it can be removed as part of a bulk
|
||||
operation by deleting the entire measurement.
|
||||
|
||||
╔════════Tag Key Block═════════╗
|
||||
║ ║
|
||||
║ ┏━━━━━━━━━Key List━━━━━━━━━┓ ║
|
||||
║ ┃ ┃ ║
|
||||
║ ┃┏━━━━━━━━━━Key━━━━━━━━━━━┓┃ ║
|
||||
║ ┃┃┌──────────────────────┐┃┃ ║
|
||||
║ ┃┃│ Flag <uint8> │┃┃ ║
|
||||
║ ┃┃├──────────────────────┤┃┃ ║
|
||||
║ ┃┃│Value Offset <uint64> │┃┃ ║
|
||||
║ ┃┃├──────────────────────┤┃┃ ║
|
||||
║ ┃┃│ len(Key) <varint> │┃┃ ║
|
||||
║ ┃┃├──────────────────────┤┃┃ ║
|
||||
║ ┃┃│ Key <byte...> │┃┃ ║
|
||||
║ ┃┃└──────────────────────┘┃┃ ║
|
||||
║ ┃┗━━━━━━━━━━━━━━━━━━━━━━━━┛┃ ║
|
||||
║ ┃ ... ┃ ║
|
||||
║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║
|
||||
║ ┏━━━━━━━━Hash Index━━━━━━━━┓ ║
|
||||
║ ┃ ┌──────────────────────┐ ┃ ║
|
||||
║ ┃ │ len(Keys) <uint32> │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │ Key Offset <uint64> │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │ ... │ ┃ ║
|
||||
║ ┃ └──────────────────────┘ ┃ ║
|
||||
║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║
|
||||
╚══════════════════════════════╝
|
||||
|
||||
╔════════════Trailer══════════════╗
|
||||
║ ┌─────────────────────────────┐ ║
|
||||
║ │ Hash Index Offset <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Tag Set Size <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Tag Set Version <uint16> │ ║
|
||||
║ └─────────────────────────────┘ ║
|
||||
╚═════════════════════════════════╝
|
||||
The query engine needs to be able to look up series in a variety of ways such
|
||||
as by measurement name, by tag value, or by using regular expressions. The
|
||||
index provides an API to iterate over subsets of series and perform set
|
||||
operations such as unions and intersections.
|
||||
|
||||
|
||||
Measurements
|
||||
Log File Layout
|
||||
|
||||
╔══════════Measurements Block═══════════╗
|
||||
║ ║
|
||||
║ ┏━━━━━━━━━Measurement List━━━━━━━━━━┓ ║
|
||||
║ ┃ ┃ ║
|
||||
║ ┃┏━━━━━━━━━━Measurement━━━━━━━━━━━┓ ┃ ║
|
||||
║ ┃┃┌─────────────────────────────┐ ┃ ┃ ║
|
||||
║ ┃┃│ Flag <uint8> │ ┃ ┃ ║
|
||||
║ ┃┃├─────────────────────────────┤ ┃ ┃ ║
|
||||
║ ┃┃│ Tag Block Offset <uint64> │ ┃ ┃ ║
|
||||
║ ┃┃├─────────────────────────────┤ ┃ ┃ ║
|
||||
║ ┃┃│ len(Name) <varint> │ ┃ ┃ ║
|
||||
║ ┃┃├─────────────────────────────┤ ┃ ┃ ║
|
||||
║ ┃┃│ Name <byte...> │ ┃ ┃ ║
|
||||
║ ┃┃├─────────────────────────────┤ ┃ ┃ ║
|
||||
║ ┃┃│ len(Series) <uint32> │ ┃ ┃ ║
|
||||
║ ┃┃├─────────────────────────────┤ ┃ ┃ ║
|
||||
║ ┃┃│ SeriesIDs <uint32...> │ ┃ ┃ ║
|
||||
║ ┃┃└─────────────────────────────┘ ┃ ┃ ║
|
||||
║ ┃┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃ ║
|
||||
║ ┃ ... ┃ ║
|
||||
║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║
|
||||
║ ┏━━━━━━━━━━━━Hash Index━━━━━━━━━━━━━┓ ║
|
||||
║ ┃ ┌───────────────────────────────┐ ┃ ║
|
||||
║ ┃ │ len(Measurements) <uint32> │ ┃ ║
|
||||
║ ┃ ├───────────────────────────────┤ ┃ ║
|
||||
║ ┃ │ Measurement Offset <uint64> │ ┃ ║
|
||||
║ ┃ ├───────────────────────────────┤ ┃ ║
|
||||
║ ┃ │ ... │ ┃ ║
|
||||
║ ┃ └───────────────────────────────┘ ┃ ║
|
||||
║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║
|
||||
║ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ║
|
||||
║ ┃ Sketch ┃ ║
|
||||
║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║
|
||||
║ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ║
|
||||
║ ┃ Tombstone Sketch ┃ ║
|
||||
║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║
|
||||
║ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ║
|
||||
║ ┃ Trailer ┃ ║
|
||||
║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║
|
||||
╚═══════════════════════════════════════╝
|
||||
The write-ahead file that series initially are inserted into simply appends
|
||||
all new operations sequentially. It is simply composed of a series of log
|
||||
entries. An entry contains a flag to specify the operation type, the measurement
|
||||
name, the tag set, and a checksum.
|
||||
|
||||
╔════════════Trailer══════════════╗
|
||||
║ ┌─────────────────────────────┐ ║
|
||||
║ │ Block Offset <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Block Size <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Hash Index Offset <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Hash Index Size <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Sketch Offset <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Sketch Size <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Tomb Sketch Offset <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Tomb Sketch Size <uint64> │ ║
|
||||
║ ├─────────────────────────────┤ ║
|
||||
║ │ Block Version <uint16> │ ║
|
||||
║ └─────────────────────────────┘ ║
|
||||
╚═════════════════════════════════╝
|
||||
┏━━━━━━━━━LogEntry━━━━━━━━━┓
|
||||
┃ ┌──────────────────────┐ ┃
|
||||
┃ │ Flag │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Measurement │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Key/Value │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Key/Value │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Key/Value │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Checksum │ ┃
|
||||
┃ └──────────────────────┘ ┃
|
||||
┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛
|
||||
|
||||
When the log file is replayed, if the checksum is incorrect or the entry is
|
||||
incomplete (because of a partially failed write) then the log is truncated.
|
||||
|
||||
|
||||
WAL
|
||||
Index File Layout
|
||||
|
||||
╔═════════════WAL══════════════╗
|
||||
║ ║
|
||||
║ ┏━━━━━━━━━━Entry━━━━━━━━━━━┓ ║
|
||||
║ ┃ ┌──────────────────────┐ ┃ ║
|
||||
║ ┃ │ Flag <uint8> │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │ len(Name) <varint> │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │ Name <byte...> │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │ len(Tags) <varint> │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │ len(Key0) <varint> │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │ Key0 <byte...> │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │ len(Value0) <varint> │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │ Value0 <byte...> │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │ ... │ ┃ ║
|
||||
║ ┃ ├──────────────────────┤ ┃ ║
|
||||
║ ┃ │ Checksum <uint32> │ ┃ ║
|
||||
║ ┃ └──────────────────────┘ ┃ ║
|
||||
║ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ║
|
||||
║ ... ║
|
||||
╚══════════════════════════════╝
|
||||
The index file is composed of 3 main block types: one series block, one or more
|
||||
tag blocks, and one measurement block. At the end of the index file is a
|
||||
trailer that records metadata such as the offsets to these blocks.
|
||||
|
||||
|
||||
Series Block Layout
|
||||
|
||||
The series block stores raw series keys in sorted order. It also provides hash
|
||||
indexes so that series can be looked up quickly. Hash indexes are inserted
|
||||
periodically so that memory size is limited at write time. Once all the series
|
||||
and hash indexes have been written then a list of index entries are written
|
||||
so that hash indexes can be looked up via binary search.
|
||||
|
||||
The end of the block contains two HyperLogLog++ sketches which track the
|
||||
estimated number of created series and deleted series. After the sketches is
|
||||
a trailer which contains metadata about the block.
|
||||
|
||||
┏━━━━━━━SeriesBlock━━━━━━━━┓
|
||||
┃ ┌──────────────────────┐ ┃
|
||||
┃ │ Series Key │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Series Key │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Series Key │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ │ ┃
|
||||
┃ │ Hash Index │ ┃
|
||||
┃ │ │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Series Key │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Series Key │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Series Key │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ │ ┃
|
||||
┃ │ Hash Index │ ┃
|
||||
┃ │ │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Index Entries │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ HLL Sketches │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Trailer │ ┃
|
||||
┃ └──────────────────────┘ ┃
|
||||
┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛
|
||||
|
||||
|
||||
Tag Block Layout
|
||||
|
||||
After the series block is one or more tag blocks. One of these blocks exists
|
||||
for every measurement in the index file. The block is structured as a sorted
|
||||
list of values for each key and then a sorted list of keys. Each of these lists
|
||||
has their own hash index for fast direct lookups.
|
||||
|
||||
┏━━━━━━━━Tag Block━━━━━━━━━┓
|
||||
┃ ┌──────────────────────┐ ┃
|
||||
┃ │ Value │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Value │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Value │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ │ ┃
|
||||
┃ │ Hash Index │ ┃
|
||||
┃ │ │ ┃
|
||||
┃ └──────────────────────┘ ┃
|
||||
┃ ┌──────────────────────┐ ┃
|
||||
┃ │ Value │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Value │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ │ ┃
|
||||
┃ │ Hash Index │ ┃
|
||||
┃ │ │ ┃
|
||||
┃ └──────────────────────┘ ┃
|
||||
┃ ┌──────────────────────┐ ┃
|
||||
┃ │ Key │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Key │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ │ ┃
|
||||
┃ │ Hash Index │ ┃
|
||||
┃ │ │ ┃
|
||||
┃ └──────────────────────┘ ┃
|
||||
┃ ┌──────────────────────┐ ┃
|
||||
┃ │ Trailer │ ┃
|
||||
┃ └──────────────────────┘ ┃
|
||||
┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛
|
||||
|
||||
Each entry for values contains a sorted list of offsets for series keys that use
|
||||
that value. Series iterators can be built around a single tag key value or
|
||||
multiple iterators can be merged with set operators such as union or
|
||||
intersection.
|
||||
|
||||
|
||||
Measurement block
|
||||
|
||||
The measurement block stores a sorted list of measurements, their associated
|
||||
series offsets, and the offset to their tag block. This allows all series for
|
||||
a measurement to be traversed quickly and it allows fast direct lookups of
|
||||
measurements and their tags.
|
||||
|
||||
This block also contains HyperLogLog++ sketches for new and deleted
|
||||
measurements.
|
||||
|
||||
┏━━━━Measurement Block━━━━━┓
|
||||
┃ ┌──────────────────────┐ ┃
|
||||
┃ │ Measurement │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Measurement │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Measurement │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ │ ┃
|
||||
┃ │ Hash Index │ ┃
|
||||
┃ │ │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ HLL Sketches │ ┃
|
||||
┃ ├──────────────────────┤ ┃
|
||||
┃ │ Trailer │ ┃
|
||||
┃ └──────────────────────┘ ┃
|
||||
┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛
|
||||
|
||||
|
||||
Manifest file
|
||||
|
||||
The index is simply an ordered set of log and index files. These files can be
|
||||
merged together or rewritten but their order must always be the same. This is
|
||||
because series, measurements, & tags can be marked as deleted (aka tombstoned)
|
||||
and this action needs to be tracked in time order.
|
||||
|
||||
Whenever the set of active files is changed, a manifest file is written to
|
||||
track the set. The manifest specifies the ordering of files and, on startup,
|
||||
all files not in the manifest are removed from the index directory.
|
||||
|
||||
|
||||
Compacting index files
|
||||
|
||||
Compaction is the process of taking files and merging them together into a
|
||||
single file. There are two stages of compaction within TSI.
|
||||
|
||||
First, once log files exceed a size threshold then they are compacted into an
|
||||
index file. This threshold is relatively small because log files must maintain
|
||||
their index in the heap which TSI tries to avoid. Small log files are also very
|
||||
quick to convert into an index file so this is done aggressively.
|
||||
|
||||
Second, once a contiguous set of index files exceed a factor (e.g. 10x) then
|
||||
they are all merged together into a single index file and the old files are
|
||||
discarded. Because all blocks are written in sorted order, the new index file
|
||||
can be streamed and minimize memory use.
|
||||
|
||||
|
||||
Concurrency
|
||||
|
||||
Index files are immutable so they do not require fine grained locks, however,
|
||||
compactions require that we track which files are in use so they are not
|
||||
discarded too soon. This is done by using reference counting with file sets.
|
||||
|
||||
A file set is simply an ordered list of index files. When the current file set
|
||||
is obtained from the index, a counter is incremented to track its usage. Once
|
||||
the user is done with the file set, it is released and the counter is
|
||||
decremented. A file cannot be removed from the file system until this counter
|
||||
returns to zero.
|
||||
|
||||
Besides the reference counting, there are no other locking mechanisms when
|
||||
reading or writing index files. Log files, however, do require a lock whenever
|
||||
they are accessed. This is another reason to minimize log file size.
|
||||
|
||||
|
||||
*/
|
||||
|
|
Loading…
Reference in New Issue