Continue work on storage engine doc
parent
761220c90a
commit
fe88a9d6e4
|
@ -30,6 +30,7 @@ Major topics include:
|
|||
The storage engine handles data from the point an API request is received through writing it to the physical disk.
|
||||
Data is written to InfluxDB using [line protocol](/v2.0/reference/line-) sent via HTTP POST request to the `/write` endpoint.
|
||||
Batches of [points](/v2.0/reference/glossary/#point) are sent to InfluxDB, compressed, and written to a WAL for immediate durability.
|
||||
(A *point* is a series key, field value, and timestamp.)
|
||||
The points are also written to an in-memory cache and become immediately queryable.
|
||||
The cache is periodically written to disk in the form of [TSM](#time-structured-merge-tree-tsm) files.
|
||||
As TSM files accumulate, they are combined and compacted into higher level TSM files.
|
||||
|
@ -50,21 +51,11 @@ When a client sends a write request, the following occurs:
|
|||
4. Return success to caller.
|
||||
|
||||
`fsync()` takes the file and pushes pending writes all the way through any buffers and caches to disk.
|
||||
As a system call, `fsync()` has a kernel context switch which is expensive _in terms of time_ but guarantees your data is safe on disk.
|
||||
|
||||
{{% note%}}
|
||||
To `fsync()` less frequently, batch your points.
|
||||
{{% /note %}}
|
||||
As a system call, `fsync()` has a kernel context switch which is computationally expensive, but guarantees your data is safe on disk.
|
||||
|
||||
When the storage engine restarts, the WAL file is read back into the in-memory database.
|
||||
InfluxDB then snswer requests to the `/read` endpoint.
|
||||
|
||||
<!-- TODO is this still true? -->
|
||||
<!-- On the file system, the WAL is made up of sequentially numbered files (`_000001.wal`). -->
|
||||
<!-- The file numbers are monotonically increasing and referred to as WAL segments. -->
|
||||
<!-- When a segment reaches 10MB in size, it is closed and a new one is opened. Each WAL segment stores multiple compressed blocks of writes and deletes. -->
|
||||
<!-- Each entry in the WAL follows a [TLV standard](https://en.wikipedia.org/wiki/Type-length-value) with a single byte representing the type of entry (write or delete), a 4 byte `uint32` for the length of the compressed block, and then the compressed block. -->
|
||||
|
||||
{{% note%}}
|
||||
Once you receive a response to a write request, your data is on disk!
|
||||
{{% /note %}}
|
||||
|
@ -78,22 +69,20 @@ Data is not compressed in the cache.
|
|||
The cache is recreated on restart by re-reading the WAL files on disk back into memory.
|
||||
The cache is queried at runtime and merged with the data stored in TSM files.
|
||||
|
||||
<!-- From Scott: Points are organize by series. -->
|
||||
<!-- A series key defines the contents of a series and is comprised of a measurement, tag set, and field key. -->
|
||||
|
||||
<!-- When the storage engine restarts, WAL files are written to the in-memory cache. -->
|
||||
When the storage engine restarts, WAL files are re-read into the in-memory cache.
|
||||
|
||||
Queries to the storage engine will merge data from the cache with data from the TSM files.
|
||||
Queries execute on a copy of the data that is made from the cache at query processing time.
|
||||
This way writes that come in while a query is running do not affect the result.
|
||||
|
||||
Deletes sent to the Cache will clear out the given key or the specific time range for the given key.
|
||||
Deletes sent to the cache will clear out the given key or the specific time range for the given key.
|
||||
|
||||
## Time-Structured Merge Tree (TSM)
|
||||
|
||||
To efficiently compact and store data,
|
||||
the storage engine groups field values by [series](/v2.0/reference/key-concepts/data-elements/#series) key,
|
||||
the storage engine groups field values by series key,
|
||||
and then orders those field values by time.
|
||||
(A *series key* is defined by measurement, tag key and value, and field key.)
|
||||
|
||||
The storage engine uses a **Time-Structured Merge Tree** (TSM) data format.
|
||||
TSM files store compressed series data in a columnar format.
|
||||
|
@ -101,13 +90,7 @@ To improve efficiency, the storage engine only stores differences (or *deltas*)
|
|||
Column-oriented storage means we can read by series key and ignore what it doesn't need.
|
||||
Storing data in columns lets the storage engine read by series key.
|
||||
|
||||
<!-- TERMS -->
|
||||
<!-- Some terminology: -->
|
||||
|
||||
<!-- - a *series key* is defined by measurement, tag key and value, and field key. -->
|
||||
<!-- - a *point* is a series key, field value, and timestamp. -->
|
||||
|
||||
After fields are stored safely in TSM files, WAL is truncated...
|
||||
After fields are stored safely in TSM files, the WAL is truncated and the cache is cleared.
|
||||
<!-- TODO what next? -->
|
||||
|
||||
There’s a lot of logic and sophistication in the TSM compaction code.
|
||||
|
@ -116,7 +99,7 @@ organize values for a series together into long runs to best optimize compressio
|
|||
|
||||
## Time Series Index (TSI)
|
||||
|
||||
As data cardinality (number of series) grows, queries read more series keys and become slower.
|
||||
As data cardinality (the number of series) grows, queries read more series keys and become slower.
|
||||
|
||||
The **Time Series Index** ensures queries remain fast as data cardinality of data grows...
|
||||
To keep queries fast as we have more data, we use a **Time Series Index**.
|
||||
|
@ -127,28 +110,3 @@ The TSI stores series keys grouped by measurement, tag, and field.
|
|||
TSI answers two questions well:
|
||||
1) What measurements, tags, fields exist?
|
||||
2) Given a measurement, tags, and fields, what series keys exist?
|
||||
|
||||
<!-- ## Shards -->
|
||||
<!-- A shard contains: -->
|
||||
<!-- WAL files -->
|
||||
<!-- TSM files -->
|
||||
<!-- TSI files -->
|
||||
<!-- Shards are time-bounded -->
|
||||
<!-- Retention policies have properties: duration and shard duration -->
|
||||
<!-- colder shards get more compacted -->
|
||||
|
||||
<!-- =========== QUESTIONS -->
|
||||
<!-- Which parts of cache and WAL are configurable? -->
|
||||
<!-- Should we even mention shards? -->
|
||||
|
||||
<!-- =========== OTHER -->
|
||||
<!-- V1 -->
|
||||
<!-- - FileStore - The FileStore mediates access to all TSM files on disk. -->
|
||||
<!-- It ensures that TSM files are installed atomically when existing ones are replaced as well as removing TSM files that are no longer used. -->
|
||||
<!-- - Compactor - The Compactor is responsible for converting less optimized Cache and TSM data into more read-optimized formats. -->
|
||||
<!-- It does this by compressing series, removing deleted data, optimizing indices and combining smaller files into larger ones. -->
|
||||
<!-- - Compaction Planner - The Compaction Planner determines which TSM files are ready for a compaction and ensures that multiple concurrent compactions do not interfere with each other. -->
|
||||
<!-- - Compression - Compression is handled by various Encoders and Decoders for specific data types. -->
|
||||
<!-- Some encoders are fairly static and always encode the same type the same way; -->
|
||||
<!-- others switch their compression strategy based on the shape of the data. -->
|
||||
<!-- - Writers/Readers - Each file type (WAL segment, TSM files, tombstones, etc..) has Writers and Readers for working with the formats. -->
|
||||
|
|
Loading…
Reference in New Issue