Continue work on storage engine doc

2019-11-12 09:40:45 -08:00 · 2019-11-12 09:40:45 -08:00 · fe88a9d6e4
parent 761220c90a
commit fe88a9d6e4
1 changed files with 8 additions and 50 deletions
--- a/content/v2.0/reference/internals/storage-engine.md
+++ b/content/v2.0/reference/internals/storage-engine.md
@ -30,6 +30,7 @@ Major topics include:
 The storage engine handles data from the point an API request is received through writing it to the physical disk.
 Data is written to InfluxDB using [line protocol](/v2.0/reference/line-) sent via HTTP POST request to the `/write` endpoint.
 Batches of [points](/v2.0/reference/glossary/#point) are sent to InfluxDB, compressed, and written to a WAL for immediate durability.
+(A *point* is a series key, field value, and timestamp.)
 The points are also written to an in-memory cache and become immediately queryable.
 The cache is periodically written to disk in the form of [TSM](#time-structured-merge-tree-tsm) files.
 As TSM files accumulate, they are combined and compacted into higher level TSM files.
@ -50,21 +51,11 @@ When a client sends a write request, the following occurs:
 4. Return success to caller.

 `fsync()` takes the file and pushes pending writes all the way through any buffers and caches to disk.
-As a system call, `fsync()` has a kernel context switch which is expensive _in terms of time_ but guarantees your data is safe on disk.
-
-{{% note%}}
-To `fsync()` less frequently, batch your points.
-{{% /note %}}
+As a system call, `fsync()` has a kernel context switch which is computationally expensive, but guarantees your data is safe on disk.

 When the storage engine restarts, the WAL file is read back into the in-memory database.
 InfluxDB then snswer requests to the `/read` endpoint.

-<!-- TODO is this still true? -->
-<!-- On the file system, the WAL is made up of sequentially numbered files (`_000001.wal`). -->
-<!-- The file numbers are monotonically increasing and referred to as WAL segments. -->
-<!-- When a segment reaches 10MB in size, it is closed and a new one is opened. Each WAL segment stores multiple compressed blocks of writes and deletes. -->
-<!-- Each entry in the WAL follows a [TLV standard](https://en.wikipedia.org/wiki/Type-length-value) with a single byte representing the type of entry (write or delete), a 4 byte `uint32` for the length of the compressed block, and then the compressed block. -->
-
 {{% note%}}
 Once you receive a response to a write request, your data is on disk!
 {{% /note %}}
@ -78,22 +69,20 @@ Data is not compressed in the cache.
 The cache is recreated on restart by re-reading the WAL files on disk back into memory.
 The cache is queried at runtime and merged with the data stored in TSM files.

-<!-- From Scott: Points are organize by series. -->
-<!-- A series key defines the contents of a series and is comprised of a measurement, tag set, and field key. -->
-
-<!-- When the storage engine restarts, WAL files are written to the in-memory cache. -->
+When the storage engine restarts, WAL files are re-read into the in-memory cache.

 Queries to the storage engine will merge data from the cache with data from the TSM files.
 Queries execute on a copy of the data that is made from the cache at query processing time.
 This way writes that come in while a query is running do not affect the result.

-Deletes sent to the Cache will clear out the given key or the specific time range for the given key.
+Deletes sent to the cache will clear out the given key or the specific time range for the given key.

 ## Time-Structured Merge Tree (TSM)

 To efficiently compact and store data,
-the storage engine groups field values by [series](/v2.0/reference/key-concepts/data-elements/#series) key,
+the storage engine groups field values by series key,
 and then orders those field values by time.
+(A *series key* is defined by measurement, tag key and value, and field key.)

 The storage engine uses a **Time-Structured Merge Tree** (TSM) data format.
 TSM files store compressed series data in a columnar format.
@ -101,13 +90,7 @@ To improve efficiency, the storage engine only stores differences (or *deltas*)
 Column-oriented storage means we can read by series key and ignore what it doesn't need.
 Storing data in columns lets the storage engine read by series key.

-<!-- TERMS -->
-<!-- Some terminology: -->
-
-<!-- - a *series key* is defined by measurement, tag key and value, and field key. -->
-<!-- - a *point* is a series key, field value, and timestamp. -->
-
-After fields are stored safely in TSM files, WAL is truncated...
+After fields are stored safely in TSM files, the WAL is truncated and the cache is cleared.
 <!-- TODO what next? -->

 There’s a lot of logic and sophistication in the TSM compaction code.
@ -116,7 +99,7 @@ organize values for a series together into long runs to best optimize compressio

 ## Time Series Index (TSI)

-As data cardinality (number of series) grows, queries read more series keys and become slower.
+As data cardinality (the number of series) grows, queries read more series keys and become slower.

 The **Time Series Index** ensures queries remain fast as data cardinality of data grows...
 To keep queries fast as we have more data, we use a **Time Series Index**.
@ -127,28 +110,3 @@ The TSI stores series keys grouped by measurement, tag, and field.
 TSI answers two questions well:
 1) What measurements, tags, fields exist?
 2) Given a measurement, tags, and fields, what series keys exist?
-
-<!-- ## Shards -->
-<!-- A shard contains: -->
-<!--   WAL files -->
-<!--   TSM files -->
-<!--   TSI files -->
-<!-- Shards are time-bounded -->
-<!-- Retention policies have properties: duration and shard duration -->
-<!-- colder shards get more compacted -->
-
-<!-- =========== QUESTIONS -->
-<!-- Which parts of cache and WAL are configurable? -->
-<!-- Should we even mention shards? -->
-
-<!-- =========== OTHER -->
-<!-- V1 -->
-<!-- - FileStore - The FileStore mediates access to all TSM files on disk. -->
-<!--   It ensures that TSM files are installed atomically when existing ones are replaced as well as removing TSM files that are no longer used. -->
-<!-- - Compactor - The Compactor is responsible for converting less optimized Cache and TSM data into more read-optimized formats. -->
-<!--   It does this by compressing series, removing deleted data, optimizing indices and combining smaller files into larger ones. -->
-<!-- - Compaction Planner - The Compaction Planner determines which TSM files are ready for a compaction and ensures that multiple concurrent compactions do not interfere with each other. -->
-<!-- - Compression - Compression is handled by various Encoders and Decoders for specific data types. -->
-<!--   Some encoders are fairly static and always encode the same type the same way; -->
-<!--   others switch their compression strategy based on the shape of the data. -->
-<!-- - Writers/Readers - Each file type (WAL segment, TSM files, tombstones, etc..) has Writers and Readers for working with the formats. -->