Add compaction information (#4084)

* add word to dict * add compation info tsi details * misc edits * add compaction info to oss 2.2+
2022-06-16 14:49:57 -07:00 · 2022-06-16 14:49:57 -07:00 · 9b4de3b064
parent 516662cf12
commit 9b4de3b064
3 changed files with 40 additions and 9 deletions
--- a/.vscode/settings.json
+++ b/.vscode/settings.json
@ -1,6 +1,7 @@
 {
    "cSpell.words": [
        "CLOCKFACE",
+        "compactible",
        "dbrp",
        "downsample",
        "eastus",
--- a/content/influxdb/v1.8/concepts/tsi-details.md
+++ b/content/influxdb/v1.8/concepts/tsi-details.md
@ -55,7 +55,6 @@ This command works at the server-level but you can optionally add database, rete

 For details on this command, see [influx inspect buildtsi](/influxdb/v1.8/tools/influx_inspect/#buildtsi).

-
 ## Understanding TSI

 ### File organization
@ -86,12 +85,29 @@ The following occurs when a write comes into the system:

 ### Compaction

-Once the LogFile exceeds a threshold (5MB), then a new active log file is created and the previous one begins compacting into an IndexFile.
-This first index file is at level 1 (L1).
-The log file is considered level 0 (L0).
+When compactions are enabled, every second, InfluxDB checks to see whether compactions are needed. If there haven't been writes during the `compact-full-write-cold-duration` period (by default, `4h`), InfluxDB compacts all TSM files. Otherwise, InfluxDB groups TSM files into compaction levels (determined by the number of times the file have been compacted), and attempts to combine files and compress them more efficiently.

-Index files can also be created by merging two smaller index files together.
-For example, if contiguous two L1 index files exist then they can be merged into an L2 index file.
+Once the `LogFile` exceeds a threshold (`5MB`), InfluxDB creates a new active log file, and the previous one begins compacting into an `IndexFile`. This first index file is at level 1 (L1). The log file is considered level 0 (L0). Index files can also be created by merging two smaller index files together.
+For example, if two contiguous L1 index files exist, InfluxDB merges them into an L2 index file.
+
+InfluxDB schedules compactions preferentially, using the following guidelines:
+
+- The lower the level (the fewer times a file has been compacted), the more weight is given to compacting it.
+- The more compactible files in a level, the higher the priority given to that level. If the number of files in each level is equal, lower levels are compacted first.
+- If a higher level has more candidates for compaction, it may be compacted before a lower level. InfluxDB multiplies the number of collection groups (collections of files to compact into a single next-generation file) by a specified weight (0.4, 0.3, 0.2, and 0.1) per level, to determine the compaction priority.
+
+#### Important compaction configuration settings
+
+Compaction workloads are driven by the ingestion rate of the database and the following throttling configuration settings:
+
+- `cache-snapshot-memory-size`: Specifies the `write-cache` size kept in memory before data is written to a TSM file.
+- `cache-snapshot-write-cold-duration`: If a cache does not exceed the `cache-snapshot-memory-size` size, this specifies the length of time to keep data in memory without any incoming data before data is written to a TSM file.
+- `max-concurrent-compactions`: The number compactions can run at once.
+- `compact-throughput`: Controls the average disk IO by the compaction engine.
+- `compact-throughput-burst`: Controls maximum disk IO by the compaction engine.
+- `compact-full-write-cold-duration`: How long a shard must receive no writes or deletes before it is scheduled for full compaction.
+
+These configuration settings are especially beneficial for systems with irregular loads, limiting compactions during periods of high usage, and letting compactions catch up during periods of lower load. In systems with stable loads, if compactions interfere with other operations, typically, the system is undersized for its load, and configuration changes won't help much.

 ### Reads

--- a/content/influxdb/v2.2/reference/internals/shards.md
+++ b/content/influxdb/v2.2/reference/internals/shards.md
@ -104,18 +104,30 @@ historical data, InfluxDB writes to older shards that must first be un-compacted
 When the backfill is complete, InfluxDB re-compacts the older shards.

 ### Shard compaction
-InfluxDB compacts shards at regular intervals to compress time series data and optimize disk usage.
+
+InfluxDB compacts shards at regular intervals to compress time series data and optimize disk usage. When compactions are enabled, InfluxDB checks to see whether shard compactions are needed every second. If there haven't been writes during the `compact-full-write-cold-duration` period (by default, `4h`), InfluxDB compacts all TSM files. Otherwise, InfluxDB groups TSM files into compaction levels (determined by the number of times the file have been compacted), and attempts to combine files and compress them more efficiently.
+
 InfluxDB uses the following four compaction levels:

- **Level 1 (L1):** InfluxDB flushes all newly written data held in an in-memory cache to disk.
+- **Level 0 (L0):** The log file (`LogFile`) is considered level 0 (L0). Once this file exceeds a `5MB` threshold, InfluxDB creates a new active log file, and the previous one begins compacting into an `IndexFile`. This first index file is at level 1 (L1).
+- **Level 1 (L1):** InfluxDB flushes all newly written data held in an in-memory cache to disk into an `IndexFile`.
 - **Level 2 (L2):** InfluxDB compacts up to eight L1-compacted files into one or more L2 files by
     combining multiple blocks containing the same series into fewer blocks in one or more new files.
 - **Level 3 (L3):** InfluxDB iterates over L2-compacted file blocks (over a certain size)
  and combines multiple blocks containing the same series into one block in a new file.
- **Level 4 (L4):** **Full compaction**—InfluxDB iterates over L3-compacted file blocks
+- **Level 4 (L4):** **Full compaction** InfluxDB iterates over L3-compacted file blocks
  and combines multiple blocks containing the same series into one block in a new file.

+InfluxDB schedules compactions preferentially, using the following guidelines:
+
+- The lower the level (fewer times the file has been compacted), the more weight is given to compacting the file.
+- The more compactible files in a level, the higher the priority given to compacting that level. If the number of files in each level is equal, lower levels are compacted first.
+- If a higher level has more candidates for compaction, it may be compacted before a lower level. InfluxDB multiplies the number of collection groups (collections of files to compact into a single next-generation file) by a specified weight (0.4, 0.3, 0.2, and 0.1) per level, to determine the compaction priority.
+
 ##### Shard compaction-related configuration settings
+
+The following configuration settings are especially beneficial for systems with irregular loads, because they limit compactions during periods of high usage, and let compactions catch up during periods of lower load:
+
 - [`storage-compact-full-write-cold-duration`](/influxdb/v2.2/reference/config-options/#storage-compact-full-write-cold-duration)
 - [`storage-compact-throughput-burst`](/influxdb/v2.2/reference/config-options/#storage-compact-throughput-burst)
 - [`storage-max-concurrent-compactions`](/influxdb/v2.2/reference/config-options/#storage-max-concurrent-compactions)
@ -123,6 +135,8 @@ InfluxDB uses the following four compaction levels:
 - [`storage-series-file-max-concurrent-snapshot-compactions`](/influxdb/v2.2/reference/config-options/#storage-series-file-max-concurrent-snapshot-compactions)
 - [`storage-series-file-max-concurrent-snapshot-compactions`](/influxdb/v2.2/reference/config-options/#storage-series-file-max-concurrent-snapshot-compactions)

+In systems with stable loads, if compactions interfere with other operations, typically, the system is undersized for its load, and configuration changes won't help much.
+
 ## Shard deletion
 The InfluxDB **retention enforcement service** routinely checks for shard groups
 older than their bucket's retention period.