fix: typos

pull/24376/head
Nga Tran 2021-12-15 17:26:52 -05:00
parent e877bc7033
commit 36b2cb021d
1 changed files with 9 additions and 9 deletions

View File

@ -43,7 +43,7 @@ Figure 1: Data Organization in an IOx Server
```
Each database of IOx owns three catalogs demonstrated in Figure 2. The `In-Memory Catalog` (or simply the `Catalog`), contains metadata to access all chunk types in both memory and Object Store described in [IOx Data Organization and LifeCycle](link). The `Preserved Catalog` is used to rebuild the database in case it is shutdown or unexpected destroyed by accidents or disasters. The `Query Catalog` includes extra information to access the right chunks and their statistics quickly for running queries, recording necessary log info, and reporting the results back to users. The `Other Properties` are a set of properties created/rebuilt with the database based on some default configuration parameters of the IOx Server or specified by the user who creates it.
Each database of IOx owns three catalogs demonstrated in Figure 2. The `In-Memory Catalog` (or simply the `Catalog`), contains metadata to access all chunk types in both memory and Object Store described in [IOx Data Organization and LifeCycle](link). The `Preserved Catalog` is used to rebuild the database in case it is shutdown or unexpectedly destroyed by accidents or disasters. The `Query Catalog` includes extra information to access the right chunks and their statistics quickly for running queries, recording necessary logs, and reporting the results back to users. The `Other Properties` are a set of properties created/rebuilt with the database based on some default configuration parameters of the IOx Server or specified by the user who creates it.
```text
┌───────────┐
@ -67,14 +67,14 @@ Figure 2: Metadata of a Database
```
## In-memory Catalog
`In-Memory Catalog` implemented in the code as [Catalog](https://github.com/influxdata/influxdb_iox/blob/8a2410e161996603a4147e319c71e5bb38ca9cb7/server/src/db/catalog.rs#L88) and shown in Figure 3 represents the structure of its actual `Data Organization` and hence looks similar to Figure 1. Note that the bottom layer of Figure 3 shows physical chunks of data (O-MUB 1, RUB 2, O-MUB 3, F-MUB 4, RUB 5, and OS 5) that the catalog does not store but points to. In short, the `In-Memory Catalog` contains a set of [`Tables`](https://github.com/influxdata/influxdb_iox/blob/8a2410e161996603a4147e319c71e5bb38ca9cb7/server/src/db/catalog/table.rs#L20), each of which consists of a set of [`Partitions`](https://github.com/influxdata/influxdb_iox/blob/8a2410e161996603a4147e319c71e5bb38ca9cb7/server/src/db/catalog/partition.rs#L116). A `Partition` includes a set of [`CatalogChunks`](https://github.com/influxdata/influxdb_iox/blob/76befe94ad14cd121d6fc5c58aa112997d9e211a/server/src/db/catalog/chunk.rs#L195), each of which is represented by a [`ChunkStage`](https://github.com/influxdata/influxdb_iox/blob/76befe94ad14cd121d6fc5c58aa112997d9e211a/server/src/db/catalog/chunk.rs#L130) that points to the corresponding Data Chunk(s). The previous document, [IOx Data Organization and LifeCycle](link), has described the chunk stages and chunk types supported in IOx.
`In-Memory Catalog` implemented in the code as [Catalog](https://github.com/influxdata/influxdb_iox/blob/8a2410e161996603a4147e319c71e5bb38ca9cb7/server/src/db/catalog.rs#L88) and shown in Figure 3 represents the structure of its actual `Data Organization` and hence looks similar to Figure 1. Note that the bottom layer of Figure 3 shows physical chunks of data (`O-MUB 1`, `RUB 2`, `O-MUB 3`, `F-MUB 4`, `RUB 5`, and `OS 5`) that the catalog does not store but points to. In short, the `In-Memory Catalog` contains a set of [`Tables`](https://github.com/influxdata/influxdb_iox/blob/8a2410e161996603a4147e319c71e5bb38ca9cb7/server/src/db/catalog/table.rs#L20), each of which consists of a set of [`Partitions`](https://github.com/influxdata/influxdb_iox/blob/8a2410e161996603a4147e319c71e5bb38ca9cb7/server/src/db/catalog/partition.rs#L116). A `Partition` includes a set of [`CatalogChunks`](https://github.com/influxdata/influxdb_iox/blob/76befe94ad14cd121d6fc5c58aa112997d9e211a/server/src/db/catalog/chunk.rs#L195), each of which is represented by a [`ChunkStage`](https://github.com/influxdata/influxdb_iox/blob/76befe94ad14cd121d6fc5c58aa112997d9e211a/server/src/db/catalog/chunk.rs#L130) that points to the corresponding Data Chunk(s). Please refer to the previous document, [IOx Data Organization and LifeCycle](link), for the description of chunk stages and chunk types supported in IOx.
Each object of the catalog contains information for us to operate the chunk lifecycle, run queries, as well as measure the health of the system [^prop]. For example, the `In-Memory Catalog` object itself includes [`CatalogMetrics`](https://github.com/influxdata/influxdb_iox/blob/873ce27b0c0c5e9da33e6f4fae8c5be7c163c16c/server/src/db/catalog/metrics.rs#L21) that measure how often its tables, partitions, and chunks are locked, what kinds of locks (e.g. shared or exclusive), and their lock wait time[^lock]. This information helps the IOx team to measure lock contention to adjust the database with a right setup. The `(L)` next to each object or property indicates they are modifiable but must be locked while doing so. Locking ensures the catalog consistency and integrity but will bring down the throughput and performance of the system, hence should be always measured for appropriate actions.
Each object of the catalog contains information for us to operate the chunk lifecycle, run queries, as well as measure the health of the system[^prop]. For example, the `In-Memory Catalog` object itself includes [`CatalogMetrics`](https://github.com/influxdata/influxdb_iox/blob/873ce27b0c0c5e9da33e6f4fae8c5be7c163c16c/server/src/db/catalog/metrics.rs#L21) that measure how often its tables, partitions, and chunks are locked, what kinds of locks (e.g. shared or exclusive), and their lock wait time[^lock]. This information helps the IOx team to measure lock contention to adjust the database with a right setup. The `(L)` next to each object or property indicates they are modifiable but must be locked while doing so. Locking ensures the catalog consistency and integrity but will bring down the throughput and performance of the system, hence should be always measured for appropriate actions.
Another example of information we handle is the `Partition`'s `PersistenceWindows` that keep track of ingested data within a partition to determine when it can be persisted. This allows IOx to receive out of order writes in their timestamps while persisting mostly in non-time overlapping Object Store files. The `CatalogChunk`'s `LifecycleAction` is another example that keeps track of on-going action for each chunk (e.g. `compacting`, `persisting`) to avoid running the same job on the same chunk.
Another example of information we handle is the `Partition`'s `PersistenceWindows` that keep track of ingested data within a partition to determine when it can be persisted. This allows IOx to receive out of order writes in their timestamps while persisting mostly in non-time overlapping Object Store files. The `CatalogChunk`'s `LifecycleAction` is another example that keeps track of on-going action of a chunk (e.g. `compacting`, `persisting`) to avoid running the same job on the chunk.
[^prop] To make Figure 3 easy to read, the properties listed in each object are just some examples. Click on the corresponding link to see the full set of properties in the code.
[^lock] Transaction and locks will be described more in a separate document.
[^prop]: To make Figure 3 easy to read, the properties listed in each object are just some examples. Click on the corresponding link to see the full set of properties in the code.
[^lock]: Transaction and locks will be described more in a separate document.
```text
┌──────────────────┐
@ -97,7 +97,7 @@ Another example of information we handle is the `Partition`'s `PersistenceWindow
┌─────────────────────────────┐ ▼ ┌─────────────────────────────┐
│ Partition 1 (L) │ │ Partition m (L) │ ...
│ --------------------------- │ ... │ --------------------------- │
│ PersistenceWindow │ │ PersistenceWindow
│ PersistenceWindows │ │ PersistenceWindows
│ CreatedTime & LastWriteTime │ │ CreatedTime & LastWriteTime │
│ NextChunkOrder │ │ NextChunkOrder │
│ PartitionMetrics │ │ PartitionMetrics │
@ -135,7 +135,7 @@ Another example of information we handle is the `Partition`'s `PersistenceWindow
Figure 3: In-Memory Catalog
```
## Preserved Catalog
Since the `In-Memory Catalog` is always in memory, we need a way to save it incrementally and rebuild it as quick as possible in case of accidents or disasters, or in case we need to restart an IOx Server, or when we want to move a database to different IOx Server. [`Preserved Catalog`](https://github.com/influxdata/influxdb_iox/blob/76befe94ad14cd121d6fc5c58aa112997d9e211a/parquet_catalog/src/core.rs#L212) is used for those purposes. Like data, IOx's `Preserved Catalog` is saved as Parquet files in Object Store such as Amazon S3 and only the catalog information of persisted chunks are stored in the Preserved Catalog. This is the result of our data lifecycle design that all data must be saved in the durable object store and that we do not need to touch the `Preserved Catalog' if nothing is persisted.
Since the `In-Memory Catalog` is always in memory, we need a way to save it incrementally and rebuild it as quick as possible in case of accidents or disasters, or in case we need to restart an IOx Server, or when we want to move a database to different IOx Server. [`Preserved Catalog`](https://github.com/influxdata/influxdb_iox/blob/76befe94ad14cd121d6fc5c58aa112997d9e211a/parquet_catalog/src/core.rs#L212) is used for those purposes. Like data, IOx's `Preserved Catalog` is saved as Parquet files in Object Store such as Amazon S3 and only the catalog information of persisted chunks are stored in the `Preserved Catalog`. This is the result of the data lifecycle design that all data must be saved in the durable object store and that we do not need to touch the `Preserved Catalog` if nothing is persisted. Every time we persist data to Object Store, we also incrementally update the `Preserved Catalog` accordingly.
The `Preserved Catalog` keeps [`DatabaseCheckpoints`](https://github.com/influxdata/influxdb_iox/blob/b39e01f7ba4f5d19f92862c5e87b90a40879a6c9/persistence_windows/src/checkpoint.rs#L554) which represent a sequence number of writes before which all the writes are persisted. To restart a database, the `In-Memory Catalog` will be rebuilt to the state of the latest `DatabaseCheckpoint` found in the `PreservedCatalog`, then all the writes after that will be (re)ingested [^write].
@ -148,7 +148,7 @@ Since Preserved Catalog only saves data to a consistent state and includes the w
## Query Catalog
[`Query Catalog`](https://github.com/influxdata/influxdb_iox/blob/218042784fdb3bd0aa2c13dcaaf2f39190d42329/server/src/db/access.rs#L123) shown in Figure 4 is an `In-Memory Catalog` plus extra information to access the right chunks and their statistics quickly for running queries, logging necessary information or metrics, and reporting the results back to users. For instance, since IOx uses [DataFusion](https://github.com/apache/arrow-datafusion) to plan and run queries, `UserTables` provide the required interface to DataFusion for doing so. Similarly, since we want to query our own catalog data such as number of tables, partitions, chunks types and so on, we construct `SystemTables` interface to send to DataFusion for that purpose. `ChunkAccess` helps find and prune chunks of the query table that do not include data needed by the query[^query]. `QueryLog` logs necessary information such as times and types the queries are run.
[^query] Chunk pruning, query planning, optimization, and execution are beyond the scope of this document.
[^query]: Chunk pruning, query planning, optimization, and execution are beyond the scope of this document.
```text
┌────────────────┐