add edits to design principles and key concepts

pull/537/head
Kelly 2019-10-17 17:24:28 -07:00
parent afef2b65a7
commit 6da1faa07e
2 changed files with 94 additions and 88 deletions

View File

@ -6,33 +6,31 @@ weight: 7
menu:
v2_0_ref:
name: Design principles
v2.0/tags: [InfluxDB design principles]
v2.0/tags: [key concepts, design principles]
---
InfluxDB 2.0 implements optimal design principles for time series data. Some of these design principles may have associated tradeoffs in performance.
InfluxDB implements optimal design principles for time series data. Some of these design principles may have associated tradeoffs in performance.
### Design principles and tradeoffs
- **Time-ordered data**
#### Data sent multiple times is duplicate data
To improve performance, data is written in time-ascending order.
To simplify conflict resolution and increase write performance, InfluxDB assumes data sent multiple times is duplicate data.Duplicate data isnt stored. In rare circumstances, data may be overwritten.
- **Strict update and delete permissions**
#### Strict permissions to update and delete data
To increase query and write performance, InfluxDB tightly restricts **update** and **delete** permissions. Time series data is predominantly new data that is never updated. Deletes generally only affect data that isn't being written to, and contentious updates never occur.
To increase query and write performance, InfluxDB restricts access to update and delete data. Time series data is predominantly new data that is never updated. Deletes are almost always of data that isn't being written to, and contentious updates never occur.|Update and delete functionality is significantly restricted.
- **Handle read and write queries first**
#### Data ordered by timestamps in ascending order
Writing and querying the data is more important than having a strongly consistent view. Multiple clients can writes InfluxDB at high loads. Query returns may not include the most recent points if database is under heavy load.
To significantly improve performance, data is written in time-ascending order. Writing points with random times (or in non-ascending order) is not performant.
- **Schemaless design**
#### Queries first, consistency second
InfluxDB uses a schemaless design to better manage discontinuous data. Time series data are often ephemeral, meaning the data appears for a few hours and then go away. For example, a new host that gets started and reports for a while and then gets shut down.
Writing and querying the data is more important than having a strongly consistent view. Multiple clients can writes InfluxDB at high loads. Query returns may not include the most recent points if database is under heavy load.
- **Datasets over individual points**
#### Schemaless design
Because the data set is more important than an individual point, InfluxDB implements powerful tools to aggregate data and handle large data sets. Points are differentiated by timestamp and series, so dont have IDs in the traditional sense.
To better manage discontinuous data, InfluxDB uses a schemaless design. Many time series are ephemeral. There are often time series that appear only for a few hours and then go away, for example, a new host that gets started and reports for a while and then gets shut down. A few database functions arent supported, for example, no cross table joins.
- **Duplicate data**
#### Datasets over individual points
Because the data set is more important than an individual point, InfluxDB implements powerful tools to aggregate data and handle large data sets. Points are differentiated by timestamp and series, so dont have IDs in the traditional sense.
To simplify conflict resolution and increase write performance, InfluxDB assumes data sent multiple times is duplicate data. Identical points aren't stored twice. If a new field value is submitted for a point, InfluxDB updates the point with the most recent field value. In rare circumstances, data may be overwritten. Learn more about [duplicate points](/v2.0/write-data/best-practices/duplicate-points/).

View File

@ -1,21 +1,39 @@
---
title: InfluxDB key concepts
description: >
Concepts related to InfluxDB 2.0.
Concepts related to InfluxDB.
weight: 7
menu:
v2_0_ref:
name: Key concepts
v2.0/tags: [InfluxDB key concepts]
v2.0/tags: [key concepts]
---
Before working with InfluxDB 2.0, it's helpful to learn a few key concepts, including:
- [InfluxDB table structure](#influxdb-layout)
- [InfluxDB data elements](#influxdb-data-elements)
- [InfluxDB table structure](#influxdb-layout)
- [InfluxDB design principles](/v2.0/reference/design-principles)
<!--- [InfluxDB 2.0 platform](/v2.0/reference/) -->
### InfluxDB data elements
InfluxDB 2.0 includes the following data elements:
- [timestamp](#timestamp)
- [field key](#field-key)
- [field value](#field-value)
- [field set](#field-set)
- [tag key](#tag-key)
- [tag value](#tag-value)
- [tag set](#tag-set)
- [measurement](#measurement)
- [series](#series)
- [point](#point)
- [bucket](#bucket)
- [organization](#organization)
### InfluxDB table structure
InfluxDB 2.0 uses the following table structure to store data:
@ -30,68 +48,61 @@ For specifications on the InfluxDB 2.0 table structure, see [Tables](/v2.0/refer
**_Tip:_** To visualize your table structure in the InfluxDB user interface, click the **Data Explorer** icon
in the sidebar, create a query, click **Submit**, and then select **View Raw Data**.
### InfluxDB data elements
InfluxDB 2.0 includes the following data elements:
| Data elements ||||
|:----|:----------|:---------|:-----------|
|[timestamp](#timestamp)|[field key](#field-key)|[field value](#field-value)|[field set](#field-set)|
[tag key](#tag-key)|[tag value](#tag-value)|[tag set](#tag-set)|[measurement](#measurement)|
|[series](#series)|[point](#point)|[bucket](#bucket)|[organization](#organization)|
### Sample data
The sample data below shows a number of bees counted by two scientists (`anderson` and `mullen`) in two locations (`1` and `2`) from 12 AM to 6 AM on August 18, 2019. The sample data is stored in a bucket `my_bucket` and retained for the duration of the retention policy specified in the [bucket](#bucket).
The sample data below shows a number of bees counted by two scientists (`anderson` and `mullen`) in two locations (`klamath` and `portland`) from 12 AM to 6 AM on August 18, 2019. The sample data is stored in the bucket, `my_bucket`, and retained for the duration of the retention policy specified in the [bucket](#bucket).
**_Tip:_** Hover over purple terms to get acquainted with InfluxDB terminology and layout.
bucket: `my_bucket`
**bucket:** `my_bucket`
| _time | _measurement| <span class="tooltip" data-tooltip-text="Field key">_field</span>|<span class="tooltip" data-tooltip-text="Field value">_value</span>|<span class="tooltip" data-tooltip-text="Tag key">location</span>|<span class="tooltip" data-tooltip-text="Tag key">scientist</span>|
|:------------------- |:------------|:--|:---|:-------|:------|
| 2019-08-18T00:00:00Z | census|bees |23 | 1 |anderson|
| 2019-08-18T00:00:00Z | census|bees |30 | 1 |mullen |
| 2019-08-18T00:06:00Z | census|bees |28 | 2 |anderson|
| | | | | | | |
| <span class="tooltip" data-tooltip-text="Timestamp">2019-08-18T00:06:00Z</span>| <span class="tooltip" data-tooltip-text="measurement"> census</span>| <span class="tooltip" data-tooltip-text="Field key">ants</span>| <span class="tooltip" data-tooltip-text="Field value">3</span> | <span class="tooltip" data-tooltip-text="Tag value">2</span> |<span class="tooltip" data-tooltip-text="Tag value">mullen</span>|
| _time | _measurement | <span class ="tooltip" data-tooltip-text ="Tag key">location</span> | <span class ="tooltip" data-tooltip-text ="Tag key">scientist</span> | _field | _value |
|:------------------- |:------------ |:------- |:------ |:-- |:------ |
| 2019-08-18T00:00:00Z | census | klamath | anderson | bees | 23 |
| 2019-08-18T00:00:00Z | census | portland | mullen | ants | 30 |
| 2019-08-18T00:06:00Z | census | klamath | anderson | bees | 28 |
| <span class="tooltip" data-tooltip-text="Timestamp">2019-08-18T00:06:00Z</span> | <span class="tooltip" data-tooltip-text="measurement"> census</span> | <span class ="tooltip" data-tooltip-text ="Tag value">portland</span> | <span class ="tooltip" data-tooltip-text ="Tag value">mullen</span> | <span class ="tooltip" data-tooltip-text ="Field key">ants</span> | <span class ="tooltip" data-tooltip-text ="Field value">32</span> |
#### Timestamp
All data stored in InfluxDB has a `_time` column that stores timestamps. Timestamps show the date and time in [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) UTC associated with data. Timestamp precision is important. When you search data within a specified time interval, make sure the timestamp precision you're searching matches the timestamp precision in your dataset.
All data stored in InfluxDB has a `_time` column that stores timestamps. On disk, timestamps are stored in epoch nanosecond format. InfluxDB formats timestamps show the date and time in [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) UTC associated with data. Timestamp precision is important when you write data.
#### Measurement
The `_measurement` column shows the name of the measurement `census`. Measurement names are strings. A measurement acts as a container for tags, fields, and timestamps. Use a measurement name that describes your data. The name `census` tells us that the field values record the number of `bees` and `ants`. A single measurement can belong to different [buckets](#bucket).
The `_measurement` column shows the name of the measurement `census`. Measurement names are strings. A measurement acts as a container for tags, fields, and timestamps. Use a measurement name that describes your data. The name `census` tells us that the field values record the number of `bees` and `ants`.
#### Fields
A field includes a field key stored in (`_field`) and associated field value(s) stored in (`_value`).
A field includes a field key stored in the `_field` column and a field value stored in the `_value` column.
##### Field key
The field keys `bees` and `ants` is a string that stores the name of the field.
A field key is a string that represents the name of the field. In the sample data above, `bees` and `ants` are field keys.
##### Field values
The field values are your data; they can be strings, floats, integers, or Booleans. A field value always has an associated timestamp. The field values in the sample data show the number of `bees` at specified times: `23`, `30`, and `28` and the number of `ants` at a specified time: `3`.
A field value represents the value of an associated field. Field values can be strings, floats, integers, or booleans. The field values in the sample data show the number of `bees` at specified times: `23`, and `28` and the number of `ants` at a specified time: `30` and `32`.
##### Field sets
A field set is a collection of field key-value pairs. The sample data includes the following four field sets:
A field set is a collection of field key-value pairs associated with a timestamp. The sample data includes the following four field sets:
- `bees = 23`
- `bees = 30`
- `bees = 28`
- `ants = 3`
```bash
#### Fields aren't indexed
census bees=23i,ants=30i 1566086400000000000
census bees=28i,ants=32i 1566086760000000000
-----------------
Field set
```
Fields are required in InfluxDB data and are not indexed. Queries that filter field values must scan all field values to match query conditions. As a result, queries on tags are more performant than queries on fields. Store commonly queried metadata in tags.
{{% note %}}
**Fields aren't indexed:** Fields are required in InfluxDB data and are not indexed. Queries that filter field values must scan all field values to match query conditions. As a result, queries on tags > are more performant than queries on fields. **Store commonly queried metadata in tags.**
{{% /note %}}
#### Tags
The last two columns in the sample data, `location` and `scientist`, are tags.
The columns in the sample data, `location` and `scientist`, are tags.
Tags include tag keys and tag values that are stored as strings and metadata.
##### Tag keys
@ -100,66 +111,63 @@ The tag keys in the sample data are `location` and `scientist`.
##### Tag values
The tag key `location` has two tag values: `1` and `2`.
The tag key `location` has two tag values: `klamath` and `portland`.
The tag key `scientist` also has two tag values: `anderson` and `mullen`.
##### Tag sets
The collection of tag key-value pairs make up a tag set. The sample data includes the following four tag sets:
- `location = 1`, `scientist = anderson`
- `location = 2`, `scientist = anderson`
- `location = 1`, `scientist = mullen`
- `location = 2`, `scientist = mullen`
```bash
location = klamath, scientist = anderson
location = portland, scientist = anderson
location = klamath, scientist = mullen
location = portland, scientist = mullen
```
#### Tags are indexed
Tags are optional. You don't need tags in your data structure, but it's typically a good idea to include tags.
{{% note %}}
**Tags are indexed:** Tags are optional. You don't need tags in your data structure, but it's typically a good idea to include tags.
Because tags are indexed, queries on tags are faster than queries on fields. This makes tags ideal for storing commonly-queried metadata.
{{% /note %}}
#### Why your schema matters
If most of your queries focus on values in the fields, for example, a query to find when 23 bees were counted:
`SELECT * FROM census WHERE bees = 23`
```bash
from(bucket: "bucket-name")
range(start: 2019-08-17T00:00:00Z, stop: 2019-08-19T00:00:00Z)
filter(fn: (r) => r._field == "bees" and r._value == 23)
```
InfluxDB scans every field value in the dataset for `bees` before the query returns a response. If our sample `census` data grew to millions of rows, to optimize your query, you could rearrange your [schema](/v2.0/reference/glossary/#schema) so the fields (`bees` and `ants`) becomes tags and the tags (`location` and `scientist`) become fields:
| _time | _measurement | _field | _value |<span class="tooltip" data-tooltip-text="Tag key">bees</span>|<span class="tooltip" data-tooltip-text="Tag key">ants</span>|
|:----------------------|--------------|--------|--------|-------|-------|
| 2019-08-18T00:00:00Z | census |scientist |anderson | 23 | |
| <span class="tooltip" data-tooltip-text="Timestamp">2015-08-18T00:00:00Z</span> | <span class="tooltip" data-tooltip-text="Measurement name">census</span> |<span class="tooltip" data-tooltip-text="Field key">scientist</span> | <span class="tooltip" data-tooltip-text="Field value">mullen</span> | <span class="tooltip" data-tooltip-text="Tag value">30</span> |
| 2019-08-18T00:06:00Z | census |scientist| anderson| 28 | |
| | | | | | |
| 2019-08-18T00:00:00Z | census |location | 1 | 23 | |
| 2019-08-18T00:00:00Z | census |location | 1 | 30 | |
| 2019-08-18T00:06:00Z | census |location | 2 | 28 | |
| | | | | | |
| 2019-08-18T00:06:00Z | census |location | 2 | | 3 |
| _time | _measurement | <span class ="tooltip" data-tooltip-text ="Tag key">bees</span> | _field | _value |
|:------------------- |:------------ |:------- |:-- |:------ |
| 2019-08-18T00:00:00Z | census | 23 | location | klamath |
| 2019-08-18T00:00:00Z | census | 23 | scientist | anderson |
| 2019-08-18T00:06:00Z | census | <span class ="tooltip" data-tooltip-text ="Tag value">28</span> | <span class ="tooltip" data-tooltip-text ="Field key">location</span> | <span class ="tooltip" data-tooltip-text ="Field value">klamath</span> |
| 2019-08-18T00:06:00Z | census | 28 | scientist | anderson |
| _time | _measurement | <span class ="tooltip" data-tooltip-text ="Tag key">ants</span> | _field | _value |
|:------------------- |:------------ |:------- |:-- |:------ |
| 2019-08-18T00:00:00Z | census | 30 | location | portland |
| 2019-08-18T00:00:00Z | census | 30 | scientist | mullen |
| 2019-08-18T00:06:00Z | census | <span class ="tooltip" data-tooltip-text ="Tag value">32</span> | <span class ="tooltip" data-tooltip-text ="Field key">location</span> | <span class ="tooltip" data-tooltip-text ="Field value">portland</span>|
| 2019-08-18T00:06:00Z | census | 32 | scientist | mullen |
Now that `bees` and `ants` are tags, InfluxDB doesn't have to scan all `_field` and `_value` columns. This makes your queries faster.
#### Series
Now that you're familiar with measurements, field sets, and tag sets, it's time to discuss **series keys** and **series**. A series key is the collection of data that shares a measurement, tag set, and field key. For example, the [sample data](#sample-data) includes four unique series:
Now that you're familiar with measurements, field sets, and tag sets, it's time to discuss **series keys** and **series**. A series is a collection of points that share a measurement, tag set, and field key. For example, the [sample data](#sample-data) includes four unique series:
| _measurement | tag set | _field |
|:-------------|:---------------------------------|:-------|
| census |<span class="tooltip" data-tooltip-text="Tag 1">location = 1</span>,scientist = anderson|<span class="tooltip" data-tooltip-text="Field key">bees</span>|
| census |location = 2,<span class="tooltip" data-tooltip-text="Tag 2">scientist = anderson</span> |bees |
| census |location = 1,scientist = mullen |bees|
| census |location = 2,scientist = mullen |ants|
| _measurement | tag set | _field |
|:------------- |:------------------------------- |:------ |
| census | <span class="tooltip" data-tooltip-text="Tag set">location=klamath,scientist=anderson</span> | <span class="tooltip" data-tooltip-text="Field key">bees</span>|
| census | location=portland,scientist=mullen | ants |
A **series** is a group of field values for a unique series key. In a series, field values (`_values`) are ordered by timestamp (`_time`) in ascending order.
| _time | _values |
|---------------------------|-------------|
| `2019-08-18T00:00:00Z` | `23` |
| `2019-08-18T00:00:00Z` | `30` |
| `2019-08-18T00:06:00Z` | `28` |
| `2019-08-18T00:06:00Z` | `3` |
Understanding the concept of a series is essential when designing your [schema](v2.0/reference/glossary/#schema) and when working with your data in InfluxDB.
Understanding the concept of a series is essential when designing your [schema](v2.0/reference/glossary/#schema) and working with your data in InfluxDB.
#### Point