diff --git a/content/v2.0/reference/key-concepts.md b/content/v2.0/reference/key-concepts.md index aa71f6d79..c53fd26f0 100644 --- a/content/v2.0/reference/key-concepts.md +++ b/content/v2.0/reference/key-concepts.md @@ -1,7 +1,7 @@ --- title: InfluxDB key concepts description: > - Concepts related to InfluxData products and platforms. + Concepts related to InfluxDB 2.0. weight: 7 menu: v2_0_ref: @@ -9,182 +9,176 @@ menu: v2.0/tags: [InfluxDB key concepts] --- -Learn the key concepts of InfluxDB 2.0, including: +Before working with InfluxDB 2.0, it's helpful to learn a few key concepts, including: - - - - - - - - - - - - - - - - - - - - - -
databasefield keyfield set
field valuemeasurementpoint
retention policyseriestag key
tag settag valuetimestamp
+- [InfluxDB table structure](#influxdb-layout) +- [InfluxDB data elements](#influxdb-data-elements) +- [InfluxDB design principles](/v2.0/reference/design-principles) + -To look up a specific term, see the [Glossary](/v2.0/reference/glossary/). +### InfluxDB table structure + +InfluxDB 2.0 uses the following table structure to store data: + +- **Annotation rows:** include the following rows: #group, #datatype, and #default. +- **Header row:** describes the data labels for each column in a row. +- **Data columns:** include the following columns: annotation, result, and table. +- **Data rows:** all rows that contain time series data. See [sample data](#sample-data) below. + +For specifications on the InfluxDB 2.0 table structure, see [Tables](/v2.0/reference/annotated-csv/#tables). + +**_Tip:_** To visualize your table structure in the InfluxDB user interface, click the **Data Explorer** icon +in the sidebar, create a query, click **Submit**, and then select **View Raw Data**. + +### InfluxDB data elements + +InfluxDB 2.0 includes the following data elements: + +| Data elements |||| +|:----|:----------|:---------|:-----------| +|[timestamp](#timestamp)|[field key](#field-key)|[field value](#field-value)|[field set](#field-set)| +[tag key](#tag-key)|[tag value](#tag-value)|[tag set](#tag-set)|[measurement](#measurement)| +|[series](#series)|[point](#point)|[bucket](#bucket)|[organization](#organization)| ### Sample data -To demonstrate key concepts, we'll use the sample data below: the number of butterflies and honeybees counted by two scientists (`anderson` and `mullen`) in two locations (`1` and `2`) from 12 AM to 6:12 AM on August 18, 2019. This sample data is stored in a bucket `my_bucket` and retained for the duration of the `default` retention policy. +The sample data below shows a number of bees counted by two scientists (`anderson` and `mullen`) in two locations (`1` and `2`) from 12 AM to 6 AM on August 18, 2019. The sample data is stored in a bucket `my_bucket` and retained for the duration of the retention policy specified in the [bucket](#bucket). -*Hint:* Hover over the links for tooltips<> to get acquainted with InfluxDB terminology and the layout. +**_Tip:_** Hover over purple terms to get acquainted with InfluxDB terminology and layout. -| Flag | Description | Input type | -|:---- |:----------- |:----------: | -| `-h`, `--help` | Help for the `find` command | | -| `-i`, `--id` | The authorization ID | string | -| `-o`, `--org` | The organization | string | -| `--org-id` | The organization ID | string | -| `-u`, `--user` | The user | string | -| `--user-id` | The user ID | string | +bucket: `my_bucket` -name: census -\------------------------------------- -time                                  butterflies     honeybees     location     scientist -2015-08-18T00:00:00Z   12                   23                    1                 anderson -2015-08-18T00:00:00Z   1                     30                    1                 mullen -2015-08-18T00:06:00Z   11                   28                    1                 anderson -2015-08-18T00:06:00Z   3                     28                    1                 mullen -2015-08-18T05:54:00Z   2                     11                    2                 anderson -2015-08-18T06:00:00Z   1                     10                    2                 anderson -2015-08-18T06:06:00Z   8                     23                    2                 mullen -2015-08-18T06:12:00Z   7                     22                    2                 mullen +| _time | _measurement| _field|_value|location|scientist| +|:------------------- |:------------|:--|:---|:-------|:------| +| 2019-08-18T00:00:00Z | census|bees |23 | 1 |anderson| +| 2019-08-18T00:00:00Z | census|bees |30 | 1 |mullen | +| 2019-08-18T00:06:00Z | census|bees |28 | 2 |anderson| +| | | | | | | | +| 2019-08-18T00:06:00Z| census| ants| 3 | 2 |mullen| -### Discussion +#### Timestamp -InfluxDB is a time series database so it makes sense to start with time. -In the sample data, you'll notice a column called `time`. All data stored in InfluxDB have a `time` column that stores timestamps. **Timestamps** shows the date and time in [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) UTC associated with particular data. +All data stored in InfluxDB has a `_time` column that stores timestamps. Timestamps show the date and time in [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) UTC associated with data. Timestamp precision is important. When you search data within a specified time interval, make sure the timestamp precision you're searching matches the timestamp precision in your dataset. -The next two columns, `butterflies` and `honeybees`, are fields. -Fields include field keys and field values. The **field keys** (`butterflies` and `honeybees`) are strings and store metadata. The field key: +#### Measurement -- `butterflies` includes field values `12`-`7` -- `honeybees` includes field values `23`-`22` +The `_measurement` column shows the name of the measurement `census`. Measurement names are strings. A measurement acts as a container for tags, fields, and timestamps. Use a measurement name that describes your data. The name `census` tells us that the field values record the number of `bees` and `ants`. A single measurement can belong to different [buckets](#bucket). -**Field values** are your data; they can be strings, floats, integers, or Booleans, and, because InfluxDB is a time series database, a field value is always associated with a timestamp. +#### Fields -Field values in the sample data include: +A field includes a field key stored in (`_field`) and associated field value(s) stored in (`_value`). -``` -12 23 -1 30 -11 28 -3 28 -2 11 -1 10 -8 23 -7 22 -``` +##### Field key -The collection of field-key and field-value pairs make up a **field set**. The eight field sets in the sample data: +The field keys `bees` and `ants` is a string that stores the name of the field. -* `butterflies = 12 honeybees = 23` -* `butterflies = 1 honeybees = 30` -* `butterflies = 11 honeybees = 28` -* `butterflies = 3 honeybees = 28` -* `butterflies = 2 honeybees = 11` -* `butterflies = 1 honeybees = 10` -* `butterflies = 8 honeybees = 23` -* `butterflies = 7 honeybees = 22` +##### Field values -Fields are required in InfluxDB data and are not indexed. -[Queries](/influxdb/v0.10/concepts/glossary/#query) filtering field values must scan all values to match conditions in the query. As a result, queries on fields are not as performant as queries on tags. In general, avoid storing commonly queried metadata in fields. +The field values are your data; they can be strings, floats, integers, or Booleans. A field value always has an associated timestamp. The field values in the sample data show the number of `bees` at specified times: `23`, `30`, and `28` and the number of `ants` at a specified time: `3`. + +##### Field sets + +A field set is a collection of field key-value pairs. The sample data includes the following four field sets: + +- `bees = 23` +- `bees = 30` +- `bees = 28` +- `ants = 3` + +#### Fields aren't indexed + +Fields are required in InfluxDB data and are not indexed. Queries that filter field values must scan all field values to match query conditions. As a result, queries on tags are more performant than queries on fields. Store commonly queried metadata in tags. + +#### Tags The last two columns in the sample data, `location` and `scientist`, are tags. -Tags include tag keys and tag values. -Both **tag keys** and **tag values** are stored as strings and record metadata. +Tags include tag keys and tag values that are stored as strings and metadata. + +##### Tag keys + The tag keys in the sample data are `location` and `scientist`. + +##### Tag values + The tag key `location` has two tag values: `1` and `2`. The tag key `scientist` also has two tag values: `anderson` and `mullen`. -In the data above, the **tag set** is the different combinations of all the tag key-value pairs. -The four tag sets in the sample data are: +##### Tag sets -* `location = 1`, `scientist = anderson` -* `location = 2`, `scientist = anderson` -* `location = 1`, `scientist = mullen` -* `location = 2`, `scientist = mullen` +The collection of tag key-value pairs make up a tag set. The sample data includes the following four tag sets: -Tags are optional. -You don't need to have tags in your data structure, but it's generally a good idea to make use of them because, unlike fields, tags are indexed. -This means that queries on tags are faster and that tags are ideal for storing commonly-queried metadata. +- `location = 1`, `scientist = anderson` +- `location = 2`, `scientist = anderson` +- `location = 1`, `scientist = mullen` +- `location = 2`, `scientist = mullen` -> **Why indexing matters: The schema case study** +#### Tags are indexed -> Say you notice that most of your queries focus on the values of the field keys `honeybees` and `butterflies`: +Tags are optional. You don't need tags in your data structure, but it's typically a good idea to include tags. +Because tags are indexed, queries on tags are faster than queries on fields. This makes tags ideal for storing commonly-queried metadata. -> `SELECT * FROM census WHERE butterflies = 1` -> `SELECT * FROM census WHERE honeybees = 23` +#### Why your schema matters -> Because fields aren't indexed, InfluxDB scans every value of `butterflies` in the first query and every value of `honeybees` in the second query before it provides a response. -That behavior can hurt query response times - especially on a much larger scale. -To optimize your queries, it may be beneficial to rearrange your [schema](/influxdb/v0.10/concepts/glossary/#schema) such that the fields (`butterflies` and `honeybees`) become the tags and the tags (`location` and `scientist`) become the fields: +If most of your queries focus on values in the fields, for example, a query to find when 23 bees were counted: -| _time | _measurement | scientist | _field | _value | -|----------------------|--------------|-----------|--------|-------| -| 2015-08-18T00:00:00Z | census | mullen | honeybees | 23 | -| 2015-08-18T00:00:00Z | census | langstroth | butterflies | 33 | -| 2015-08-18T00:06:00Z | census | anderson | butterflies | 45 | -| 2015-08-18T00:06:00Z | census | mullen | honeybees | 10 | +`SELECT * FROM census WHERE bees = 23` -> Now that `butterflies` and `honeybees` are tags, InfluxDB won't have to scan every one of their values when it performs the queries above - this means that your queries are even faster. +InfluxDB scans every field value in the dataset for `bees` before the query returns a response. If our sample `census` data grew to millions of rows, to optimize your query, you could rearrange your [schema](/v2.0/reference/glossary/#schema) so the fields (`bees` and `ants`) becomes tags and the tags (`location` and `scientist`) become fields: -The **measurement** acts as a container for tags, fields, and the `time` column, and the measurement name is the description of the data that are stored in the associated fields. -Measurement names are strings, and, for any SQL users out there, a measurement is conceptually similar to a table. -The only measurement in the sample data is `census`. -The name `census` tells us that the field values record the number of `butterflies` and `honeybees` - not their size, direction, or some sort of happiness index. +| _time | _measurement | _field | _value |bees|ants| +|:----------------------|--------------|--------|--------|-------|-------| +| 2019-08-18T00:00:00Z | census |scientist |anderson | 23 | | +| 2015-08-18T00:00:00Z | census |scientist | mullen | 30 | +| 2019-08-18T00:06:00Z | census |scientist| anderson| 28 | | +| | | | | | | +| 2019-08-18T00:00:00Z | census |location | 1 | 23 | | +| 2019-08-18T00:00:00Z | census |location | 1 | 30 | | +| 2019-08-18T00:06:00Z | census |location | 2 | 28 | | +| | | | | | | +| 2019-08-18T00:06:00Z | census |location | 2 | | 3 | -A single measurement can belong to different retention policies. -A **retention policy** describes how long InfluxDB keeps data (`DURATION`) and how many copies of this data is stored in the cluster (`REPLICATION`). -If you're interested in reading more about retention policies, check out [Database Management](/influxdb/v0.10/query_language/database_management/#retention-policy-management). +Now that `bees` and `ants` are tags, InfluxDB doesn't have to scan all `_field` and `_value` columns. This makes your queries faster. -In the sample data, everything in the `census` measurement belongs to the `default` retention policy. -InfluxDB automatically creates that retention policy; it has an infinite duration and a replication factor set to the number of nodes in the cluster. +#### Series -Now that you're familiar with measurements, tag sets, and retention policies it's time to discuss series. -In InfluxDB, a **series** is the collection of data that share a retention policy, measurement, and tag set. -The data above consist of four series: +Now that you're familiar with measurements, field sets, and tag sets, it's time to discuss **series keys** and **series**. A series key is the collection of data that shares a measurement, tag set, and field key. For example, the [sample data](#sample-data) includes four unique series: -| Arbitrary series number | Bucket | Measurement | Tag set | -|---|---|---|---| -| series 1 | `default` | `census` | `location = 1`,`scientist = anderson` | -| series 2 | `default` | `census` | `location = 2`,`scientist = anderson` | -| series 3 | `default` | `census` | `location = 1`,`scientist = mullen` | -| series 4 | `default` | `census` | `location = 2`,`scientist = mullen` | +| _measurement | tag set | _field | +|:-------------|:---------------------------------|:-------| +| census |location = 1,scientist = anderson|bees| +| census |location = 2,scientist = anderson |bees | +| census |location = 1,scientist = mullen |bees| +| census |location = 2,scientist = mullen |ants| -Understanding the concept of a series is essential when designing your [schema](/influxdb/v0.10/concepts/glossary/#schema) and when working with your data in InfluxDB. +A **series** is a group of field values for a unique series key. In a series, field values (`_values`) are ordered by timestamp (`_time`) in ascending order. -Finally, a **point** is the field set in the same series with the same timestamp. -For example, here's a single point: -``` -name: census ------------------ -time butterflies honeybees location scientist -2015-08-18T00:00:00Z 1 30 1 mullen -``` +| _time | _values | +|---------------------------|-------------| +| `2019-08-18T00:00:00Z` | `23` | +| `2019-08-18T00:00:00Z` | `30` | +| `2019-08-18T00:06:00Z` | `28` | +| `2019-08-18T00:06:00Z` | `3` | -The series in the example is defined by the retention policy (`default`), the measurement (`census`), and the tag set (`location = 1`, `scientist = mullen`). -The timestamp for the point is `2015-08-18T00:00:00Z`. +Understanding the concept of a series is essential when designing your [schema](v2.0/reference/glossary/#schema) and when working with your data in InfluxDB. -All of the stuff we've just covered is stored in a bucket call database `my_bucket`. -An InfluxDB **database** is similar to traditional relational databases and serves as a logical container for users, retention policies, continuous queries, and, of course, your time series data. -See [users](/influxdb/v0.10/administration/authentication_and_authorization/) and [continuous queries](/influxdb/v0.10/query_language/continuous_queries/) for more on those topics. +#### Point -Databases can have several users, continuous queries, retention policies, and measurements. -InfluxDB is a schemaless database, so you can easily add new measurements, tags, and fields at any time. +A **point** includes the series key, a field value, and a timestamp. For example, a single point from the [sample data](#sample-data) looks like this: -If you're just starting out, we recommend taking a look at [Getting Started](/influxdb/v0.10/introduction/getting_started/) and the [Writing Data](/influxdb/v0.10/guides/writing_data/) and [Querying Data](/influxdb/v0.10/guides/querying_data/) guides. -May our time series database serve you well 🕔. +`2019-08-18T00:00:00Z bees 30 census 1 mullen` + + + +#### Bucket + +All InfluxDB data is stored in a bucket. A **bucket** combines the concept of a database and a retention period (the duration of time that each data point persists). A bucket belongs to an organization. For more information about buckets, see [Manage buckets](https://v2.docs.influxdata.com/v2.0/organizations/buckets/). + +#### Organization + +An InfluxDB **organization** is a workspace for a group of [users](/v2.0/users/). All [dashboards](/v2.0/visualize-data/dashboards/), [tasks](/v2.0/process-data/), buckets, and users belong to an organization. For more information about organizations, see [Manage organizations](https://v2.docs.influxdata.com/v2.0/organizations/). + +If you're just starting out, we recommend taking a look at the following guides: + +- [Getting Started](/influxdb/v0.10/introduction/getting_started/) +- [Writing Data](/influxdb/v0.10/guides/writing_data/) +- [Querying Data](/influxdb/v0.10/guides/querying_data/)