docs-v2

10 KiB

Raw Blame History

title

description

InfluxDB schema design

Improve InfluxDB schema design and data layout. Store unique values in fields and other tips to reduce high cardinality in InfluxDB and make your data more performant.

influxdb_2_0

name	weight	parent
Schema design	201	write-best-practices

Every InfluxDB use case is unique and your schema reflects that uniqueness. There are, however, general guidelines to follow and pitfalls to avoid when designing your schema:

Store data in tags or fields?
Avoid keywords as tag or field names
Avoid too many series
Avoid the same name for a tag and a field
Avoid encoding data in measurement names
Avoid putting more than one piece of information in one tag

Store data in tags or fields?

Tags are indexed and fields are not. This means that queries on tags are more performant than queries on fields.

In general, your queries should guide what gets stored as a tag and what gets stored as a field:

Store commonly-queried meta data in tags.
Store data in fields if each data point contains a different value.
Store numeric values as fields (tag values only support string values).

Avoid keywords as tag or field names

Not required, but simplifies writing queries because you won't have to wrap tag or field names in double quotes. See Flux keywords to avoid.

Also, if a tag or field name contains characters other than [A-z,_], you must use bracket notation in Flux.

Avoid too many series

Tags containing highly variable information like UUIDs, hashes, and random strings lead to a large number of series in the database, also known as high series cardinality.

High series cardinality is a primary driver of high memory usage for many database workloads. When you write to InfluxDB, InfluxDB uses the measurements and tags to create indexes to speed up reads. However, when there are too many indexes created, both writes and reads may start to slow down. Therefore, if a system has memory constraints, consider storing high-cardinality data as a field rather than a tag.

Avoid the same name for a tag and a field

Avoid using the same name for a tag and field key, which may result in unexpected behavior when querying data.

Avoid encoding data in measurement names

InfluxDB queries merge data that falls within the same measurement; it's better to differentiate data with tags than with detailed measurement names. If you encode data in a measurement name, you must use a regular expression to query the data, making some queries more complicated or impossible.

Example:

Consider the following schema represented by line protocol.

Schema 1 - Data encoded in the measurement name
-------------
blueberries.plot-1.north temp=50.1 1472515200000000000
blueberries.plot-2.midwest temp=49.8 1472515200000000000

The long measurement names (blueberries.plot-1.north) with no tags are similar to Graphite metrics. Encoding the plot and region in the measurement name makes the data more difficult to query.

For example, calculating the average temperature of both plots 1 and 2 is not possible with schema 1. Compare this to schema 2:

Schema 2 - Data encoded in tags
-------------
weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000

Use Flux to calculate the average temp for blueberries in the north region:

Flux

// Schema 1 - Query for data encoded in the measurement name
from(bucket:"<database>/<retention_policy>")
  |> range(start:2016-08-30T00:00:00Z)
  |> filter(fn: (r) =>  r._measurement =~ /\.north$/ and r._field == "temp")
  |> mean()

// Schema 2 - Query for data encoded in tags
from(bucket:"<database>/<retention_policy>")
  |> range(start:2016-08-30T00:00:00Z)
  |> filter(fn: (r) =>  r._measurement == "weather_sensor" and r.region == "north" and r._field == "temp")
  |> mean()

Avoid putting more than one piece of information in one tag

Splitting a single tag with multiple pieces into separate tags simplifies your queries and reduces the need for regular expressions.

Consider the following schema represented by line protocol.

Schema 1 - Multiple data encoded in a single tag
-------------
weather_sensor,crop=blueberries,location=plot-1.north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,location=plot-2.midwest temp=49.8 1472515200000000000

The Schema 1 data encodes multiple separate parameters, the plot and region into a long tag value (plot-1.north). Compare this to the following schema represented in line protocol.

Schema 2 - Data encoded in multiple tags
-------------
weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000

Schema 2 is preferable because using multiple tags, you don't need a regular expression. The following Flux examples show how to calculate the average temp for blueberries in the north region; both for schema 1 and schema 2.

Flux

// Schema 1 -  Query for multiple data encoded in a single tag
from(bucket:"<database>/<retention_policy>")
  |> range(start:2016-08-30T00:00:00Z)
  |> filter(fn: (r) =>  r._measurement == "weather_sensor" and r.location =~ /\.north$/ and r._field == "temp")
  |> mean()

// Schema 2 - Query for data encoded in multiple tags
from(bucket:"<database>/<retention_policy>")
  |> range(start:2016-08-30T00:00:00Z)
  |> filter(fn: (r) =>  r._measurement == "weather_sensor" and r.region == "north" and r._field == "temp")
  |> mean()

10 KiB Raw Blame History

Store data in tags or fields?

Avoid keywords as tag or field names

Avoid too many series

Avoid the same name for a tag and a field

Avoid encoding data in measurement names

Flux

Avoid putting more than one piece of information in one tag

Flux

10 KiB

Raw Blame History