fix: write-data section: consistent page titles, database and table nomenclature in dedicated and clustered

pull/5503/head
Jason Stirnaman 2024-06-25 11:51:26 -05:00
parent f9bd0891d9
commit 9d542dd8d6
8 changed files with 197 additions and 530 deletions

View File

@ -13,46 +13,61 @@ menu:
Use the following guidelines to design your [schema](/influxdb/cloud-dedicated/reference/glossary/#schema)
for simpler and more performant queries.
- [InfluxDB data structure](#influxdb-data-structure)
- [Primary keys](#primary-keys)
- [Tags versus fields](#tags-versus-fields)
- [Schema restrictions](#schema-restrictions)
- [Do not use duplicate names for tags and fields](#do-not-use-duplicate-names-for-tags-and-fields)
- [Measurements can contain up to 250 columns](#measurements-can-contain-up-to-250-columns)
- [Tables can contain up to 250 columns](#tables-can-contain-up-to-250-columns)
- [Design for performance](#design-for-performance)
- [Avoid wide schemas](#avoid-wide-schemas)
- [Avoid sparse schemas](#avoid-sparse-schemas)
- [Measurement schemas should be homogenous](#measurement-schemas-should-be-homogenous)
- [Table schemas should be homogenous](#table-schemas-should-be-homogenous)
- [Use the best data type for your data](#use-the-best-data-type-for-your-data)
- [Design for query simplicity](#design-for-query-simplicity)
- [Keep measurement names, tags, and fields simple](#keep-measurement-names-tags-and-fields-simple)
- [Keep table names, tags, and fields simple](#keep-table-names-tags-and-fields-simple)
- [Avoid keywords and special characters](#avoid-keywords-and-special-characters)
## InfluxDB data structure
The InfluxDB data model organizes time series data into buckets and measurements.
A bucket can contain multiple measurements. Measurements contain multiple
tags and fields.
The {{% product-name %}} data model organizes time series data into databases and tables.
A database can contain multiple tables.
Tables contain multiple tags and fields.
- **Bucket**: Named location where time series data is stored.
In the InfluxDB SQL implementation, a bucket is synonymous with a _database_.
A bucket can contain multiple _measurements_.
- **Measurement**: Logical grouping for time series data.
In the InfluxDB SQL implementation, a measurement is synonymous with a _table_.
All _points_ in a given measurement should have the same _tags_.
A measurement contains multiple _tags_ and _fields_.
- **Tags**: Key-value pairs that store metadata string values for each point--for example,
a value that identifies or differentiates the data source or context--for example, host,
location, station, etc.
Tag values may be null.
- **Fields**: Key-value pairs that store data for each point--for example,
temperature, pressure, stock price, etc.
Field values may be null, but at least one field value is not null on any given row.
- **Timestamp**: Timestamp associated with the data.
When stored on disk and queried, all data is ordered by time.
In InfluxDB, a timestamp is a nanosecond-scale [unix timestamp](#unix-timestamp) in UTC.
A timestamp is never null.
<!-- vale InfluxDataDocs.v3Schema = NO -->
- **Database**: A named location where time series data is stored.
In {{% product-name %}}, _database_ is synonymous with _bucket_ in InfluxDB Cloud Serverless and InfluxDB TSM implementations.
A database can contain multiple _tables_.
- **Table**: A logical grouping for time series data.
In {{% product-name %}}, _table_ is synonymous with _measurement_ in InfluxDB Cloud Serverless and InfluxDB TSM implementations.
All _points_ in a given table should have the same _tags_.
A table contains multiple _tags_ and _fields_.
- **Tags**: Key-value pairs that store metadata string values for each point--for example,
a value that identifies or differentiates the data source or context--for example, host,
location, station, etc.
Tag values may be null.
- **Fields**: Key-value pairs that store data for each point--for example,
temperature, pressure, stock price, etc.
Field values may be null, but at least one field value is not null on any given row.
- **Timestamp**: Timestamp associated with the data.
When stored on disk and queried, all data is ordered by time.
In InfluxDB, a timestamp is a nanosecond-scale [Unix timestamp](/influxdb/cloud-dedicated/reference/glossary/#unix-timestamp) in UTC.
A timestamp is never null.
{{% note %}}
#### What happened to buckets and measurements?
If coming from InfluxDB Cloud Serverless or InfluxDB powered by the TSM storage engine, you're likely familiar
with the concepts _bucket_ and _measurement_.
_Bucket_ in TSM or InfluxDB Cloud Serverless is synonymous with
_database_ in {{% product-name %}}.
_Measurement_ in TSM or InfluxDB Cloud Serverless is synonymous with
_table_ in {{% product-name %}}.
{{% /note %}}
<!-- vale InfluxDataDocs.v3Schema = YES -->
### Primary keys
@ -91,39 +106,38 @@ cardinality doesn't affect the overall performance of your database.
### Do not use duplicate names for tags and fields
Tags and fields within the same measurement can't be named the same.
Tags and fields within the same table can't be named the same.
All tags and fields are stored as unique columns in a table representing the
measurement on disk.
If you attempt to write a measurement that contains tags or fields with the same name,
table on disk.
If you attempt to write a table that contains tags or fields with the same name,
the write fails due to a column conflict.
### Measurements can contain up to 250 columns
### Tables can contain up to 250 columns
A measurement can contain **up to 250 columns**. Each row requires a time column,
but the rest represent tags and fields stored in the measurement.
Therefore, a measurement can contain one time column and 249 total field and tag columns.
If you attempt to write to a measurement and exceed the 250 column limit, the
A table can contain **up to 250 columns**. Each row requires a time column,
but the rest represent tags and fields stored in the table.
Therefore, a table can contain one time column and 249 total field and tag columns.
If you attempt to write to a table and exceed the 250 column limit, the
write request fails and InfluxDB returns an error.
---
## Design for performance
How you structure your schema within a measurement can affect the overall
performance of queries against that measurement.
How you structure your schema within a table can affect the overall
performance of queries against that table.
The following guidelines help to optimize query performance:
- [Avoid wide schemas](#avoid-wide-schemas)
- [Avoid sparse schemas](#avoid-sparse-schemas)
- [Measurement schemas should be homogenous](#measurement-schemas-should-be-homogenous)
- [Table schemas should be homogenous](#table-schemas-should-be-homogenous)
- [Use the best data type for your data](#use-the-best-data-type-for-your-data)
### Avoid wide schemas
A wide schema is one with many tags and fields and corresponding columns for each.
With the InfluxDB v3 storage engine, wide schemas don't impact query execution performance.
Because v3 is a columnar database, it executes queries only against columns selected in the query.
Because InfluxDB v3 is a columnar database, it executes queries only against columns selected in the query.
Although a wide schema won't affect query performance, it can lead to the following:
@ -131,11 +145,11 @@ Although a wide schema won't affect query performance, it can lead to the follow
- Decreased sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
The InfluxDB v3 storage engine has a
[limit of 250 columns per measurement](#measurements-can-contain-up-to-250-columns).
[limit of 250 columns per table](#tables-can-contain-up-to-250-columns).
To avoid a wide schema, limit the number of tags and fields stored in a measurement.
To avoid a wide schema, limit the number of tags and fields stored in a table.
If you need to store more than 249 total tags and fields, consider segmenting
your fields into a separate measurement.
your fields into a separate table.
#### Avoid too many tags
@ -146,8 +160,9 @@ A point that contains more tags has a more complex primary key, which could impa
A sparse schema is one where, for many rows, columns contain null values.
These generally stem from the following:
- [non-homogenous measurement schemas](#measurement-schemas-should-be-homogenous)
These generally stem from the following:
- [non-homogenous table schemas](#table-schemas-should-be-homogenous)
- [writing individual fields with different timestamps](#writing-individual-fields-with-different-timestamps)
Sparse schemas require the InfluxDB query engine to evaluate many
@ -167,34 +182,38 @@ In contrast, if you report fields at different times while using the same tagset
This requires slightly more resources at ingestion time, but then gets resolved at persistence time or compaction time
and avoids a sparse schema.
### Measurement schemas should be homogenous
### Table schemas should be homogenous
Data stored within a measurement should be "homogenous," meaning each row should
Data stored within a table should be "homogenous," meaning each row should
have the same tag and field keys.
All rows stored in a measurement share the same columns, but if a point doesn't
All rows stored in a table share the same columns, but if a point doesn't
include a value for a column, the column value is null.
A measurement full of null values has a ["sparse" schema](#avoid-sparse-schemas).
A table full of null values has a ["sparse" schema](#avoid-sparse-schemas).
{{< expand-wrapper >}}
{{% expand "View example of a sparse, non-homogenous schema" %}}
Non-homogenous schemas are often caused by writing points to a measurement with
Non-homogenous schemas are often caused by writing points to a table with
inconsistent tag or field sets.
In the following example, data is collected from two
different sources and each source returns data with different tag and field sets.
{{< flex >}}
{{% flex-content %}}
##### Source 1 tags and fields:
- tags:
- source
- code
- crypto
- fields:
- price
{{% /flex-content %}}
{{% flex-content %}}
{{% /flex-content %}}
{{% flex-content %}}
##### Source 2 tags and fields:
- tags:
- src
- currency
@ -202,10 +221,10 @@ different sources and each source returns data with different tag and field sets
- fields:
- cost
- volume
{{% /flex-content %}}
{{< /flex >}}
{{% /flex-content %}}
{{< /flex >}}
These sets of data written to the same measurement result in a measurement
These sets of data written to the same table result in a table
full of null values (also known as a _sparse schema_):
| time | source | src | code | currency | crypto | price | cost | volume |
@ -230,25 +249,25 @@ querying over many long string values can negatively affect performance.
## Design for query simplicity
Naming conventions for measurements, tag keys, and field keys can simplify or
Naming conventions for tables, tag keys, and field keys can simplify or
complicate the process of writing queries for your data.
The following guidelines help to ensure writing queries for your data is as
simple as possible.
- [Keep measurement names, tags, and fields simple](#keep-measurement-names-tags-and-fields-simple)
- [Keep table names, tags, and fields simple](#keep-table-names-tags-and-fields-simple)
- [Avoid keywords and special characters](#avoid-keywords-and-special-characters)
### Keep measurement names, tags, and fields simple
### Keep table names, tags, and fields simple
Use one tag or one field for each data attribute.
If your source data contains multiple data attributes in a single parameter,
split each attribute into its own tag or field.
Measurement names, tag keys, and field keys should be simple and accurately
Table names, tag keys, and field keys should be simple and accurately
describe what each contains.
Keep names free of data.
The most common cause of a complex naming convention is when you try to "embed"
data attributes into a measurement name, tag key, or field key.
data attributes into a table name, tag key, or field key.
When each key and value represents one attribute (not multiple concatenated attributes) of your data,
you'll reduce the need for regular expressions in your queries.
@ -258,7 +277,7 @@ Without regular expressions, your queries will be easier to write and more perfo
For example, consider the following [line protocol](/influxdb/cloud-dedicated/reference/syntax/line-protocol/) that embeds multiple attributes (location, model, and ID) into a `sensor` tag value:
```
```text
home,sensor=loc-kitchen.model-A612.id-1726ZA temp=72.1
home,sensor=loc-bath.model-A612.id-2635YB temp=71.8
```
@ -309,7 +328,7 @@ are less performant than simple equality expressions.
The better approach would be to write each sensor attribute as a separate tag:
```
```text
home,location=kitchen,sensor_model=A612,sensor_id=1726ZA temp=72.1
home,location=bath,sensor_model=A612,sensor_id=2635YB temp=71.8
```
@ -351,19 +370,19 @@ or regular expressions.
### Avoid keywords and special characters
To simplify query writing, avoid using reserved keywords or special characters
in measurement names, tag keys, and field keys.
in table names, tag keys, and field keys.
- [SQL keywords](/influxdb/cloud-dedicated/reference/sql/#keywords)
- [InfluxQL keywords](/influxdb/cloud-dedicated/reference/influxql/#keywords)
When using SQL or InfluxQL to query measurements, tags, and fields with special
When using SQL or InfluxQL to query tables, tags, and fields with special
characters or keywords, you have to wrap these keys in **double quotes**.
```sql
SELECT
"example-field", "tag@1-23"
FROM
"example-measurement"
"example-table"
WHERE
"tag@1-23" = 'ABC'
```

View File

@ -5,7 +5,7 @@ description: >
to InfluxDB Cloud Dedicated.
menu:
influxdb_cloud_dedicated:
name: Write line protocol
name: Write line protocol data
parent: Write data
weight: 101
related:

View File

@ -6,7 +6,7 @@ description: >
menu:
influxdb_cloud_dedicated:
name: Use client libraries
parent: Write line protocol
parent: Write line protocol data
identifier: write-client-libs
weight: 103
related:

View File

@ -6,7 +6,7 @@ description: >
menu:
influxdb_cloud_dedicated:
name: Use the influxctl CLI
parent: Write line protocol
parent: Write line protocol data
identifier: write-influxctl
weight: 101
related:
@ -89,7 +89,7 @@ Provide the following:
{{% tab-content %}}
{{% influxdb/custom-timestamps %}}
{{% code-placeholders "DATABASE_(NAME|TOKEN)|(LINE_PROTOCOL_FILEPATH)" %}}
{{% code-placeholders "DATABASE_(NAME|TOKEN)" %}}
```sh
influxctl write \
@ -123,7 +123,7 @@ Replace the following:
{{% /tab-content %}}
{{% tab-content %}}
{{% code-placeholders "DATABASE_(NAME|TOKEN)|(LINE_PROTOCOL_FILEPATH)" %}}
{{% code-placeholders "DATABASE_(NAME|TOKEN)|(\$LINE_PROTOCOL_FILEPATH)" %}}
1. In your terminal, enter the following command to create the sample data file:

View File

@ -1,379 +0,0 @@
---
title: Write line protocol data to InfluxDB Cloud Serverless
description: >
Use Telegraf and API clients to write line protocol data
to InfluxDB Cloud Serverless.
menu:
influxdb_cloud_serverless:
name: Write line protocol data
parent: Write data
weight: 103
related:
- /influxdb/cloud-serverless/reference/syntax/line-protocol/
- /influxdb/cloud-serverless/reference/syntax/annotated-csv/
- /influxdb/cloud-serverless/reference/cli/influx/write/
- /influxdb/cloud-serverless/get-started/write/
---
Learn the fundamentals of constructing and writing line protocol data.
Use tools like Telegraf and InfluxDB client libraries to
build line protocol, and then write it to an InfluxDB bucket.
You can use these tools to build line protocol from scratch or transform
your data to line protocol.
However, if you already have CSV data, you might want to use tools that [consume CSV
and write it to InfluxDB as line protocol](/influxdb/cloud-serverless/write-data/csv).
<!-- TOC -->
- [Line protocol](#line-protocol)
- [Line protocol elements](#line-protocol-elements)
- [Line protocol element parsing](#line-protocol-element-parsing)
- [Construct line protocol](#construct-line-protocol)
- [Example home schema](#example-home-schema)
- [Set up your project](#set-up-your-project)
- [Construct points and write line protocol](#construct-points-and-write-line-protocol)
- [Run the example](#run-the-example)
- [Home sensor data line protocol](#home-sensor-data-line-protocol)
<!-- /TOC -->
## Line protocol
All data written to InfluxDB is written using [line protocol](/influxdb/cloud-serverless/reference/syntax/line-protocol/), a text-based
format that lets you provide the necessary information to write a data point to InfluxDB.
### Line protocol elements
In InfluxDB, a point contains a measurement name, one or more fields, a timestamp, and optional tags that provide metadata about the observation.
Each line of line protocol contains the following elements:
{{< req type="key" >}}
- {{< req "\*" >}} **measurement**: String that identifies the [measurement](/influxdb/cloud-serverless/reference/glossary/#measurement) to store the data in.
- **tag set**: Comma-delimited list of key value pairs, each representing a tag.
Tag keys and values are unquoted strings. _Spaces, commas, and equal characters must be escaped._
- {{< req "\*" >}} **field set**: Comma-delimited list of key value pairs, each representing a field.
Field keys are unquoted strings. _Spaces and commas must be escaped._
Field values can be [strings](/influxdb/cloud-serverless/reference/syntax/line-protocol/#string) (quoted),
[floats](/influxdb/cloud-serverless/reference/syntax/line-protocol/#float),
[integers](/influxdb/cloud-serverless/reference/syntax/line-protocol/#integer),
[unsigned integers](/influxdb/cloud-serverless/reference/syntax/line-protocol/#uinteger),
or [booleans](/influxdb/cloud-serverless/reference/syntax/line-protocol/#boolean).
- **timestamp**: [Unix timestamp](/influxdb/cloud-serverless/reference/syntax/line-protocol/#unix-timestamp)
associated with the data. InfluxDB supports up to nanosecond precision.
_If the precision of the timestamp is not in nanoseconds, you must specify the
precision when writing the data to InfluxDB._
#### Line protocol element parsing
- **measurement**: Everything before the _first unescaped comma before the first whitespace_.
- **tag set**: Key-value pairs between the _first unescaped comma_ and the _first unescaped whitespace_.
- **field set**: Key-value pairs between the _first and second unescaped whitespaces_.
- **timestamp**: Integer value after the _second unescaped whitespace_.
- Lines are separated by the newline character (`\n`).
Line protocol is whitespace sensitive.
---
{{< influxdb/line-protocol >}}
---
_For schema design recommendations, see [InfluxDB schema design](/influxdb/cloud-serverless/write-data/best-practices/schema-design/)._
## Construct line protocol
With a basic understanding of line protocol, you can now construct line protocol
and write data to InfluxDB.
Consider a use case where you collect data from sensors in your home.
Each sensor collects temperature, humidity, and carbon monoxide readings.
### Example home schema
To collect this data, use the following schema:
- **measurement**: `home`
- **tags**
- `room`: Living Room or Kitchen
- **fields**
- `temp`: temperature in °C (float)
- `hum`: percent humidity (float)
- `co`: carbon monoxide in parts per million (integer)
- **timestamp**: Unix timestamp in _second_ precision
Data is collected hourly beginning at
{{% influxdb/custom-timestamps-span %}}**2022-01-01T08:00:00Z (UTC)** until **2022-01-01T20:00:00Z (UTC)**{{% /influxdb/custom-timestamps-span %}}.
### Set up your project
The examples in this guide assume you followed [Set up InfluxDB](/influxdb/cloud-serverless/get-started/setup/) and [Write data set up](/influxdb/cloud-serverless/get-started/write/#set-up-your-project-and-credentials) instructions in [Get started](/influxdb/cloud-serverless/get-started/).
After setting up InfluxDB and your project, you should have the following:
- InfluxDB Cloud Serverless credentials:
- [Bucket](/influxdb/cloud-serverless/admin/buckets/)
- [Token](/influxdb/cloud-serverless/admin/tokens/)
- [Region URL](/influxdb/cloud-serverless/reference/regions/)
- A directory for your project.
- Credentials stored as environment variables or in a project configuration file--for example, a `.env` ("dotenv") file.
- Client libraries installed for writing data to InfluxDB.
The following example shows how to construct `Point` objects that follow the [example `home` schema](#example-home-schema), and then write the points as line protocol to an
{{% product-name %}} bucket.
### Construct points and write line protocol
{{< tabs-wrapper >}}
{{% tabs %}}
[Go](#)
[Node.js](#)
[Python](#)
{{% /tabs %}}
{{% tab-content %}}
<!-- BEGIN GO SETUP SAMPLE -->
1. Create a file for your module--for example: `write-point.go`.
2. In `write-point.go`, enter the following sample code:
```go
package main
import (
"os"
"time"
"fmt"
"github.com/influxdata/influxdb-client-go/v2"
)
func main() {
// Set a log level constant
const debugLevel uint = 4
/**
* Instantiate a client with a configuration object
* that contains your InfluxDB URL and token.
**/
clientOptions := influxdb2.DefaultOptions().
SetBatchSize(20).
SetLogLevel(debugLevel).
SetPrecision(time.Second)
client := influxdb2.NewClientWithOptions(os.Getenv("INFLUX_URL"),
os.Getenv("INFLUX_TOKEN"),
clientOptions)
/**
* Create an asynchronous, non-blocking write client.
* Provide your InfluxDB org and bucket as arguments
**/
writeAPI := client.WriteAPI(os.Getenv("INFLUX_ORG"), "get-started")
// Get the errors channel for the asynchronous write client.
errorsCh := writeAPI.Errors()
/** Create a point.
* Provide measurement, tags, and fields as arguments.
**/
p := influxdb2.NewPointWithMeasurement("home").
AddTag("room", "Kitchen").
AddField("temp", 72.0).
AddField("hum", 20.2).
AddField("co", 9).
SetTime(time.Now())
// Define a proc for handling errors.
go func() {
for err := range errorsCh {
fmt.Printf("write error: %s\n", err.Error())
}
}()
// Write the point asynchronously
writeAPI.WritePoint(p)
// Send pending writes from the buffer to the bucket.
writeAPI.Flush()
// Ensure background processes finish and release resources.
client.Close()
}
```
<!-- END GO SETUP SAMPLE -->
{{% /tab-content %}}
{{% tab-content %}}
<!-- BEGIN NODE.JS SETUP SAMPLE -->
1. Create a file for your module--for example: `write-point.js`.
2. In `write-point.js`, enter the following sample code:
```js
'use strict'
/** @module write
* Use the JavaScript client library for Node.js. to create a point and write it to InfluxDB
**/
import {InfluxDB, Point} from '@influxdata/influxdb-client'
/** Get credentials from the environment **/
const url = process.env.INFLUX_URL
const token = process.env.INFLUX_TOKEN
const org = process.env.INFLUX_ORG
/**
* Instantiate a client with a configuration object
* that contains your InfluxDB URL and token.
**/
const influxDB = new InfluxDB({url, token})
/**
* Create a write client configured to write to the bucket.
* Provide your InfluxDB org and bucket.
**/
const writeApi = influxDB.getWriteApi(org, 'get-started')
/**
* Create a point and add tags and fields.
* To add a field, call the field method for your data type.
**/
const point1 = new Point('home')
.tag('room', 'Kitchen')
.floatField('temp', 72.0)
.floatField('hum', 20.2)
.intField('co', 9)
console.log(` ${point1}`)
/**
* Add the point to the batch.
**/
writeApi.writePoint(point1)
/**
* Flush pending writes in the batch from the buffer and close the write client.
**/
writeApi.close().then(() => {
console.log('WRITE FINISHED')
})
```
<!-- END NODE.JS SETUP SAMPLE -->
{{% /tab-content %}}
{{% tab-content %}}
<!-- BEGIN PYTHON SETUP SAMPLE -->
1. Create a file for your module--for example: `write-point.py`.
2. In `write-point.py`, enter the following sample code:
```python
import os
from influxdb_client import InfluxDBClient, Point
# Instantiate a client with a configuration object
# that contains your InfluxDB URL and token.
# InfluxDB ignores the org argument, but the client requires it.
client = InfluxDBClient(url=os.getenv('INFLUX_URL'),
token=os.getenv('INFLUX_TOKEN'),
org='ignored')
# Create an array of points with tags and fields.
points = [Point("home")
.tag("room", "Kitchen")
.field("temp", 25.3)
.field('hum', 20.2)
.field('co', 9)]
# Execute code after a successful write request.
# Callback methods receive the configuration and data sent in the request.
def success_callback(self, data):
print(f"{data}")
print(f"WRITE FINISHED")
# Create a write client.
# Optionally, provide callback methods to execute on request success, error, and completion.
with client.write_api(success_callback=success_callback) as write_api:
# Write the data to the bucket.
write_api.write(bucket='get-started',
record=points,
content_encoding="identity",
content_type="text/plain; charset=utf-8",)
# Flush the write buffer and release resources.
write_api.close()
```
<!-- END PYTHON SETUP PROJECT -->
{{% /tab-content %}}
{{< /tabs-wrapper >}}
The sample code does the following:
1. Instantiates a client configured with the InfluxDB URL and API token.
2. Uses the client to instantiate a **write client** with credentials.
3. Constructs a `Point` object with the [measurement](/influxdb/cloud-serverless/reference/glossary/#measurement) name (`"home"`).
4. Adds a tag and fields to the point.
5. Adds the point to a batch to be written to the bucket.
6. Sends the batch to InfluxDB and waits for the response.
7. Executes callbacks for the response, flushes the write buffer, and releases resources.
### Run the example
To run the sample and write the data to your {{% product-name %}} bucket, enter the following command in your terminal:
{{< code-tabs-wrapper >}}
{{% code-tabs %}}
[Go](#)
[Node.js](#)
[Python](#)
{{% /code-tabs %}}
{{% code-tab-content %}}
<!-- BEGIN GO RUN EXAMPLE -->
```sh
go run write-point.go
```
<!-- END GO RUN EXAMPLE -->
{{% /code-tab-content %}}
{{% code-tab-content %}}
<!-- BEGIN NODE.JS RUN EXAMPLE -->
```sh
node write-point.js
```
<!-- END NODE.JS RUN EXAMPLE -->
{{% /code-tab-content %}}
{{% code-tab-content %}}
<!-- BEGIN PYTHON RUN EXAMPLE -->
```sh
python write-point.py
```
<!-- END PYTHON RUN EXAMPLE -->
{{% /code-tab-content %}}
{{< /code-tabs-wrapper >}}
The example logs the point as line protocol to stdout, and then writes the point to the bucket.
The line protocol is similar to the following:
{{% influxdb/custom-timestamps %}}
#### Home sensor data line protocol
```sh
home,room=Kitchen co=9i,hum=20.2,temp=72 1641024000
```
{{% /influxdb/custom-timestamps %}}

View File

@ -13,50 +13,61 @@ menu:
Use the following guidelines to design your [schema](/influxdb/clustered/reference/glossary/#schema)
for simpler and more performant queries.
<!-- TOC -->
- [InfluxDB data structure](#influxdb-data-structure)
- [Primary keys](#primary-keys)
- [Tags versus fields](#tags-versus-fields)
- [Schema restrictions](#schema-restrictions)
- [Do not use duplicate names for tags and fields](#do-not-use-duplicate-names-for-tags-and-fields)
- [Measurements can contain up to 205 columns](#measurements-can-contain-up-to-205-columns)
- [Tables can contain up to 250 columns](#tables-can-contain-up-to-250-columns)
- [Design for performance](#design-for-performance)
- [Avoid wide schemas](#avoid-wide-schemas)
- [Avoid too many tags](#avoid-too-many-tags)
- [Avoid sparse schemas](#avoid-sparse-schemas)
- [Writing individual fields with different timestamps](#writing-individual-fields-with-different-timestamps)
- [Measurement schemas should be homogenous](#measurement-schemas-should-be-homogenous)
- [Table schemas should be homogenous](#table-schemas-should-be-homogenous)
- [Use the best data type for your data](#use-the-best-data-type-for-your-data)
- [Design for query simplicity](#design-for-query-simplicity)
- [Keep measurement names, tags, and fields simple](#keep-measurement-names-tags-and-fields-simple)
- [Keep table names, tags, and fields simple](#keep-table-names-tags-and-fields-simple)
- [Avoid keywords and special characters](#avoid-keywords-and-special-characters)
<!-- TOC -->
## InfluxDB data structure
The InfluxDB data model organizes time series data into buckets and measurements.
A bucket can contain multiple measurements. Measurements contain multiple
tags and fields.
The {{% product-name %}} data model organizes time series data into databases and tables.
A database can contain multiple tables.
Tables contain multiple tags and fields.
- **Bucket**: Named location where time series data is stored.
In the InfluxDB SQL implementation, a bucket is synonymous with a _database_.
A bucket can contain multiple _measurements_.
- **Measurement**: Logical grouping for time series data.
In the InfluxDB SQL implementation, a measurement is synonymous with a _table_.
All _points_ in a given measurement should have the same _tags_.
A measurement contains multiple _tags_ and _fields_.
- **Tags**: Key-value pairs that store metadata string values for each point--for example,
a value that identifies or differentiates the data source or context--for example, host,
location, station, etc.
Tag values may be null.
- **Fields**: Key-value pairs that store data for each point--for example,
temperature, pressure, stock price, etc.
Field values may be null, but at least one field value is not null on any given row.
- **Timestamp**: Timestamp associated with the data.
When stored on disk and queried, all data is ordered by time.
In InfluxDB, a timestamp is a nanosecond-scale [unix timestamp](#unix-timestamp) in UTC.
A timestamp is never null.
<!-- vale InfluxDataDocs.v3Schema = NO -->
- **Database**: A named location where time series data is stored.
In {{% product-name %}}, _database_ is synonymous with _bucket_ in InfluxDB Cloud Serverless and InfluxDB TSM implementations.
A database can contain multiple _tables_.
- **Table**: A logical grouping for time series data.
In {{% product-name %}}, _table_ is synonymous with _measurement_ in InfluxDB Cloud Serverless and InfluxDB TSM implementations.
All _points_ in a given table should have the same _tags_.
A table contains multiple _tags_ and _fields_.
- **Tags**: Key-value pairs that store metadata string values for each point--for example,
a value that identifies or differentiates the data source or context--for example, host,
location, station, etc.
Tag values may be null.
- **Fields**: Key-value pairs that store data for each point--for example,
temperature, pressure, stock price, etc.
Field values may be null, but at least one field value is not null on any given row.
- **Timestamp**: Timestamp associated with the data.
When stored on disk and queried, all data is ordered by time.
In InfluxDB, a timestamp is a nanosecond-scale [unix timestamp](/influxdb/clustered/reference/glossary/#unix-timestamp) in UTC.
A timestamp is never null.
{{% note %}}
#### What happened to buckets and measurements?
If coming from InfluxDB Cloud Serverless or InfluxDB powered by the TSM storage engine, you're likely familiar
with the concepts _bucket_ and _measurement_.
_Bucket_ in TSM or InfluxDB Cloud Serverless is synonymous with
_database_ in {{% product-name %}}.
_Measurement_ in TSM or InfluxDB Cloud Serverless is synonymous with
_table_ in {{% product-name %}}.
{{% /note %}}
<!-- vale InfluxDataDocs.v3Schema = YES -->
### Primary keys
@ -95,32 +106,32 @@ cardinality doesn't affect the overall performance of your database.
### Do not use duplicate names for tags and fields
Tags and fields within the same measurement can't be named the same.
Tags and fields within the same table can't be named the same.
All tags and fields are stored as unique columns in a table representing the
measurement on disk.
If you attempt to write a measurement that contains tags or fields with the same name,
table on disk.
If you attempt to write a table that contains tags or fields with the same name,
the write fails due to a column conflict.
### Measurements can contain up to 250 columns
### Tables can contain up to 250 columns
A measurement can contain **up to 250 columns**. Each row requires a time column,
but the rest represent tags and fields stored in the measurement.
Therefore, a measurement can contain one time column and 249 total field and tag columns.
If you attempt to write to a measurement and exceed the 250 column limit, the
A table can contain **up to 250 columns**. Each row requires a time column,
but the rest represent tags and fields stored in the table.
Therefore, a table can contain one time column and 249 total field and tag columns.
If you attempt to write to a table and exceed the 250 column limit, the
write request fails and InfluxDB returns an error.
---
## Design for performance
How you structure your schema within a measurement can affect the overall
performance of queries against that measurement.
How you structure your schema within a table can affect the overall
performance of queries against that table.
The following guidelines help to optimize query performance:
- [Avoid wide schemas](#avoid-wide-schemas)
- [Avoid sparse schemas](#avoid-sparse-schemas)
- [Measurement schemas should be homogenous](#measurement-schemas-should-be-homogenous)
- [Table schemas should be homogenous](#table-schemas-should-be-homogenous)
- [Use the best data type for your data](#use-the-best-data-type-for-your-data)
### Avoid wide schemas
@ -134,11 +145,11 @@ Although a wide schema won't affect query performance, it can lead to the follow
- Decreased sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
The InfluxDB v3 storage engine has a
[limit of 250 columns per measurement](#measurements-can-contain-up-to-250-columns).
[limit of 250 columns per table](#tables-can-contain-up-to-250-columns).
To avoid a wide schema, limit the number of tags and fields stored in a measurement.
To avoid a wide schema, limit the number of tags and fields stored in a table.
If you need to store more than 249 total tags and fields, consider segmenting
your fields into a separate measurement.
your fields into a separate table.
#### Avoid too many tags
@ -149,8 +160,9 @@ A point that contains more tags has a more complex primary key, which could impa
A sparse schema is one where, for many rows, columns contain null values.
These generally stem from the following:
- [non-homogenous measurement schemas](#measurement-schemas-should-be-homogenous)
These generally stem from the following:
- [non-homogenous table schemas](#table-schemas-should-be-homogenous)
- [writing individual fields with different timestamps](#writing-individual-fields-with-different-timestamps)
Sparse schemas require the InfluxDB query engine to evaluate many
@ -170,34 +182,38 @@ In contrast, if you report fields at different times while using the same tagset
This requires slightly more resources at ingestion time, but then gets resolved at persistence time or compaction time
and avoids a sparse schema.
### Measurement schemas should be homogenous
### Table schemas should be homogenous
Data stored within a measurement should be "homogenous," meaning each row should
Data stored within a table should be "homogenous," meaning each row should
have the same tag and field keys.
All rows stored in a measurement share the same columns, but if a point doesn't
All rows stored in a table share the same columns, but if a point doesn't
include a value for a column, the column value is null.
A measurement full of null values has a ["sparse" schema](#avoid-sparse-schemas).
A table full of null values has a ["sparse" schema](#avoid-sparse-schemas).
{{< expand-wrapper >}}
{{% expand "View example of a sparse, non-homogenous schema" %}}
Non-homogenous schemas are often caused by writing points to a measurement with
Non-homogenous schemas are often caused by writing points to a table with
inconsistent tag or field sets.
In the following example, data is collected from two
different sources and each source returns data with different tag and field sets.
{{< flex >}}
{{% flex-content %}}
##### Source 1 tags and fields:
- tags:
- source
- code
- crypto
- fields:
- price
{{% /flex-content %}}
{{% flex-content %}}
{{% /flex-content %}}
{{% flex-content %}}
##### Source 2 tags and fields:
- tags:
- src
- currency
@ -205,10 +221,10 @@ different sources and each source returns data with different tag and field sets
- fields:
- cost
- volume
{{% /flex-content %}}
{{< /flex >}}
{{% /flex-content %}}
{{< /flex >}}
These sets of data written to the same measurement will result in a measurement
These sets of data written to the same table result in a table
full of null values (also known as a _sparse schema_):
| time | source | src | code | currency | crypto | price | cost | volume |
@ -225,27 +241,33 @@ full of null values (also known as a _sparse schema_):
{{% /expand %}}
{{< /expand-wrapper >}}
### Use the best data type for your data
When writing data to a field, use the most appropriate [data type](/influxdb/clustered/reference/glossary/#data-type) for your data--write integers as integers, decimals as floats, and booleans as booleans.
A query against a field that stores integers outperforms a query against string data;
querying over many long string values can negatively affect performance.
## Design for query simplicity
Naming conventions for measurements, tag keys, and field keys can simplify or
Naming conventions for tables, tag keys, and field keys can simplify or
complicate the process of writing queries for your data.
The following guidelines help to ensure writing queries for your data is as
simple as possible.
- [Keep measurement names, tags, and fields simple](#keep-measurement-names-tags-and-fields-simple)
- [Keep table names, tags, and fields simple](#keep-table-names-tags-and-fields-simple)
- [Avoid keywords and special characters](#avoid-keywords-and-special-characters)
### Keep measurement names, tags, and fields simple
### Keep table names, tags, and fields simple
Use one tag or one field for each data attribute.
If your source data contains multiple data attributes in a single parameter,
split each attribute into its own tag or field.
Measurement names, tag keys, and field keys should be simple and accurately
Table names, tag keys, and field keys should be simple and accurately
describe what each contains.
Keep names free of data.
The most common cause of a complex naming convention is when you try to "embed"
data attributes into a measurement name, tag key, or field key.
data attributes into a table name, tag key, or field key.
When each key and value represents one attribute (not multiple concatenated attributes) of your data,
you'll reduce the need for regular expressions in your queries.
@ -255,7 +277,7 @@ Without regular expressions, your queries will be easier to write and more perfo
For example, consider the following [line protocol](/influxdb/clustered/reference/syntax/line-protocol/) that embeds multiple attributes (location, model, and ID) into a `sensor` tag value:
```
```text
home,sensor=loc-kitchen.model-A612.id-1726ZA temp=72.1
home,sensor=loc-bath.model-A612.id-2635YB temp=71.8
```
@ -306,7 +328,7 @@ are less performant than simple equality expressions.
The better approach would be to write each sensor attribute as a separate tag:
```
```text
home,location=kitchen,sensor_model=A612,sensor_id=1726ZA temp=72.1
home,location=bath,sensor_model=A612,sensor_id=2635YB temp=71.8
```
@ -348,19 +370,19 @@ or regular expressions.
### Avoid keywords and special characters
To simplify query writing, avoid using reserved keywords or special characters
in measurement names, tag keys, and field keys.
in table names, tag keys, and field keys.
- [SQL keywords](/influxdb/clustered/reference/sql/#keywords)
- [InfluxQL keywords](/influxdb/clustered/reference/influxql/#keywords)
When using SQL or InfluxQL to query measurements, tags, and fields with special
When using SQL or InfluxQL to query tables, tags, and fields with special
characters or keywords, you have to wrap these keys in **double quotes**.
```sql
SELECT
"example-field", "tag@1-23"
FROM
"example-measurement"
"example-table"
WHERE
"tag@1-23" = 'ABC'
```

View File

@ -4,7 +4,7 @@ description: >
Use Telegraf and API clients to write line protocol data to InfluxDB Clustered.
menu:
influxdb_clustered:
name: Write line protocol
name: Write line protocol data
parent: Write data
weight: 101
related:
@ -21,34 +21,35 @@ your data to line protocol.
However, if you already have CSV data, you might want to use tools that
[consume CSV and write it to InfluxDB as line protocol](/influxdb/clustered/write-data/csv/).
<!-- TOC -->
- [Line protocol](#line-protocol)
- [Line protocol elements](#line-protocol-elements)
- [Line protocol element parsing](#line-protocol-element-parsing)
- [Write line protocol to InfluxDB](#write-line-protocol-to-influxdb)
<!-- /TOC -->
## Line protocol
All data written to InfluxDB is written using [line protocol](/influxdb/clustered/reference/syntax/line-protocol/), a text-based
All data written to InfluxDB is written using
[line protocol](/influxdb/clustered/reference/syntax/line-protocol/), a text-based
format that lets you provide the necessary information to write a data point to InfluxDB.
### Line protocol elements
In InfluxDB, a point contains a measurement name, one or more fields, a timestamp, and optional tags that provide metadata about the observation.
In InfluxDB, a point contains a measurement name, one or more fields,
a timestamp, and optional tags that provide metadata about the observation.
Each line of line protocol contains the following elements:
{{< req type="key" >}}
- {{< req "\*" >}} **measurement**: A string that identifies the [table](/influxdb/clustered/reference/glossary/#table) to store the data in.
- {{< req "\*" >}} **measurement**: A string that identifies the
[table](/influxdb/clustered/reference/glossary/#table) to store the data in.
- **tag set**: Comma-delimited list of key value pairs, each representing a tag.
Tag keys and values are unquoted strings. _Spaces, commas, and equal characters must be escaped._
- {{< req "\*" >}} **field set**: Comma-delimited list of key value pairs, each representing a field.
Tag keys and values are unquoted strings. _Spaces, commas, and equal characters
must be escaped._
- {{< req "\*" >}} **field set**: Comma-delimited list of key value pairs, each
representing a field.
Field keys are unquoted strings. _Spaces and commas must be escaped._
Field values can be [strings](/influxdb/clustered/reference/syntax/line-protocol/#string) (quoted),
Field values can be [strings](/influxdb/clustered/reference/syntax/line-protocol/#string)
(quoted),
[floats](/influxdb/clustered/reference/syntax/line-protocol/#float),
[integers](/influxdb/clustered/reference/syntax/line-protocol/#integer),
[unsigned integers](/influxdb/clustered/reference/syntax/line-protocol/#uinteger),
@ -60,8 +61,10 @@ Each line of line protocol contains the following elements:
#### Line protocol element parsing
- **measurement**: Everything before the _first unescaped comma before the first whitespace_.
- **tag set**: Key-value pairs between the _first unescaped comma_ and the _first unescaped whitespace_.
- **measurement**: Everything before the _first unescaped comma before the first
whitespace_.
- **tag set**: Key-value pairs between the _first unescaped comma_ and the _first
unescaped whitespace_.
- **field set**: Key-value pairs between the _first and second unescaped whitespaces_.
- **timestamp**: Integer value after the _second unescaped whitespace_.
- Lines are separated by the newline character (`\n`).
@ -73,7 +76,8 @@ Each line of line protocol contains the following elements:
---
_For schema design recommendations, see [InfluxDB schema design](/influxdb/clustered/write-data/best-practices/schema-design/)._
_For schema design recommendations, see
[InfluxDB schema design](/influxdb/clustered/write-data/best-practices/schema-design/)._
## Write line protocol to InfluxDB

View File

@ -12,7 +12,8 @@
"lint-staged": "^15.2.5",
"postcss": ">=8.4.31",
"postcss-cli": ">=9.1.0",
"prettier": "^3.2.5"
"prettier": "^3.2.5",
"prettier-plugin-sql": "^0.18.0"
},
"dependencies": {
"axios": "^1.6.0",