fix(monolith): Ports edits to Core, misc. fixes

pull/5759/head
Jason Stirnaman 2025-01-13 04:03:19 -06:00
parent fd8e38b639
commit d82149e6e6
2 changed files with 195 additions and 113 deletions

View File

@ -1,8 +1,20 @@
## Getting Started with InfluxDB 3 Core
InfluxDB is a database built to collect, process, transform, and store event and time series data. It is ideal for use cases that require real-time ingest and fast query response times to build user interfaces, monitoring, and automation solutions. Use cases include applications in sensor data, server monitoring, application performance monitoring, network monitoring, financial market and trading analytics, behavioral analytics, and more. InfluxDB is focused on problem domains where data must be monitored in near real-time and queries must return quickly to drive user experiences like dashboards and interactive UIs.
InfluxDB is a database built to collect, process, transform, and store event and time series data. It is ideal for use cases that require real-time ingest and fast query response times to build user interfaces, monitoring, and automation solutions.
This guide covers InfluxDB 3 Core (the open source release). Core's feature highlights include:
Common use cases include:
- Monitoring sensor data
- Server monitoring
- Application performance monitoring
- Network monitoring
- Financial market and trading analytics
- Behavioral analytics
InfluxDB is optimized for scenarios where near real-time data monitoring is essential and queries need to return quickly to support user experiences such as dashboards and interactive user interfaces.
{{% product-name %}} is the InfluxDB 3 open source release.
Core's feature highlights include:
* Diskless architecture with object storage support (or local disk with no dependencies)
* Fast query response times (under 10ms for last-value queries, or 30ms for distinct metadata)
* Embedded Python VM for plugins and triggers
@ -17,44 +29,55 @@ The Enterprise version adds onto Core's functionality with:
* Row-level delete support (coming soon)
* Integrated admin UI (coming soon)
You can find the [Enterprise Guide here](https://docs.influxdata.com/influxdb3/enterprise/get-started/).
For more information, see the [Enterprise guide](/influxdb3/enterprise/get-started/).
### What's in this guide
Here's what we'll cover in this guide:
This guide covers InfluxDB 3 Core (the open source release), including the following topics:
* [Installation and startup](#installation-and-startup)
* [The Data Model](#data-model)
* [Writing data to the database](#write-data)
* [Querying the database](#querying-the-database)
* [Install and startup](#install-and-startup)
* [Data Model](#data-model)
* [Write data to the database](#write-data)
* [Query the database](#query-the-database)
* [Last Values Cache](#last-values-cache)
* [Distinct Values Cache](#distinct-values-cache)
* [Python plugins and the processing engine](#python-plugins-and-the-processing-engine)
* [Diskless architechture](#diskless-architechture)
* [Diskless architecture](#diskless-architecture)
### Installation and Startup
### Install and startup
If you are looking to get started quickly with an installation on your local machine, we recommend using our install script below. Regardless of your platform OS, it will handle the download and installation of InfluxDB 3 Core. If you just want to find the built artifacts or Docker images and install them yourself, details are after the install script instructions.
{{% product-name %}} runs on **Linux**, **macOS**, and **Windows**.
[Run the install script](#run-the-install-script) to get started quickly,
regardless of your operating system.
```
Or, if you prefer, you can download and install {{% product-name %}} from [build artifacts and Docker images](#optional-download-build-artifacts-and-docker-images).
#### Run the install script
Enter the following command to use [curl](https://curl.se/download.html)to download the script and install {{% product-name %}}, regardless of your operating system:
```bash
curl -O https://www.influxdata.com/d/install_influxdb3.sh && sh install_influxdb3.sh
```
To ensure that the download and installation completed successfully, you can run:
To verify that the download and installation completed successfully, run the following command:
```
```bash
influxdb3 --version
```
If your system doesn't locate your installation of InfluxDB, `source` your current shell configuration file (.bashrc, .zshrc, etc).
If your system doesn't locate `influxdb3`, then `source` the configuration file (for example, .bashrc, .zshrc) for your shell--for example:
```
```zsh
source ~/.zshrc
```
If you just want to download the build artifacts and Docker images you can find them here. These are build on every merge into `main` so they represent the latest builds.
#### Optional: Download build artifacts and Docker images
#### InfluxDB 3 Core (latest):
Download the latest build artifacts and Docker images from the links below. These are updated with every merge into `main`.
##### {{% product-name %}} (latest):
* Docker: [quay.io/influxdb/influxdb3-core:latest](https://quay.io/influxdb/influxdb3-core:latest)
* [Linux | x86 | musl](https://dl.influxdata.com/influxdb/snapshots/influxdb3-core_x86_64-unknown-linux-musl.tar.gz)
* [Linux | x86 | gnu](https://dl.influxdata.com/influxdb/snapshots/influxdb3-core_x86_64-unknown-linux-gnu.tar.gz)
@ -63,53 +86,71 @@ If you just want to download the build artifacts and Docker images you can find
* [macOS | Darwin](https://dl.influxdata.com/influxdb/snapshots/influxdb3-core_aarch64-apple-darwin.tar.gz)
* [Windows | x86](https://dl.influxdata.com/influxdb/snapshots/influxdb3-core_x86_64-pc-windows-gnu.tar.gz)
#### Starting InfluxDB
#### Start InfluxDB
To start your InfluxDB instance, youll need to set your object store configuration and choose a unique `writer-id`. InfluxDB can use either the local file system, RAM, S3 (or compatible services like Ceph or Minio), Google, and Azure. In the configured storage location, all files that this instance writes will be kept under the path of the `writer-id` you choose, which is just a string identifier.
To start your InfluxDB instance, use the `influxdb3 serve` command
and provide an object store configuration and a unique `writer-id`.
You start the database using the `serve` command. This creates a new, running instance of InfluxDB 3 Core. Here are some quick examples:
- `--object-store`: InfluxDB supports various storage options, including the local file system, memory, S3 (and compatible services like Ceph or Minio), Google Cloud Storage, and Azure Blob Storage.
- `--writer-id`: This string identifier determines the path under which all files written by this instance will be stored in the configured storage location.
The following examples show how to start InfluxDB with different object store configurations:
```bash
# MEMORY
influxdb3 serve --writer-id=local01 --object-store=memory
```
```bash
# FILESYSTEM
influxdb3 serve --writer-id=local01 --object-store=file --data-dir ~/.influxdb3
```
```bash
# S3
influxdb3 serve --writer-id=local01 --object-store=s3 --bucket=[BUCKET] --aws-access-key=[AWS ACCESS KEY] --aws-secret-access-key=[AWS SECRET ACCESS KEY]
```
```bash
# Minio/Open Source Object Store (Uses the AWS S3 API, with additional parameters)
influxdb3 serve --writer-id=local01 --object-store=s3 --bucket=[BUCKET] --aws-access-key=[AWS ACCESS KEY] --aws-secret-access-key=[AWS SECRET ACCESS KEY] --aws-endpoint=[ENDPOINT] --aws-allow-http
```
### Data Model
The database server contains logical databases, which have tables, which have columns. Compared to previous versions of InfluxDB you can think of a database as a `bucket` in v2 or as a `db/retention_policy` in v1. A `table` is equivalent to a `measurement` which has columns that can be of type `tag` (a string dictionary), `int64`, `float64`, `uint64`, `bool`, or `string` and finally every table has a `time` column that is a nanosecond precision timestamp.
The database server contains logical databases, which have tables, which have columns. Compared to previous versions of InfluxDB you can think of a database as a `bucket` in v2 or as a `db/retention_policy` in v1. A `table` is equivalent to a `measurement`, which has columns that can be of type `tag` (a string dictionary), `int64`, `float64`, `uint64`, `bool`, or `string` and finally every table has a `time` column that is a nanosecond precision timestamp.
In InfluxDB 3, every table has a primary key for the data in it, which is the ordered set of tags and the time. This is the sort order that will be used for all Parquet files that get created. When you create a table, either through an explicit call or by writing data into a a table for the first time, it will set the primary key to the tags in the order they came in. This is immutable. So while InfluxDB is still a "schema on write" database, the tag column definitions for a table are immutable.
In InfluxDB 3, every table has a primary key--the ordered set of tags and the time--for its data.
This is the sort order used for all Parquet files that get created. When you create a table, either through an explicit call or by writing data into a table for the first time, it sets the primary key to the tags in the order they arrived. This is immutable. Although InfluxDB is still a _schema-on-write_ database, the tag column definitions for a table are immutable.
Tags should hold unique identifying information like `sensor_id`, or `building_id` or `trace_id`. All other data should be kept in fields. You will be able to add fast last N value and distinct value lookups later for any column, whether it is a field or a tag.
### Write Data
InfluxDB is a schema on write database. You can start writing data and it will create the logical database, tables, and their schemas on the fly. Once schema has been created, future requests with get validated against that schema before being accepted. New on-the-fly field additions are possible in subsequent requests (but tags are not).
InfluxDB is a schema-on-write database. You can start writing data and InfluxDB creates the logical database, tables, and their schemas on the fly.
After a schema is created, InfluxDB validates future write requests against it before accepting the data.
Subsequent requests can add new fields on-the-fly, but can't add new tags.
InfluxDB 3 Core is optimized for recent data only. It accepts writes for data with timestamps from the last 72 hours. It will persist that data in Parquet files for access by third-party systems for longer term historical analysis and queries. If you require longer historical queries with a compactor that optimizes data organization, that functionality is available in [InfluxDB 3 Enterprise](https://docs.influxdata.com/influxdb3/enterprise/get-started/).
InfluxDB 3 Core is optimized for recent data only--it accepts writes for data with timestamps from the last 72 hours. It will persist that data in Parquet files for access by third-party systems for longer term historical analysis and queries. If you require longer historical queries with a compactor that optimizes data organization, consider using [InfluxDB 3 Enterprise](/influxdb3/enterprise/get-started/).
It is important to note that write requests to the database do not return until a WAL file has been flushed to the configured object store, which by default happens once per second. This means that individual write requests may not complete quickly, but you can make many concurrent requests to get higher total throughput. In the near future we will add an API parameter to make requests that do not wait for WAL flush to return.
**Note**: write requests to the database _don't_ return until a WAL file has been flushed to the configured object store, which by default happens once per second.
This means that individual write requests may not complete quickly, but you can make many concurrent requests to get higher total throughput. In the future, we will add an API parameter to make requests that do not wait for the WAL flush to return.
The database has three write API endpoints that repopnd to HTTP POST requests:
The database has three write API endpoints that respond to HTTP `POST` requests:
* /api/v1/write?db=mydb,precision=ns
* /api/v2/write?db=mydb,precision=ns
* /api/v3/write?db=mydb,precision=ns
* `/write?db=mydb,precision=ns`
* `/api/v2/write?db=mydb,precision=ns`
* `/api/v3/write?db=mydb,precision=ns`
The v1 and v2 APIs are there for backward compatability with clients that can write data to previous versions of InfluxDB. The most important thing to note about those endpoints that is different from previous versions is that the tags in a table (measurement) are immutable and you are no longer able to have a tag and field with the same name within a table.
{{% product-name %}} provides the `/write` and `/api/v2` endpoints for backward compatibility with clients that can write data to previous versions of InfluxDB.
However, these APIs differ from the APIs in the previous versions in the following ways:
The v3 write API accepts the same Line Protocol as previous versions. It adds new functionality to accept or reject partial writes with the `accept_partial` parameter, which defaults to true.
- Tags in a table (measurement) are _immutable_
- A tag and a field can't have the same name within a table.
Here's an example of line protocol, which shows the table name followed by tags, which are an ordered, comma separated list of key/value pairs where the values are strings, followed by a comma separated list of key/val.ue pairs that are the fields and ending with an optional timestamp. The timestamp by default is a nanosecond epoch, but the precision can be specified through the `precision` query parameter.
The `/api/v3/write` endpoint accepts the same line protocol syntax as previous versions, and brings new functionality that lets you accept or reject partial writes using the `accept_partial` parameter (`true` is default).
The following code block is an example of [line protocol](/influxdb3/core/reference/syntax/line-protocol/), which shows the table name followed by tags, which are an ordered, comma-separated list of key/value pairs where the values are strings, followed by a comma-separated list of key/value pairs that are the fields, and ending with an optional timestamp. The timestamp by default is a nanosecond epoch, but you can specify a different precision through the `precision` query parameter.
```
cpu,host=Alpha,region=us-west,application=webserver val=1i,usage_percent=20.5,status="OK"
@ -120,49 +161,50 @@ cpu,host=Bravo,region=us-central,application=database val=5i,usage_percent=80.5,
cpu,host=Alpha,region=us-west,application=webserver val=6i,usage_percent=25.3,status="Warn"
```
More details on [line protocol here](https://docs.influxdata.com/influxdb/clustered/reference/syntax/line-protocol/).
If you save the preceding line protocol to a file (for example, `server_data`), then you can use the `influxdb3` CLI to write the data--for example:
If you save this line protocol example to a file you can load it using the CLI like this:
```
```bash
influxdb3 write --database=mydb --file=server_data
```
The written data will go into WAL files, which are created once per second, and into an in-memory queryable buffer. Later when the WAL is snapshotted, the data will be persisted into object storage as Parquet files. We'll cover more about the [diskless architecture](#diskless-architechture) later in this document.
The written data goes into WAL files, created once per second, and into an in-memory queryable buffer. Later, InfluxDB snapshots the WAL and persists the data into object storage as Parquet files.
We'll cover the [diskless architecture](#diskless-architecture) later in this document.
#### Creating a Database or Table
#### Create a Database or Table
You can create a database without writing data into it, use the subcommand `create`.
To create a database without writing data into it, use the `create` subcommand--for example:
```
```bash
influxdb3 create database mydb
```
Explore the create API with the help flag:
To learn more about a subcommand, use the `-h, --help` flag:
```
influxdb3 create -h
```
### Querying the database
### Query the database
InfluxDB 3 now supports native SQL for querying, in addition to InfluxQL a SQL-like language customized for time series queries. Note: Flux, the language introduced in InfluxDB 2.0, is not supported in 3.
InfluxDB 3 now supports native SQL for querying, in addition to InfluxQL, an SQL-like language customized for time series queries.
The quickest way to get started is simply leveraging the CLI. It uses the FlightSQL API to make queries to the database over HTTP2.
> [!Note]
> Flux, the language introduced in InfluxDB 2.0, is **not** supported in InfluxDB 3.
The quickest way to get started querying is to use the `influxdb3` CLI (which uses the Flight SQL API over HTTP2).
The `query` subcommand has several parameters to ensure the right database is queried with the correct permissions. Only the `--database` command is required every time, but depending on your specific setup (host port, server token), you may have additional required parameters.
The `query` subcommand includes options to help ensure that the right database is queried with the correct permissions. Only the `--database` option is required, but depending on your specific setup, you may need to pass other options, such as host, port, and token.
| Option | Description | Required |
|---------|-------------|--------------|
| `--host` | The host URL of the running InfluxDB 3 Core server [default: http://127.0.0.1:8181] | No |
| `--host` | The host URL of the running {{% product-name %}} server [default: http://127.0.0.1:8181] | No |
| `--database` | The name of the database to operate on | Yes |
| `--token` | The token for authentication with the InfluxDB 3 Core server | No |
| `--token` | The token for authentication with the {{% product-name %}} server | No |
| `--language` | The query language used to format the provided query string [default: sql] [possible values: sql, influxql] | No |
| `--format` | The format in which to output the query [default: pretty] [possible values: pretty, json, json_lines, csv, parquet] | No |
| `--output` | Put all query output into `output` | No |
Example for “SHOW TABLES” on the database called **servers**.
#### Example: query `“SHOW TABLES”` on the `servers` database:
```
$ influxdb3 query --database=servers "SHOW TABLES"
@ -178,7 +220,7 @@ $ influxdb3 query --database=servers "SHOW TABLES"
+---------------+--------------------+--------------+------------+
```
Another example of querying directly on that **cpu** table (limiting to last ten for brevity):
#### Example: query the `cpu` table, limiting to 10 rows:
```
$ influxdb3 query --database=servers "SELECT DISTINCT usage_percent, time FROM cpu LIMIT 10"
@ -200,29 +242,44 @@ $ influxdb3 query --database=servers "SELECT DISTINCT usage_percent, time FROM c
### Querying using the CLI for InfluxQL
InfluxQL is a SQL-like language developed by InfluxData with specific features tailored for leveraging and working with InfluxDB. Its compatible with all versions of InfluxDB, making it a tremendous choice for interoperability across different InfluxDB installations.
[InfluxQL](/influxdb3/core/reference/influxql/) is an SQL-like language developed by InfluxData with specific features tailored for leveraging and working with InfluxDB. Its compatible with all versions of InfluxDB, making it a good choice for interoperability across different InfluxDB installations.
To learn more about all that InfluxQL has to offer, you can [learn more here](https://docs.influxdata.com/influxdb/cloud-serverless/query-data/influxql/basic-query/).
To query using InfluxQL, enter the `influxdb3 query` subcommand and specify `influxql` in the language option--for example:
Once youre familiar with InfluxQL, you can run queries directly from the CLI similarly to how you can run SQL queries. The only adjustment is supplying the language option: `--lang=influxql`
```
$ influxdb3 query --database=servers --lang=influxql "SELECT DISTINCT usage_percent FROM cpu WHERE time >= now() - 1d"
```bash
influxdb3 query --database=servers --lang=influxql "SELECT DISTINCT usage_percent FROM cpu WHERE time >= now() - 1d"
```
### Query using the API
InfluxDB3 supports FlightSQL and an HTTP API. To query your database, you can send a request to the `/v3/query_sql` or `/v3/query_influxql` endpoints, supplying the the `db` as a parameter, as well as the entire query in a URL-encoded format. You can also issue a POST for longer queries. Use the `format` parameter to specify what resonse format. Valid options are `pretty`, `jsonl`, `parquet`, `csv`, and `json`.
InfluxDB 3 supports Flight (gRPC) APIs and an HTTP API.
To query your database using the HTTP API, send a request to the `/api/v3/query_sql` or `/api/v3/query_influxql` endpoints.
In the request, specify the database name in the `db` parameter
and a query in the `q` parameter.
You can pass parameters in the query string or inside a JSON object.
Example of a SQL query leveraging the API:
Use the `format` parameter to specify the response format: `pretty`, `jsonl`, `parquet`, `csv`, and `json`. Default is `json`.
```
$ curl -v "http://127.0.0.1:8181/api/v3/query_sql?db=servers&q=select+*+from+cpu+limit+5"
##### Example: Query passing URL-encoded parameters
The following example sends an HTTP `GET` request with a URL-encoded SQL query:
```bash
curl -v "http://127.0.0.1:8181/api/v3/query_sql?db=servers&q=select+*+from+cpu+limit+5"
```
### Query using the Python Client
##### Example: Query passing JSON parameters
We have a Python library available to interact with the database. We recommend installing the required packages in a Python virtual environment for your specific project. You can learn more about the full Python client installation and development process [here](https://docs.influxdata.com/influxdb/clustered/query-data/execute-queries/client-libraries/python/).
The following example sends an HTTP `POST` request with parameters in a JSON payload:
```bash
curl http://127.0.0.1:8181/api/v3/query_sql --data '{"db": "server", "q": "select * from cpu limit 5"}'
```
### Query using the Python client
Use the InfluxDB 3 Python library to interact with the database and integrate with your application.
We recommend installing the required packages in a Python virtual environment for your specific project.
To get started, install the `influxdb3-python` package.
@ -241,7 +298,8 @@ client = InfluxDBClient3(
)
```
Here are more examples of using the Python client to query the database.
The following example shows how to query using SQL, and then
use PyArrow to explore the schema and process results:
```py
from influxdb_client_3 import InfluxDBClient3
@ -262,21 +320,23 @@ print("\n#### View Schema information\n")
print(table.schema)
print("\n#### Use PyArrow to read the specified columns\n")
print(table.column('temp'))
print(table.select(['room', 'temp']))
print(table.select(['time', 'room', 'temp']))
print(table.column('usage_active'))
print(table.select(['host', 'usage_active']))
print(table.select(['time', 'host', 'usage_active']))
print("\n#### Use PyArrow compute functions to aggregate data\n")
print(table.group_by('hum').aggregate([]))
print(table.group_by('room').aggregate([('temp', 'mean')]))
print(table.group_by('host').aggregate([]))
print(table.group_by('cpu').aggregate([('time_system', 'mean')]))
```
## Last Values Cache
For more information about the Python client library, see the [`influxdb3-python` repository](https://github.com/InfluxCommunity/influxdb3-python) in GitHub.
InfluxDB 3 supports a **last-n values cache** which stores the last N values in a series or or column heirarchy in memory. This gives the database the ability to answer these kinds of queries in under 10 milliseconds. The CLI can be used to create a last value cache, which uses a REST API on the database.
## Last values cache
{{% product-name %}} supports a **last-n values cache** which stores the last N values in a series or column hierarchy in memory. This gives the database the ability to answer these kinds of queries in under 10 milliseconds.
You can use the `influxdb3` CLI to create a last value cache.
```
Usage: $ influxdb3 create last-cache [OPTIONS] -d <DATABASE_NAME> -t <TABLE>
Options:
@ -293,7 +353,7 @@ Options:
```
You can create a last value cache per time series, but you should be mindful of high cardinality tables that could take excessive memory.
You can create a last value cache per time series, but be mindful of high cardinality tables that could take excessive memory.
An example of creating this cache in use:
@ -305,7 +365,7 @@ An example of creating this cache in use:
| Bravo | database | 2024-12-11T10:01:00 | 80.5 | OK |
| Alpha | webserver | 2024-12-11T10:02:00 | 25.3 | Warn |
```
```bash
influxdb3 create last-cache --database=servers --table=cpu --cache-name=cpuCache --key-columns=host,application --value-columns=usage_percent,status --count=5
```
@ -317,7 +377,9 @@ To leverage the LVC, you need to specifically call on it using the `last_cache()
Usage: $ influxdb3 query --database=servers "SELECT * FROM last_cache('cpu', 'cpuCache') WHERE host = 'Bravo;"
```
{{% note %}}
Note: The Last Value Cache only works with SQL, not InfluxQL; SQL is the default language.
#### Only works with SQL
The Last Value Cache only works with SQL, not InfluxQL; SQL is the default language.
{{% /note %}}
### Deleting a Last Values Cache
@ -341,33 +403,38 @@ Options:
Similar to the Last Values Cache, the database can cache in RAM the distinct values for a single column in a table or a heirarchy of columns. This is useful for fast metadata lookups, which can return in under 30 milliseoncds. Many of the options are similar to the last value cache. See the CLI output for more information:
```
```bash
influxdb3 create distinct_cache -h
```
### Python Plugins and the Processing Engine
{{% note %}}
As of this writing, the Processing Engine is only supported in Docker environments.<br/>
We expect it to launch in non-Docker environments soon. We're still in very active development creating the API and developer experience so things will break and change fast. Join our <a href=https://discord.com/invite/eMnhxPyj>Discord</a> to ask questions and give feedback.
#### Only supported in Docker
As of this writing, the Processing Engine is only supported in Docker environments.
We expect it to launch in non-Docker environments soon. We're still in very active development creating the API and developer experience; things will break and change fast. Join our <a href=https://discord.com/invite/eMnhxPyj>Discord</a> to ask questions and give feedback.
{{% /note %}}
InfluxDB3 has an embedded Python VM for running code inside the database. Currently, we only support plugins that get triggered on WAL file flushes, but more will be coming soon. Specifically, plugins will be able to be triggered by:
* On WAL flush sends a batch of write data to a plugin once a second (can be configured).
* On Snapshot (persist of Parquet files) sends the metadata to a plugin to do further processing against the Parquet data or send the information elsewhere (e.g., adding it to an Iceberg Catalog).
* On Schedule executes plugin on a schedule configured by the user, and is useful for data collection and deadman monitoring.
* On Request binds a plugin to an HTTP endpoint at /api/v3/plugins/<name> where request headers and content are sent to the plugin, which can then parse, process, and send the data into the database or to third party services
* On WAL flush: sends a batch of write data to a plugin once a second (can be configured).
* On Snapshot (persist of Parquet files): sends the metadata to a plugin to do further processing against the Parquet data or send the information elsewhere (for example, adding it to an Iceberg Catalog).
* On Schedule: executes plugin on a schedule configured by the user, and is useful for data collection and deadman monitoring.
* On Request: binds a plugin to an HTTP endpoint at `/api/v3/plugins/<name>` where request headers and content are sent to the plugin, which can then parse, process, and send the data into the database or to third party services
Plugins work in two parts: plugins and triggers. Plugins are the generic Python code that represent a plugin. Once you've loaded a plugin into the server, you can create many triggers of that plugin. A trigger has a plugin, a database and then a trigger-spec, which can be either all_tables or table:my_table_name where my_table_name is the name of your table you want to filter the plugin to.
You can also specify a list of key/value pairs as arugments supplied to a trigger. This makes it so that you could have many triggers of the same plugin, but with different arguments supplied to check for different things. These commands will give you useful information:
You can also specify a list of key/value pairs as arguments supplied to a trigger. This makes it so that you could have many triggers of the same plugin, but with different arguments supplied to check for different things. These commands will give you useful information:
```
influxdb3 create plugin -h
influxdb3 create trigger -h
```
For now, plugins will only work with the x86 Docker image. So you'll need to run from there.
> [!Note]
> #### Plugins only work with x86 Docker
> For now, plugins only work with the x86 Docker image.
Before we try to load up a plugin and create a trigger for it, we should write one and test it out. To test out and run plugins, you'll need to create a plugin directory. Start up your server with the --plugin-dir argument and point it at your plugin dir (note that you'll need to make this available in your Docker container).
@ -448,10 +515,17 @@ def process_writes(influxdb3_local, table_batches, args=None):
Then you'll want to drop a file into that plugin directory. You can use the example from above, but comment out the section where it queries (unless you write some data to that table, in which case leave it in!).
The server has a way to test out what a plugin will do in advance of actually loading it into the server or creating a trigger that calls it. To see that do:
To use the server to test what a plugin will do, in advance of actually loading it into the server or creating a trigger that calls it, enter the following command:
influxdb3 test wal_plugin -h
The important arguements are lp or file which will read line protocol from that file. This is what will get yielded as a test to your new plugin. --input-arguments take the form of key/value pairs separated by commas (e.g. --input-arguments "arg1=foo,arg2=bar".
`influxdb3 test wal_plugin -h`
The important arguments are `lp` or `file`, which read line protocol from that file and yield it as a test to your new plugin.
`--input-arguments` are key/value pairs separated by commas--for example:
```bash
--input-arguments "arg1=foo,arg2=bar"
```
If you execute a query within the plugin, it will query against the live server you're sending this request to. Any writes you do will not be sent into the server, but instead returned back to you.
@ -468,16 +542,16 @@ influxdb3 create plugin -d mydb --code-filename="/Users/pauldix/.influxdb3/plugi
influxdb3 create trigger -d mydb --plugin=test_plugin --trigger-spec="table:foo" trigger1
```
After you've tested it, you can create the plugin in the serve (the file will need to be there in the plugin-dir) and then create a trigger to trigger it on WAL flushes.
After you've tested it, you can create the plugin in the server(the file will need to be there in the plugin-dir) and then create a trigger to trigger it on WAL flushes.
### Diskless Architechture
### Diskless Architecture
InfluxDB 3 is able to operate using only object storage with no locally attached disk. While it can use only a disk with no dependencies, the ability to operate without one is a new capability with this release. The figure below illustrates the write path for data landing in the database.
{{< img-hd src="/img/influxdb/influxdb-3-write-path.png" alt="Write Path for InfluxDB 3 Core & Enterprise" />}}
As write requests come in to the server, they are parsed and validated and put into an in-memory WAL buffer. This buffer is flushed every second by default (can be changed through configuration), which will create a WAL file. Once the data is flushed to disk it is put into a queryable in-memory buffer and then a response is sent back to the client that the write was successful. That data will now show up in queries to the server.
As write requests come in to the server, they are parsed and validated and put into an in-memory WAL buffer. This buffer is flushed every second by default (can be changed through configuration), which will create a WAL file. Once the data is flushed to disk, it is put into a queryable in-memory buffer and then a response is sent back to the client that the write was successful. That data will now show up in queries to the server.
The WAL is periodically snapshotted, which will persist the oldest data in the queryable buffer, allowing the server to remove old WAL files. By default, the server will keep up to 900 WAL files buffered up (15 minutes of data) and attempt to persist the oldest 10 minutes, keeping the most recent 5 minutes around.
InfluxDB periodically snapshots the WAL to persist the oldest data in the queryable buffer, allowing the server to remove old WAL files. By default, the server will keep up to 900 WAL files buffered up (15 minutes of data) and attempt to persist the oldest 10 minutes, keeping the most recent 5 minutes around.
When the data is persisted out of the queryable buffer it is put into the configured object store as Paruqet files. Those files are also put into an in-memory cache so that queries against the most recently persisted data do not have to go to object storage.
When the data is persisted out of the queryable buffer it is put into the configured object store as Parquet files. Those files are also put into an in-memory cache so that queries against the most recently persisted data do not have to go to object storage.

View File

@ -22,7 +22,7 @@ Core's feature highlights include:
* Parquet file persistence
* Compatibility with InfluxDB 1.x and 2.x write APIs
{{% product-name %}} adds the following features to Core:
The Enterprise version adds onto Core's functionality with:
* Historical query capability and single series indexing
* High availability
@ -33,11 +33,10 @@ Core's feature highlights include:
### What's in this guide
This guide covers Enterprise as well as InfluxDB 3 Core.
Here's what we'll cover:
This guide covers Enterprise as well as InfluxDB 3 Core, including the following topics:
* [Install and startup](#install-and-startup)
* [The Data Model](#data-model)
* [Data Model](#data-model)
* [Write data to the database](#write-data)
* [Query the database](#query-the-database)
* [Last Values Cache](#last-values-cache)
@ -70,7 +69,7 @@ influxdb3 --version
If your system doesn't locate `influxdb3`, then `source` the configuration file (for example, .bashrc, .zshrc) for your shell--for example:
```bash
```zsh
source ~/.zshrc
```
@ -130,7 +129,7 @@ _During the alpha period, licenses are valid until May 7, 2025._
The database server contains logical databases, which have tables, which have columns. Compared to previous versions of InfluxDB you can think of a database as a `bucket` in v2 or as a `db/retention_policy` in v1. A `table` is equivalent to a `measurement`, which has columns that can be of type `tag` (a string dictionary), `int64`, `float64`, `uint64`, `bool`, or `string` and finally every table has a `time` column that is a nanosecond precision timestamp.
In InfluxDB 3, every table has a primary key--the ordered set of tags and the time--for its data.
This is the sort order used for all Parquet files that get created. When you create a table, either through an explicit call or by writing data into a table for the first time, it sets the primary key to the tags in the order they arrived. This is immutable. Although InfluxDB is still a "schema on write" database, the tag column definitions for a table are immutable.
This is the sort order used for all Parquet files that get created. When you create a table, either through an explicit call or by writing data into a table for the first time, it sets the primary key to the tags in the order they arrived. This is immutable. Although InfluxDB is still a _schema-on-write_ database, the tag column definitions for a table are immutable.
Tags should hold unique identifying information like `sensor_id`, or `building_id` or `trace_id`. All other data should be kept in fields. You will be able to add fast last N value and distinct value lookups later for any column, whether it is a field or a tag.
@ -155,7 +154,7 @@ However, these APIs differ from the APIs in the previous versions in the followi
- Tags in a table (measurement) are _immutable_
- A tag and a field can't have the same name within a table.
The `/api/v3/write` endpoint accepts the same line protocol syntax as previous versions, and brings new functionality that lets you accept or reject partial writes using the `accept_partial` parameter, which defaults to true.
The `/api/v3/write` endpoint accepts the same line protocol syntax as previous versions, and brings new functionality that lets you accept or reject partial writes using the `accept_partial` parameter (`true` is default).
The following code block is an example of [line protocol](/influxdb3/enterprise/reference/syntax/line-protocol/), which shows the table name followed by tags, which are an ordered, comma-separated list of key/value pairs where the values are strings, followed by a comma-separated list of key/value pairs that are the fields, and ending with an optional timestamp. The timestamp by default is a nanosecond epoch, but you can specify a different precision through the `precision` query parameter.
@ -206,7 +205,7 @@ The `query` subcommand includes options to help ensure that the right database i
|---------|-------------|--------------|
| `--host` | The host URL of the running {{% product-name %}} server [default: http://127.0.0.1:8181] | No |
| `--database` | The name of the database to operate on | Yes |
| `--token` | The token for authentication with the {{% product-name %}}server | No |
| `--token` | The token for authentication with the {{% product-name %}} server | No |
| `--language` | The query language used to format the provided query string [default: sql] [possible values: sql, influxql] | No |
| `--format` | The format in which to output the query [default: pretty] [possible values: pretty, json, json_lines, csv, parquet] | No |
| `--output` | Put all query output into `output` | No |
@ -305,7 +304,8 @@ client = InfluxDBClient3(
)
```
Here are more examples of using the Python client to query the database.
The following example shows how to query using SQL, and then
use PyArrow to explore the schema and process results:
```py
from influxdb_client_3 import InfluxDBClient3
@ -424,10 +424,10 @@ We expect it to launch in non-Docker environments soon. We're still in very acti
InfluxDB3 has an embedded Python VM for running code inside the database. Currently, we only support plugins that get triggered on WAL file flushes, but more will be coming soon. Specifically, plugins will be able to be triggered by:
* On WAL flush sends a batch of write data to a plugin once a second (can be configured).
* On Snapshot (persist of Parquet files) sends the metadata to a plugin to do further processing against the Parquet data or send the information elsewhere (e.g., adding it to an Iceberg Catalog).
* On Schedule executes plugin on a schedule configured by the user, and is useful for data collection and deadman monitoring.
* On Request binds a plugin to an HTTP endpoint at /api/v3/plugins/<name> where request headers and content are sent to the plugin, which can then parse, process, and send the data into the database or to third party services
* On WAL flush: sends a batch of write data to a plugin once a second (can be configured).
* On Snapshot (persist of Parquet files): sends the metadata to a plugin to do further processing against the Parquet data or send the information elsewhere (for example, adding it to an Iceberg Catalog).
* On Schedule: executes plugin on a schedule configured by the user, and is useful for data collection and deadman monitoring.
* On Request: binds a plugin to an HTTP endpoint at `/api/v3/plugins/<name>` where request headers and content are sent to the plugin, which can then parse, process, and send the data into the database or to third party services
Plugins work in two parts: plugins and triggers. Plugins are the generic Python code that represent a plugin. Once you've loaded a plugin into the server, you can create many triggers of that plugin. A trigger has a plugin, a database and then a trigger-spec, which can be either all_tables or table:my_table_name where my_table_name is the name of your table you want to filter the plugin to.
@ -439,8 +439,8 @@ influxdb3 create trigger -h
```
> [!Note]
> #### Only work with x86 Docker
> For now, plugins only work with the x86 Docker image. So you'll need to run from there.
> #### Plugins only work with x86 Docker
> For now, plugins only work with the x86 Docker image.
Before we try to load up a plugin and create a trigger for it, we should write one and test it out. To test out and run plugins, you'll need to create a plugin directory. Start up your server with the --plugin-dir argument and point it at your plugin dir (note that you'll need to make this available in your Docker container).
@ -521,10 +521,17 @@ def process_writes(influxdb3_local, table_batches, args=None):
Then you'll want to drop a file into that plugin directory. You can use the example from above, but comment out the section where it queries (unless you write some data to that table, in which case leave it in!).
The server has a way to test out what a plugin will do in advance of actually loading it into the server or creating a trigger that calls it. To see that do:
To use the server to test what a plugin will do, in advance of actually loading it into the server or creating a trigger that calls it, enter the following command:
influxdb3 test wal_plugin -h
The important arguments are `lp` or `file` which will read line protocol from that file. This is what will get yielded as a test to your new plugin. --input-arguments take the form of key/value pairs separated by commas (e.g. --input-arguments "arg1=foo,arg2=bar".
`influxdb3 test wal_plugin -h`
The important arguments are `lp` or `file`, which read line protocol from that file and yield it as a test to your new plugin.
`--input-arguments` are key/value pairs separated by commas--for example:
```bash
--input-arguments "arg1=foo,arg2=bar"
```
If you execute a query within the plugin, it will query against the live server you're sending this request to. Any writes you do will not be sent into the server, but instead returned back to you.
@ -541,7 +548,7 @@ influxdb3 create plugin -d mydb --code-filename="/Users/pauldix/.influxdb3/plugi
influxdb3 create trigger -d mydb --plugin=test_plugin --trigger-spec="table:foo" trigger1
```
After you've tested it, you can create the plugin in the serve (the file will need to be there in the plugin-dir) and then create a trigger to trigger it on WAL flushes.
After you've tested it, you can create the plugin in the server(the file will need to be there in the plugin-dir) and then create a trigger to trigger it on WAL flushes.
### Diskless Architecture
@ -549,11 +556,11 @@ InfluxDB 3 is able to operate using only object storage with no locally attached
{{< img-hd src="/img/influxdb/influxdb-3-write-path.png" alt="Write Path for InfluxDB 3 Core & Enterprise" />}}
As write requests come in to the server, they are parsed and validated and put into an in-memory WAL buffer. This buffer is flushed every second by default (can be changed through configuration), which will create a WAL file. Once the data is flushed to disk it is put into a queryable in-memory buffer and then a response is sent back to the client that the write was successful. That data will now show up in queries to the server.
As write requests come in to the server, they are parsed and validated and put into an in-memory WAL buffer. This buffer is flushed every second by default (can be changed through configuration), which will create a WAL file. Once the data is flushed to disk, it is put into a queryable in-memory buffer and then a response is sent back to the client that the write was successful. That data will now show up in queries to the server.
The WAL is periodically snapshotted, which will persist the oldest data in the queryable buffer, allowing the server to remove old WAL files. By default, the server will keep up to 900 WAL files buffered up (15 minutes of data) and attempt to persist the oldest 10 minutes, keeping the most recent 5 minutes around.
InfluxDB periodically snapshots the WAL to persist the oldest data in the queryable buffer, allowing the server to remove old WAL files. By default, the server will keep up to 900 WAL files buffered up (15 minutes of data) and attempt to persist the oldest 10 minutes, keeping the most recent 5 minutes around.
When the data is persisted out of the queryable buffer it is put into the configured object store as Paruqet files. Those files are also put into an in-memory cache so that queries against the most recently persisted data do not have to go to object storage.
When the data is persisted out of the queryable buffer it is put into the configured object store as Parquet files. Those files are also put into an in-memory cache so that queries against the most recently persisted data do not have to go to object storage.
### Multi-Server Setup
@ -735,4 +742,5 @@ To accelerate performance on specific queries, you can define non-primary keys t
Create Usage: $ influxdb3 file-index create --host=http://127.0.0.1:8585 -d <DATABASE> -t <TABLE> <COLUMNS>
Delete Usage: $ influxdb3 file-index delete --host=http://127.0.0.1:8585 -d <DATABASE> -t <TABLE>```
Delete Usage: $ influxdb3 file-index delete --host=http://127.0.0.1:8585 -d <DATABASE> -t <TABLE>
```