InfluxDB Clustered partial writes (#5594)

* updated status code information for partial writes

* docs: add environment variable tuning explanation (#5579)

* WIP clustered partial writes

* fixed yaml error

* fixed duplicate key in clustered api docs

* Apply suggestions from code review

Co-authored-by: Jason Stirnaman <jstirnaman@influxdata.com>

* Update content/influxdb/clustered/admin/env-vars.md

* Apply suggestions from code review

* add placeholder release notes for next clustered version

* update clustered release notes, remove option license info

---------

Co-authored-by: Jack <56563911+jdockerty@users.noreply.github.com>
Co-authored-by: Jason Stirnaman <jstirnaman@influxdata.com>
pull/5627/head
Scott Anderson 2024-09-27 09:30:39 -06:00 committed by GitHub
parent 329e619a78
commit a62f69ffaf
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
8 changed files with 348 additions and 102 deletions

View File

@ -68,7 +68,7 @@ influx3
influxctl influxctl
influxd influxd
influxdata.com influxdata.com
iox (iox|IOx)
keep-url keep-url
lat lat
locf locf

View File

@ -646,17 +646,33 @@ paths:
'204': '204':
description: Write data is correctly formatted and accepted for writing to the database. description: Write data is correctly formatted and accepted for writing to the database.
'400': '400':
description: |
Data from the batch was rejected and not written. The response body indicates if a partial write occurred or all data was rejected.
If a partial write occurred, then some points from the batch are written and queryable.
The response body contains details about the [rejected points](/influxdb/clustered/write-data/troubleshoot/#troubleshoot-rejected-points), up to 100 points.
content: content:
application/json: application/json:
examples:
rejectedAllPoints:
summary: Rejected all points
value:
code: invalid
line: 2
message: 'no data written, errors encountered on line(s): error message for first rejected point</n> error message for second rejected point</n> error message for Nth rejected point (up to 100 rejected points)'
partialWriteErrorWithRejectedPoints:
summary: Partial write rejects some points
value:
code: invalid
line: 2
message: 'partial write has occurred, errors encountered on line(s): error message for first rejected point</n> error message for second rejected point</n> error message for Nth rejected point (up to 100 rejected points)'
schema: schema:
$ref: '#/components/schemas/LineProtocolError' $ref: '#/components/schemas/LineProtocolError'
description: Line protocol poorly formed and no points were written. Response can be used to determine the first malformed line in the body line-protocol. All data in body was rejected and not written.
'401': '401':
content: content:
application/json: application/json:
schema: schema:
$ref: '#/components/schemas/Error' $ref: '#/components/schemas/Error'
description: Token doesn't have sufficient permissions to write to this database or the database doesn't exist. description: Token doesn't have sufficient permissions to write to this database or the database doesn't exist.
'403': '403':
content: content:
application/json: application/json:

View File

@ -7,7 +7,7 @@ description: >
menu: menu:
influxdb_clustered: influxdb_clustered:
parent: Administer InfluxDB Clustered parent: Administer InfluxDB Clustered
weight: 208 weight: 209
--- ---
{{< product-name >}} generates a valid access token (known as the _admin token_) {{< product-name >}} generates a valid access token (known as the _admin token_)

View File

@ -0,0 +1,171 @@
---
title: Manage environment variables in your InfluxDB Cluster
description: >
Use environment variables to define settings for individual components in your
InfluxDB cluster.
menu:
influxdb_clustered:
parent: Administer InfluxDB Clustered
name: Manage environment variables
weight: 208
---
Use environment variables to define settings for individual components in your
InfluxDB cluster and adjust your cluster's running configuration.
Define environment variables for each component in your `AppInstance` resource.
InfluxDB Clustered components support various environment variables.
While many of these variables have default settings, you can customize them by
setting your own values.
{{% warn %}}
#### Overriding default settings may affect overall cluster performance
{{% product-name %}} components have complex interactions that can be affected
when overriding default configuration settings.
Changing these settings may impact overall cluster performance.
Before making configuration changes using environment variables, consider
consulting [InfluxData Support](https://support.influxdata.com/) to identify any
potential unintended consequences.
{{% /warn %}}
## AppInstance component schema
In your `AppInstance` resource, configure individual component settings in the
`spec.package.spec.components` property. This property supports the following
InfluxDB Clustered component keys:
- `ingester`
- `querier`
- `router`
- `compactor`
- `garbage-collector`
```yaml
apiVersion: kubecfg.dev/v1alpha1
kind: AppInstance
metadata:
name: influxdb
namespace: influxdb
spec:
package:
# ...
spec:
components:
ingester:
# Ingester settings ...
querier:
# Querier settings ...
router:
# Router settings. ...
compactor:
# Compactor settings ...
garbage-collector:
# Garbage collector settings ...
```
_For more information about components in the InfluxDB v3 storage engine, see
the [InfluxDB v3 storage engine architecture](/influxdb/clustered/reference/internals/storage-engine/)._
## Set environment variables for a component
1. Under the specific component property, use the
`<component>.template.containers.iox.env` property to define environment
variables.
2. In the `env` property, structure each environment variable as a key-value pair.
For example, to configure environment variables for the Garbage collector:
```yaml
apiVersion: kubecfg.dev/v1alpha1
kind: AppInstance
metadata:
name: influxdb
namespace: influxdb
spec:
package:
# ...
spec:
components:
garbage-collector:
template:
containers:
iox:
env:
INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF: '6h'
INFLUXDB_IOX_GC_PARQUETFILE_CUTOFF: '6h'
```
3. Use `kubectl apply` to apply the configuration changes to your cluster and
add or update environment variables in each component.
<!-- pytest.mark.skip -->
```bash
kubectl apply \
--filename myinfluxdb.yml \
--namespace influxdb
```
{{% note %}}
#### Update environment variables instead of removing them
Most configuration settings that can be overridden by environment variables have
default values that are used if the environment variable is unset. Removing
environment variables from your `AppInstance` resource configuration will not
remove those environment variables entirely; instead, they will revert to their
default settings. To revert to the default settings, simply unset the
environment variable or update the value in your `AppInstance` resource to the
default value.
In the preceding example, the `INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF` environment
variable is set to `6h`. If you remove `INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF` from
the `env` property, the cutoff reverts to its default setting of `30d`.
{{% /note %}}
{{< expand-wrapper >}}
{{% expand "View example of environment variables in all components" %}}
```yaml
apiVersion: kubecfg.dev/v1alpha1
kind: AppInstance
metadata:
name: influxdb
namespace: influxdb
spec:
package:
# ...
spec:
components:
ingester:
template:
containers:
iox:
env:
INFLUXDB_IOX_WAL_ROTATION_PERIOD_SECONDS: '360'
querier:
template:
containers:
iox:
env:
INFLUXDB_IOX_EXEC_MEM_POOL_BYTES: '10737418240' # 10GiB
router:
template:
containers:
iox:
env:
INFLUXDB_IOX_MAX_HTTP_REQUESTS: '5000'
compactor:
template:
containers:
iox:
env:
INFLUXDB_IOX_EXEC_MEM_POOL_PERCENT: '80'
garbage-collector:
template:
containers:
iox:
env:
INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF: '6h'
INFLUXDB_IOX_GC_PARQUETFILE_CUTOFF: '6h'
```
{{% /expand %}}
{{< /expand-wrapper >}}

View File

@ -27,20 +27,6 @@ the InfluxDB Clustered software.
- [License expiry logs](#license-expiry-logs) - [License expiry logs](#license-expiry-logs)
- [Query brownout](#query-brownout) - [Query brownout](#query-brownout)
{{% note %}}
#### License enforcement is currently an opt-in feature
In currently available versions of InfluxDB Clustered, license enforcement is an
opt-in feature that allows InfluxData to introduce license enforcement to
customers, and allows customers to deactivate the feature if issues arise.
In the future, all releases of InfluxDB Clustered will require customers to
configure an active license before they can use the product.
To opt into license enforcement, include the `useLicensedBinaries` feature flag
in your `AppInstance` resource _([See the example below](#enable-feature-flag))_.
To deactivate license enforcement, remove the `useLicensedBinaries` feature flag.
{{% /note %}}
## Install your InfluxDB license ## Install your InfluxDB license
{{% note %}} {{% note %}}
@ -64,22 +50,6 @@ install your license.
kubectl apply --filename license.yml --namespace influxdb kubectl apply --filename license.yml --namespace influxdb
``` ```
4. <span id="enable-feature-flag"></span>
Update your `AppInstance` resource to include the `useLicensedBinaries` feature flag.
Add the `useLicensedBinaries` entry to the `.spec.package.spec.featureFlags`
property--for example:
```yml
apiVersion: kubecfg.dev/v1alpha1
kind: AppInstance
# ...
spec:
package:
spec:
featureFlags:
- useLicensedBinaries
```
InfluxDB Clustered detects the `License` resource and extracts the credentials InfluxDB Clustered detects the `License` resource and extracts the credentials
into a secret required by InfluxDB Clustered Kubernetes pods. into a secret required by InfluxDB Clustered Kubernetes pods.
Pods validate the license secret both at startup and periodically (roughly once Pods validate the license secret both at startup and periodically (roughly once
@ -115,7 +85,6 @@ license enforcement.
### A valid license is required ### A valid license is required
_When you include the `useLicensedBinaries` feature flag_,
Kubernetes pods running in your InfluxDB cluster must have a valid `License` Kubernetes pods running in your InfluxDB cluster must have a valid `License`
resource to run. Licenses are issued by InfluxData. If there is no `License` resource to run. Licenses are issued by InfluxData. If there is no `License`
resource installed in your cluster, one of two things may happen: resource installed in your cluster, one of two things may happen:

View File

@ -17,20 +17,6 @@ related:
Install your InfluxDB Clustered license in your cluster to authorize the use Install your InfluxDB Clustered license in your cluster to authorize the use
of the InfluxDB Clustered software. of the InfluxDB Clustered software.
{{% note %}}
#### License enforcement is currently an opt-in feature
In currently available versions of InfluxDB Clustered, license enforcement is an
opt-in feature that allows InfluxData to introduce license enforcement to
customers, and allows customers to deactivate the feature if issues arise.
In the future, all releases of InfluxDB Clustered will require customers to
configure an active license before they can use the product.
To opt into license enforcement, include the `useLicensedBinaries` feature flag
in your `AppInstance` resource _([See the example below](#enable-feature-flag))_.
To deactivate license enforcement, remove the `useLicensedBinaries` feature flag.
{{% /note %}}
## Install your InfluxDB license ## Install your InfluxDB license
1. If you haven't already, 1. If you haven't already,
@ -46,46 +32,6 @@ To deactivate license enforcement, remove the `useLicensedBinaries` feature flag
kubectl apply --filename license.yml --namespace influxdb kubectl apply --filename license.yml --namespace influxdb
``` ```
4. <span id="enable-feature-flag"></span>
Update your `AppInstance` resource to activate the `useLicensedBinaries` feature flag:
- If configuring the `AppInstance` resource directly, add the
`useLicensedBinaries` entry to the `.spec.package.spec.featureFlags`
property.
- If using the [InfluxDB Clustered Helm chart](https://github.com/influxdata/helm-charts/tree/master/charts/influxdb3-clustered), add the `useLicensedBinaries` entry to the
`featureFlags` property in your `values.yaml`.
{{< code-tabs-wrapper >}}
{{% code-tabs %}}
[AppInstance](#)
[Helm](#)
{{% /code-tabs %}}
{{% code-tab-content %}}
```yml
apiVersion: kubecfg.dev/v1alpha1
kind: AppInstance
# ...
spec:
package:
spec:
featureFlags:
- useLicensedBinaries
```
{{% /code-tab-content %}}
{{% code-tab-content %}}
```yml
# values.yaml
featureFlags:
- useLicensedBinaries
```
{{% /code-tab-content %}}
{{< /code-tabs-wrapper >}}
InfluxDB Clustered detects the `License` resource and extracts the credentials InfluxDB Clustered detects the `License` resource and extracts the credentials
into a secret required by InfluxDB Clustered Kubernetes pods. into a secret required by InfluxDB Clustered Kubernetes pods.
Pods validate the license secret both at startup and periodically (roughly once Pods validate the license secret both at startup and periodically (roughly once
@ -97,7 +43,8 @@ If you are currently using a non-licensed preview release of InfluxDB Clustered
and want to upgrade to a licensed release, do the following: and want to upgrade to a licensed release, do the following:
1. [Install an InfluxDB license](#install-your-influxdb-license) 1. [Install an InfluxDB license](#install-your-influxdb-license)
2. If you [use the `AppInstance` resource configuration](/influxdb/clustered/install/configure-cluster/directly/) to configure your cluster, in your `myinfluxdb.yml`, 2. If you [use the `AppInstance` resource configuration](/influxdb/clustered/install/configure-cluster/directly/)
to configure your cluster, in your `myinfluxdb.yml`,
update the package version defined in `spec.package.image` to use a licensed update the package version defined in `spec.package.image` to use a licensed
release. release.

View File

@ -26,6 +26,129 @@ identified below with the <span class="cf-icon Shield pink"></span> icon.
--- ---
## 20240925-1257864 {date="2024-09-25" .checkpoint}
### Quickstart
```yaml
spec:
package:
image: us-docker.pkg.dev/influxdb2-artifacts/clustered/influxdb:202409XX-XXXXXXX
```
### Highlights
#### Default to partial write semantics
In InfluxDB Clustered 20240925-1257864+, "partial writes" are enabled by default.
With partial writes enabled, InfluxDB accepts write requests with invalid or
malformed lines of line protocol and successfully write valid lines and rejects
invalid lines. Previously, if any line protocol in a batch was invalid, the
entire batch was rejected and no data was written.
To disable partial writes and revert back to the previous behavior, set the
`INFLUXDB_IOX_PARTIAL_WRITES_ENABLED` environment variable on your cluster's
Ingester to `false`. Define this environment variable in the
`spec.package.spec.components.ingester.template.containers.iox.env` property in
your `AppInstance` resource.
{{< expand-wrapper >}}
{{% expand "View example of disabling partial writes in your `AppInstance` resource" %}}
```yaml
apiVersion: kubecfg.dev/v1alpha1
kind: AppInstance
metadata:
name: influxdb
namespace: influxdb
spec:
package:
spec:
components:
ingester:
template:
containers:
iox:
env:
INFLUXDB_IOX_PARTIAL_WRITES_ENABLED: false
```
{{% /expand %}}
{{< /expand-wrapper >}}
For more information about defining variables in your InfluxDB cluster, see
[Manage environment variables in your InfluxDB Cluster](/influxdb/clustered/admin/env-vars/).
##### Write API behaviors
When submitting a write request that includes invalid or malformed line protocol,
The InfluxDB write API returns a 400 response code and does the following:
- With partial writes _enabled_:
- Writes all valid points and rejects all invalid points.
- Includes details about the [rejected points](/influxdb/clustered/write-data/troubleshoot/#troubleshoot-rejected-points)
(up to 100 points) in the response body.
- With partial writes _disabled_:
- Rejects all points in the batch.
- Includes an error message and the first malformed line of line protocol in
the response body.
#### Deploy and use the Catalog service by default
The Catalog service is a new IOx component that centralizes access to the
InfluxDB Catalog among Ingesters, Queriers, Compactors, and Garbage Collectors.
This is expected to improve Catalog query performance overall with an expected
drop in ninety-ninth percentile (p99) latencies.
### Upgrade notes
#### License now required
A valid license token is now required to start up your InfluxDB Cluster.
To avoid possible complications, ensure you have a valid license token. If you
do not, contact your InfluxData sales representative to get a license token
**before upgrading to this release**.
#### Removed prometheusOperator feature flag
The `prometheusOperator` feature flag has been removed.
**If you current have this feature flag enabled in your `AppInstance` resource,
remove it before upgrading to this release.**
This flag was deprecated in a previous release, but from this release forward,
enabling this feature flag may cause errors.
The installation of the Prometheus operator should be handled externally.
### Changes
#### Deployment
- Introduces the `nodeAffinity` and CPU/Memory requests setting for "granite"
components. Previously, these settings were only available for core IOx
components.
- Prior to this release, many of the IOx dashboards deployed with the `grafana`
feature flag were showing "no data." This has been fixed and now all
dashboards should display actual data.
#### Database Engine
- Adjusted compactor concurrency scaling heuristic to improve performance as
memory and CPU scale.
- Adjusted default `INFLUXDB_IOX_COMPACTION_PARTITION_MINUTE_THRESHOLD` from
`20m` to `100m` to help compactor more quickly rediscover cool partitions.
#### Configuration
- Introduces the `podAntiAffinity` setting for InfluxDB Clustered components.
Previously, the scheduling of pods was influenced by the Kubernetes
scheduler's default behavior. For further details, see the
[Kubernetes pod affinity documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#types-of-inter-pod-affinity-and-anti-affinity).
---
## 20240819-1176644 {date="2024-08-19" .checkpoint} ## 20240819-1176644 {date="2024-08-19" .checkpoint}
### Quickstart ### Quickstart
@ -463,7 +586,7 @@ mounted into your existing Grafana instance.
An authentication component, previously known as `authz`, has been consolidated An authentication component, previously known as `authz`, has been consolidated
into the `token-management` service. into the `token-management` service.
There is now a temporary `Job` in place, `delete-authz-schema`, that Now there is a temporary `Job` in place, `delete-authz-schema`, that
automatically removes the `authz` schema from the configured PostgreSQL database. automatically removes the `authz` schema from the configured PostgreSQL database.
### Changes ### Changes
@ -704,7 +827,7 @@ the `create-admin-token` job.
#### Deployment #### Deployment
- Increase HTTP write request limit from 10MB to 50MB. - Increase HTTP write request limit from 10 MB to 50 MB.
- Added support for [Telegraf Operator](https://github.com/influxdata/telegraf-operator). - Added support for [Telegraf Operator](https://github.com/influxdata/telegraf-operator).
We have added the `telegraf.influxdata.com/port` annotation to all the pods. We have added the `telegraf.influxdata.com/port` annotation to all the pods.
No configuration is required. We don't yet provide a way to specify the No configuration is required. We don't yet provide a way to specify the

View File

@ -5,7 +5,8 @@ weight: 106
description: > description: >
Troubleshoot issues writing data. Troubleshoot issues writing data.
Find response codes for failed writes. Find response codes for failed writes.
Discover how writes fail, from exceeding rate or payload limits, to syntax errors and schema conflicts. Discover how writes fail, from exceeding rate or payload limits, to syntax
errors and schema conflicts.
menu: menu:
influxdb_clustered: influxdb_clustered:
name: Troubleshoot issues name: Troubleshoot issues
@ -17,7 +18,8 @@ related:
- /influxdb/clustered/reference/internals/durability/ - /influxdb/clustered/reference/internals/durability/
--- ---
Learn how to avoid unexpected results and recover from errors when writing to {{% product-name %}}. Learn how to avoid unexpected results and recover from errors when writing to
{{% product-name %}}.
- [Handle write responses](#handle-write-responses) - [Handle write responses](#handle-write-responses)
- [Review HTTP status codes](#review-http-status-codes) - [Review HTTP status codes](#review-http-status-codes)
@ -26,12 +28,26 @@ Learn how to avoid unexpected results and recover from errors when writing to {{
## Handle write responses ## Handle write responses
In {{% product-name %}}, writes are synchronous. {{% product-name %}} does the following when you send a write request:
After InfluxDB validates the request and ingests the data, it sends a _success_ response (HTTP `204` status code) as an acknowledgement that the data is written and queryable.
To ensure that InfluxDB handles writes in the order you request them, wait for the acknowledgement before you send the next request.
If InfluxDB successfully writes all the request data to the database, it returns _success_ (HTTP `204` status code). 1. Validates the request.
The first rejected point in a batch causes InfluxDB to reject the entire batch and respond with an [HTTP error status](#review-http-status-codes). 2. If successful, attempts to ingest data from the request body; otherwise,
responds with an [error status](#review-http-status-codes).
3. Ingests or rejects data in the batch and returns one of the following HTTP
status codes:
- `204 No Content`: All data in the batch is ingested.
- `400 Bad Request`: Some or all of the data has been rejected.
Data that has not been rejected is ingested and queryable.
The response body contains error details about
[rejected points](#troubleshoot-rejected-points), up to 100 points.
Writes are synchronous--the response status indicates the final status of the
write and all ingested data is queryable.
To ensure that InfluxDB handles writes in the order you request them,
wait for the response before you send the next request.
### Review HTTP status codes ### Review HTTP status codes
@ -42,7 +58,7 @@ Write requests return the following status codes:
| HTTP response code | Message | Description | | HTTP response code | Message | Description |
| :-------------------------------| :--------------------------------------------------------------- | :------------- | | :-------------------------------| :--------------------------------------------------------------- | :------------- |
| `204 "Success"` | | If InfluxDB ingested the data | | `204 "Success"` | | If InfluxDB ingested the data |
| `400 "Bad request"` | `message` contains the first malformed line | If data is malformed | | `400 "Bad request"` | error details about rejected points, up to 100 points: `line` contains the first rejected line, `message` describes rejections | If some or all request data isn't allowed (for example, if it is malformed or falls outside of the bucket's retention period)--the response body indicates whether a partial write has occurred or if all data has been rejected |
| `401 "Unauthorized"` | | If the `Authorization` header is missing or malformed or if the [token](/influxdb/clustered/admin/tokens/) doesn't have [permission](/influxdb/clustered/reference/cli/influxctl/token/create/#examples) to write to the database. See [examples using credentials](/influxdb/clustered/get-started/write/#write-line-protocol-to-influxdb) in write requests. | | `401 "Unauthorized"` | | If the `Authorization` header is missing or malformed or if the [token](/influxdb/clustered/admin/tokens/) doesn't have [permission](/influxdb/clustered/reference/cli/influxctl/token/create/#examples) to write to the database. See [examples using credentials](/influxdb/clustered/get-started/write/#write-line-protocol-to-influxdb) in write requests. |
| `404 "Not found"` | requested **resource type** (for example, "organization" or "database"), and **resource name** | If a requested resource (for example, organization or database) wasn't found | | `404 "Not found"` | requested **resource type** (for example, "organization" or "database"), and **resource name** | If a requested resource (for example, organization or database) wasn't found |
| `500 "Internal server error"` | | Default status for an error | | `500 "Internal server error"` | | Default status for an error |
@ -62,6 +78,10 @@ If you notice data is missing in your database, do the following:
## Troubleshoot rejected points ## Troubleshoot rejected points
InfluxDB rejects points that fall within the same partition (default partitioning is measurement and day) as existing bucket data and have a different data type for an existing field. InfluxDB rejects points that fall within the same partition (default partitioning
is by measurement and day) as existing bucket data and have a different data type
for an existing field.
Check for [field data type](/influxdb/clustered/reference/syntax/line-protocol/#data-types-and-format) differences between the rejected data point and points within the same database and partition--for example, did you attempt to write `string` data to an `int` field? Check for [field data type](/influxdb/clustered/reference/syntax/line-protocol/#data-types-and-format)
differences between the rejected data point and points within the same database
and partition--for example, did you attempt to write `string` data to an `int` field?