From a62f69ffaf4c0ae1ef11da592e807a3224760aaf Mon Sep 17 00:00:00 2001 From: Scott Anderson Date: Fri, 27 Sep 2024 09:30:39 -0600 Subject: [PATCH] InfluxDB Clustered partial writes (#5594) * updated status code information for partial writes * docs: add environment variable tuning explanation (#5579) * WIP clustered partial writes * fixed yaml error * fixed duplicate key in clustered api docs * Apply suggestions from code review Co-authored-by: Jason Stirnaman * Update content/influxdb/clustered/admin/env-vars.md * Apply suggestions from code review * add placeholder release notes for next clustered version * update clustered release notes, remove option license info --------- Co-authored-by: Jack <56563911+jdockerty@users.noreply.github.com> Co-authored-by: Jason Stirnaman --- .../vocabularies/InfluxDataDocs/accept.txt | 2 +- api-docs/clustered/v2/ref.yml | 20 +- .../admin/bypass-identity-provider.md | 2 +- content/influxdb/clustered/admin/env-vars.md | 171 ++++++++++++++++++ content/influxdb/clustered/admin/licensing.md | 31 ---- .../influxdb/clustered/install/licensing.md | 57 +----- .../reference/release-notes/clustered.md | 127 ++++++++++++- .../clustered/write-data/troubleshoot.md | 40 +++- 8 files changed, 348 insertions(+), 102 deletions(-) create mode 100644 content/influxdb/clustered/admin/env-vars.md diff --git a/.ci/vale/styles/config/vocabularies/InfluxDataDocs/accept.txt b/.ci/vale/styles/config/vocabularies/InfluxDataDocs/accept.txt index dae2608d1..6c0a4fb86 100644 --- a/.ci/vale/styles/config/vocabularies/InfluxDataDocs/accept.txt +++ b/.ci/vale/styles/config/vocabularies/InfluxDataDocs/accept.txt @@ -68,7 +68,7 @@ influx3 influxctl influxd influxdata.com -iox +(iox|IOx) keep-url lat locf diff --git a/api-docs/clustered/v2/ref.yml b/api-docs/clustered/v2/ref.yml index 239859457..b617a7103 100644 --- a/api-docs/clustered/v2/ref.yml +++ b/api-docs/clustered/v2/ref.yml @@ -646,17 +646,33 @@ paths: '204': description: Write data is correctly formatted and accepted for writing to the database. '400': + description: | + Data from the batch was rejected and not written. The response body indicates if a partial write occurred or all data was rejected. + If a partial write occurred, then some points from the batch are written and queryable. + The response body contains details about the [rejected points](/influxdb/clustered/write-data/troubleshoot/#troubleshoot-rejected-points), up to 100 points. content: application/json: + examples: + rejectedAllPoints: + summary: Rejected all points + value: + code: invalid + line: 2 + message: 'no data written, errors encountered on line(s): error message for first rejected point error message for second rejected point error message for Nth rejected point (up to 100 rejected points)' + partialWriteErrorWithRejectedPoints: + summary: Partial write rejects some points + value: + code: invalid + line: 2 + message: 'partial write has occurred, errors encountered on line(s): error message for first rejected point error message for second rejected point error message for Nth rejected point (up to 100 rejected points)' schema: $ref: '#/components/schemas/LineProtocolError' - description: Line protocol poorly formed and no points were written. Response can be used to determine the first malformed line in the body line-protocol. All data in body was rejected and not written. '401': content: application/json: schema: $ref: '#/components/schemas/Error' - description: Token doesn't have sufficient permissions to write to this database or the database doesn't exist. + description: Token doesn't have sufficient permissions to write to this database or the database doesn't exist. '403': content: application/json: diff --git a/content/influxdb/clustered/admin/bypass-identity-provider.md b/content/influxdb/clustered/admin/bypass-identity-provider.md index c9a3e6caa..7edd944ed 100644 --- a/content/influxdb/clustered/admin/bypass-identity-provider.md +++ b/content/influxdb/clustered/admin/bypass-identity-provider.md @@ -7,7 +7,7 @@ description: > menu: influxdb_clustered: parent: Administer InfluxDB Clustered -weight: 208 +weight: 209 --- {{< product-name >}} generates a valid access token (known as the _admin token_) diff --git a/content/influxdb/clustered/admin/env-vars.md b/content/influxdb/clustered/admin/env-vars.md new file mode 100644 index 000000000..a17bd8201 --- /dev/null +++ b/content/influxdb/clustered/admin/env-vars.md @@ -0,0 +1,171 @@ +--- +title: Manage environment variables in your InfluxDB Cluster +description: > + Use environment variables to define settings for individual components in your + InfluxDB cluster. +menu: + influxdb_clustered: + parent: Administer InfluxDB Clustered + name: Manage environment variables +weight: 208 +--- + +Use environment variables to define settings for individual components in your +InfluxDB cluster and adjust your cluster's running configuration. +Define environment variables for each component in your `AppInstance` resource. + +InfluxDB Clustered components support various environment variables. +While many of these variables have default settings, you can customize them by +setting your own values. + +{{% warn %}} +#### Overriding default settings may affect overall cluster performance + +{{% product-name %}} components have complex interactions that can be affected +when overriding default configuration settings. +Changing these settings may impact overall cluster performance. +Before making configuration changes using environment variables, consider +consulting [InfluxData Support](https://support.influxdata.com/) to identify any +potential unintended consequences. +{{% /warn %}} + +## AppInstance component schema + +In your `AppInstance` resource, configure individual component settings in the +`spec.package.spec.components` property. This property supports the following +InfluxDB Clustered component keys: + +- `ingester` +- `querier` +- `router` +- `compactor` +- `garbage-collector` + +```yaml +apiVersion: kubecfg.dev/v1alpha1 +kind: AppInstance +metadata: + name: influxdb + namespace: influxdb +spec: + package: + # ... + spec: + components: + ingester: + # Ingester settings ... + querier: + # Querier settings ... + router: + # Router settings. ... + compactor: + # Compactor settings ... + garbage-collector: + # Garbage collector settings ... +``` + +_For more information about components in the InfluxDB v3 storage engine, see +the [InfluxDB v3 storage engine architecture](/influxdb/clustered/reference/internals/storage-engine/)._ + +## Set environment variables for a component + +1. Under the specific component property, use the + `.template.containers.iox.env` property to define environment + variables. +2. In the `env` property, structure each environment variable as a key-value pair. + For example, to configure environment variables for the Garbage collector: + + ```yaml + apiVersion: kubecfg.dev/v1alpha1 + kind: AppInstance + metadata: + name: influxdb + namespace: influxdb + spec: + package: + # ... + spec: + components: + garbage-collector: + template: + containers: + iox: + env: + INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF: '6h' + INFLUXDB_IOX_GC_PARQUETFILE_CUTOFF: '6h' + ``` + +3. Use `kubectl apply` to apply the configuration changes to your cluster and + add or update environment variables in each component. + + + + ```bash + kubectl apply \ + --filename myinfluxdb.yml \ + --namespace influxdb + ``` +{{% note %}} +#### Update environment variables instead of removing them + +Most configuration settings that can be overridden by environment variables have +default values that are used if the environment variable is unset. Removing +environment variables from your `AppInstance` resource configuration will not +remove those environment variables entirely; instead, they will revert to their +default settings. To revert to the default settings, simply unset the +environment variable or update the value in your `AppInstance` resource to the +default value. + +In the preceding example, the `INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF` environment +variable is set to `6h`. If you remove `INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF` from +the `env` property, the cutoff reverts to its default setting of `30d`. +{{% /note %}} + +{{< expand-wrapper >}} +{{% expand "View example of environment variables in all components" %}} + +```yaml +apiVersion: kubecfg.dev/v1alpha1 +kind: AppInstance +metadata: + name: influxdb + namespace: influxdb +spec: + package: + # ... + spec: + components: + ingester: + template: + containers: + iox: + env: + INFLUXDB_IOX_WAL_ROTATION_PERIOD_SECONDS: '360' + querier: + template: + containers: + iox: + env: + INFLUXDB_IOX_EXEC_MEM_POOL_BYTES: '10737418240' # 10GiB + router: + template: + containers: + iox: + env: + INFLUXDB_IOX_MAX_HTTP_REQUESTS: '5000' + compactor: + template: + containers: + iox: + env: + INFLUXDB_IOX_EXEC_MEM_POOL_PERCENT: '80' + garbage-collector: + template: + containers: + iox: + env: + INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF: '6h' + INFLUXDB_IOX_GC_PARQUETFILE_CUTOFF: '6h' +``` +{{% /expand %}} +{{< /expand-wrapper >}} diff --git a/content/influxdb/clustered/admin/licensing.md b/content/influxdb/clustered/admin/licensing.md index 90f716bbe..bda66025d 100644 --- a/content/influxdb/clustered/admin/licensing.md +++ b/content/influxdb/clustered/admin/licensing.md @@ -27,20 +27,6 @@ the InfluxDB Clustered software. - [License expiry logs](#license-expiry-logs) - [Query brownout](#query-brownout) -{{% note %}} -#### License enforcement is currently an opt-in feature - -In currently available versions of InfluxDB Clustered, license enforcement is an -opt-in feature that allows InfluxData to introduce license enforcement to -customers, and allows customers to deactivate the feature if issues arise. -In the future, all releases of InfluxDB Clustered will require customers to -configure an active license before they can use the product. - -To opt into license enforcement, include the `useLicensedBinaries` feature flag -in your `AppInstance` resource _([See the example below](#enable-feature-flag))_. -To deactivate license enforcement, remove the `useLicensedBinaries` feature flag. -{{% /note %}} - ## Install your InfluxDB license {{% note %}} @@ -64,22 +50,6 @@ install your license. kubectl apply --filename license.yml --namespace influxdb ``` -4. - Update your `AppInstance` resource to include the `useLicensedBinaries` feature flag. - Add the `useLicensedBinaries` entry to the `.spec.package.spec.featureFlags` - property--for example: - - ```yml - apiVersion: kubecfg.dev/v1alpha1 - kind: AppInstance - # ... - spec: - package: - spec: - featureFlags: - - useLicensedBinaries - ``` - InfluxDB Clustered detects the `License` resource and extracts the credentials into a secret required by InfluxDB Clustered Kubernetes pods. Pods validate the license secret both at startup and periodically (roughly once @@ -115,7 +85,6 @@ license enforcement. ### A valid license is required -_When you include the `useLicensedBinaries` feature flag_, Kubernetes pods running in your InfluxDB cluster must have a valid `License` resource to run. Licenses are issued by InfluxData. If there is no `License` resource installed in your cluster, one of two things may happen: diff --git a/content/influxdb/clustered/install/licensing.md b/content/influxdb/clustered/install/licensing.md index df245008f..a15b6ea0c 100644 --- a/content/influxdb/clustered/install/licensing.md +++ b/content/influxdb/clustered/install/licensing.md @@ -17,20 +17,6 @@ related: Install your InfluxDB Clustered license in your cluster to authorize the use of the InfluxDB Clustered software. -{{% note %}} -#### License enforcement is currently an opt-in feature - -In currently available versions of InfluxDB Clustered, license enforcement is an -opt-in feature that allows InfluxData to introduce license enforcement to -customers, and allows customers to deactivate the feature if issues arise. -In the future, all releases of InfluxDB Clustered will require customers to -configure an active license before they can use the product. - -To opt into license enforcement, include the `useLicensedBinaries` feature flag -in your `AppInstance` resource _([See the example below](#enable-feature-flag))_. -To deactivate license enforcement, remove the `useLicensedBinaries` feature flag. -{{% /note %}} - ## Install your InfluxDB license 1. If you haven't already, @@ -46,46 +32,6 @@ To deactivate license enforcement, remove the `useLicensedBinaries` feature flag kubectl apply --filename license.yml --namespace influxdb ``` -4. - Update your `AppInstance` resource to activate the `useLicensedBinaries` feature flag: - - - If configuring the `AppInstance` resource directly, add the - `useLicensedBinaries` entry to the `.spec.package.spec.featureFlags` - property. - - If using the [InfluxDB Clustered Helm chart](https://github.com/influxdata/helm-charts/tree/master/charts/influxdb3-clustered), add the `useLicensedBinaries` entry to the - `featureFlags` property in your `values.yaml`. - - {{< code-tabs-wrapper >}} -{{% code-tabs %}} -[AppInstance](#) -[Helm](#) -{{% /code-tabs %}} -{{% code-tab-content %}} - -```yml -apiVersion: kubecfg.dev/v1alpha1 -kind: AppInstance -# ... -spec: - package: - spec: - featureFlags: - - useLicensedBinaries -``` - -{{% /code-tab-content %}} -{{% code-tab-content %}} - -```yml -# values.yaml - -featureFlags: - - useLicensedBinaries -``` - -{{% /code-tab-content %}} - {{< /code-tabs-wrapper >}} - InfluxDB Clustered detects the `License` resource and extracts the credentials into a secret required by InfluxDB Clustered Kubernetes pods. Pods validate the license secret both at startup and periodically (roughly once @@ -97,7 +43,8 @@ If you are currently using a non-licensed preview release of InfluxDB Clustered and want to upgrade to a licensed release, do the following: 1. [Install an InfluxDB license](#install-your-influxdb-license) -2. If you [use the `AppInstance` resource configuration](/influxdb/clustered/install/configure-cluster/directly/) to configure your cluster, in your `myinfluxdb.yml`, +2. If you [use the `AppInstance` resource configuration](/influxdb/clustered/install/configure-cluster/directly/) + to configure your cluster, in your `myinfluxdb.yml`, update the package version defined in `spec.package.image` to use a licensed release. diff --git a/content/influxdb/clustered/reference/release-notes/clustered.md b/content/influxdb/clustered/reference/release-notes/clustered.md index 8ad663db5..cf1b3f234 100644 --- a/content/influxdb/clustered/reference/release-notes/clustered.md +++ b/content/influxdb/clustered/reference/release-notes/clustered.md @@ -26,6 +26,129 @@ identified below with the icon. --- +## 20240925-1257864 {date="2024-09-25" .checkpoint} + +### Quickstart + +```yaml +spec: + package: + image: us-docker.pkg.dev/influxdb2-artifacts/clustered/influxdb:202409XX-XXXXXXX +``` + +### Highlights + +#### Default to partial write semantics + +In InfluxDB Clustered 20240925-1257864+, "partial writes" are enabled by default. +With partial writes enabled, InfluxDB accepts write requests with invalid or +malformed lines of line protocol and successfully write valid lines and rejects +invalid lines. Previously, if any line protocol in a batch was invalid, the +entire batch was rejected and no data was written. + +To disable partial writes and revert back to the previous behavior, set the +`INFLUXDB_IOX_PARTIAL_WRITES_ENABLED` environment variable on your cluster's +Ingester to `false`. Define this environment variable in the +`spec.package.spec.components.ingester.template.containers.iox.env` property in +your `AppInstance` resource. + +{{< expand-wrapper >}} +{{% expand "View example of disabling partial writes in your `AppInstance` resource" %}} + +```yaml +apiVersion: kubecfg.dev/v1alpha1 +kind: AppInstance +metadata: + name: influxdb + namespace: influxdb +spec: + package: + spec: + components: + ingester: + template: + containers: + iox: + env: + INFLUXDB_IOX_PARTIAL_WRITES_ENABLED: false +``` + +{{% /expand %}} +{{< /expand-wrapper >}} + +For more information about defining variables in your InfluxDB cluster, see +[Manage environment variables in your InfluxDB Cluster](/influxdb/clustered/admin/env-vars/). + +##### Write API behaviors + +When submitting a write request that includes invalid or malformed line protocol, +The InfluxDB write API returns a 400 response code and does the following: + +- With partial writes _enabled_: + + - Writes all valid points and rejects all invalid points. + - Includes details about the [rejected points](/influxdb/clustered/write-data/troubleshoot/#troubleshoot-rejected-points) + (up to 100 points) in the response body. + +- With partial writes _disabled_: + + - Rejects all points in the batch. + - Includes an error message and the first malformed line of line protocol in + the response body. + +#### Deploy and use the Catalog service by default + +The Catalog service is a new IOx component that centralizes access to the +InfluxDB Catalog among Ingesters, Queriers, Compactors, and Garbage Collectors. +This is expected to improve Catalog query performance overall with an expected +drop in ninety-ninth percentile (p99) latencies. + +### Upgrade notes + +#### License now required + +A valid license token is now required to start up your InfluxDB Cluster. +To avoid possible complications, ensure you have a valid license token. If you +do not, contact your InfluxData sales representative to get a license token +**before upgrading to this release**. + +#### Removed prometheusOperator feature flag + +The `prometheusOperator` feature flag has been removed. +**If you current have this feature flag enabled in your `AppInstance` resource, +remove it before upgrading to this release.** +This flag was deprecated in a previous release, but from this release forward, +enabling this feature flag may cause errors. + +The installation of the Prometheus operator should be handled externally. + +### Changes + +#### Deployment + +- Introduces the `nodeAffinity` and CPU/Memory requests setting for "granite" + components. Previously, these settings were only available for core IOx + components. +- Prior to this release, many of the IOx dashboards deployed with the `grafana` + feature flag were showing "no data." This has been fixed and now all + dashboards should display actual data. + +#### Database Engine + +- Adjusted compactor concurrency scaling heuristic to improve performance as + memory and CPU scale. +- Adjusted default `INFLUXDB_IOX_COMPACTION_PARTITION_MINUTE_THRESHOLD` from + `20m` to `100m` to help compactor more quickly rediscover cool partitions. + +#### Configuration + +- Introduces the `podAntiAffinity` setting for InfluxDB Clustered components. + Previously, the scheduling of pods was influenced by the Kubernetes + scheduler's default behavior. For further details, see the + [Kubernetes pod affinity documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#types-of-inter-pod-affinity-and-anti-affinity). + +--- + ## 20240819-1176644 {date="2024-08-19" .checkpoint} ### Quickstart @@ -463,7 +586,7 @@ mounted into your existing Grafana instance. An authentication component, previously known as `authz`, has been consolidated into the `token-management` service. -There is now a temporary `Job` in place, `delete-authz-schema`, that +Now there is a temporary `Job` in place, `delete-authz-schema`, that automatically removes the `authz` schema from the configured PostgreSQL database. ### Changes @@ -704,7 +827,7 @@ the `create-admin-token` job. #### Deployment -- Increase HTTP write request limit from 10MB to 50MB. +- Increase HTTP write request limit from 10 MB to 50 MB. - Added support for [Telegraf Operator](https://github.com/influxdata/telegraf-operator). We have added the `telegraf.influxdata.com/port` annotation to all the pods. No configuration is required. We don't yet provide a way to specify the diff --git a/content/influxdb/clustered/write-data/troubleshoot.md b/content/influxdb/clustered/write-data/troubleshoot.md index 1c7c9b5f5..5e397c5e4 100644 --- a/content/influxdb/clustered/write-data/troubleshoot.md +++ b/content/influxdb/clustered/write-data/troubleshoot.md @@ -5,7 +5,8 @@ weight: 106 description: > Troubleshoot issues writing data. Find response codes for failed writes. - Discover how writes fail, from exceeding rate or payload limits, to syntax errors and schema conflicts. + Discover how writes fail, from exceeding rate or payload limits, to syntax + errors and schema conflicts. menu: influxdb_clustered: name: Troubleshoot issues @@ -17,7 +18,8 @@ related: - /influxdb/clustered/reference/internals/durability/ --- -Learn how to avoid unexpected results and recover from errors when writing to {{% product-name %}}. +Learn how to avoid unexpected results and recover from errors when writing to +{{% product-name %}}. - [Handle write responses](#handle-write-responses) - [Review HTTP status codes](#review-http-status-codes) @@ -26,12 +28,26 @@ Learn how to avoid unexpected results and recover from errors when writing to {{ ## Handle write responses -In {{% product-name %}}, writes are synchronous. -After InfluxDB validates the request and ingests the data, it sends a _success_ response (HTTP `204` status code) as an acknowledgement that the data is written and queryable. -To ensure that InfluxDB handles writes in the order you request them, wait for the acknowledgement before you send the next request. +{{% product-name %}} does the following when you send a write request: -If InfluxDB successfully writes all the request data to the database, it returns _success_ (HTTP `204` status code). -The first rejected point in a batch causes InfluxDB to reject the entire batch and respond with an [HTTP error status](#review-http-status-codes). +1. Validates the request. +2. If successful, attempts to ingest data from the request body; otherwise, + responds with an [error status](#review-http-status-codes). +3. Ingests or rejects data in the batch and returns one of the following HTTP + status codes: + + - `204 No Content`: All data in the batch is ingested. + - `400 Bad Request`: Some or all of the data has been rejected. + Data that has not been rejected is ingested and queryable. + +The response body contains error details about +[rejected points](#troubleshoot-rejected-points), up to 100 points. + +Writes are synchronous--the response status indicates the final status of the +write and all ingested data is queryable. + +To ensure that InfluxDB handles writes in the order you request them, +wait for the response before you send the next request. ### Review HTTP status codes @@ -42,7 +58,7 @@ Write requests return the following status codes: | HTTP response code | Message | Description | | :-------------------------------| :--------------------------------------------------------------- | :------------- | | `204 "Success"` | | If InfluxDB ingested the data | -| `400 "Bad request"` | `message` contains the first malformed line | If data is malformed | +| `400 "Bad request"` | error details about rejected points, up to 100 points: `line` contains the first rejected line, `message` describes rejections | If some or all request data isn't allowed (for example, if it is malformed or falls outside of the bucket's retention period)--the response body indicates whether a partial write has occurred or if all data has been rejected | | `401 "Unauthorized"` | | If the `Authorization` header is missing or malformed or if the [token](/influxdb/clustered/admin/tokens/) doesn't have [permission](/influxdb/clustered/reference/cli/influxctl/token/create/#examples) to write to the database. See [examples using credentials](/influxdb/clustered/get-started/write/#write-line-protocol-to-influxdb) in write requests. | | `404 "Not found"` | requested **resource type** (for example, "organization" or "database"), and **resource name** | If a requested resource (for example, organization or database) wasn't found | | `500 "Internal server error"` | | Default status for an error | @@ -62,6 +78,10 @@ If you notice data is missing in your database, do the following: ## Troubleshoot rejected points -InfluxDB rejects points that fall within the same partition (default partitioning is measurement and day) as existing bucket data and have a different data type for an existing field. +InfluxDB rejects points that fall within the same partition (default partitioning +is by measurement and day) as existing bucket data and have a different data type +for an existing field. -Check for [field data type](/influxdb/clustered/reference/syntax/line-protocol/#data-types-and-format) differences between the rejected data point and points within the same database and partition--for example, did you attempt to write `string` data to an `int` field? +Check for [field data type](/influxdb/clustered/reference/syntax/line-protocol/#data-types-and-format) +differences between the rejected data point and points within the same database +and partition--for example, did you attempt to write `string` data to an `int` field?