InfluxDB Clustered partial writes (#5594)

* updated status code information for partial writes * docs: add environment variable tuning explanation (#5579) * WIP clustered partial writes * fixed yaml error * fixed duplicate key in clustered api docs * Apply suggestions from code review Co-authored-by: Jason Stirnaman <jstirnaman@influxdata.com> * Update content/influxdb/clustered/admin/env-vars.md * Apply suggestions from code review * add placeholder release notes for next clustered version * update clustered release notes, remove option license info --------- Co-authored-by: Jack <56563911+jdockerty@users.noreply.github.com> Co-authored-by: Jason Stirnaman <jstirnaman@influxdata.com>
2024-09-27 09:30:39 -06:00 · 2024-09-27 09:30:39 -06:00 · a62f69ffaf
parent 329e619a78
commit a62f69ffaf
8 changed files with 348 additions and 102 deletions
--- a/.ci/vale/styles/config/vocabularies/InfluxDataDocs/accept.txt
+++ b/.ci/vale/styles/config/vocabularies/InfluxDataDocs/accept.txt
@ -68,7 +68,7 @@ influx3
 influxctl
 influxd
 influxdata.com
-iox
+(iox|IOx)
 keep-url
 lat
 locf
--- a/api-docs/clustered/v2/ref.yml
+++ b/api-docs/clustered/v2/ref.yml
@ -646,17 +646,33 @@ paths:
        '204':
          description: Write data is correctly formatted and accepted for writing to the database.
        '400':
          description: |
            Data from the batch was rejected and not written. The response body indicates if a partial write occurred or all data was rejected.
            If a partial write occurred, then some points from the batch are written and queryable. 
            The response body contains details about the [rejected points](/influxdb/clustered/write-data/troubleshoot/#troubleshoot-rejected-points), up to 100 points.
          content:
            application/json:
              examples:
                rejectedAllPoints:
                  summary: Rejected all points
                  value:
                    code: invalid
                    line: 2
                    message: 'no data written, errors encountered on line(s): error message for first rejected point</n> error message for second rejected point</n> error message for Nth rejected point (up to 100 rejected points)'
                partialWriteErrorWithRejectedPoints:
                  summary: Partial write rejects some points
                  value:
                    code: invalid
                    line: 2
                    message: 'partial write has occurred, errors encountered on line(s): error message for first rejected point</n> error message for second rejected point</n> error message for Nth rejected point (up to 100 rejected points)'
              schema:
                $ref: '#/components/schemas/LineProtocolError'
          description: Line protocol poorly formed and no points were written.  Response can be used to determine the first malformed line in the body line-protocol. All data in body was rejected and not written.
        '401':
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
-          description: Token doesn't have sufficient permissions to write to this  database or the database doesn't exist.
+          description: Token doesn't have sufficient permissions to write to this database or the database doesn't exist.
        '403':
          content:
            application/json:
--- a/content/influxdb/clustered/admin/bypass-identity-provider.md
+++ b/content/influxdb/clustered/admin/bypass-identity-provider.md
@ -7,7 +7,7 @@ description: >
 menu:
  influxdb_clustered:
    parent: Administer InfluxDB Clustered
-weight: 208
+weight: 209
 ---
 {{< product-name >}} generates a valid access token (known as the _admin token_)
--- a/content/influxdb/clustered/admin/env-vars.md
+++ b/content/influxdb/clustered/admin/env-vars.md
@ -0,0 +1,171 @@
 ---
 title: Manage environment variables in your InfluxDB Cluster
 description: >
  Use environment variables to define settings for individual components in your
  InfluxDB cluster.
 menu:
  influxdb_clustered:
    parent: Administer InfluxDB Clustered
    name: Manage environment variables
 weight: 208
 ---
 Use environment variables to define settings for individual components in your
 InfluxDB cluster and adjust your cluster's running configuration.
 Define environment variables for each component in your `AppInstance` resource.
 InfluxDB Clustered components support various environment variables.
 While many of these variables have default settings, you can customize them by
 setting your own values.
 {{% warn %}}
 #### Overriding default settings may affect overall cluster performance
 {{% product-name %}} components have complex interactions that can be affected
 when overriding default configuration settings.
 Changing these settings may impact overall cluster performance.
 Before making configuration changes using environment variables, consider
 consulting [InfluxData Support](https://support.influxdata.com/) to identify any
 potential unintended consequences.
 {{% /warn %}}
 ## AppInstance component schema
 In your `AppInstance` resource, configure individual component settings in the
 `spec.package.spec.components` property. This property supports the following
 InfluxDB Clustered component keys:
 - `ingester`
 - `querier`
 - `router`
 - `compactor`
 - `garbage-collector`
 ```yaml
 apiVersion: kubecfg.dev/v1alpha1
 kind: AppInstance
 metadata:
  name: influxdb
  namespace: influxdb
 spec:
  package:
    # ...
    spec:
      components:
        ingester:
          # Ingester settings ...
        querier:
          # Querier settings ...
        router:
          # Router settings. ...
        compactor:
          # Compactor settings ...
        garbage-collector:
          # Garbage collector settings ...
 ```
 _For more information about components in the InfluxDB v3 storage engine, see
 the [InfluxDB v3 storage engine architecture](/influxdb/clustered/reference/internals/storage-engine/)._
 ## Set environment variables for a component
 1.  Under the specific component property, use the
    `<component>.template.containers.iox.env` property to define environment
    variables.
 2.  In the `env` property, structure each environment variable as a key-value pair.
    For example, to configure environment variables for the Garbage collector:
    ```yaml
    apiVersion: kubecfg.dev/v1alpha1
    kind: AppInstance
    metadata:
      name: influxdb
      namespace: influxdb
    spec:
      package:
        # ...
        spec:
          components:
            garbage-collector:
              template:
                containers:
                  iox:
                    env:
                      INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF: '6h'
                      INFLUXDB_IOX_GC_PARQUETFILE_CUTOFF: '6h'
    ```
 3.  Use `kubectl apply` to apply the configuration changes to your cluster and
    add or update environment variables in each component.
    <!-- pytest.mark.skip -->
    ```bash
    kubectl apply \
      --filename myinfluxdb.yml \
      --namespace influxdb
    ```
 {{% note %}}
 #### Update environment variables instead of removing them
 Most configuration settings that can be overridden by environment variables have
 default values that are used if the environment variable is unset. Removing
 environment variables from your `AppInstance` resource configuration will not
 remove those environment variables entirely; instead, they will revert to their
 default settings. To revert to the default settings, simply unset the
 environment variable or update the value in your `AppInstance` resource to the
 default value.
 In the preceding example, the `INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF` environment
 variable is set to `6h`. If you remove `INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF` from
 the `env` property, the cutoff reverts to its default setting of `30d`.
 {{% /note %}}
 {{< expand-wrapper >}}
 {{% expand "View example of environment variables in all components" %}}
 ```yaml
 apiVersion: kubecfg.dev/v1alpha1
 kind: AppInstance
 metadata:
  name: influxdb
  namespace: influxdb
 spec:
  package:
    # ...
    spec:
      components:
       ingester:
         template:
           containers:
             iox:
               env:
                 INFLUXDB_IOX_WAL_ROTATION_PERIOD_SECONDS: '360'
       querier:
         template:
           containers:
             iox:
               env:
                 INFLUXDB_IOX_EXEC_MEM_POOL_BYTES: '10737418240' # 10GiB
       router:
         template:
           containers:
             iox:
               env:
                 INFLUXDB_IOX_MAX_HTTP_REQUESTS: '5000'
       compactor:
         template:
           containers:
             iox:
               env:
                 INFLUXDB_IOX_EXEC_MEM_POOL_PERCENT: '80'
       garbage-collector:
         template:
           containers:
             iox:
               env:
                 INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF: '6h'
                 INFLUXDB_IOX_GC_PARQUETFILE_CUTOFF: '6h'
 ```
 {{% /expand %}}
 {{< /expand-wrapper >}}
--- a/content/influxdb/clustered/admin/licensing.md
+++ b/content/influxdb/clustered/admin/licensing.md
@ -27,20 +27,6 @@ the InfluxDB Clustered software.
    - [License expiry logs](#license-expiry-logs)
    - [Query brownout](#query-brownout)
 {{% note %}}
 #### License enforcement is currently an opt-in feature
 In currently available versions of InfluxDB Clustered, license enforcement is an
 opt-in feature that allows InfluxData to introduce license enforcement to
 customers, and allows customers to deactivate the feature if issues arise.
 In the future, all releases of InfluxDB Clustered will require customers to
 configure an active license before they can use the product.
 To opt into license enforcement, include the `useLicensedBinaries` feature flag
 in your `AppInstance` resource _([See the example below](#enable-feature-flag))_.
 To deactivate license enforcement, remove the `useLicensedBinaries` feature flag.
 {{% /note %}}
 ## Install your InfluxDB license
 {{% note %}}
@ -64,22 +50,6 @@ install your license.
    kubectl apply --filename license.yml --namespace influxdb
    ```
 4.  <span id="enable-feature-flag"></span>
    Update your `AppInstance` resource to include the `useLicensedBinaries` feature flag.
    Add the `useLicensedBinaries` entry to the `.spec.package.spec.featureFlags`
    property--for example:
    ```yml
    apiVersion: kubecfg.dev/v1alpha1
    kind: AppInstance
    # ...
    spec:
      package:
        spec:
          featureFlags:
            - useLicensedBinaries
    ```
 InfluxDB Clustered detects the `License` resource and extracts the credentials
 into a secret required by InfluxDB Clustered Kubernetes pods.
 Pods validate the license secret both at startup and periodically (roughly once
@ -115,7 +85,6 @@ license enforcement.
 ### A valid license is required
 _When you include the `useLicensedBinaries` feature flag_,
 Kubernetes pods running in your InfluxDB cluster must have a valid `License`
 resource to run. Licenses are issued by InfluxData. If there is no `License`
 resource installed in your cluster, one of two things may happen:
--- a/content/influxdb/clustered/install/licensing.md
+++ b/content/influxdb/clustered/install/licensing.md
@ -17,20 +17,6 @@ related:
 Install your InfluxDB Clustered license in your cluster to authorize the use
 of the InfluxDB Clustered software.
 {{% note %}}
 #### License enforcement is currently an opt-in feature
 In currently available versions of InfluxDB Clustered, license enforcement is an
 opt-in feature that allows InfluxData to introduce license enforcement to
 customers, and allows customers to deactivate the feature if issues arise.
 In the future, all releases of InfluxDB Clustered will require customers to
 configure an active license before they can use the product.
 To opt into license enforcement, include the `useLicensedBinaries` feature flag
 in your `AppInstance` resource _([See the example below](#enable-feature-flag))_.
 To deactivate license enforcement, remove the `useLicensedBinaries` feature flag.
 {{% /note %}}
 ## Install your InfluxDB license
 1.  If you haven't already,
@ -46,46 +32,6 @@ To deactivate license enforcement, remove the `useLicensedBinaries` feature flag
    kubectl apply --filename license.yml --namespace influxdb
    ```
 4.  <span id="enable-feature-flag"></span>
    Update your `AppInstance` resource to activate the `useLicensedBinaries` feature flag:
    - If configuring the `AppInstance` resource directly, add the
      `useLicensedBinaries` entry to the `.spec.package.spec.featureFlags`
      property.
    - If using the [InfluxDB Clustered Helm chart](https://github.com/influxdata/helm-charts/tree/master/charts/influxdb3-clustered), add the `useLicensedBinaries` entry to the
    `featureFlags` property in your `values.yaml`.
    {{< code-tabs-wrapper >}}
 {{% code-tabs %}}
 [AppInstance](#)
 [Helm](#)
 {{% /code-tabs %}}
 {{% code-tab-content %}}
 ```yml
 apiVersion: kubecfg.dev/v1alpha1
 kind: AppInstance
 # ...
 spec:
  package:
    spec:
      featureFlags:
        - useLicensedBinaries
 ```
 {{% /code-tab-content %}}
 {{% code-tab-content %}}
 ```yml
 # values.yaml
 featureFlags:
  - useLicensedBinaries
 ```
 {{% /code-tab-content %}}
    {{< /code-tabs-wrapper >}}
 InfluxDB Clustered detects the `License` resource and extracts the credentials
 into a secret required by InfluxDB Clustered Kubernetes pods.
 Pods validate the license secret both at startup and periodically (roughly once
@ -97,7 +43,8 @@ If you are currently using a non-licensed preview release of InfluxDB Clustered
 and want to upgrade to a licensed release, do the following:
 1.  [Install an InfluxDB license](#install-your-influxdb-license)
-2.  If you [use the `AppInstance` resource configuration](/influxdb/clustered/install/configure-cluster/directly/) to configure your cluster, in your `myinfluxdb.yml`,
+2.  If you [use the `AppInstance` resource configuration](/influxdb/clustered/install/configure-cluster/directly/)
    to configure your cluster, in your `myinfluxdb.yml`,
    update the package version defined in `spec.package.image` to use a licensed
    release.
--- a/content/influxdb/clustered/reference/release-notes/clustered.md
+++ b/content/influxdb/clustered/reference/release-notes/clustered.md
@ -26,6 +26,129 @@ identified below with the <span class="cf-icon Shield pink"></span> icon.
 ---
 ## 20240925-1257864 {date="2024-09-25" .checkpoint} 
 ### Quickstart
 ```yaml
 spec:
  package:
    image: us-docker.pkg.dev/influxdb2-artifacts/clustered/influxdb:202409XX-XXXXXXX
 ```
 ### Highlights
 #### Default to partial write semantics
 In InfluxDB Clustered 20240925-1257864+, "partial writes" are enabled by default.
 With partial writes enabled, InfluxDB accepts write requests with invalid or
 malformed lines of line protocol and successfully write valid lines and rejects
 invalid lines. Previously, if any line protocol in a batch was invalid, the
 entire batch was rejected and no data was written.
 To disable partial writes and revert back to the previous behavior, set the
 `INFLUXDB_IOX_PARTIAL_WRITES_ENABLED` environment variable on your cluster's
 Ingester to `false`. Define this environment variable in the
 `spec.package.spec.components.ingester.template.containers.iox.env` property in
 your `AppInstance` resource.
 {{< expand-wrapper >}}
 {{% expand "View example of disabling partial writes in your `AppInstance` resource" %}}
 ```yaml
 apiVersion: kubecfg.dev/v1alpha1
 kind: AppInstance
 metadata:
  name: influxdb
  namespace: influxdb
 spec:
  package:
    spec:
      components:
        ingester:
          template:
              containers:
                iox:
                  env:
                    INFLUXDB_IOX_PARTIAL_WRITES_ENABLED: false
 ```
 {{% /expand %}}
 {{< /expand-wrapper >}}
 For more information about defining variables in your InfluxDB cluster, see
 [Manage environment variables in your InfluxDB Cluster](/influxdb/clustered/admin/env-vars/).
 ##### Write API behaviors
 When submitting a write request that includes invalid or malformed line protocol,
 The InfluxDB write API returns a 400 response code and does the following: 
 - With partial writes _enabled_:
  - Writes all valid points and rejects all invalid points.
  - Includes details about the [rejected points](/influxdb/clustered/write-data/troubleshoot/#troubleshoot-rejected-points)
    (up to 100 points) in the response body.
 - With partial writes _disabled_:
  - Rejects all points in the batch.
  - Includes an error message and the first malformed line of line protocol in
    the response body.
 #### Deploy and use the Catalog service by default
 The Catalog service is a new IOx component that centralizes access to the
 InfluxDB Catalog among Ingesters, Queriers, Compactors, and Garbage Collectors.
 This is expected to improve Catalog query performance overall with an expected
 drop in ninety-ninth percentile (p99) latencies.
 ### Upgrade notes
 #### License now required
 A valid license token is now required to start up your InfluxDB Cluster.
 To avoid possible complications, ensure you have a valid license token. If you
 do not, contact your InfluxData sales representative to get a license token
 **before upgrading to this release**.
 #### Removed prometheusOperator feature flag
 The `prometheusOperator` feature flag has been removed.
 **If you current have this feature flag enabled in your `AppInstance` resource,
 remove it before upgrading to this release.**
 This flag was deprecated in a previous release, but from this release forward,
 enabling this feature flag may cause errors.
 The installation of the Prometheus operator should be handled externally.
 ### Changes
 #### Deployment
 - Introduces the `nodeAffinity` and CPU/Memory requests setting for "granite"
  components. Previously, these settings were only available for core IOx
  components.
 - Prior to this release, many of the IOx dashboards deployed with the `grafana`
  feature flag were showing "no data." This has been fixed and now all
  dashboards should display actual data.
 #### Database Engine
 - Adjusted compactor concurrency scaling heuristic to improve performance as
  memory and CPU scale.
 - Adjusted default `INFLUXDB_IOX_COMPACTION_PARTITION_MINUTE_THRESHOLD` from
  `20m` to `100m` to help compactor more quickly rediscover cool partitions.
 #### Configuration
 - Introduces the `podAntiAffinity` setting for InfluxDB Clustered components.
  Previously, the scheduling of pods was influenced by the Kubernetes
  scheduler's default behavior. For further details, see the
  [Kubernetes pod affinity documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#types-of-inter-pod-affinity-and-anti-affinity).
 ---
 ## 20240819-1176644 {date="2024-08-19" .checkpoint}
 ### Quickstart
@ -463,7 +586,7 @@ mounted into your existing Grafana instance.
 An authentication component, previously known as `authz`, has been consolidated
 into the `token-management` service.
-There is now a temporary `Job` in place, `delete-authz-schema`, that
+Now there is a temporary `Job` in place, `delete-authz-schema`, that
 automatically removes the `authz` schema from the configured PostgreSQL database.
 ### Changes
@ -704,7 +827,7 @@ the `create-admin-token` job.
 #### Deployment
- Increase HTTP write request limit from 10MB to 50MB.
+- Increase HTTP write request limit from 10 MB to 50 MB.
 - Added support for [Telegraf Operator](https://github.com/influxdata/telegraf-operator).
  We have added the `telegraf.influxdata.com/port` annotation to all the pods.
  No configuration is required. We don't yet provide a way to specify the
--- a/content/influxdb/clustered/write-data/troubleshoot.md
+++ b/content/influxdb/clustered/write-data/troubleshoot.md
@ -5,7 +5,8 @@ weight: 106
 description: >
  Troubleshoot issues writing data.
  Find response codes for failed writes.
-  Discover how writes fail, from exceeding rate or payload limits, to syntax errors and schema conflicts.
+  Discover how writes fail, from exceeding rate or payload limits, to syntax
  errors and schema conflicts.
 menu:
  influxdb_clustered:
    name: Troubleshoot issues
@ -17,7 +18,8 @@ related:
  - /influxdb/clustered/reference/internals/durability/
 ---
-Learn how to avoid unexpected results and recover from errors when writing to {{% product-name %}}.
+Learn how to avoid unexpected results and recover from errors when writing to
 {{% product-name %}}.
 - [Handle write responses](#handle-write-responses)
  - [Review HTTP status codes](#review-http-status-codes)
@ -26,12 +28,26 @@ Learn how to avoid unexpected results and recover from errors when writing to {{
 ## Handle write responses
-In {{% product-name %}}, writes are synchronous.
+{{% product-name %}} does the following when you send a write request:
 After InfluxDB validates the request and ingests the data, it sends a _success_ response (HTTP `204` status code) as an acknowledgement that the data is written and queryable.
 To ensure that InfluxDB handles writes in the order you request them, wait for the acknowledgement before you send the next request.
-If InfluxDB successfully writes all the request data to the database, it returns _success_ (HTTP `204` status code).
+1.  Validates the request.
-The first rejected point in a batch causes InfluxDB to reject the entire batch and respond with an [HTTP error status](#review-http-status-codes).
+2.  If successful, attempts to ingest data from the request body; otherwise,
    responds with an [error status](#review-http-status-codes).
 3.  Ingests or rejects data in the batch and returns one of the following HTTP
    status codes:
    - `204 No Content`: All data in the batch is ingested.
    - `400 Bad Request`: Some or all of the data has been rejected.
      Data that has not been rejected is ingested and queryable.
 The response body contains error details about
 [rejected points](#troubleshoot-rejected-points), up to 100 points.
 Writes are synchronous--the response status indicates the final status of the
 write and all ingested data is queryable.
 To ensure that InfluxDB handles writes in the order you request them,
 wait for the response before you send the next request.
 ### Review HTTP status codes
@ -42,7 +58,7 @@ Write requests return the following status codes:
 | HTTP response code              | Message                                                                 | Description    |
 | :-------------------------------| :---------------------------------------------------------------        | :------------- |
 | `204 "Success"`                 |                                                                         | If InfluxDB ingested the data |
-| `400 "Bad request"`             | `message` contains the first malformed line                             | If data is malformed    |
+| `400 "Bad request"`             | error details about rejected points, up to 100 points: `line` contains the first rejected line, `message` describes rejections | If some or all request data isn't allowed (for example, if it is malformed or falls outside of the bucket's retention period)--the response body indicates whether a partial write has occurred or if all data has been rejected |
 | `401 "Unauthorized"`            |                                                                         | If the `Authorization` header is missing or malformed or if the [token](/influxdb/clustered/admin/tokens/) doesn't have [permission](/influxdb/clustered/reference/cli/influxctl/token/create/#examples) to write to the database. See [examples using credentials](/influxdb/clustered/get-started/write/#write-line-protocol-to-influxdb) in write requests. |
 | `404 "Not found"`               | requested **resource type** (for example, "organization" or "database"), and **resource name**     | If a requested resource (for example, organization or database) wasn't found |
 | `500 "Internal server error"`   |                                                                         | Default status for an error |
@ -62,6 +78,10 @@ If you notice data is missing in your database, do the following:
 ## Troubleshoot rejected points
-InfluxDB rejects points that fall within the same partition (default partitioning is measurement and day) as existing bucket data and have a different data type for an existing field.
+InfluxDB rejects points that fall within the same partition (default partitioning
 is by measurement and day) as existing bucket data and have a different data type
 for an existing field.
-Check for [field data type](/influxdb/clustered/reference/syntax/line-protocol/#data-types-and-format) differences between the rejected data point and points within the same database and partition--for example, did you attempt to write `string` data to an `int` field?
+Check for [field data type](/influxdb/clustered/reference/syntax/line-protocol/#data-types-and-format)
 differences between the rejected data point and points within the same database
 and partition--for example, did you attempt to write `string` data to an `int` field?