finalize clustered install reorg for preview

2024-10-04 11:18:14 -06:00 · 2024-10-04 11:18:14 -06:00 · 598ce37fb4
parent b625b59b25
commit 598ce37fb4
12 changed files with 383 additions and 19 deletions
--- a/content/influxdb/clustered/install/customize-cluster/_index.md
+++ b/content/influxdb/clustered/install/customize-cluster/_index.md
@ -22,6 +22,8 @@ related:
 This phase of the installation process customizes the scale and configuration of
 your InfluxDB cluster to meet the needs of your specific workload.

+## Phase 2 process
+
 {{< children type="ordered-list" >}}

 {{< page-nav prev="/influxdb/clustered/install/set-up-cluster/test-cluster/" prevText="Test your cluster" next="/influxdb/clustered/install/customize-cluster/scale/" nextText="Customize cluster scale" >}}
--- a/content/influxdb/clustered/install/optimize-cluster/_index.md
+++ b/content/influxdb/clustered/install/optimize-cluster/_index.md
@ -1,7 +1,8 @@
 ---
 title: Optimize your InfluxDB cluster
 description: >
-  ....
+  Test your cluster with a production-like workload and optimize your cluster
+  for your workload.
 menu:
  influxdb_clustered:
    name: Optimize your cluster
@ -16,10 +17,31 @@ metadata:
  - Phase 3
 ---

- Simulate a production-like workload
- Define your schema
- Define your query patterns
- Optimize for your workload:
-  - Querying by specific tag values? Partition by those tags.
-  - Is your schema wide? SELECT specific columns in queries rather than wildcards.
+The goal of this phase of the installation process is to simulate a
+production-like workload against your InfluxDB cluster and make changes to
+optimize your cluster for your workload.

+{{% note %}}
+Depending on your requirements, this phase is likely to take the longest of all
+the installation phases.
+{{% /note %}}
+
+## Identify performance requirements {note="Recommended"}
+
+Before beginning this process, we recommend identifying performance requirements
+and goals--for example:
+
+- Writes per second
+- Query concurrency
+- Query response time
+- etc.
+
+This gives specific metrics to test for and make adjustments towards.
+Consult with [InfluxData support](https://support.influxdata.com) as you make
+changes to meet these requirements and goals.
+
+## Phase 3 process
+
+{{< children type="ordered-list" >}}
+
+{{< page-nav prev="/influxdb/clustered/install/customize-cluster/config/" prevText="Customize cluster configuration" next="/influxdb/clustered/install/optimize-cluster/design-schema/" nextText="Design your schema" >}}
--- a/content/influxdb/clustered/install/optimize-cluster/design-schema.md
+++ b/content/influxdb/clustered/install/optimize-cluster/design-schema.md
@ -0,0 +1,88 @@
+---
+title: Design your schema
+description: >
+  Use schema design guidelines to improve write and query performance in your
+  InfluxDB cluster.
+menu:
+  influxdb_clustered:
+    name: Design your schema
+    parent: Optimize your cluster
+weight: 201
+related:
+  - /influxdb/clustered/write-data/best-practices/schema-design/
+---
+
+Schema design can have a significant impact on both write and query performance
+in your InfluxDB cluster. The items below cover high-level considerations and
+recommendation. For detailed recommendations, see
+[Schema design recommendations](/influxdb/clustered/write-data/best-practices/schema-design/).
+
+## Understand the difference between tags and fields
+
+In the [InfluxDB data structure](/influxdb/clustered/write-data/best-practices/schema-design/#influxdb-data-structure),
+there are three main "categories" of information--timestamps, tags, and fields.
+Understanding the difference between what should be a tag and what should be a
+field is important when designing your schema.
+
+Use the following guidelines to determine what should be tags versus fields:
+
+- Use tags to store metadata that provides information about the source or
+  context of the data.
+- Use fields to store measured values.
+- Field values typically change over time. Tag values do not.
+- Tag values can only be strings.
+- Field values can be any of the following data types:
+  - Integer
+  - Unsigned integer
+  - Float
+  - String
+  - Boolean
+
+For more information, see [Tags versus fields](/influxdb/clustered/write-data/best-practices/schema-design/#tags-versus-fields).
+
+## Schema restrictions
+
+InfluxDB enforces the following schema restrictions:
+
+- You cannot use the same name for a tag and a field in the same table.
+- There is a limit to the number of columns you can store in a table.
+  By default, tables can have up to 250 columns.
+
+For more information, see [InfluxDB schema restrictions](/influxdb/clustered/write-data/best-practices/schema-design/#schema-restrictions).
+
+## Design for performance
+
+The following guidelines help to ensure write and query performance:
+
+{{% caption %}}
+Follow the links below for more detailed information.
+{{% /caption %}}
+
+- [Avoid wide schemas](/influxdb/clustered/write-data/best-practices/schema-design/#avoid-wide-schemas):
+  A wide schema is one with a large number of columns (tags and fields).
+- [Avoid sparse schemas](/influxdb/clustered/write-data/best-practices/schema-design/#avoid-sparse-schemas):
+  A sparse schema is one where, for many rows, columns contain null values.
+- [Keep table schemas homogenous](/influxdb/clustered/write-data/best-practices/schema-design/#table-schemas-should-be-homogenous):
+  A homogenous table schema is one where every row has values for all tags and fields.
+- [Use the best data type for your data](/influxdb/clustered/write-data/best-practices/schema-design/#use-the-best-data-type-for-your-data):
+  Write integers as integers, decimals as floats, and booleans as booleans.
+  Queries against a field that stores integers outperforms a query against string data.
+
+## Design for query simplicity
+
+The following guidelines help to ensure that, when querying data, the schema
+makes it easy to write queries:
+
+{{% caption %}}
+Follow the links below for more detailed information.
+{{% /caption %}}
+
+- [Keep table names, tags, and fields simple](/influxdb/clustered/write-data/best-practices/schema-design/#keep-table-names-tags-and-fields-simple):
+  Use one tag or one field for each data attribute.
+  If your source data contains multiple data attributes in a single parameter,
+  split each attribute into its own tag or field.
+- [Avoid keywords and special characters](/influxdb/clustered/write-data/best-practices/schema-design/#avoid-keywords-and-special-characters):
+  Reserved keywords or special characters in table names, tag keys, and field
+  keys makes writing queries more complex.
+
+{{< page-nav prev="/influxdb/clustered/install/optimize-cluster/" prevText="Optimize your cluster" next="/influxdb/clustered/install/optimize-cluster/write-methods/" nextText="Identify write methods" >}}
--- a/content/influxdb/clustered/install/optimize-cluster/optimize-querying.md
+++ b/content/influxdb/clustered/install/optimize-cluster/optimize-querying.md
@ -0,0 +1,94 @@
+---
+title: Optimize querying
+seotitle: Optimize querying in your InfluxDB cluster
+description: >
+  Define your typical query patterns and employ optimizations to ensure query
+  performance.
+menu:
+  influxdb_clustered:
+    name: Optimize querying
+    parent: Optimize your cluster
+weight: 204
+related:
+  - /influxdb/clustered/query-data/troubleshoot-and-optimize/optimize-queries/
+  - /influxdb/clustered/admin/custom-partitions/
+  - /influxdb/clustered/query-data/troubleshoot-and-optimize/troubleshoot/
+  - /influxdb/clustered/query-data/troubleshoot-and-optimize/analyze-query-plan/
+  - /influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues/
+---
+
+With data written to your cluster, you can now begin to define and test your
+typical query patterns and employ optimizations to ensure query performance.
+
+## Define your query patterns
+
+Understanding your typical query pattern helps to prioritize what optimizations
+can be made to ensure your query performance meets your requirements.
+For example, consider the following questions:
+
+- **Do you typically query data by a specific tag values?**  
+  [Apply custom partitioning](/influxdb/clustered/admin/custom-partitions/) to
+  your target database or table to partition by those tags. Partitioning by
+  commonly-queried tags helps InfluxDB to quickly identify where the relevant
+  data is in storage and improves query performance.
+- **Do you query tables with [wide schemas](/influxdb/clustered/write-data/best-practices/schema-design/#avoid-wide-schemas)?**  
+  Avoid using wildcards (`*`) in your `SELECT` statement. Select specific
+  columns you want returned in your query results. The more columns queried, the
+  less performant the query.
+- **Do you query large, historical time ranges?**
+  Use [time-based aggregation methods to downsample your data](/influxdb/clustered/query-data/sql/aggregate-select/#downsample-data-by-applying-interval-based-aggregates) and return aggregate
+  values per interval of time instead of all the data. 
+
+## Decide on your query language
+
+InfluxDB Clustered supports both [SQL](/influxdb/clustered/reference/sql/) and
+[InfluxQL](/influxdb/clustered/reference/influxql/)--a SQL-like query language
+designed for InfluxDB v1 and specifically querying time series data.
+
+### SQL 
+
+The InfluxDB SQL implementation is a full-featured SQL query engine powered by
+[Apache DataFusion](https://datafusion.apache.org/). It benefits from a robust
+upstream community that is constantly improving the functionality and performance
+of the engine. Some time series-specific queries (such as time-based aggregates)
+are more verbose in SQL than in InfluxQL, but they are still possible.
+
+### InfluxQL
+
+InfluxQL is designed specifically for time series data and simplifies many 
+time series-related operations like aggregating based on time, technical
+analysis, and forecasting. It isn't as full-featured as SQL and requires some
+understanding of the InfluxDB v1 data model.
+
+## Optimize your queries
+
+View the [query optimization and troubleshooting documentation](/influxdb/clustered/query-data/troubleshoot-and-optimize/optimize-queries/)
+for guidance and information on how to troubleshoot and optimize queries that do
+not perform as expected.
+
+### Analyze queries
+
+Both SQL and InfluxQL support the `EXPLAIN` and `EXPLAIN ANALYZE` statements
+that return detailed information about your query's planning and execution.
+This can provide insight into possible optimizations you can make for a specific
+query. For more information, see
+[Analyze a query plan](/influxdb/clustered/query-data/troubleshoot-and-optimize/analyze-query-plan/).
+
+## Custom-partition data
+
+InfluxDB Clustered lets you define how data is stored to ensure queries are
+performant. [Custom partitioning](/influxdb/clustered/admin/custom-partitions/)
+lets you define how InfluxDB partitions data and can be used to structure your
+data so it's easier for InfluxDB to identify where the data you typically query
+is in storage. For more information, see
+[Manage data partitioning](/influxdb/clustered/admin/custom-partitions/).
+
+## Report query performance issues
+
+If you have a query that isn't meeting your performance requirements despite
+implementing query optimizations, please following the process described in
+[Report query performance issues](/influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues/)
+to gather information for InfluxData engineers so they can help identify any
+potential solutions.
+
+{{< page-nav prev="/influxdb/clustered/install/optimize-cluster/simulate-load/" prevText="Simulate load" next="/influxdb/clustered/install/secure-cluster/" nextText="Phase 4: Secure your cluster" >}}
--- a/content/influxdb/clustered/install/optimize-cluster/simulate-load.md
+++ b/content/influxdb/clustered/install/optimize-cluster/simulate-load.md
@ -0,0 +1,36 @@
+---
+title: Simulate a production-like load
+description: >
+  Simulate a production-like load that writes data to your InfluxDB cluster.
+menu:
+  influxdb_clustered:
+    name: Simulate load
+    parent: Optimize your cluster
+weight: 203
+---
+
+With your schema defined you can begin to simulate a production-like load that
+writes data to your InfluxDB cluster. This process helps to ensure that your
+schema works as designed and that both your cluster's scale and configuration
+are able to meet your cluster's write requirements.
+
+{{% warn %}}
+We do not recommend writing production data to your InfluxDB cluster at this point.
+{{% /warn %}}
+
+## Load testing tools
+
+Contact your [InfluxData sales representative](https://influxdata.com/contact-sales)
+for information about tools that you can use to load test your InfluxDB cluster.
+There are tools available that can simulate your schema and desired write
+concurrency to ensure your cluster performs under production-like load.
+
+<!-- TO-DO: Would love to be able to list available tools here -->
+
+## Use your own tools
+
+You can also build and use your own tools to load test a production-like workload.
+Use Telegraf, client libraries, or the InfluxDB API to build out tests that
+simulate writes to your cluster.
+
+{{< page-nav prev="/influxdb/clustered/install/optimize-cluster/write-methods/" prevText="Identify write methods" next="/influxdb/clustered/install/optimize-cluster/optimize-querying/" nextText="Optimize querying" >}}
--- a/content/influxdb/clustered/install/optimize-cluster/write-methods.md
+++ b/content/influxdb/clustered/install/optimize-cluster/write-methods.md
@ -0,0 +1,125 @@
+---
+title: Identify write methods
+seotitle: Identify methods for writing to your InfluxDB cluster
+description:
+  Identify the most appropriate and useful tools and methods for writing data to
+  your InfluxDB cluster.
+menu:
+  influxdb_clustered:
+    name: Identify write methods
+    parent: Optimize your cluster
+weight: 202
+related:
+  - /telegraf/v1/
+  - /telegraf/v1/plugins/
+  - /influxdb/clustered/write-data/use-telegraf/configure/
+  - /influxdb/clustered/reference/client-libraries/
+  - /influxdb/clustered/write-data/best-practices/optimize-writes/
+---
+
+Many different tools are available for writing data into your InfluxDB cluster.
+Based on your use case, you should identify the most appropriate tools and
+methods to use. Below is a summary of some of the tools that are available
+(this list is not exhaustive).
+
+## Telegraf
+
+[Telegraf](/telegraf/v1/) is a data collection agent that collects data from
+various sources, parses the data into
+[line protocol](/influxdb/clustered/reference/syntax/line-protocol/), and then
+writes the data to InfluxDB.
+Telegraf is plugin-based and provides hundreds of
+[plugins that collect, aggregate, process, and write data](/telegraf/v1/plugins/).
+
+If you need to collect data from well-established systems and technologies,
+Telegraf likely already supports a plugin for collecting that data.
+Some of the most common use cases are:
+
+- Monitoring system metrics (memory, CPU, disk usage, etc.)
+- Monitoring Docker containers
+- Monitoring network devices via SNMP
+- Collecting data from a Kafka queue
+- Collecting data from an MQTT broker
+- Collecting data from HTTP endpoints
+- Scraping data from a Prometheus exporter
+- Parsing logs
+
+For more information about using Telegraf with InfluxDB Clustered, see
+[Use Telegraf to write data to InfluxDB Clustered](/influxdb/clustered/write-data/use-telegraf/configure/).
+
+## InfluxDB client libraries
+
+[InfluxDB client libraries](/influxdb/clustered/reference/client-libraries/) are
+language-specific packages that integrate with InfluxDB APIs. They simplify
+integrating InfluxDB with your own custom application and standardize
+interactions between your application and your InfluxDB cluster.
+With client libraries, you can collect and write whatever time series data is
+useful for your application.
+
+InfluxDB Clustered includes backwards compatible write APIs, so if you are
+currently using an InfluxDB v1 or v2 client library, you can continue to use the
+same client library to write data to your cluster.
+
+{{< expand-wrapper >}}
+{{% expand "View available InfluxDB client libraries" %}}
+
+<!-- TO-DO: Somehow automate this list -->
+
+- [InfluxDB v3 client libraries](/influxdb/clustered/reference/client-libraries/v3/)
+  - [C# .NET](/influxdb/clustered/reference/client-libraries/v3/csharp/)
+  - [Go](/influxdb/clustered/reference/client-libraries/v3/go/)
+  - [Java](/influxdb/clustered/reference/client-libraries/v3/java/)
+  - [JavaScript](/influxdb/clustered/reference/client-libraries/v3/javascript/)
+  - [Python](/influxdb/clustered/reference/client-libraries/v3/python/)
+- [InfluxDB v2 client libraries](/influxdb/clustered/reference/client-libraries/v2/)
+  - [Arduino](/influxdb/clustered/reference/client-libraries/v2/arduino/)
+  - [C#](/influxdb/clustered/reference/client-libraries/v2/csharp/)
+  - [Dart](/influxdb/clustered/reference/client-libraries/v2/dart/)
+  - [Go](/influxdb/clustered/reference/client-libraries/v2/go/)
+  - [Java](/influxdb/clustered/reference/client-libraries/v2/java/)
+  - [JavaScript](/influxdb/clustered/reference/client-libraries/v2/javascript/)
+  - [Kotlin](/influxdb/clustered/reference/client-libraries/v2/kotlin/)
+  - [PHP](/influxdb/clustered/reference/client-libraries/v2/php/)
+  - [Python](/influxdb/clustered/reference/client-libraries/v2/python/)
+  - [R](/influxdb/clustered/reference/client-libraries/v2/r/)
+  - [Ruby](/influxdb/clustered/reference/client-libraries/v2/ruby/)
+  - [Scala](/influxdb/clustered/reference/client-libraries/v2/scala/)
+  - [Swift](/influxdb/clustered/reference/client-libraries/v2/swift/)
+- [InfluxDB v1 client libraries](/influxdb/clustered/reference/client-libraries/v1/)
+
+{{% /expand %}}
+{{< /expand-wrapper >}}
+
+## InfluxDB HTTP write APIs
+
+InfluxDB Clustered provides backwards-compatible HTTP write APIs for writing
+data to your cluster. The [InfluxDB client libraries](#influxdb-client-libraries)
+use these APIs, but if you choose not to use a client library, you can integrate
+directly with the API. Because these APIs are backwards compatible, you can use
+existing InfluxDB API integrations with your InfluxDB cluster.
+
+- [InfluxDB v2 API for InfluxDB Clustered](/influxdb/clustered/api/v2/)
+- [InfluxDB v1 API for InfluxDB Clustered](/influxdb/clustered/api/v1/)
+  
+## Write optimizations
+
+As you decide on and integrate tooling to write data to your InfluxDB cluster,
+there are things you can do to ensure your write pipeline is as performant as
+possible. The list below provides links to more detailed descriptions of these
+optimizations in the [Optimize writes](/influxdb/clustered/write-data/best-practices/optimize-writes/)
+documentation:
+
+- [Batch writes](/influxdb/clustered/write-data/best-practices/optimize-writes/#batch-writes)
+- [Sort tags by key](/influxdb/clustered/write-data/best-practices/optimize-writes/#sort-tags-by-key)
+- [Use the coarsest time precision possible](/influxdb/clustered/write-data/best-practices/optimize-writes/#use-the-coarsest-time-precision-possible)
+- [Use gzip compression](/influxdb/clustered/write-data/best-practices/optimize-writes/#use-gzip-compression)
+- [Synchronize hosts with NTP](/influxdb/clustered/write-data/best-practices/optimize-writes/#synchronize-hosts-with-ntp)
+- [Write multiple data points in one request](/influxdb/clustered/write-data/best-practices/optimize-writes/#write-multiple-data-points-in-one-request)
+- [Pre-process data before writing](/influxdb/clustered/write-data/best-practices/optimize-writes/#pre-process-data-before-writing)
+
+{{% note %}}
+[Telegraf](#telegraf) and [InfluxDB client libraries](#influxdb-client-libraries)
+leverage many of these optimizations by default.
+{{% /note %}}
+
+{{< page-nav prev="/influxdb/clustered/install/optimize-cluster/design-schema" prevText="Design your schema" next="/influxdb/clustered/install/optimize-cluster/simulate-load/" nextText="Simulate load" >}}
--- a/content/influxdb/clustered/install/secure-cluster/_index.md
+++ b/content/influxdb/clustered/install/secure-cluster/_index.md
@ -20,6 +20,8 @@ metadata:
 This phase of the installation process prepares your InfluxDB cluster for
 production use by enabling security options to ensure your cluster is secured.

+## Phase 4 process
+
 {{< children type="ordered-list" >}}

-{{< page-nav prev="/influxdb/clustered/install/set-up-cluster/optimize-cluster/" prevText="Phase 3: Optimize your cluster" next="/influxdb/clustered/install/secure-cluster/tls/" nextText="Set up TLS" >}}
+{{< page-nav prev="/influxdb/clustered/install/optimize-cluster/optimize-querying/" prevText="Optimize querying" next="/influxdb/clustered/install/secure-cluster/tls/" nextText="Set up TLS" >}}
--- a/content/influxdb/clustered/install/secure-cluster/auth.md
+++ b/content/influxdb/clustered/install/secure-cluster/auth.md
@ -196,7 +196,7 @@ The following are important fields in the JSON object that are necessary to
 connect your InfluxDB cluster and administrative tools to Keycloak:

 - **jwks_uri**: Used in your InfluxDB cluster configuration file.
-  _See [Configure your cluster--Configure your OAuth2 provider](/influxdb/clustered/install/set-up-cluster/configure-cluster/#configure-your-oauth2-provider)_.
+  _See [Configure your cluster to connect to your identity provider](#configure-your-cluster-to-connect-to-your-identity-provider)_.
 - **device_authorization_endpoint**: Used in your [`influxctl` configuration file](#configure-influxctl) (`profile.auth.oauth2.device_url`)
 - **token_endpoint**: Used in your [`influxctl` configuration file](#configure-influxctl) (`profile.auth.oauth2.token_url`)

@ -306,7 +306,7 @@ The following are important fields in the JSON object that are necessary to
 connect your InfluxDB cluster and administrative tools to Keycloak:

 - **jwks_uri**: Used in your InfluxDB cluster configuration file.
-  _See [Configure your cluster--Configure your OAuth2 provider](/influxdb/clustered/install/set-up-cluster/configure-cluster/?t=Microsoft+Entra+ID#configure-your-oauth2-provider)_.
+  _See [Configure your cluster to connect to your identity provider](#configure-your-cluster-to-connect-to-your-identity-provider)_.
 - **device_authorization_endpoint**: Used in your [`influxctl` configuration file](#configure-influxctl) (`profile.auth.oauth2.device_url`)
 - **token_endpoint**: Used in your [`influxctl` configuration file](#configure-influxctl) (`profile.auth.oauth2.token_url`)

--- a/content/influxdb/clustered/install/secure-cluster/tls.md
+++ b/content/influxdb/clustered/install/secure-cluster/tls.md
@ -67,7 +67,7 @@ Provide the TLS certificate secret to the InfluxDB configuration in the
 ### Configure ingress

 Update your `AppInstance` resource to reference the secret that
-[contains your TLS certificate and key](#set-up-cluster-ingress).
+[contains your TLS certificate and key](#set-up-ingress-tls).
 The examples below use the name `ingress-tls`.

 - **If modifying the `AppInstance` resource directly**, reference the TLS secret
--- a/content/influxdb/clustered/install/set-up-cluster/_index.md
+++ b/content/influxdb/clustered/install/set-up-cluster/_index.md
@ -21,11 +21,8 @@ The first phase of installing InfluxDB Clustered is to get a basic InfluxDB
 cluster up and running with as few external dependencies as possible and confirm
 you can write and query data.

+## Phase 1 process
+
 {{< children type="ordered-list" >}}

-
- Use internal admin authorization to bypass the need to integrate with an
-  identity provider. This is a temporary measure while setting and testing your
-  cluster. Before moving into production, you will
-
 {{< page-nav next="/influxdb/clustered/install/set-up-cluster/prerequisites/" nextText="Set up prerequisites" >}}
--- a/content/influxdb/clustered/install/set-up-cluster/configure-cluster/directly.md
+++ b/content/influxdb/clustered/install/set-up-cluster/configure-cluster/directly.md
@ -348,8 +348,6 @@ connect your cluster to your prerequisites.
 - [Configure the object store](#configure-the-object-store)
 - [Configure the catalog database](#configure-the-catalog-database)
 - [Configure local storage for ingesters](#configure-local-storage-for-ingesters)
- [Configure your OAuth2 provider](#configure-your-oauth2-provider)
- [Configure the size of your cluster](#configure-the-size-of-your-cluster)

 #### Configure ingress

--- a/content/influxdb/clustered/reference/client-libraries/v3/python.md
+++ b/content/influxdb/clustered/reference/client-libraries/v3/python.md
@ -14,7 +14,7 @@ aliases:
  - /influxdb/clustered/reference/client-libraries/v3/pyinflux3/
 related:
  - /influxdb/clustered/query-data/execute-queries/troubleshoot/
-list_code_example: >
+list_code_example: |
  <!-- Import for tests and hide from users.
  ```python
  import os