finalize clustered install reorg for preview

staging/commandbar-clustered-install
Scott Anderson 2024-10-04 11:18:14 -06:00
parent b625b59b25
commit 598ce37fb4
12 changed files with 383 additions and 19 deletions

View File

@ -22,6 +22,8 @@ related:
This phase of the installation process customizes the scale and configuration of
your InfluxDB cluster to meet the needs of your specific workload.
## Phase 2 process
{{< children type="ordered-list" >}}
{{< page-nav prev="/influxdb/clustered/install/set-up-cluster/test-cluster/" prevText="Test your cluster" next="/influxdb/clustered/install/customize-cluster/scale/" nextText="Customize cluster scale" >}}

View File

@ -1,7 +1,8 @@
---
title: Optimize your InfluxDB cluster
description: >
....
Test your cluster with a production-like workload and optimize your cluster
for your workload.
menu:
influxdb_clustered:
name: Optimize your cluster
@ -16,10 +17,31 @@ metadata:
- Phase 3
---
- Simulate a production-like workload
- Define your schema
- Define your query patterns
- Optimize for your workload:
- Querying by specific tag values? Partition by those tags.
- Is your schema wide? SELECT specific columns in queries rather than wildcards.
The goal of this phase of the installation process is to simulate a
production-like workload against your InfluxDB cluster and make changes to
optimize your cluster for your workload.
{{% note %}}
Depending on your requirements, this phase is likely to take the longest of all
the installation phases.
{{% /note %}}
## Identify performance requirements {note="Recommended"}
Before beginning this process, we recommend identifying performance requirements
and goals--for example:
- Writes per second
- Query concurrency
- Query response time
- etc.
This gives specific metrics to test for and make adjustments towards.
Consult with [InfluxData support](https://support.influxdata.com) as you make
changes to meet these requirements and goals.
## Phase 3 process
{{< children type="ordered-list" >}}
{{< page-nav prev="/influxdb/clustered/install/customize-cluster/config/" prevText="Customize cluster configuration" next="/influxdb/clustered/install/optimize-cluster/design-schema/" nextText="Design your schema" >}}

View File

@ -0,0 +1,88 @@
---
title: Design your schema
description: >
Use schema design guidelines to improve write and query performance in your
InfluxDB cluster.
menu:
influxdb_clustered:
name: Design your schema
parent: Optimize your cluster
weight: 201
related:
- /influxdb/clustered/write-data/best-practices/schema-design/
---
Schema design can have a significant impact on both write and query performance
in your InfluxDB cluster. The items below cover high-level considerations and
recommendation. For detailed recommendations, see
[Schema design recommendations](/influxdb/clustered/write-data/best-practices/schema-design/).
## Understand the difference between tags and fields
In the [InfluxDB data structure](/influxdb/clustered/write-data/best-practices/schema-design/#influxdb-data-structure),
there are three main "categories" of information--timestamps, tags, and fields.
Understanding the difference between what should be a tag and what should be a
field is important when designing your schema.
Use the following guidelines to determine what should be tags versus fields:
- Use tags to store metadata that provides information about the source or
context of the data.
- Use fields to store measured values.
- Field values typically change over time. Tag values do not.
- Tag values can only be strings.
- Field values can be any of the following data types:
- Integer
- Unsigned integer
- Float
- String
- Boolean
For more information, see [Tags versus fields](/influxdb/clustered/write-data/best-practices/schema-design/#tags-versus-fields).
## Schema restrictions
InfluxDB enforces the following schema restrictions:
- You cannot use the same name for a tag and a field in the same table.
- There is a limit to the number of columns you can store in a table.
By default, tables can have up to 250 columns.
For more information, see [InfluxDB schema restrictions](/influxdb/clustered/write-data/best-practices/schema-design/#schema-restrictions).
## Design for performance
The following guidelines help to ensure write and query performance:
{{% caption %}}
Follow the links below for more detailed information.
{{% /caption %}}
- [Avoid wide schemas](/influxdb/clustered/write-data/best-practices/schema-design/#avoid-wide-schemas):
A wide schema is one with a large number of columns (tags and fields).
- [Avoid sparse schemas](/influxdb/clustered/write-data/best-practices/schema-design/#avoid-sparse-schemas):
A sparse schema is one where, for many rows, columns contain null values.
- [Keep table schemas homogenous](/influxdb/clustered/write-data/best-practices/schema-design/#table-schemas-should-be-homogenous):
A homogenous table schema is one where every row has values for all tags and fields.
- [Use the best data type for your data](/influxdb/clustered/write-data/best-practices/schema-design/#use-the-best-data-type-for-your-data):
Write integers as integers, decimals as floats, and booleans as booleans.
Queries against a field that stores integers outperforms a query against string data.
## Design for query simplicity
The following guidelines help to ensure that, when querying data, the schema
makes it easy to write queries:
{{% caption %}}
Follow the links below for more detailed information.
{{% /caption %}}
- [Keep table names, tags, and fields simple](/influxdb/clustered/write-data/best-practices/schema-design/#keep-table-names-tags-and-fields-simple):
Use one tag or one field for each data attribute.
If your source data contains multiple data attributes in a single parameter,
split each attribute into its own tag or field.
- [Avoid keywords and special characters](/influxdb/clustered/write-data/best-practices/schema-design/#avoid-keywords-and-special-characters):
Reserved keywords or special characters in table names, tag keys, and field
keys makes writing queries more complex.
{{< page-nav prev="/influxdb/clustered/install/optimize-cluster/" prevText="Optimize your cluster" next="/influxdb/clustered/install/optimize-cluster/write-methods/" nextText="Identify write methods" >}}

View File

@ -0,0 +1,94 @@
---
title: Optimize querying
seotitle: Optimize querying in your InfluxDB cluster
description: >
Define your typical query patterns and employ optimizations to ensure query
performance.
menu:
influxdb_clustered:
name: Optimize querying
parent: Optimize your cluster
weight: 204
related:
- /influxdb/clustered/query-data/troubleshoot-and-optimize/optimize-queries/
- /influxdb/clustered/admin/custom-partitions/
- /influxdb/clustered/query-data/troubleshoot-and-optimize/troubleshoot/
- /influxdb/clustered/query-data/troubleshoot-and-optimize/analyze-query-plan/
- /influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues/
---
With data written to your cluster, you can now begin to define and test your
typical query patterns and employ optimizations to ensure query performance.
## Define your query patterns
Understanding your typical query pattern helps to prioritize what optimizations
can be made to ensure your query performance meets your requirements.
For example, consider the following questions:
- **Do you typically query data by a specific tag values?**
[Apply custom partitioning](/influxdb/clustered/admin/custom-partitions/) to
your target database or table to partition by those tags. Partitioning by
commonly-queried tags helps InfluxDB to quickly identify where the relevant
data is in storage and improves query performance.
- **Do you query tables with [wide schemas](/influxdb/clustered/write-data/best-practices/schema-design/#avoid-wide-schemas)?**
Avoid using wildcards (`*`) in your `SELECT` statement. Select specific
columns you want returned in your query results. The more columns queried, the
less performant the query.
- **Do you query large, historical time ranges?**
Use [time-based aggregation methods to downsample your data](/influxdb/clustered/query-data/sql/aggregate-select/#downsample-data-by-applying-interval-based-aggregates) and return aggregate
values per interval of time instead of all the data.
## Decide on your query language
InfluxDB Clustered supports both [SQL](/influxdb/clustered/reference/sql/) and
[InfluxQL](/influxdb/clustered/reference/influxql/)--a SQL-like query language
designed for InfluxDB v1 and specifically querying time series data.
### SQL
The InfluxDB SQL implementation is a full-featured SQL query engine powered by
[Apache DataFusion](https://datafusion.apache.org/). It benefits from a robust
upstream community that is constantly improving the functionality and performance
of the engine. Some time series-specific queries (such as time-based aggregates)
are more verbose in SQL than in InfluxQL, but they are still possible.
### InfluxQL
InfluxQL is designed specifically for time series data and simplifies many
time series-related operations like aggregating based on time, technical
analysis, and forecasting. It isn't as full-featured as SQL and requires some
understanding of the InfluxDB v1 data model.
## Optimize your queries
View the [query optimization and troubleshooting documentation](/influxdb/clustered/query-data/troubleshoot-and-optimize/optimize-queries/)
for guidance and information on how to troubleshoot and optimize queries that do
not perform as expected.
### Analyze queries
Both SQL and InfluxQL support the `EXPLAIN` and `EXPLAIN ANALYZE` statements
that return detailed information about your query's planning and execution.
This can provide insight into possible optimizations you can make for a specific
query. For more information, see
[Analyze a query plan](/influxdb/clustered/query-data/troubleshoot-and-optimize/analyze-query-plan/).
## Custom-partition data
InfluxDB Clustered lets you define how data is stored to ensure queries are
performant. [Custom partitioning](/influxdb/clustered/admin/custom-partitions/)
lets you define how InfluxDB partitions data and can be used to structure your
data so it's easier for InfluxDB to identify where the data you typically query
is in storage. For more information, see
[Manage data partitioning](/influxdb/clustered/admin/custom-partitions/).
## Report query performance issues
If you have a query that isn't meeting your performance requirements despite
implementing query optimizations, please following the process described in
[Report query performance issues](/influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues/)
to gather information for InfluxData engineers so they can help identify any
potential solutions.
{{< page-nav prev="/influxdb/clustered/install/optimize-cluster/simulate-load/" prevText="Simulate load" next="/influxdb/clustered/install/secure-cluster/" nextText="Phase 4: Secure your cluster" >}}

View File

@ -0,0 +1,36 @@
---
title: Simulate a production-like load
description: >
Simulate a production-like load that writes data to your InfluxDB cluster.
menu:
influxdb_clustered:
name: Simulate load
parent: Optimize your cluster
weight: 203
---
With your schema defined you can begin to simulate a production-like load that
writes data to your InfluxDB cluster. This process helps to ensure that your
schema works as designed and that both your cluster's scale and configuration
are able to meet your cluster's write requirements.
{{% warn %}}
We do not recommend writing production data to your InfluxDB cluster at this point.
{{% /warn %}}
## Load testing tools
Contact your [InfluxData sales representative](https://influxdata.com/contact-sales)
for information about tools that you can use to load test your InfluxDB cluster.
There are tools available that can simulate your schema and desired write
concurrency to ensure your cluster performs under production-like load.
<!-- TO-DO: Would love to be able to list available tools here -->
## Use your own tools
You can also build and use your own tools to load test a production-like workload.
Use Telegraf, client libraries, or the InfluxDB API to build out tests that
simulate writes to your cluster.
{{< page-nav prev="/influxdb/clustered/install/optimize-cluster/write-methods/" prevText="Identify write methods" next="/influxdb/clustered/install/optimize-cluster/optimize-querying/" nextText="Optimize querying" >}}

View File

@ -0,0 +1,125 @@
---
title: Identify write methods
seotitle: Identify methods for writing to your InfluxDB cluster
description:
Identify the most appropriate and useful tools and methods for writing data to
your InfluxDB cluster.
menu:
influxdb_clustered:
name: Identify write methods
parent: Optimize your cluster
weight: 202
related:
- /telegraf/v1/
- /telegraf/v1/plugins/
- /influxdb/clustered/write-data/use-telegraf/configure/
- /influxdb/clustered/reference/client-libraries/
- /influxdb/clustered/write-data/best-practices/optimize-writes/
---
Many different tools are available for writing data into your InfluxDB cluster.
Based on your use case, you should identify the most appropriate tools and
methods to use. Below is a summary of some of the tools that are available
(this list is not exhaustive).
## Telegraf
[Telegraf](/telegraf/v1/) is a data collection agent that collects data from
various sources, parses the data into
[line protocol](/influxdb/clustered/reference/syntax/line-protocol/), and then
writes the data to InfluxDB.
Telegraf is plugin-based and provides hundreds of
[plugins that collect, aggregate, process, and write data](/telegraf/v1/plugins/).
If you need to collect data from well-established systems and technologies,
Telegraf likely already supports a plugin for collecting that data.
Some of the most common use cases are:
- Monitoring system metrics (memory, CPU, disk usage, etc.)
- Monitoring Docker containers
- Monitoring network devices via SNMP
- Collecting data from a Kafka queue
- Collecting data from an MQTT broker
- Collecting data from HTTP endpoints
- Scraping data from a Prometheus exporter
- Parsing logs
For more information about using Telegraf with InfluxDB Clustered, see
[Use Telegraf to write data to InfluxDB Clustered](/influxdb/clustered/write-data/use-telegraf/configure/).
## InfluxDB client libraries
[InfluxDB client libraries](/influxdb/clustered/reference/client-libraries/) are
language-specific packages that integrate with InfluxDB APIs. They simplify
integrating InfluxDB with your own custom application and standardize
interactions between your application and your InfluxDB cluster.
With client libraries, you can collect and write whatever time series data is
useful for your application.
InfluxDB Clustered includes backwards compatible write APIs, so if you are
currently using an InfluxDB v1 or v2 client library, you can continue to use the
same client library to write data to your cluster.
{{< expand-wrapper >}}
{{% expand "View available InfluxDB client libraries" %}}
<!-- TO-DO: Somehow automate this list -->
- [InfluxDB v3 client libraries](/influxdb/clustered/reference/client-libraries/v3/)
- [C# .NET](/influxdb/clustered/reference/client-libraries/v3/csharp/)
- [Go](/influxdb/clustered/reference/client-libraries/v3/go/)
- [Java](/influxdb/clustered/reference/client-libraries/v3/java/)
- [JavaScript](/influxdb/clustered/reference/client-libraries/v3/javascript/)
- [Python](/influxdb/clustered/reference/client-libraries/v3/python/)
- [InfluxDB v2 client libraries](/influxdb/clustered/reference/client-libraries/v2/)
- [Arduino](/influxdb/clustered/reference/client-libraries/v2/arduino/)
- [C#](/influxdb/clustered/reference/client-libraries/v2/csharp/)
- [Dart](/influxdb/clustered/reference/client-libraries/v2/dart/)
- [Go](/influxdb/clustered/reference/client-libraries/v2/go/)
- [Java](/influxdb/clustered/reference/client-libraries/v2/java/)
- [JavaScript](/influxdb/clustered/reference/client-libraries/v2/javascript/)
- [Kotlin](/influxdb/clustered/reference/client-libraries/v2/kotlin/)
- [PHP](/influxdb/clustered/reference/client-libraries/v2/php/)
- [Python](/influxdb/clustered/reference/client-libraries/v2/python/)
- [R](/influxdb/clustered/reference/client-libraries/v2/r/)
- [Ruby](/influxdb/clustered/reference/client-libraries/v2/ruby/)
- [Scala](/influxdb/clustered/reference/client-libraries/v2/scala/)
- [Swift](/influxdb/clustered/reference/client-libraries/v2/swift/)
- [InfluxDB v1 client libraries](/influxdb/clustered/reference/client-libraries/v1/)
{{% /expand %}}
{{< /expand-wrapper >}}
## InfluxDB HTTP write APIs
InfluxDB Clustered provides backwards-compatible HTTP write APIs for writing
data to your cluster. The [InfluxDB client libraries](#influxdb-client-libraries)
use these APIs, but if you choose not to use a client library, you can integrate
directly with the API. Because these APIs are backwards compatible, you can use
existing InfluxDB API integrations with your InfluxDB cluster.
- [InfluxDB v2 API for InfluxDB Clustered](/influxdb/clustered/api/v2/)
- [InfluxDB v1 API for InfluxDB Clustered](/influxdb/clustered/api/v1/)
## Write optimizations
As you decide on and integrate tooling to write data to your InfluxDB cluster,
there are things you can do to ensure your write pipeline is as performant as
possible. The list below provides links to more detailed descriptions of these
optimizations in the [Optimize writes](/influxdb/clustered/write-data/best-practices/optimize-writes/)
documentation:
- [Batch writes](/influxdb/clustered/write-data/best-practices/optimize-writes/#batch-writes)
- [Sort tags by key](/influxdb/clustered/write-data/best-practices/optimize-writes/#sort-tags-by-key)
- [Use the coarsest time precision possible](/influxdb/clustered/write-data/best-practices/optimize-writes/#use-the-coarsest-time-precision-possible)
- [Use gzip compression](/influxdb/clustered/write-data/best-practices/optimize-writes/#use-gzip-compression)
- [Synchronize hosts with NTP](/influxdb/clustered/write-data/best-practices/optimize-writes/#synchronize-hosts-with-ntp)
- [Write multiple data points in one request](/influxdb/clustered/write-data/best-practices/optimize-writes/#write-multiple-data-points-in-one-request)
- [Pre-process data before writing](/influxdb/clustered/write-data/best-practices/optimize-writes/#pre-process-data-before-writing)
{{% note %}}
[Telegraf](#telegraf) and [InfluxDB client libraries](#influxdb-client-libraries)
leverage many of these optimizations by default.
{{% /note %}}
{{< page-nav prev="/influxdb/clustered/install/optimize-cluster/design-schema" prevText="Design your schema" next="/influxdb/clustered/install/optimize-cluster/simulate-load/" nextText="Simulate load" >}}

View File

@ -20,6 +20,8 @@ metadata:
This phase of the installation process prepares your InfluxDB cluster for
production use by enabling security options to ensure your cluster is secured.
## Phase 4 process
{{< children type="ordered-list" >}}
{{< page-nav prev="/influxdb/clustered/install/set-up-cluster/optimize-cluster/" prevText="Phase 3: Optimize your cluster" next="/influxdb/clustered/install/secure-cluster/tls/" nextText="Set up TLS" >}}
{{< page-nav prev="/influxdb/clustered/install/optimize-cluster/optimize-querying/" prevText="Optimize querying" next="/influxdb/clustered/install/secure-cluster/tls/" nextText="Set up TLS" >}}

View File

@ -196,7 +196,7 @@ The following are important fields in the JSON object that are necessary to
connect your InfluxDB cluster and administrative tools to Keycloak:
- **jwks_uri**: Used in your InfluxDB cluster configuration file.
_See [Configure your cluster--Configure your OAuth2 provider](/influxdb/clustered/install/set-up-cluster/configure-cluster/#configure-your-oauth2-provider)_.
_See [Configure your cluster to connect to your identity provider](#configure-your-cluster-to-connect-to-your-identity-provider)_.
- **device_authorization_endpoint**: Used in your [`influxctl` configuration file](#configure-influxctl) (`profile.auth.oauth2.device_url`)
- **token_endpoint**: Used in your [`influxctl` configuration file](#configure-influxctl) (`profile.auth.oauth2.token_url`)
@ -306,7 +306,7 @@ The following are important fields in the JSON object that are necessary to
connect your InfluxDB cluster and administrative tools to Keycloak:
- **jwks_uri**: Used in your InfluxDB cluster configuration file.
_See [Configure your cluster--Configure your OAuth2 provider](/influxdb/clustered/install/set-up-cluster/configure-cluster/?t=Microsoft+Entra+ID#configure-your-oauth2-provider)_.
_See [Configure your cluster to connect to your identity provider](#configure-your-cluster-to-connect-to-your-identity-provider)_.
- **device_authorization_endpoint**: Used in your [`influxctl` configuration file](#configure-influxctl) (`profile.auth.oauth2.device_url`)
- **token_endpoint**: Used in your [`influxctl` configuration file](#configure-influxctl) (`profile.auth.oauth2.token_url`)

View File

@ -67,7 +67,7 @@ Provide the TLS certificate secret to the InfluxDB configuration in the
### Configure ingress
Update your `AppInstance` resource to reference the secret that
[contains your TLS certificate and key](#set-up-cluster-ingress).
[contains your TLS certificate and key](#set-up-ingress-tls).
The examples below use the name `ingress-tls`.
- **If modifying the `AppInstance` resource directly**, reference the TLS secret

View File

@ -21,11 +21,8 @@ The first phase of installing InfluxDB Clustered is to get a basic InfluxDB
cluster up and running with as few external dependencies as possible and confirm
you can write and query data.
## Phase 1 process
{{< children type="ordered-list" >}}
- Use internal admin authorization to bypass the need to integrate with an
identity provider. This is a temporary measure while setting and testing your
cluster. Before moving into production, you will
{{< page-nav next="/influxdb/clustered/install/set-up-cluster/prerequisites/" nextText="Set up prerequisites" >}}

View File

@ -348,8 +348,6 @@ connect your cluster to your prerequisites.
- [Configure the object store](#configure-the-object-store)
- [Configure the catalog database](#configure-the-catalog-database)
- [Configure local storage for ingesters](#configure-local-storage-for-ingesters)
- [Configure your OAuth2 provider](#configure-your-oauth2-provider)
- [Configure the size of your cluster](#configure-the-size-of-your-cluster)
#### Configure ingress

View File

@ -14,7 +14,7 @@ aliases:
- /influxdb/clustered/reference/client-libraries/v3/pyinflux3/
related:
- /influxdb/clustered/query-data/execute-queries/troubleshoot/
list_code_example: >
list_code_example: |
<!-- Import for tests and hide from users.
```python
import os