From 42efdf968bb5350a1e6b5240053db644c88b183a Mon Sep 17 00:00:00 2001 From: Jason Stirnaman Date: Fri, 20 Sep 2024 15:38:13 -0500 Subject: [PATCH 1/4] chore(clustered): Gather system information for reporting query performance issues. --- .../optimize-queries.md | 3 + .../report-query-performance-issues.md | 89 ++++++++++++++++++- 2 files changed, 88 insertions(+), 4 deletions(-) diff --git a/content/influxdb/clustered/query-data/troubleshoot-and-optimize/optimize-queries.md b/content/influxdb/clustered/query-data/troubleshoot-and-optimize/optimize-queries.md index b7751578e..69a66af6f 100644 --- a/content/influxdb/clustered/query-data/troubleshoot-and-optimize/optimize-queries.md +++ b/content/influxdb/clustered/query-data/troubleshoot-and-optimize/optimize-queries.md @@ -91,3 +91,6 @@ less efficient. Learn how to [analyze a query plan](/influxdb/clustered/query-data/troubleshoot-and-optimize/analyze-query-plan/) to troubleshoot queries and find performance bottlenecks. + +If you need help troubleshooting, follow the guidelines to +[report query performance issues](/influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues/). diff --git a/content/influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md b/content/influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md index 0bd923a5b..e33a7aaa7 100644 --- a/content/influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md +++ b/content/influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md @@ -7,6 +7,8 @@ menu: name: Report query performance issues parent: Troubleshoot and optimize queries weight: 402 +related: + - /influxdb/clustered/admin/query-system-data/ --- These guidelines are intended to faciliate collaboration between InfluxData @@ -23,13 +25,17 @@ queries](/influxdb/clustered/query-data//troubleshoot-and-optimize). 6. [Reduce query noise](#reduce-query-noise) 7. [Establish baseline single-query performance](#establish-baseline-single-query-performance) 8. [Run queries at multiple load scales](#run-queries-at-multiple-load-scales) -9. [Gather debug info](#gather-debug-info) +9. [Gather debug information](#gather-debug-information) 1. [Kubernetes-specific information](#kubernetes-specific-information) 2. [Clustered-specific information](#clustered-specific-information) 3. [Query analysis](#query-analysis) 1. [EXPLAIN](#explain) 2. [EXPLAIN VERBOSE](#explain-verbose) 3. [EXPLAIN ANALYZE](#explain-analyze) +10. [Gather system information](#gather-system-information) + - [Collect table information](#collect-table-information) + - [Collect compaction information for the table](#collect-compaction-information-for-the-table) + - [Collect partition information for multiple tables](#collect-partition-information-for-multiple-tables) {{% note %}} Please note that this document may change from one support engagement to the @@ -141,14 +147,21 @@ As an example, consider the following test plan outline: 4. Run 10 concurrent instances of Query A and allow the cluster to recover for 1 minute. 5. Run 20 concurrent instances of Query A and allow the cluster to recover for 1 minute. 6. Run 40 concurrent instances of Query A and allow the cluster to recover for 1 minute. -7. Provide InfluxData the debug information [described below](#gather-debug-info). +7. Provide InfluxData the debug information [described below](#gather-debug-information). {{% note %}} This is just an example. You don't have to go beyond the scale where queries get slower but you may also need to go further than what's outlined here. {{% /note %}} -### Gather debug info +### Capture dashboard screens + +If you have set up alerts and dashboards for monitoring your cluster, capture +screenshots of dashboard events for Queriers, Compactors, and Ingesters. + +See [system query examples](/influxdb/clustered/admin/query-system-data/#system-query-examples). + +### Gather debug information The following debug information should be collected shortly _after_ a problematic query has been tried against your InfluxDB cluster. @@ -165,7 +178,7 @@ kubectl cluster-info dump --namespace influxdb --output-directory "${DATETIME}-c tar -czf "${DATETIME}-cluster-info.tar.gz" "${DATETIME}-cluster-info/" ``` -#### Clustered-Specific Info +#### Clustered-specific information **Outputs:** @@ -310,3 +323,71 @@ curl --get "https://{{< influxdb/host >}}/query" \ {{< /code-tabs-wrapper >}} {{% /code-placeholders %}} + +### Gather system information + +{{% warn %}} +#### May impact cluster performance + +Querying InfluxDB v3 system tables may impact write and query +performance of your {{< product-name omit=" Clustered" >}} cluster. +Use filters to [optimize queries to reduce impact to your cluster](/influxdb/clustered/admin/query-system-data/#optimize-queries-to-reduce-impact-to-your-cluster). + + + +#### System tables are subject to change + +System tables are not part of InfluxDB's stable API and may change with new releases. +The provided schema information and query examples are valid as of **September 20, 2024**. +If you detect a schema change or a non-functioning query example, please +[submit an issue](https://github.com/influxdata/docs-v2/issues/new/choose). + + +{{% /warn %}} + +If queries are slow for a specific table, run the following system queries to collect information for troubleshooting. + +- [Collect table information](#collect-table-information) +- [Collect compaction information for the table](#collect-compaction-information-for-the-table) +- [Collect partition information for multiple tables](#collect-partition-information-for-multiple-tables) + +In your queries, replace the following: + +- {{% code-placeholder-key %}}`TABLE_NAME`{{% /code-placeholder-key %}}: the + table to retrieve information about + +#### Collect table information + +{{% code-placeholders "TABLE_NAME" %}} +```sql +SELECT * +FROM system.tables +WHERE table_name = 'TABLE_NAME'; +``` +{{% /code-placeholders%}} + +#### Collect compaction information for the table + +{{% code-placeholders "TABLE_NAME" %}} +```sql +SELECT * +FROM system.compactor +WHERE table_name = 'TABLE_NAME'; +``` +{{% /code-placeholders%}} + +#### Collect partition information for multiple tables + +If the same queries are slow on more than 1 table, also run the following query to collect the size and +number of partitions for all tables: + +{{% code-placeholders "TABLE_NAME" %}} +```sql +SELECT table_name, + COUNT(*) as partition_count, + MAX(last_new_file_created_at) as last_new_file_created_at, + SUM(total_size_mb) as total_size_mb +FROM system.partitions +GROUP BY table_name; +``` +{{% /code-placeholders%}} From 40533f3ee0cacb5aac903dad17a47504dcd195c0 Mon Sep 17 00:00:00 2001 From: Jason Stirnaman Date: Mon, 23 Sep 2024 13:42:00 -0500 Subject: [PATCH 2/4] fix(clustered): filter system queries. --- .../clustered/admin/query-system-data.md | 10 +++++ .../report-query-performance-issues.md | 38 +++++++++++++++---- 2 files changed, 41 insertions(+), 7 deletions(-) diff --git a/content/influxdb/clustered/admin/query-system-data.md b/content/influxdb/clustered/admin/query-system-data.md index 3f4e4a18c..e3841e600 100644 --- a/content/influxdb/clustered/admin/query-system-data.md +++ b/content/influxdb/clustered/admin/query-system-data.md @@ -182,6 +182,16 @@ WHERE ``` {{% /code-placeholders %}} +{{% code-placeholders "TABLE_NAME|PARTITION_KEY" %}} +```sql +SELECT * +FROM system.compactor +WHERE + table_name = 'TABLE_NAME' + AND partition_key = 'PARTITION_KEY'; +``` +{{% /code-placeholders %}} + ##### Filter by partition ID When querying the `system.partitions` or `system.compactor` table, use the `WHERE` clause to diff --git a/content/influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md b/content/influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md index e33a7aaa7..a3bf8dd30 100644 --- a/content/influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md +++ b/content/influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md @@ -351,10 +351,15 @@ If queries are slow for a specific table, run the following system queries to co - [Collect compaction information for the table](#collect-compaction-information-for-the-table) - [Collect partition information for multiple tables](#collect-partition-information-for-multiple-tables) +To [optimize system queries](/influxdb/clustered/admin/query-system-data/#optimize-queries-to-reduce-impact-to-your-cluster), use `table_name`, `partition_key`, and +`partition_id` filters. In your queries, replace the following: -- {{% code-placeholder-key %}}`TABLE_NAME`{{% /code-placeholder-key %}}: the - table to retrieve information about +- {{% code-placeholder-key %}}`TABLE_NAME`{{% /code-placeholder-key %}}: the table to retrieve partitions for +- {{% code-placeholder-key %}}`PARTITION_ID`{{% /code-placeholder-key %}}: a [partition ID](/influxdb/clustered/admin/query-system-data/#retrieve-a-partition-id) (int64) +- {{% code-placeholder-key %}}`PARTITION_KEY`{{% /code-placeholder-key %}}: a [partition key](/influxdb/clustered/admin/custom-partitions/#partition-keys) + derived from the table's partition template. + The default format is `%Y-%m-%d` (for example, `2024-01-01`). #### Collect table information @@ -368,13 +373,32 @@ WHERE table_name = 'TABLE_NAME'; #### Collect compaction information for the table -{{% code-placeholders "TABLE_NAME" %}} +Query the `system.compactor` table to collect compaction information--for example, run one of the following +queries: + +{{% code-placeholders "TABLE_NAME|PARTITION_KEY" %}} + ```sql -SELECT * -FROM system.compactor -WHERE table_name = 'TABLE_NAME'; +SELECT * +FROM system.compactor +WHERE + table_name = 'TABLE_NAME' + AND partition_key = 'PARTITION_KEY'; ``` -{{% /code-placeholders%}} + +{{% /code-placeholders %}} + +{{% code-placeholders "TABLE_NAME|PARTITION_ID" %}} + +```sql +SELECT * +FROM system.compactor +WHERE + table_name = 'TABLE_NAME' + AND partition_id = 'PARTITION_ID'; +``` + +{{% /code-placeholders %}} #### Collect partition information for multiple tables From f0e67a38ddf927e20d7fa1a7eb016aaf6a8d2f9b Mon Sep 17 00:00:00 2001 From: Jason Stirnaman Date: Mon, 23 Sep 2024 15:24:17 -0500 Subject: [PATCH 3/4] fix(clustered): Apply suggestion - don't mention dashboards. --- .../report-query-performance-issues.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/content/influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md b/content/influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md index a3bf8dd30..3f5ad0271 100644 --- a/content/influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md +++ b/content/influxdb/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md @@ -154,12 +154,14 @@ This is just an example. You don't have to go beyond the scale where queries get but you may also need to go further than what's outlined here. {{% /note %}} + ### Gather debug information @@ -412,6 +414,7 @@ SELECT table_name, MAX(last_new_file_created_at) as last_new_file_created_at, SUM(total_size_mb) as total_size_mb FROM system.partitions +WHERE table_name IN ('foo', 'bar', 'baz') GROUP BY table_name; ``` {{% /code-placeholders%}} From a1865368fa62a85a8259db09d8af82da254668a8 Mon Sep 17 00:00:00 2001 From: wayne Date: Tue, 24 Sep 2024 13:12:51 -0600 Subject: [PATCH 4/4] fix(clustered): update admin token revocation steps (#5612) --- .../clustered/admin/bypass-identity-provider.md | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/content/influxdb/clustered/admin/bypass-identity-provider.md b/content/influxdb/clustered/admin/bypass-identity-provider.md index 7d5c7ec3a..c9a3e6caa 100644 --- a/content/influxdb/clustered/admin/bypass-identity-provider.md +++ b/content/influxdb/clustered/admin/bypass-identity-provider.md @@ -63,13 +63,13 @@ The only way to revoke the token is to do the following: {{% code-placeholders "INFLUXDB_NAMESPACE|KEY_GEN_JOB|001" %}} -1. Delete the `rsa-keys` secret from your InfluxDB cluster's context and namespace: +1. Delete the `rsa-keys` and `admin-token` secrets from your InfluxDB cluster's context and namespace: ```sh - kubectl delete secrets/rsa-keys --namespace INFLUXDB_NAMESPACE + kubectl delete secret rsa-keys admin-token --namespace INFLUXDB_NAMESPACE ``` -2. Rerun the `key-gen` job: +2. Rerun the `key-gen` and `create-amin-token` jobs: 1. List the jobs in your InfluxDB namespace to find the key-gen job pod: @@ -78,12 +78,11 @@ The only way to revoke the token is to do the following: kubectl get jobs --namespace INFLUXDB_NAMESPACE ``` - 2. Run the key-gen job and increment the job number as needed: - + 2. Delete the key-gen and create-admin-token jobs so they it will be re-created by kubit: + ```sh - kubectl create job \ - --from=job/KEY_GEN_JOB key-gen-001 \ - --namespace INFLUXDB_NAMESPACE + kubectl delete job/KEY_GEN_JOB job/CREATE_ADMIN_TOKEN_JOB \ + --namespace INFLUXDB_NAMESPACE ``` 3. Restart the `token-management` service: