Merge pull request #5615 from influxdata/5576-add-optimizations-to-system-table-queries

fix(v3): Apply suggestion from code review to combine query-system-da…
2024-09-25 09:31:32 -05:00 · 2024-09-25 09:31:32 -05:00 · 39dfd2306b
parent e4f348e797 5a9954f85a
commit 39dfd2306b
4 changed files with 166 additions and 176 deletions
--- a/content/influxdb/cloud-dedicated/admin/query-system-data.md
+++ b/content/influxdb/cloud-dedicated/admin/query-system-data.md
@ -10,15 +10,19 @@ menu:
 weight: 105
 related:
  - /influxdb/cloud-dedicated/reference/cli/influxctl/query/
+  - /influxdb/cloud-dedicated/reference/internals/system-tables/
 --- 

 {{< product-name >}} stores data related to queries, tables, partitions, and
 compaction in _system tables_ within your cluster.
+System tables contain time series data used by and generated from the
+{{< product-name >}} internal monitoring system.
 You can query the cluster system tables for information about your cluster.

 - [Query system tables](#query-system-tables)
  - [Optimize queries to reduce impact to your cluster](#optimize-queries-to-reduce-impact-to-your-cluster)
 - [System tables](#system-tables)
+  - [Understanding system table data distribution](#understanding-system-table-data-distribution)
  - [system.queries](#systemqueries)
  - [system.tables](#systemtables)
  - [system.partitions](#systempartitions)
@ -50,7 +54,6 @@ If you detect a schema change or a non-functioning query example, please

 ## Query system tables

-
 {{% note %}}
 Querying system tables with `influxctl` requires **`influxctl` v2.8.0 or newer**.
 {{% /note %}}
@ -259,6 +262,21 @@ Use the `AND`, `OR`, or `IN` keywords to combine filters in your query.
 _System tables are [subject to change](#system-tables-are-subject-to-change)._
 {{% /warn %}}

+### Understanding system table data distribution
+
+Data in `system.tables`, `system.partitions`, and `system.compactor` includes
+data for all [InfluxDB Queriers](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier) in your cluster.
+The data comes from the catalog, and because all the queriers share one catalog,
+the results from these three tables derive from the same source data,
+regardless of which querier you connect to.
+
+However, the `system.queries` table is different--data is local to each Querier.
+`system.queries` contains a non-persisted log of queries run against the current
+querier to which your query is routed.
+The query log is specific to the current Querier and isn't shared across
+queriers in your cluster.
+Logs are scoped to the specified database.
+
 - [system.queries](#systemqueries)
 - [system.tables](#systemtables)
 - [system.partitions](#systempartitions)
@ -266,12 +284,18 @@ _System tables are [subject to change](#system-tables-are-subject-to-change)._

 ### system.queries

-The `system.queries` table contains an unpersisted log of queries run against
-the current [InfluxDB Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier)
-to which your query is routed.
-The query log is specific to the current Querier and is not shared across Queriers
-in your cluster.
-Logs are scoped to the specified database.
+The `system.queries` table stores log entries for queries executed for the provided namespace (database) on the node that is _currently handling queries_.
+`system.queries` reflects a process-local, in-memory, namespace-scoped query log.
+
+While this table may be useful for debugging and monitoring queries, keep the following in mind:
+
+- Records stored in `system.queries` are transient and volatile
+  - InfluxDB deletes `system.queries` records during pod restarts.
+  - Queries for one namespace can evict records from another namespace.
+- Data reflects the state of a specific pod answering queries for the namespace.
+  - Data isn't shared across queriers in your cluster.
+  - A query for records in `system.queries` can return different results
+    depending on the pod the request was routed to.

 {{< expand-wrapper >}}
 {{% expand "View `system.queries` schema" %}}
@ -280,9 +304,9 @@ The `system.queries` table contains the following columns:

 - id
 - phase
- issue_time
- query_type
- query_text  
+- **issue_time**: timestamp when the query was issued
+- **query_type**: type (syntax: `sql`, `flightsql`, or `influxql`) of the query
+- **query_text**: query statement text
 - partitions
 - parquet_files
 - plan_duration
@ -291,14 +315,20 @@ The `system.queries` table contains the following columns:
 - end2end_duration
 - compute_duration
 - max_memory
- success
+- **success**: execution status (boolean) of the query
 - running
 - cancelled
- trace_id
+- **trace_id**: trace ID for debugging and monitoring events

 {{% /expand %}}
 {{< /expand-wrapper >}}

+{{% note %}}
+_When listing measurements (tables) available within a namespace,
+some clients and query tools may include the `queries` table in the list of
+namespace tables._
+{{% /note %}}
+
 ### system.tables

 The `system.tables` table contains information about tables in the specified database.
@ -372,6 +402,7 @@ The examples in this section include `WHERE` filters to [optimize queries and re
 - [Query logs](#query-logs)
  - [View all stored query logs](#view-all-stored-query-logs)
  - [View query logs for queries with end-to-end durations above a threshold](#view-query-logs-for-queries-with-end-to-end-durations-above-a-threshold)
+  - [View query logs for a specific query within a time interval](#view-query-logs-for-a-specific-query-within-a-time-interval)
 - [Partitions](#partitions)
  - [View the partition template of a specific table](#view-the-partition-template-of-a-specific-table)
  - [View all partitions for a table](#view-all-partitions-for-a-table)
@ -414,6 +445,46 @@ WHERE
  end2end_duration::BIGINT > (50 * 1000000)
 ```

+### View query logs for a specific query within a time interval
+
+{{< code-tabs >}}
+{{% tabs %}}
+[SQL](#)
+[Python](#)
+{{% /tabs %}}
+{{% code-tab-content %}}
+<!-----------------------------------BEGIN SQL------------------------------>
+```sql
+SELECT *
+FROM system.queries
+WHERE issue_time >= now() - INTERVAL '1 day'
+  AND query_text LIKE '%select * from home%'
+```
+<!-----------------------------------END SQL------------------------------>
+{{% /code-tab-content %}}
+{{% code-tab-content %}}
+<!-----------------------------------BEGIN PYTHON------------------------------>
+```python
+from influxdb_client_3 import InfluxDBClient3
+client = InfluxDBClient3(token = DATABASE_TOKEN,
+                          host = HOSTNAME,
+                          org = '',
+                          database=DATABASE_NAME)
+client.query('select * from home')
+reader = client.query('''
+                      SELECT *
+                      FROM system.queries
+                      WHERE issue_time >= now() - INTERVAL '1 day'
+                      AND query_text LIKE '%select * from home%'
+                      ''',
+                    language='sql',
+                    headers=[(b"iox-debug", b"true")],
+                    mode="reader")
+```
+<!-----------------------------------END PYTHON------------------------------>
+{{% /code-tab-content %}}
+{{< /code-tabs >}}
+
 --- 

 ### Partitions
--- a/content/influxdb/cloud-dedicated/reference/internals/system-tables.md
+++ b/content/influxdb/cloud-dedicated/reference/internals/system-tables.md
@ -1,89 +0,0 @@
---
-title: InfluxDB system tables
-description: >
-  InfluxDB system measurements contain time series data used by and generated from the
-  InfluxDB internal monitoring system.
-menu:
-  influxdb_cloud_dedicated:
-    name: System tables
-    parent: InfluxDB internals
-weight: 103
-influxdb/cloud-dedicated/tags: [tables, information schema]
-related:
-  - /influxdb/cloud-dedicated/reference/sql/information-schema/
---
-
-InfluxDB system measurements contain time series data used by and generated from the
-InfluxDB internal monitoring system.
-
-Each {{% product-name %}} namespace includes the following system measurements:
-
-<!-- TOC -->
-
- [system.queries measurement](#systemqueries-measurement)
-  - [system.queries schema](#systemqueries-schema)
-
-## system.queries measurement
-
-The `system.queries` measurement stores log entries for queries executed for the provided namespace (database) on the node that is currently handling queries.
-
-```python
-from influxdb_client_3 import InfluxDBClient3
-client = InfluxDBClient3(token = DATABASE_TOKEN,
-                          host = HOSTNAME,
-                          org = '',
-                          database=DATABASE_NAME)
-client.query('select * from home')
-reader = client.query('''
-                      SELECT *
-                      FROM system.queries
-                      WHERE issue_time >= now() - INTERVAL '1 day'
-                      AND query_text LIKE '%select * from home%'
-                      ''',
-                    language='sql',
-                    headers=[(b"iox-debug", b"true")],
-                    mode="reader")
-print("# system.queries schema\n")
-print(reader.schema)
-```
-
-<!--pytest-codeblocks:expected-output-->
-
-`system.queries` has the following schema:
-
-```python
-# system.queries schema
-
-issue_time: timestamp[ns] not null
-query_type: string not null
-query_text: string not null
-completed_duration: duration[ns]
-success: bool not null
-trace_id: string
-```
-
-_When listing measurements (tables) available within a namespace, some clients and query tools may include the `queries` table in the list of namespace tables._
-
-`system.queries` reflects a process-local, in-memory, namespace-scoped query log.
-The query log isn't shared across instances within the same deployment.
-While this table may be useful for debugging and monitoring queries, keep the following in mind:
-
- Records stored in `system.queries` are volatile.
-  - Records are lost on pod restarts.
-  - Queries for one namespace can evict records from another namespace.
- Data reflects the state of a specific pod answering queries for the namespace----the log view is scoped to the requesting namespace and queries aren't leaked across namespaces.
-  - A query for records in `system.queries` can return different results depending on the pod the request was routed to.
-
-**Data retention:** System data can be transient and is deleted on pod restarts.
-The log size per instance is limited and the log view is scoped to the requesting namespace.
-
-### system.queries schema
-
- **system.queries** _(measurement)_
-  - **fields**:
-      - **issue_time**: timestamp when the query was issued
-      - **query_type**: type (syntax: `sql`, `flightsql`, or `influxql`) of the query
-      - **query_text**: query statement text
-      - **success**: execution status (boolean) of the query
-      - **completed_duration**: time (duration) that the query took to complete
-      - **trace_id**: trace ID for debugging and monitoring events
--- a/content/influxdb/clustered/admin/query-system-data.md
+++ b/content/influxdb/clustered/admin/query-system-data.md
@ -10,15 +10,19 @@ menu:
 weight: 105
 related:
  - /influxdb/clustered/reference/cli/influxctl/query/
+  - /influxdb/clustered/reference/internals/system-tables/
 --- 

 {{< product-name >}} stores data related to queries, tables, partitions, and
 compaction in _system tables_ within your cluster.
+System tables contain time series data used by and generated from the
+{{< product-name >}} internal monitoring system.
 You can query the cluster system tables for information about your cluster.

 - [Query system tables](#query-system-tables)
  - [Optimize queries to reduce impact to your cluster](#optimize-queries-to-reduce-impact-to-your-cluster)
 - [System tables](#system-tables)
+  - [Understanding system table data distribution](#understanding-system-table-data-distribution)
  - [system.queries](#systemqueries)
  - [system.tables](#systemtables)
  - [system.partitions](#systempartitions)
@ -50,7 +54,6 @@ If you detect a schema change or a non-functioning query example, please

 ## Query system tables

-
 {{% note %}}
 Querying system tables with `influxctl` requires **`influxctl` v2.8.0 or newer**.
 {{% /note %}}
@ -269,6 +272,21 @@ Use the `AND`, `OR`, or `IN` keywords to combine filters in your query.
 _System tables are [subject to change](#system-tables-are-subject-to-change)._
 {{% /warn %}}

+### Understanding system table data distribution
+
+Data in `system.tables`, `system.partitions`, and `system.compactor` includes
+data for all [InfluxDB Queriers](/influxdb/clustered/reference/internals/storage-engine/#querier) in your cluster.
+The data comes from the catalog, and because all the queriers share one catalog,
+the results from these three tables derive from the same source data,
+regardless of which querier you connect to.
+
+However, the `system.queries` table is different--data is local to each Querier.
+`system.queries` contains a non-persisted log of queries run against the current
+querier to which your query is routed.
+The query log is specific to the current Querier and isn't shared across
+queriers in your cluster.
+Logs are scoped to the specified database.
+
 - [system.queries](#systemqueries)
 - [system.tables](#systemtables)
 - [system.partitions](#systempartitions)
@ -276,12 +294,18 @@ _System tables are [subject to change](#system-tables-are-subject-to-change)._

 ### system.queries

-The `system.queries` table contains an unpersisted log of queries run against
-the current [InfluxDB Querier](/influxdb/clustered/reference/internals/storage-engine/#querier)
-to which your query is routed.
-The query log is specific to the current Querier and is not shared across Queriers
-in your cluster.
-Logs are scoped to the specified database.
+The `system.queries` table stores log entries for queries executed for the provided namespace (database) on the node that is _currently handling queries_.
+`system.queries` reflects a process-local, in-memory, namespace-scoped query log.
+
+While this table may be useful for debugging and monitoring queries, keep the following in mind:
+
+- Records stored in `system.queries` are transient and volatile
+  - InfluxDB deletes `system.queries` records during pod restarts.
+  - Queries for one namespace can evict records from another namespace.
+- Data reflects the state of a specific pod answering queries for the namespace.
+  - Data isn't shared across queriers in your cluster.
+  - A query for records in `system.queries` can return different results
+    depending on the pod the request was routed to.

 {{< expand-wrapper >}}
 {{% expand "View `system.queries` schema" %}}
@ -290,9 +314,9 @@ The `system.queries` table contains the following columns:

 - id
 - phase
- issue_time
- query_type
- query_text  
+- **issue_time**: timestamp when the query was issued
+- **query_type**: type (syntax: `sql`, `flightsql`, or `influxql`) of the query
+- **query_text**: query statement text
 - partitions
 - parquet_files
 - plan_duration
@ -301,14 +325,20 @@ The `system.queries` table contains the following columns:
 - end2end_duration
 - compute_duration
 - max_memory
- success
+- **success**: execution status (boolean) of the query
 - running
 - cancelled
- trace_id
+- **trace_id**: trace ID for debugging and monitoring events

 {{% /expand %}}
 {{< /expand-wrapper >}}

+{{% note %}}
+_When listing measurements (tables) available within a namespace,
+some clients and query tools may include the `queries` table in the list of
+namespace tables._
+{{% /note %}}
+
 ### system.tables

 The `system.tables` table contains information about tables in the specified database.
@ -382,6 +412,7 @@ The examples in this section include `WHERE` filters to [optimize queries and re
 - [Query logs](#query-logs)
  - [View all stored query logs](#view-all-stored-query-logs)
  - [View query logs for queries with end-to-end durations above a threshold](#view-query-logs-for-queries-with-end-to-end-durations-above-a-threshold)
+  - [View query logs for a specific query within a time interval](#view-query-logs-for-a-specific-query-within-a-time-interval)
 - [Partitions](#partitions)
  - [View the partition template of a specific table](#view-the-partition-template-of-a-specific-table)
  - [View all partitions for a table](#view-all-partitions-for-a-table)
@ -424,6 +455,46 @@ WHERE
  end2end_duration::BIGINT > (50 * 1000000)
 ```

+### View query logs for a specific query within a time interval
+
+{{< code-tabs >}}
+{{% tabs %}}
+[SQL](#)
+[Python](#)
+{{% /tabs %}}
+{{% code-tab-content %}}
+<!-----------------------------------BEGIN SQL------------------------------>
+```sql
+SELECT *
+FROM system.queries
+WHERE issue_time >= now() - INTERVAL '1 day'
+  AND query_text LIKE '%select * from home%'
+```
+<!-----------------------------------END SQL------------------------------>
+{{% /code-tab-content %}}
+{{% code-tab-content %}}
+<!-----------------------------------BEGIN PYTHON------------------------------>
+```python
+from influxdb_client_3 import InfluxDBClient3
+client = InfluxDBClient3(token = DATABASE_TOKEN,
+                          host = HOSTNAME,
+                          org = '',
+                          database=DATABASE_NAME)
+client.query('select * from home')
+reader = client.query('''
+                      SELECT *
+                      FROM system.queries
+                      WHERE issue_time >= now() - INTERVAL '1 day'
+                      AND query_text LIKE '%select * from home%'
+                      ''',
+                    language='sql',
+                    headers=[(b"iox-debug", b"true")],
+                    mode="reader")
+```
+<!-----------------------------------END PYTHON------------------------------>
+{{% /code-tab-content %}}
+{{< /code-tabs >}}
+
 --- 

 ### Partitions
--- a/content/influxdb/clustered/reference/internals/system-tables.md
+++ b/content/influxdb/clustered/reference/internals/system-tables.md
@ -1,63 +0,0 @@
---
-title: InfluxDB system tables
-description: >
-  InfluxDB system measurements contain time series data used by and generated from the
-  InfluxDB internal monitoring system.
-menu:
-  influxdb_clustered:
-    name: System tables
-    parent: InfluxDB internals
-weight: 103
-influxdb/clustered/tags: [tables, information schema]
-related:
-  - /influxdb/clustered/reference/sql/information-schema/
---
-
-{{% warn %}}
-Queries of InfluxDB system tables may affect production performance while
-system tables are accessed.
-
-System tables are not currently part of the stable API and the schema may change
-in subsequent releases.
-{{% /warn %}}
-
-InfluxDB system measurements contain time series data used by and generated from the
-InfluxDB internal monitoring system.
-
-Each InfluxDB Clustered namespace includes the following system measurements:
-
- [queries](#_queries-system-measurement)
-
-## queries system measurement
-
-The `system.queries` measurement stores log entries for queries executed for the provided namespace (database) on the node that is currently handling queries.
-
-The following example shows how to list queries recorded in the `system.queries` measurement:
-
-```sql
-SELECT issue_time, query_type, query_text, success FROM system.queries;
-```
-
-_When listing measurements (tables) available within a namespace, some clients and query tools may include the `queries` table in the list of namespace tables._
-
-`system.queries` reflects a process-local, in-memory, namespace-scoped query log.
-While this table may be useful for debugging and monitoring queries, keep the following in mind:
-
- Records stored in `system.queries` are volatile.
-  - Records are lost on pod restarts.
-  - Queries for one namespace can evict records from another namespace.
- Data reflects the state of a specific pod answering queries for the namespace.
-  - A query for records in `system.queries` can return different results depending on the pod the request was routed to.
-
-**Data retention:** System data can be transient and is deleted on pod restarts.
-
-### queries measurement schema
-
- **system.queries** _(measurement)_
-  - **fields**:
-      - **issue_time**: timestamp when the query was issued
-      - **query_type**: type (syntax: `sql`, `flightsql`, or `influxql`) of the query
-      - **query_text**: query statement text
-      - **success**: execution status (boolean) of the query
-      - **completed_duration**: time (duration) that the query took to complete
-      - **trace_id**: trace ID for debugging and monitoring events