Merge pull request #5615 from influxdata/5576-add-optimizations-to-system-table-queries

fix(v3): Apply suggestion from code review to combine query-system-da…
pull/5620/head
Jason Stirnaman 2024-09-25 09:31:32 -05:00 committed by GitHub
commit 39dfd2306b
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 166 additions and 176 deletions

View File

@ -10,15 +10,19 @@ menu:
weight: 105
related:
- /influxdb/cloud-dedicated/reference/cli/influxctl/query/
- /influxdb/cloud-dedicated/reference/internals/system-tables/
---
{{< product-name >}} stores data related to queries, tables, partitions, and
compaction in _system tables_ within your cluster.
System tables contain time series data used by and generated from the
{{< product-name >}} internal monitoring system.
You can query the cluster system tables for information about your cluster.
- [Query system tables](#query-system-tables)
- [Optimize queries to reduce impact to your cluster](#optimize-queries-to-reduce-impact-to-your-cluster)
- [System tables](#system-tables)
- [Understanding system table data distribution](#understanding-system-table-data-distribution)
- [system.queries](#systemqueries)
- [system.tables](#systemtables)
- [system.partitions](#systempartitions)
@ -50,7 +54,6 @@ If you detect a schema change or a non-functioning query example, please
## Query system tables
{{% note %}}
Querying system tables with `influxctl` requires **`influxctl` v2.8.0 or newer**.
{{% /note %}}
@ -259,6 +262,21 @@ Use the `AND`, `OR`, or `IN` keywords to combine filters in your query.
_System tables are [subject to change](#system-tables-are-subject-to-change)._
{{% /warn %}}
### Understanding system table data distribution
Data in `system.tables`, `system.partitions`, and `system.compactor` includes
data for all [InfluxDB Queriers](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier) in your cluster.
The data comes from the catalog, and because all the queriers share one catalog,
the results from these three tables derive from the same source data,
regardless of which querier you connect to.
However, the `system.queries` table is different--data is local to each Querier.
`system.queries` contains a non-persisted log of queries run against the current
querier to which your query is routed.
The query log is specific to the current Querier and isn't shared across
queriers in your cluster.
Logs are scoped to the specified database.
- [system.queries](#systemqueries)
- [system.tables](#systemtables)
- [system.partitions](#systempartitions)
@ -266,12 +284,18 @@ _System tables are [subject to change](#system-tables-are-subject-to-change)._
### system.queries
The `system.queries` table contains an unpersisted log of queries run against
the current [InfluxDB Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier)
to which your query is routed.
The query log is specific to the current Querier and is not shared across Queriers
in your cluster.
Logs are scoped to the specified database.
The `system.queries` table stores log entries for queries executed for the provided namespace (database) on the node that is _currently handling queries_.
`system.queries` reflects a process-local, in-memory, namespace-scoped query log.
While this table may be useful for debugging and monitoring queries, keep the following in mind:
- Records stored in `system.queries` are transient and volatile
- InfluxDB deletes `system.queries` records during pod restarts.
- Queries for one namespace can evict records from another namespace.
- Data reflects the state of a specific pod answering queries for the namespace.
- Data isn't shared across queriers in your cluster.
- A query for records in `system.queries` can return different results
depending on the pod the request was routed to.
{{< expand-wrapper >}}
{{% expand "View `system.queries` schema" %}}
@ -280,9 +304,9 @@ The `system.queries` table contains the following columns:
- id
- phase
- issue_time
- query_type
- query_text
- **issue_time**: timestamp when the query was issued
- **query_type**: type (syntax: `sql`, `flightsql`, or `influxql`) of the query
- **query_text**: query statement text
- partitions
- parquet_files
- plan_duration
@ -291,14 +315,20 @@ The `system.queries` table contains the following columns:
- end2end_duration
- compute_duration
- max_memory
- success
- **success**: execution status (boolean) of the query
- running
- cancelled
- trace_id
- **trace_id**: trace ID for debugging and monitoring events
{{% /expand %}}
{{< /expand-wrapper >}}
{{% note %}}
_When listing measurements (tables) available within a namespace,
some clients and query tools may include the `queries` table in the list of
namespace tables._
{{% /note %}}
### system.tables
The `system.tables` table contains information about tables in the specified database.
@ -372,6 +402,7 @@ The examples in this section include `WHERE` filters to [optimize queries and re
- [Query logs](#query-logs)
- [View all stored query logs](#view-all-stored-query-logs)
- [View query logs for queries with end-to-end durations above a threshold](#view-query-logs-for-queries-with-end-to-end-durations-above-a-threshold)
- [View query logs for a specific query within a time interval](#view-query-logs-for-a-specific-query-within-a-time-interval)
- [Partitions](#partitions)
- [View the partition template of a specific table](#view-the-partition-template-of-a-specific-table)
- [View all partitions for a table](#view-all-partitions-for-a-table)
@ -414,6 +445,46 @@ WHERE
end2end_duration::BIGINT > (50 * 1000000)
```
### View query logs for a specific query within a time interval
{{< code-tabs >}}
{{% tabs %}}
[SQL](#)
[Python](#)
{{% /tabs %}}
{{% code-tab-content %}}
<!-----------------------------------BEGIN SQL------------------------------>
```sql
SELECT *
FROM system.queries
WHERE issue_time >= now() - INTERVAL '1 day'
AND query_text LIKE '%select * from home%'
```
<!-----------------------------------END SQL------------------------------>
{{% /code-tab-content %}}
{{% code-tab-content %}}
<!-----------------------------------BEGIN PYTHON------------------------------>
```python
from influxdb_client_3 import InfluxDBClient3
client = InfluxDBClient3(token = DATABASE_TOKEN,
host = HOSTNAME,
org = '',
database=DATABASE_NAME)
client.query('select * from home')
reader = client.query('''
SELECT *
FROM system.queries
WHERE issue_time >= now() - INTERVAL '1 day'
AND query_text LIKE '%select * from home%'
''',
language='sql',
headers=[(b"iox-debug", b"true")],
mode="reader")
```
<!-----------------------------------END PYTHON------------------------------>
{{% /code-tab-content %}}
{{< /code-tabs >}}
---
### Partitions

View File

@ -1,89 +0,0 @@
---
title: InfluxDB system tables
description: >
InfluxDB system measurements contain time series data used by and generated from the
InfluxDB internal monitoring system.
menu:
influxdb_cloud_dedicated:
name: System tables
parent: InfluxDB internals
weight: 103
influxdb/cloud-dedicated/tags: [tables, information schema]
related:
- /influxdb/cloud-dedicated/reference/sql/information-schema/
---
InfluxDB system measurements contain time series data used by and generated from the
InfluxDB internal monitoring system.
Each {{% product-name %}} namespace includes the following system measurements:
<!-- TOC -->
- [system.queries measurement](#systemqueries-measurement)
- [system.queries schema](#systemqueries-schema)
## system.queries measurement
The `system.queries` measurement stores log entries for queries executed for the provided namespace (database) on the node that is currently handling queries.
```python
from influxdb_client_3 import InfluxDBClient3
client = InfluxDBClient3(token = DATABASE_TOKEN,
host = HOSTNAME,
org = '',
database=DATABASE_NAME)
client.query('select * from home')
reader = client.query('''
SELECT *
FROM system.queries
WHERE issue_time >= now() - INTERVAL '1 day'
AND query_text LIKE '%select * from home%'
''',
language='sql',
headers=[(b"iox-debug", b"true")],
mode="reader")
print("# system.queries schema\n")
print(reader.schema)
```
<!--pytest-codeblocks:expected-output-->
`system.queries` has the following schema:
```python
# system.queries schema
issue_time: timestamp[ns] not null
query_type: string not null
query_text: string not null
completed_duration: duration[ns]
success: bool not null
trace_id: string
```
_When listing measurements (tables) available within a namespace, some clients and query tools may include the `queries` table in the list of namespace tables._
`system.queries` reflects a process-local, in-memory, namespace-scoped query log.
The query log isn't shared across instances within the same deployment.
While this table may be useful for debugging and monitoring queries, keep the following in mind:
- Records stored in `system.queries` are volatile.
- Records are lost on pod restarts.
- Queries for one namespace can evict records from another namespace.
- Data reflects the state of a specific pod answering queries for the namespace----the log view is scoped to the requesting namespace and queries aren't leaked across namespaces.
- A query for records in `system.queries` can return different results depending on the pod the request was routed to.
**Data retention:** System data can be transient and is deleted on pod restarts.
The log size per instance is limited and the log view is scoped to the requesting namespace.
### system.queries schema
- **system.queries** _(measurement)_
- **fields**:
- **issue_time**: timestamp when the query was issued
- **query_type**: type (syntax: `sql`, `flightsql`, or `influxql`) of the query
- **query_text**: query statement text
- **success**: execution status (boolean) of the query
- **completed_duration**: time (duration) that the query took to complete
- **trace_id**: trace ID for debugging and monitoring events

View File

@ -10,15 +10,19 @@ menu:
weight: 105
related:
- /influxdb/clustered/reference/cli/influxctl/query/
- /influxdb/clustered/reference/internals/system-tables/
---
{{< product-name >}} stores data related to queries, tables, partitions, and
compaction in _system tables_ within your cluster.
System tables contain time series data used by and generated from the
{{< product-name >}} internal monitoring system.
You can query the cluster system tables for information about your cluster.
- [Query system tables](#query-system-tables)
- [Optimize queries to reduce impact to your cluster](#optimize-queries-to-reduce-impact-to-your-cluster)
- [System tables](#system-tables)
- [Understanding system table data distribution](#understanding-system-table-data-distribution)
- [system.queries](#systemqueries)
- [system.tables](#systemtables)
- [system.partitions](#systempartitions)
@ -50,7 +54,6 @@ If you detect a schema change or a non-functioning query example, please
## Query system tables
{{% note %}}
Querying system tables with `influxctl` requires **`influxctl` v2.8.0 or newer**.
{{% /note %}}
@ -269,6 +272,21 @@ Use the `AND`, `OR`, or `IN` keywords to combine filters in your query.
_System tables are [subject to change](#system-tables-are-subject-to-change)._
{{% /warn %}}
### Understanding system table data distribution
Data in `system.tables`, `system.partitions`, and `system.compactor` includes
data for all [InfluxDB Queriers](/influxdb/clustered/reference/internals/storage-engine/#querier) in your cluster.
The data comes from the catalog, and because all the queriers share one catalog,
the results from these three tables derive from the same source data,
regardless of which querier you connect to.
However, the `system.queries` table is different--data is local to each Querier.
`system.queries` contains a non-persisted log of queries run against the current
querier to which your query is routed.
The query log is specific to the current Querier and isn't shared across
queriers in your cluster.
Logs are scoped to the specified database.
- [system.queries](#systemqueries)
- [system.tables](#systemtables)
- [system.partitions](#systempartitions)
@ -276,12 +294,18 @@ _System tables are [subject to change](#system-tables-are-subject-to-change)._
### system.queries
The `system.queries` table contains an unpersisted log of queries run against
the current [InfluxDB Querier](/influxdb/clustered/reference/internals/storage-engine/#querier)
to which your query is routed.
The query log is specific to the current Querier and is not shared across Queriers
in your cluster.
Logs are scoped to the specified database.
The `system.queries` table stores log entries for queries executed for the provided namespace (database) on the node that is _currently handling queries_.
`system.queries` reflects a process-local, in-memory, namespace-scoped query log.
While this table may be useful for debugging and monitoring queries, keep the following in mind:
- Records stored in `system.queries` are transient and volatile
- InfluxDB deletes `system.queries` records during pod restarts.
- Queries for one namespace can evict records from another namespace.
- Data reflects the state of a specific pod answering queries for the namespace.
- Data isn't shared across queriers in your cluster.
- A query for records in `system.queries` can return different results
depending on the pod the request was routed to.
{{< expand-wrapper >}}
{{% expand "View `system.queries` schema" %}}
@ -290,9 +314,9 @@ The `system.queries` table contains the following columns:
- id
- phase
- issue_time
- query_type
- query_text
- **issue_time**: timestamp when the query was issued
- **query_type**: type (syntax: `sql`, `flightsql`, or `influxql`) of the query
- **query_text**: query statement text
- partitions
- parquet_files
- plan_duration
@ -301,14 +325,20 @@ The `system.queries` table contains the following columns:
- end2end_duration
- compute_duration
- max_memory
- success
- **success**: execution status (boolean) of the query
- running
- cancelled
- trace_id
- **trace_id**: trace ID for debugging and monitoring events
{{% /expand %}}
{{< /expand-wrapper >}}
{{% note %}}
_When listing measurements (tables) available within a namespace,
some clients and query tools may include the `queries` table in the list of
namespace tables._
{{% /note %}}
### system.tables
The `system.tables` table contains information about tables in the specified database.
@ -382,6 +412,7 @@ The examples in this section include `WHERE` filters to [optimize queries and re
- [Query logs](#query-logs)
- [View all stored query logs](#view-all-stored-query-logs)
- [View query logs for queries with end-to-end durations above a threshold](#view-query-logs-for-queries-with-end-to-end-durations-above-a-threshold)
- [View query logs for a specific query within a time interval](#view-query-logs-for-a-specific-query-within-a-time-interval)
- [Partitions](#partitions)
- [View the partition template of a specific table](#view-the-partition-template-of-a-specific-table)
- [View all partitions for a table](#view-all-partitions-for-a-table)
@ -424,6 +455,46 @@ WHERE
end2end_duration::BIGINT > (50 * 1000000)
```
### View query logs for a specific query within a time interval
{{< code-tabs >}}
{{% tabs %}}
[SQL](#)
[Python](#)
{{% /tabs %}}
{{% code-tab-content %}}
<!-----------------------------------BEGIN SQL------------------------------>
```sql
SELECT *
FROM system.queries
WHERE issue_time >= now() - INTERVAL '1 day'
AND query_text LIKE '%select * from home%'
```
<!-----------------------------------END SQL------------------------------>
{{% /code-tab-content %}}
{{% code-tab-content %}}
<!-----------------------------------BEGIN PYTHON------------------------------>
```python
from influxdb_client_3 import InfluxDBClient3
client = InfluxDBClient3(token = DATABASE_TOKEN,
host = HOSTNAME,
org = '',
database=DATABASE_NAME)
client.query('select * from home')
reader = client.query('''
SELECT *
FROM system.queries
WHERE issue_time >= now() - INTERVAL '1 day'
AND query_text LIKE '%select * from home%'
''',
language='sql',
headers=[(b"iox-debug", b"true")],
mode="reader")
```
<!-----------------------------------END PYTHON------------------------------>
{{% /code-tab-content %}}
{{< /code-tabs >}}
---
### Partitions

View File

@ -1,63 +0,0 @@
---
title: InfluxDB system tables
description: >
InfluxDB system measurements contain time series data used by and generated from the
InfluxDB internal monitoring system.
menu:
influxdb_clustered:
name: System tables
parent: InfluxDB internals
weight: 103
influxdb/clustered/tags: [tables, information schema]
related:
- /influxdb/clustered/reference/sql/information-schema/
---
{{% warn %}}
Queries of InfluxDB system tables may affect production performance while
system tables are accessed.
System tables are not currently part of the stable API and the schema may change
in subsequent releases.
{{% /warn %}}
InfluxDB system measurements contain time series data used by and generated from the
InfluxDB internal monitoring system.
Each InfluxDB Clustered namespace includes the following system measurements:
- [queries](#_queries-system-measurement)
## queries system measurement
The `system.queries` measurement stores log entries for queries executed for the provided namespace (database) on the node that is currently handling queries.
The following example shows how to list queries recorded in the `system.queries` measurement:
```sql
SELECT issue_time, query_type, query_text, success FROM system.queries;
```
_When listing measurements (tables) available within a namespace, some clients and query tools may include the `queries` table in the list of namespace tables._
`system.queries` reflects a process-local, in-memory, namespace-scoped query log.
While this table may be useful for debugging and monitoring queries, keep the following in mind:
- Records stored in `system.queries` are volatile.
- Records are lost on pod restarts.
- Queries for one namespace can evict records from another namespace.
- Data reflects the state of a specific pod answering queries for the namespace.
- A query for records in `system.queries` can return different results depending on the pod the request was routed to.
**Data retention:** System data can be transient and is deleted on pod restarts.
### queries measurement schema
- **system.queries** _(measurement)_
- **fields**:
- **issue_time**: timestamp when the query was issued
- **query_type**: type (syntax: `sql`, `flightsql`, or `influxql`) of the query
- **query_text**: query statement text
- **success**: execution status (boolean) of the query
- **completed_duration**: time (duration) that the query took to complete
- **trace_id**: trace ID for debugging and monitoring events