5143-Add Optimize Queries page with query analysis help (#5165)

* chore(test): Use my python client fork (pending approval) to allow custom headers.

* feature(query): Add Optimize Queries page with query analysis help

- Closes Client library query traces: Python #5143
- Dedicated and Clustered examples for enabling query tracing and extracting headers
- System.queries table
- Explain and Analyze
- For now, skip tests for sample Flight responses until we add in code samples.

* Update content/influxdb/clustered/query-data/execute-queries/optimize-queries.md

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>

* Update content/influxdb/cloud-dedicated/query-data/execute-queries/optimize-queries.md

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>

* Update content/influxdb/cloud-dedicated/query-data/execute-queries/optimize-queries.md

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>

* Update content/influxdb/cloud-dedicated/query-data/execute-queries/optimize-queries.md

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>

* Update content/influxdb/clustered/query-data/execute-queries/optimize-queries.md

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>

* Update content/influxdb/clustered/query-data/execute-queries/optimize-queries.md

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>

* Update content/influxdb/clustered/query-data/execute-queries/optimize-queries.md

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>

* Update content/influxdb/clustered/query-data/execute-queries/optimize-queries.md

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>

* Update content/influxdb/cloud-dedicated/query-data/execute-queries/optimize-queries.md

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>

* Update content/influxdb/cloud-dedicated/query-data/execute-queries/optimize-queries.md

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>

* feat(v3): influx-trace-id for dedicated, tracing not ready for clustered (Client library query traces: Python #5143)

---------

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>
pull/5180/head^2
Jason Stirnaman 2023-10-16 15:08:40 -05:00 committed by GitHub
parent 6be4bbd3bc
commit 5ad8e80361
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 740 additions and 21 deletions

View File

@ -25,6 +25,7 @@ related:
- /influxdb/cloud-dedicated/query-data/sql/
- /influxdb/cloud-dedicated/reference/influxql/
- /influxdb/cloud-dedicated/reference/sql/
- /influxdb/cloud-dedicated/query-data/execute-queries/troubleshoot/
list_code_example: |
```py
@ -305,7 +306,7 @@ and specify the following arguments:
#### Example {#execute-query-example}
The following examples shows how to use SQL or InfluxQL to select all fields in a measurement, and then output the results formatted as a Markdown table.
The following example shows how to use SQL or InfluxQL to select all fields in a measurement, and then use PyArrow functions to extract metadata and aggregate data.
{{% code-tabs-wrapper %}}
{{% code-tabs %}}

View File

@ -0,0 +1,442 @@
---
title: Optimize queries
description: >
Optimize your SQL and InfluxQL queries to improve performance and reduce their memory and compute (CPU) requirements.
weight: 401
menu:
influxdb_cloud_dedicated:
name: Optimize queries
parent: Execute queries
influxdb/cloud-dedicated/tags: [query, sql, influxql]
related:
- /influxdb/cloud-dedicated/query-data/sql/
- /influxdb/cloud-dedicated/query-data/influxql/
- /influxdb/cloud-dedicated/query-data/execute-queries/troubleshoot/
- /influxdb/cloud-dedicated/reference/client-libraries/v3/
---
Use the following tools to help you identify performance bottlenecks and troubleshoot problems in queries:
<!-- TOC -->
- [EXPLAIN and ANALYZE](#explain-and-analyze)
- [Enable trace logging](#enable-trace-logging)
- [Avoid unnecessary tracing](#avoid-unnecessary-tracing)
- [Syntax](#syntax)
- [Example](#example)
- [Tracing response header](#tracing-response-header)
- [Trace response header syntax](#trace-response-header-syntax)
- [Inspect Flight response headers](#inspect-flight-response-headers)
- [Retrieve query information](#retrieve-query-information)
<!-- /TOC -->
## EXPLAIN and ANALYZE
To view the query engine's execution plan and metrics for an SQL or InfluxQL query, prepend [`EXPLAIN`](/influxdb/cloud-dedicated/reference/sql/explain/) or [`EXPLAIN ANALYZE`](/influxdb/cloud-dedicated/reference/sql/explain/#explain-analyze) to the query.
The report can reveal query bottlenecks such as a large number of table scans or parquet files, and can help triage the question, "Is the query slow due to the amount of work required or due to a problem with the schema, compactor, etc.?"
The following example shows how to use the InfluxDB v3 Python client library and pandas to view `EXPLAIN` and `EXPLAIN ANALYZE` results for a query:
<!-- Import for tests and hide from users.
```python
import os
```
-->
{{% code-placeholders "DATABASE_(NAME|TOKEN)" %}}
<!--pytest-codeblocks:cont-->
```python
from influxdb_client_3 import InfluxDBClient3
import pandas as pd
import tabulate # Required for pandas.to_markdown()
# Instantiate an InfluxDB client.
client = InfluxDBClient3(token = f"DATABASE_TOKEN",
host = f"{{< influxdb/host >}}",
database = f"DATABASE_NAME")
sql_explain = '''EXPLAIN
SELECT temp
FROM home
WHERE time >= now() - INTERVAL '90 days'
AND room = 'Kitchen'
ORDER BY time'''
table = client.query(sql_explain)
df = table.to_pandas()
print(df.to_markdown(index=False))
assert df.shape == (2, 2), f'Expect {df.shape} to have 2 columns, 2 rows'
assert 'physical_plan' in df.plan_type.values, "Expect physical_plan"
assert 'logical_plan' in df.plan_type.values, "Expect logical_plan"
```
{{< expand-wrapper >}}
{{% expand "View EXPLAIN example results" %}}
| plan_type | plan |
|:--------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| logical_plan | Projection: home.temp |
| | Sort: home.time ASC NULLS LAST |
| | Projection: home.temp, home.time |
| | TableScan: home projection=[room, temp, time], full_filters=[home.time >= TimestampNanosecond(1688676582918581320, None), home.room = Dictionary(Int32, Utf8("Kitchen"))] |
| physical_plan | ProjectionExec: expr=[temp@0 as temp] |
| | SortExec: expr=[time@1 ASC NULLS LAST] |
| | EmptyExec: produce_one_row=false |
{{% /expand %}}
{{< /expand-wrapper >}}
<!--pytest-codeblocks:cont-->
```python
sql_explain_analyze = '''EXPLAIN ANALYZE
SELECT *
FROM home
WHERE time >= now() - INTERVAL '90 days'
ORDER BY time'''
table = client.query(sql_explain_analyze)
df = table.to_pandas()
print(df.to_markdown(index=False))
assert df.shape == (1,2)
assert 'Plan with Metrics' in df.plan_type.values, "Expect plan metrics"
client.close()
```
{{% /code-placeholders %}}
Replace the following:
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} database
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}: a [database token](/influxdb/cloud-dedicated/admin/tokens/) with sufficient permissions to the specified database
{{< expand-wrapper >}}
{{% expand "View EXPLAIN ANALYZE example results" %}}
| plan_type | plan |
|:------------------|:-----------------------------------------------------------------------------------------------------------------------|
| Plan with Metrics | ProjectionExec: expr=[temp@0 as temp], metrics=[output_rows=0, elapsed_compute=1ns] |
| | SortExec: expr=[time@1 ASC NULLS LAST], metrics=[output_rows=0, elapsed_compute=1ns, spill_count=0, spilled_bytes=0] |
| | EmptyExec: produce_one_row=false, metrics=[]
{{% /expand %}}
{{< /expand-wrapper >}}
## Enable trace logging
When you enable trace logging for a query, InfluxDB propagates your _trace ID_ through system processes and collects additional log information.
InfluxDB Support can then use the trace ID that you provide to filter, collate, and analyze log information for the query run.
The tracing system follows the [OpenTelemetry traces](https://opentelemetry.io/docs/concepts/signals/traces/) model for providing observability into a request.
{{% warn %}}
#### Avoid unnecessary tracing
Only enable tracing for a query when you need to request troubleshooting help from InfluxDB Support.
To manage resources, InfluxDB has an upper limit for the number of trace requests.
Too many traces can cause InfluxDB to evict log information.
{{% /warn %}}
To enable tracing for a query, include the `influx-trace-id` header in your query request.
### Syntax
Use the following syntax for the `influx-trace-id` header:
```http
influx-trace-id: TRACE_ID:1112223334445:0:1
```
In the header value, replace the following:
- `TRACE_ID`: a unique string, 8-16 bytes long, encoded as hexadecimal (32 maximum hex characters).
The trace ID should uniquely identify the query run.
- `:1112223334445:0:1`: InfluxDB constant values (required, but ignored)
### Example
The following examples show how to create and pass a trace ID to enable query tracing in InfluxDB:
{{< tabs-wrapper >}}
{{% tabs %}}
[Python with FlightCallOptions](#)
[Python with FlightClientMiddleware](#python-with-flightclientmiddleware)
{{% /tabs %}}
{{% tab-content %}}
<!---- BEGIN PYTHON WITH FLIGHTCALLOPTIONS ---->
Use the `InfluxDBClient3` InfluxDB Python client and pass the `headers` argument in the
`query()` method.
<!-- Import for tests and hide from users.
```python
import os
```
-->
{{% code-placeholders "DATABASE_(NAME|TOKEN)|APP_REQUEST_ID" %}}
<!--pytest-codeblocks:cont-->
```python
from influxdb_client_3 import InfluxDBClient3
import secrets
def use_flightcalloptions_trace_header():
print('# Use FlightCallOptions to enable tracing.')
client = InfluxDBClient3(token=f"DATABASE_TOKEN",
host=f"{{< influxdb/host >}}",
database=f"DATABASE_NAME")
# Generate a trace ID for the query:
# 1. Generate a random 8-byte value as bytes.
# 2. Encode the value as hexadecimal.
random_bytes = secrets.token_bytes(8)
trace_id = random_bytes.hex()
# Append required constants to the trace ID.
trace_value = f"{trace_id}:1112223334445:0:1"
# Encode the header key and value as bytes.
# Create a list of header tuples.
headers = [((b"influx-trace-id", trace_value.encode('utf-8')))]
sql = "SELECT * FROM home WHERE time >= now() - INTERVAL '30 days'"
influxql = "SELECT * FROM home WHERE time >= -90d"
# Use the query() headers argument to pass the list as FlightCallOptions.
client.query(sql, headers=headers)
client.close()
use_flightcalloptions_trace_header()
```
{{% /code-placeholders %}}
<!---- END PYTHON WITH FLIGHTCALLOPTIONS ---->
{{% /tab-content %}}
{{% tab-content %}}
<!---- BEGIN PYTHON WITH MIDDLEWARE ---->
Use the `InfluxDBClient3` InfluxDB Python client and `flight.ClientMiddleware` to pass and inspect headers.
### Tracing response header
With tracing enabled and a valid trace ID in the request, InfluxDB's `DoGet` action response contains a header with the trace ID that you sent.
#### Trace response header syntax
```http
trace-id: TRACE_ID
```
### Inspect Flight response headers
To inspect Flight response headers when using a client library, pass a `FlightClientMiddleware` instance.
that defines a middleware callback function for the `onHeadersReceived` event (the particular function name you use depends on the client library language).
The following example uses Python client middleware that adds request headers and extracts the trace ID from the `DoGet` response headers:
<!-- Import for tests and hide from users.
```python
import os
```
-->
{{% code-placeholders "DATABASE_(NAME|TOKEN)|APP_REQUEST_ID" %}}
<!--pytest-codeblocks:cont-->
```python
import pyarrow.flight as flight
class TracingClientMiddleWareFactory(flight.ClientMiddleware):
# Defines a custom middleware factory that returns a middleware instance.
def __init__(self):
self.request_headers = []
self.response_headers = []
self.traces = []
def addRequestHeader(self, header):
self.request_headers.append(header)
def addResponseHeader(self, header):
self.response_headers.append(header)
def addTrace(self, traceid):
self.traces.append(traceid)
def createTrace(self, traceid):
# Append InfluxDB constants to the trace ID.
trace = f"{traceid}:1112223334445:0:1"
# To the list of request headers,
# add a tuple with the header key and value as bytes.
self.addRequestHeader((b"influx-trace-id", trace.encode('utf-8')))
def start_call(self, info):
return TracingClientMiddleware(info.method, self)
class TracingClientMiddleware(flight.ClientMiddleware):
# Defines middleware with client event callback methods.
def __init__(self, method, callback_obj):
self._method = method
self.callback = callback_obj
def call_completed(self, exception):
print('callback: call_completed')
if(exception):
print(f" ...with exception: {exception}")
def sending_headers(self):
print('callback: sending_headers: ', self.callback.request_headers)
if len(self.callback.request_headers) > 0:
return dict(self.callback.request_headers)
def received_headers(self, headers):
self.callback.addResponseHeader(headers)
# For the DO_GET action, extract the trace ID from the response headers.
if str(self._method) == "FlightMethod.DO_GET" and "trace-id" in headers:
trace_id = headers["trace-id"][0]
self.callback.addTrace(trace_id)
from influxdb_client_3 import InfluxDBClient3
import secrets
def use_middleware_trace_header():
print('# Use Flight client middleware to enable tracing.')
# Instantiate the middleware.
res = TracingClientMiddleWareFactory()
# Instantiate the client, passing in the middleware instance that provides
# event callbacks for the request.
client = InfluxDBClient3(token=f"DATABASE_TOKEN",
host=f"{{< influxdb/host >}}",
database=f"DATABASE_NAME",
flight_client_options={"middleware": (res,)})
# Generate a trace ID for the query:
# 1. Generate a random 8-byte value as bytes.
# 2. Encode the value as hexadecimal.
random_bytes = secrets.token_bytes(8)
trace_id = random_bytes.hex()
res.createTrace(trace_id)
sql = "SELECT * FROM home WHERE time >= now() - INTERVAL '30 days'"
client.query(sql)
client.close()
assert trace_id in res.traces[0], "Expect trace ID in DoGet response."
use_middleware_trace_header()
```
{{% /code-placeholders %}}
<!---- END PYTHON WITH MIDDLEWARE ---->
{{% /tab-content %}}
{{< /tabs-wrapper >}}
Replace the following:
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} database
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}: a [database token](/influxdb/cloud-dedicated/admin/tokens/) with sufficient permissions to the specified database
{{% note %}}
Store or log your query trace ID to ensure you can provide it to InfluxDB Support for troubleshooting.
{{% /note %}}
After you run your query with tracing enabled, do the following:
- Remove the tracing header from subsequent runs of the query (to [avoid unnecessary tracing](#avoid-unnecessary-tracing)).
- Provide the trace ID in a request to InfluxDB Support.
## Retrieve query information
In addition to the SQL standard `information_schema`, {{% product-name %}} contains _system_ tables that provide access to
InfluxDB-specific information.
The information in each system table is scoped to the namespace you're querying;
you can only retrieve system information for that particular instance.
To get information about queries you've run on the current instance, use SQL to query the [`system.queries` table](/influxdb/cloud-dedicated/reference/internals/system-tables/#systemqueries-measurement), which contains information from the querier instance currently handling queries.
If you [enabled trace logging for the query](#enable-trace-logging-for-a-query), the `trace-id` appears in the `system.queries.trace_id` column for the query.
The `system.queries` table is an InfluxDB v3 **debug feature**.
To enable the feature and query `system.queries`, include an `"iox-debug"` header set to `"true"` and use SQL to query the table.
The following sample code shows how to use the Python client library to do the following:
1. Enable tracing for a query.
2. Retrieve the trace ID record from `system.queries`.
<!-- Import for tests and hide from users.
```python
import os
```
-->
{{% code-placeholders "DATABASE_(NAME|TOKEN)|APP_REQUEST_ID" %}}
<!--pytest-codeblocks:cont-->
```python
from influxdb_client_3 import InfluxDBClient3
import secrets
import pandas
def get_query_information():
print('# Get query information')
client = InfluxDBClient3(token = f"DATABASE_TOKEN",
host = f"{{< influxdb/host >}}",
database = f"DATABASE_NAME")
random_bytes = secrets.token_bytes(16)
trace_id = random_bytes.hex()
trace_value = (f"{trace_id}:1112223334445:0:1").encode('utf-8')
sql = "SELECT * FROM home WHERE time >= now() - INTERVAL '30 days'"
try:
client.query(sql, headers=[(b'influx-trace-id', trace_value)])
client.close()
except Exception as e:
print("Query error: ", e)
client = InfluxDBClient3(token = f"DATABASE_TOKEN",
host = f"{{< influxdb/host >}}",
database = f"DATABASE_NAME")
import time
df = pandas.DataFrame()
for i in range(0, 5):
time.sleep(1)
# Use SQL
# To query the system.queries table for your trace ID, pass the following:
# - the iox-debug: true request header
# - an SQL query for the trace_id column
reader = client.query(f'''SELECT compute_duration, query_type, query_text,
success, trace_id
FROM system.queries
WHERE issue_time >= now() - INTERVAL '1 day'
AND trace_id = '{trace_id}'
ORDER BY issue_time DESC
''',
headers=[(b"iox-debug", b"true")],
mode="reader")
df = reader.read_all().to_pandas()
if df.shape[0]:
break
assert df.shape == (1, 5), f"Expect a row for the query trace ID."
print(df)
get_query_information()
```
{{% /code-placeholders %}}
The output is similar to the following:
```text
compute_duration query_type query_text success trace_id
0 days sql SELECT compute_duration, quer... True 67338...
```

View File

@ -24,6 +24,7 @@ Learn how to handle responses and troubleshoot errors encountered when querying
- [Internal Error: Received RST_STREAM](#internal-error-received-rst_stream)
- [Internal Error: stream terminated by RST_STREAM with NO_ERROR](#internal-error-stream-terminated-by-rst_stream-with-no_error)
- [Invalid Argument: Invalid ticket](#invalid-argument-invalid-ticket)
- [Timeout: Deadline exceeded](#timeout-deadline-exceeded)
- [Unauthenticated: Unauthenticated](#unauthenticated-unauthenticated)
- [Unauthorized: Permission denied](#unauthorized-permission-denied)
- [FlightUnavailableError: Could not get default pem root certs](#flightunavailableerror-could-not-get-default-pem-root-certs)
@ -80,7 +81,8 @@ SELECT co, delete, hum, room, temp, time
The Python client library outputs the following schema representation:
```py
<!--pytest.mark.skip-->
```python
Schema:
co: int64
-- field metadata --
@ -175,7 +177,7 @@ _For a list of gRPC codes that servers and clients may return, see [Status codes
**Example**:
```sh
```structuredtext
Flight returned internal error, with message: Received RST_STREAM with error code 2. gRPC client debug context: UNKNOWN:Error received from peer ipv4:34.196.233.7:443 {grpc_message:"Received RST_STREAM with error code 2"}
```
@ -192,11 +194,12 @@ Flight returned internal error, with message: Received RST_STREAM with error cod
**Example**:
<!--pytest.mark.skip-->
```sh
pyarrow._flight.FlightInternalError: Flight returned internal error, with message: stream terminated by RST_STREAM with error code: NO_ERROR. gRPC client debug context: UNKNOWN:Error received from peer ipv4:3.123.149.45:443 {created_time:"2023-07-26T14:12:44.992317+02:00", grpc_status:13, grpc_message:"stream terminated by RST_STREAM with error code: NO_ERROR"}. Client context: OK
```
**Potential Reasons**:
**Potential reasons**:
- The server terminated the stream, but there wasn't any specific error associated with it.
- Possible network disruption, even if it's temporary.
@ -208,21 +211,35 @@ pyarrow._flight.FlightInternalError: Flight returned internal error, with messag
**Example**:
<!--pytest.mark.skip-->
```sh
pyarrow.lib.ArrowInvalid: Flight returned invalid argument error, with message: Invalid ticket. Error: Invalid ticket. gRPC client debug context: UNKNOWN:Error received from peer ipv4:54.158.68.83:443 {created_time:"2023-08-31T17:56:42.909129-05:00", grpc_status:3, grpc_message:"Invalid ticket. Error: Invalid ticket"}. Client context: IOError: Server never sent a data message. Detail: Internal
```
**Potential Reasons**:
**Potential reasons**:
- The request is missing the database name or some other required metadata value.
- The request contains bad query syntax.
<!-- END -->
#### Timeout: Deadline exceeded
<!--pytest.mark.skip-->
```sh
pyarrow._flight.FlightTimedOutError: Flight returned timeout error, with message: Deadline Exceeded. gRPC client debug context: UNKNOWN:Deadline Exceeded {grpc_status:4, created_time:"2023-09-27T15:30:58.540385-05:00"}. Client context: IOError: Server never sent a data message. Detail: Internal
```
**Potential reasons**:
- The server's response time exceeded the number of seconds allowed by the client.
See how to specify `timeout` in [FlightCallOptions](https://arrow.apache.org/docs/python/generated/pyarrow.flight.FlightCallOptions.html#pyarrow.flight.FlightCallOptions).
#### Unauthenticated: Unauthenticated
**Example**:
<!--pytest.mark.skip-->
```sh
Flight returned unauthenticated error, with message: unauthenticated. gRPC client debug context: UNKNOWN:Error received from peer ipv4:34.196.233.7:443 {grpc_message:"unauthenticated", grpc_status:16, created_time:"2023-08-28T15:38:33.380633-05:00"}. Client context: IOError: Server never sent a data message. Detail: Internal
```
@ -238,6 +255,7 @@ Flight returned unauthenticated error, with message: unauthenticated. gRPC clien
**Example**:
<!--pytest.mark.skip-->
```sh
pyarrow._flight.FlightUnauthorizedError: Flight returned unauthorized error, with message: Permission denied. gRPC client debug context: UNKNOWN:Error received from peer ipv4:54.158.68.83:443 {grpc_message:"Permission denied", grpc_status:7, created_time:"2023-08-31T17:51:08.271009-05:00"}. Client context: IOError: Server never sent a data message. Detail: Internal
```
@ -254,6 +272,7 @@ pyarrow._flight.FlightUnauthorizedError: Flight returned unauthorized error, wit
If unable to locate a root certificate for _gRPC+TLS_, the Flight client returns errors similar to the following:
<!--pytest.mark.skip-->
```sh
UNKNOWN:Failed to load file... filename:"/usr/share/grpc/roots.pem",
children:[UNKNOWN:No such file or directory

View File

@ -16,34 +16,68 @@ related:
InfluxDB system measurements contain time series data used by and generated from the
InfluxDB internal monitoring system.
Each InfluxDB Cloud Dedicated namespace includes the following system measurements:
Each {{% product-name %}} namespace includes the following system measurements:
- [queries](#_queries-system-measurement)
<!-- TOC -->
## queries system measurement
- [system.queries measurement](#systemqueries-measurement)
- [system.queries schema](#systemqueries-schema)
## system.queries measurement
The `system.queries` measurement stores log entries for queries executed for the provided namespace (database) on the node that is currently handling queries.
The following example shows how to list queries recorded in the `system.queries` measurement:
```python
from influxdb_client_3 import InfluxDBClient3
client = InfluxDBClient3(token = DATABASE_TOKEN,
host = HOSTNAME,
org = '',
database=DATABASE_NAME)
client.query('select * from home')
reader = client.query('''
SELECT *
FROM system.queries
WHERE issue_time >= now() - INTERVAL '1 day'
AND query_text LIKE '%select * from home%'
''',
language='sql',
headers=[(b"iox-debug", b"true")],
mode="reader")
print("# system.queries schema\n")
print(reader.schema)
```
```sql
SELECT issue_time, query_type, query_text, success FROM system.queries;
<!--pytest-codeblocks:expected-output-->
`system.queries` has the following schema:
```python
# system.queries schema
issue_time: timestamp[ns] not null
query_type: string not null
query_text: string not null
completed_duration: duration[ns]
success: bool not null
trace_id: string
```
_When listing measurements (tables) available within a namespace, some clients and query tools may include the `queries` table in the list of namespace tables._
`system.queries` reflects a process-local, in-memory, namespace-scoped query log.
The query log isn't shared across instances within the same deployment.
While this table may be useful for debugging and monitoring queries, keep the following in mind:
- Records stored in `system.queries` are volatile.
- Records are lost on pod restarts.
- Queries for one namespace can evict records from another namespace.
- Data reflects the state of a specific pod answering queries for the namespace.
- Data reflects the state of a specific pod answering queries for the namespace----the log view is scoped to the requesting namespace and queries aren't leaked across namespaces.
- A query for records in `system.queries` can return different results depending on the pod the request was routed to.
**Data retention:** System data can be transient and is deleted on pod restarts.
The log size per instance is limited and the log view is scoped to the requesting namespace.
### queries measurement schema
### system.queries schema
- **system.queries** _(measurement)_
- **fields**:

View File

@ -26,6 +26,7 @@ related:
- /influxdb/cloud-serverless/query-data/sql/
- /influxdb/cloud-serverless/reference/influxql/
- /influxdb/cloud-serverless/reference/sql/
- /influxdb/cloud-serverless/query-data/execute-queries/troubleshoot/
list_code_example: |
```py
@ -33,7 +34,7 @@ list_code_example: |
# Instantiate an InfluxDB client
client = InfluxDBClient3(
host='cloud2.influxdata.com',
host='{{< influxdb/host >}}',
token='DATABASE_TOKEN',
database='DATABASE_NAME'
)
@ -306,7 +307,7 @@ and specify the following arguments:
#### Example {#execute-query-example}
The following examples show how to use SQL or InfluxQL to select all fields in a measurement, and then output the results formatted as a Markdown table.
The following example shows how to use SQL or InfluxQL to select all fields in a measurement, and then use PyArrow functions to extract metadata and aggregate data.
{{% code-tabs-wrapper %}}
{{% code-tabs %}}

View File

@ -0,0 +1,106 @@
---
title: Optimize queries
description: >
Optimize your SQL and InfluxQL queries to improve performance and reduce their memory and compute (CPU) requirements.
weight: 401
menu:
influxdb_cloud_serverless:
name: Optimize queries
parent: Execute queries
influxdb/cloud-serverless/tags: [query, sql, influxql]
related:
- /influxdb/cloud-serverless/query-data/sql/
- /influxdb/cloud-serverless/query-data/influxql/
- /influxdb/cloud-serverless/query-data/execute-queries/troubleshoot/
- /influxdb/cloud-serverless/reference/client-libraries/v3/
---
## Troubleshoot query performance
Use the following tools to help you identify performance bottlenecks and troubleshoot problems in queries:
<!-- TOC -->
- [Troubleshoot query performance](#troubleshoot-query-performance)
- [EXPLAIN and ANALYZE](#explain-and-analyze)
- [Enable trace logging](#enable-trace-logging)
<!-- /TOC -->
### EXPLAIN and ANALYZE
To view the query engine's execution plan and metrics for an SQL query, prepend [`EXPLAIN`](/influxdb/cloud-serverless/reference/sql/explain/) or [`EXPLAIN ANALYZE`](/influxdb/cloud-serverless/reference/sql/explain/#explain-analyze) to the query.
The report can reveal query bottlenecks such as a large number of table scans or parquet files, and can help triage the question, "Is the query slow due to the amount of work required or due to a problem with the schema, compactor, etc.?"
The following example shows how to use the InfluxDB v3 Python client library and pandas to view `EXPLAIN` and `EXPLAIN ANALYZE` results for a query:
<!-- Import for tests and hide from users.
```python
import os
```
-->
<!--pytest-codeblocks:cont-->
{{% code-placeholders "BUCKET_NAME|API_TOKEN|APP_REQUEST_ID" %}}
```python
from influxdb_client_3 import InfluxDBClient3
import pandas as pd
import tabulate # Required for pandas.to_markdown()
def explain_and_analyze():
print('Use SQL EXPLAIN and ANALYZE to view query plan information.')
# Instantiate an InfluxDB client.
client = InfluxDBClient3(token = f"API_TOKEN",
host = f"{{< influxdb/host >}}",
database = f"BUCKET_NAME")
sql_explain = '''EXPLAIN SELECT *
FROM home
WHERE time >= now() - INTERVAL '90 days'
ORDER BY time'''
table = client.query(sql_explain)
df = table.to_pandas()
sql_explain_analyze = '''EXPLAIN ANALYZE SELECT *
FROM home
WHERE time >= now() - INTERVAL '90 days'
ORDER BY time'''
table = client.query(sql_explain_analyze)
# Combine the Dataframes and output the plan information.
df = pd.concat([df, table.to_pandas()])
assert df.shape == (3, 2) and df.columns.to_list() == ['plan_type', 'plan']
print(df[['plan_type', 'plan']].to_markdown(index=False))
client.close()
explain_and_analyze()
```
{{% /code-placeholders %}}
Replace the following:
- {{% code-placeholder-key %}}`BUCKET_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} database
- {{% code-placeholder-key %}}`API_TOKEN`{{% /code-placeholder-key %}}: a [database token](/influxdb/cloud-serverless/admin/tokens/) with sufficient permissions to the specified database
The output is similar to the following:
```markdown
| plan_type | plan |
|:------------------|:---------------------------------------------------------------------------------------------------------------------------------------------|
| logical_plan | Sort: home.time ASC NULLS LAST |
| | TableScan: home projection=[co, hum, room, sensor, temp, time], full_filters=[home.time >= TimestampNanosecond(1688491380936276013, None)] |
| physical_plan | SortExec: expr=[time@5 ASC NULLS LAST] |
| | EmptyExec: produce_one_row=false |
| Plan with Metrics | SortExec: expr=[time@5 ASC NULLS LAST], metrics=[output_rows=0, elapsed_compute=1ns, spill_count=0, spilled_bytes=0] |
| | EmptyExec: produce_one_row=false, metrics=[]
```
### Enable trace logging
Customers with an {{% product-name %}} [annual or support contract](https://www.influxdata.com/influxdb-cloud-pricing/) can [contact InfluxData Support](https://support.influxdata.com/) to enable tracing and request help troubleshooting your query.
With tracing enabled, InfluxDB Support can trace system processes and analyze log information for a query instance.
The tracing system follows the [OpenTelemetry traces](https://opentelemetry.io/docs/concepts/signals/traces/) model for providing observability into a request.

View File

@ -197,7 +197,7 @@ Flight returned internal error, with message: Received RST_STREAM with error cod
pyarrow._flight.FlightInternalError: Flight returned internal error, with message: stream terminated by RST_STREAM with error code: NO_ERROR. gRPC client debug context: UNKNOWN:Error received from peer ipv4:3.123.149.45:443 {created_time:"2023-07-26T14:12:44.992317+02:00", grpc_status:13, grpc_message:"stream terminated by RST_STREAM with error code: NO_ERROR"}. Client context: OK
```
**Potential Reasons**:
**Potential reasons**:
- The server terminated the stream, but there wasn't any specific error associated with it.
- Possible network disruption, even if it's temporary.
@ -213,7 +213,7 @@ pyarrow._flight.FlightInternalError: Flight returned internal error, with messag
ArrowInvalid: Flight returned invalid argument error, with message: bucket "otel5" not found. gRPC client debug context: UNKNOWN:Error received from peer ipv4:3.123.149.45:443 {grpc_message:"bucket \"otel5\" not found", grpc_status:3, created_time:"2023-08-09T16:37:30.093946+01:00"}. Client context: IOError: Server never sent a data message. Detail: Internal
```
**Potential Reasons**:
**Potential reasons**:
- The specified bucket doesn't exist.
@ -227,7 +227,7 @@ ArrowInvalid: Flight returned invalid argument error, with message: bucket "otel
pyarrow.lib.ArrowInvalid: Flight returned invalid argument error, with message: Invalid ticket. Error: Invalid ticket. gRPC client debug context: UNKNOWN:Error received from peer ipv4:54.158.68.83:443 {created_time:"2023-08-31T17:56:42.909129-05:00", grpc_status:3, grpc_message:"Invalid ticket. Error: Invalid ticket"}. Client context: IOError: Server never sent a data message. Detail: Internal
```
**Potential Reasons**:
**Potential reasons**:
- The request is missing the bucket name or some other required metadata value.
- The request contains bad query syntax.

View File

@ -0,0 +1,115 @@
---
title: Optimize queries
description: >
Optimize your SQL and InfluxQL queries to improve performance and reduce their memory and compute (CPU) requirements.
weight: 401
menu:
influxdb_clustered:
name: Optimize queries
parent: Execute queries
influxdb/clustered/tags: [query, sql, influxql]
related:
- /influxdb/clustered/query-data/sql/
- /influxdb/clustered/query-data/influxql/
- /influxdb/clustered/query-data/execute-queries/troubleshoot/
- /influxdb/clustered/reference/client-libraries/v3/
---
Use the following tools to help you identify performance bottlenecks and troubleshoot problems in queries:
<!-- TOC -->
- [EXPLAIN and ANALYZE](#explain-and-analyze)
<!-- /TOC -->
### EXPLAIN and ANALYZE
To view the query engine's execution plan and metrics for an SQL query, prepend [`EXPLAIN`](/influxdb/clustered/reference/sql/explain/) or [`EXPLAIN ANALYZE`](/influxdb/clustered/reference/sql/explain/#explain-analyze) to the query.
The report can reveal query bottlenecks such as a large number of table scans or parquet files, and can help triage the question, "Is the query slow due to the amount of work required or due to a problem with the schema, compactor, etc.?"
The following example shows how to use the InfluxDB v3 Python client library and pandas to view `EXPLAIN` and `EXPLAIN ANALYZE` results for a query:
<!-- Import for tests and hide from users.
```python
import os
```
-->
{{% code-placeholders "DATABASE_(NAME|TOKEN)" %}}
<!--pytest-codeblocks:cont-->
```python
from influxdb_client_3 import InfluxDBClient3
import pandas as pd
import tabulate # Required for pandas.to_markdown()
# Instantiate an InfluxDB client.
client = InfluxDBClient3(token = f"DATABASE_TOKEN",
host = f"{{< influxdb/host >}}",
database = f"DATABASE_NAME")
sql_explain = '''EXPLAIN
SELECT temp
FROM home
WHERE time >= now() - INTERVAL '90 days'
AND room = 'Kitchen'
ORDER BY time'''
table = client.query(sql_explain)
df = table.to_pandas()
print(df.to_markdown(index=False))
assert df.shape == (2, 2), f'Expect {df.shape} to have 2 columns, 2 rows'
assert 'physical_plan' in df.plan_type.values, "Expect physical_plan"
assert 'logical_plan' in df.plan_type.values, "Expect logical_plan"
```
{{< expand-wrapper >}}
{{% expand "View EXPLAIN example results" %}}
| plan_type | plan |
|:--------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| logical_plan | Projection: home.temp |
| | Sort: home.time ASC NULLS LAST |
| | Projection: home.temp, home.time |
| | TableScan: home projection=[room, temp, time], full_filters=[home.time >= TimestampNanosecond(1688676582918581320, None), home.room = Dictionary(Int32, Utf8("Kitchen"))] |
| physical_plan | ProjectionExec: expr=[temp@0 as temp] |
| | SortExec: expr=[time@1 ASC NULLS LAST] |
| | EmptyExec: produce_one_row=false |
{{% /expand %}}
{{< /expand-wrapper >}}
<!--pytest-codeblocks:cont-->
```python
sql_explain_analyze = '''EXPLAIN ANALYZE
SELECT *
FROM home
WHERE time >= now() - INTERVAL '90 days'
ORDER BY time'''
table = client.query(sql_explain_analyze)
df = table.to_pandas()
print(df.to_markdown(index=False))
assert df.shape == (1,2)
assert 'Plan with Metrics' in df.plan_type.values, "Expect plan metrics"
client.close()
```
{{% /code-placeholders %}}
Replace the following:
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} database
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}: a [database token](/influxdb/cloud-dedicated/admin/tokens/) with sufficient permissions to the specified database
{{< expand-wrapper >}}
{{% expand "View EXPLAIN ANALYZE example results" %}}
| plan_type | plan |
|:------------------|:-----------------------------------------------------------------------------------------------------------------------|
| Plan with Metrics | ProjectionExec: expr=[temp@0 as temp], metrics=[output_rows=0, elapsed_compute=1ns] |
| | SortExec: expr=[time@1 ASC NULLS LAST], metrics=[output_rows=0, elapsed_compute=1ns, spill_count=0, spilled_bytes=0] |
| | EmptyExec: produce_one_row=false, metrics=[]
{{% /expand %}}
{{< /expand-wrapper >}}

View File

@ -197,7 +197,7 @@ Flight returned internal error, with message: Received RST_STREAM with error cod
pyarrow._flight.FlightInternalError: Flight returned internal error, with message: stream terminated by RST_STREAM with error code: NO_ERROR. gRPC client debug context: UNKNOWN:Error received from peer ipv4:3.123.149.45:443 {created_time:"2023-07-26T14:12:44.992317+02:00", grpc_status:13, grpc_message:"stream terminated by RST_STREAM with error code: NO_ERROR"}. Client context: OK
```
**Potential Reasons**:
**Potential reasons**:
- The server terminated the stream, but there wasn't any specific error associated with it.
- Possible network disruption, even if it's temporary.
@ -213,7 +213,7 @@ pyarrow._flight.FlightInternalError: Flight returned internal error, with messag
pyarrow.lib.ArrowInvalid: Flight returned invalid argument error, with message: Invalid ticket. Error: Invalid ticket. gRPC client debug context: UNKNOWN:Error received from peer ipv4:54.158.68.83:443 {created_time:"2023-08-31T17:56:42.909129-05:00", grpc_status:3, grpc_message:"Invalid ticket. Error: Invalid ticket"}. Client context: IOError: Server never sent a data message. Detail: Internal
```
**Potential Reasons**:
**Potential reasons**:
- The request is missing the database name or some other required metadata value.
- The request contains bad query syntax.

View File

@ -1,5 +1,6 @@
## Code sample dependencies
influxdb3-python
# Temporary fork for passing headers in query options.
influxdb3-python @ git+https://github.com/jstirnaman/influxdb3-python@4abd41c710e79f85333ba81258b10daff54d05b0
pandas
## Tabulate for printing pandas DataFrames.
tabulate