diff --git a/content/influxdb/cloud-dedicated/query-data/execute-queries/client-libraries/python.md b/content/influxdb/cloud-dedicated/query-data/execute-queries/client-libraries/python.md index 57717b767..d07a4cd74 100644 --- a/content/influxdb/cloud-dedicated/query-data/execute-queries/client-libraries/python.md +++ b/content/influxdb/cloud-dedicated/query-data/execute-queries/client-libraries/python.md @@ -25,6 +25,7 @@ related: - /influxdb/cloud-dedicated/query-data/sql/ - /influxdb/cloud-dedicated/reference/influxql/ - /influxdb/cloud-dedicated/reference/sql/ + - /influxdb/cloud-dedicated/query-data/execute-queries/troubleshoot/ list_code_example: | ```py @@ -305,7 +306,7 @@ and specify the following arguments: #### Example {#execute-query-example} -The following examples shows how to use SQL or InfluxQL to select all fields in a measurement, and then output the results formatted as a Markdown table. +The following example shows how to use SQL or InfluxQL to select all fields in a measurement, and then use PyArrow functions to extract metadata and aggregate data. {{% code-tabs-wrapper %}} {{% code-tabs %}} diff --git a/content/influxdb/cloud-dedicated/query-data/execute-queries/optimize-queries.md b/content/influxdb/cloud-dedicated/query-data/execute-queries/optimize-queries.md new file mode 100644 index 000000000..83f320b5d --- /dev/null +++ b/content/influxdb/cloud-dedicated/query-data/execute-queries/optimize-queries.md @@ -0,0 +1,442 @@ +--- +title: Optimize queries +description: > + Optimize your SQL and InfluxQL queries to improve performance and reduce their memory and compute (CPU) requirements. +weight: 401 +menu: + influxdb_cloud_dedicated: + name: Optimize queries + parent: Execute queries +influxdb/cloud-dedicated/tags: [query, sql, influxql] +related: + - /influxdb/cloud-dedicated/query-data/sql/ + - /influxdb/cloud-dedicated/query-data/influxql/ + - /influxdb/cloud-dedicated/query-data/execute-queries/troubleshoot/ + - /influxdb/cloud-dedicated/reference/client-libraries/v3/ +--- + +Use the following tools to help you identify performance bottlenecks and troubleshoot problems in queries: + + + +- [EXPLAIN and ANALYZE](#explain-and-analyze) +- [Enable trace logging](#enable-trace-logging) + - [Avoid unnecessary tracing](#avoid-unnecessary-tracing) + - [Syntax](#syntax) + - [Example](#example) + - [Tracing response header](#tracing-response-header) + - [Trace response header syntax](#trace-response-header-syntax) + - [Inspect Flight response headers](#inspect-flight-response-headers) +- [Retrieve query information](#retrieve-query-information) + + + +## EXPLAIN and ANALYZE + +To view the query engine's execution plan and metrics for an SQL or InfluxQL query, prepend [`EXPLAIN`](/influxdb/cloud-dedicated/reference/sql/explain/) or [`EXPLAIN ANALYZE`](/influxdb/cloud-dedicated/reference/sql/explain/#explain-analyze) to the query. +The report can reveal query bottlenecks such as a large number of table scans or parquet files, and can help triage the question, "Is the query slow due to the amount of work required or due to a problem with the schema, compactor, etc.?" + +The following example shows how to use the InfluxDB v3 Python client library and pandas to view `EXPLAIN` and `EXPLAIN ANALYZE` results for a query: + + + +{{% code-placeholders "DATABASE_(NAME|TOKEN)" %}} + + +```python +from influxdb_client_3 import InfluxDBClient3 +import pandas as pd +import tabulate # Required for pandas.to_markdown() + +# Instantiate an InfluxDB client. +client = InfluxDBClient3(token = f"DATABASE_TOKEN", + host = f"{{< influxdb/host >}}", + database = f"DATABASE_NAME") + +sql_explain = '''EXPLAIN + SELECT temp + FROM home + WHERE time >= now() - INTERVAL '90 days' + AND room = 'Kitchen' + ORDER BY time''' + +table = client.query(sql_explain) +df = table.to_pandas() +print(df.to_markdown(index=False)) + +assert df.shape == (2, 2), f'Expect {df.shape} to have 2 columns, 2 rows' +assert 'physical_plan' in df.plan_type.values, "Expect physical_plan" +assert 'logical_plan' in df.plan_type.values, "Expect logical_plan" +``` + +{{< expand-wrapper >}} +{{% expand "View EXPLAIN example results" %}} +| plan_type | plan | +|:--------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| logical_plan | Projection: home.temp | +| | Sort: home.time ASC NULLS LAST | +| | Projection: home.temp, home.time | +| | TableScan: home projection=[room, temp, time], full_filters=[home.time >= TimestampNanosecond(1688676582918581320, None), home.room = Dictionary(Int32, Utf8("Kitchen"))] | +| physical_plan | ProjectionExec: expr=[temp@0 as temp] | +| | SortExec: expr=[time@1 ASC NULLS LAST] | +| | EmptyExec: produce_one_row=false | +{{% /expand %}} +{{< /expand-wrapper >}} + + + +```python +sql_explain_analyze = '''EXPLAIN ANALYZE + SELECT * + FROM home + WHERE time >= now() - INTERVAL '90 days' + ORDER BY time''' + +table = client.query(sql_explain_analyze) +df = table.to_pandas() +print(df.to_markdown(index=False)) + +assert df.shape == (1,2) +assert 'Plan with Metrics' in df.plan_type.values, "Expect plan metrics" + +client.close() +``` +{{% /code-placeholders %}} + +Replace the following: + +- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} database +- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}: a [database token](/influxdb/cloud-dedicated/admin/tokens/) with sufficient permissions to the specified database + +{{< expand-wrapper >}} +{{% expand "View EXPLAIN ANALYZE example results" %}} +| plan_type | plan | +|:------------------|:-----------------------------------------------------------------------------------------------------------------------| +| Plan with Metrics | ProjectionExec: expr=[temp@0 as temp], metrics=[output_rows=0, elapsed_compute=1ns] | +| | SortExec: expr=[time@1 ASC NULLS LAST], metrics=[output_rows=0, elapsed_compute=1ns, spill_count=0, spilled_bytes=0] | +| | EmptyExec: produce_one_row=false, metrics=[] +{{% /expand %}} +{{< /expand-wrapper >}} + +## Enable trace logging + +When you enable trace logging for a query, InfluxDB propagates your _trace ID_ through system processes and collects additional log information. + +InfluxDB Support can then use the trace ID that you provide to filter, collate, and analyze log information for the query run. +The tracing system follows the [OpenTelemetry traces](https://opentelemetry.io/docs/concepts/signals/traces/) model for providing observability into a request. + +{{% warn %}} +#### Avoid unnecessary tracing + +Only enable tracing for a query when you need to request troubleshooting help from InfluxDB Support. +To manage resources, InfluxDB has an upper limit for the number of trace requests. +Too many traces can cause InfluxDB to evict log information. +{{% /warn %}} + +To enable tracing for a query, include the `influx-trace-id` header in your query request. + +### Syntax + +Use the following syntax for the `influx-trace-id` header: + +```http +influx-trace-id: TRACE_ID:1112223334445:0:1 +``` + +In the header value, replace the following: + +- `TRACE_ID`: a unique string, 8-16 bytes long, encoded as hexadecimal (32 maximum hex characters). + The trace ID should uniquely identify the query run. +- `:1112223334445:0:1`: InfluxDB constant values (required, but ignored) + +### Example + +The following examples show how to create and pass a trace ID to enable query tracing in InfluxDB: + +{{< tabs-wrapper >}} +{{% tabs %}} +[Python with FlightCallOptions](#) +[Python with FlightClientMiddleware](#python-with-flightclientmiddleware) +{{% /tabs %}} +{{% tab-content %}} + +Use the `InfluxDBClient3` InfluxDB Python client and pass the `headers` argument in the +`query()` method. + + + +{{% code-placeholders "DATABASE_(NAME|TOKEN)|APP_REQUEST_ID" %}} + + + +```python +from influxdb_client_3 import InfluxDBClient3 +import secrets + +def use_flightcalloptions_trace_header(): + print('# Use FlightCallOptions to enable tracing.') + client = InfluxDBClient3(token=f"DATABASE_TOKEN", + host=f"{{< influxdb/host >}}", + database=f"DATABASE_NAME") + + # Generate a trace ID for the query: + # 1. Generate a random 8-byte value as bytes. + # 2. Encode the value as hexadecimal. + random_bytes = secrets.token_bytes(8) + trace_id = random_bytes.hex() + + # Append required constants to the trace ID. + trace_value = f"{trace_id}:1112223334445:0:1" + + # Encode the header key and value as bytes. + # Create a list of header tuples. + headers = [((b"influx-trace-id", trace_value.encode('utf-8')))] + + sql = "SELECT * FROM home WHERE time >= now() - INTERVAL '30 days'" + influxql = "SELECT * FROM home WHERE time >= -90d" + + # Use the query() headers argument to pass the list as FlightCallOptions. + client.query(sql, headers=headers) + + client.close() + +use_flightcalloptions_trace_header() +``` + +{{% /code-placeholders %}} + +{{% /tab-content %}} +{{% tab-content %}} + +Use the `InfluxDBClient3` InfluxDB Python client and `flight.ClientMiddleware` to pass and inspect headers. + +### Tracing response header + +With tracing enabled and a valid trace ID in the request, InfluxDB's `DoGet` action response contains a header with the trace ID that you sent. + +#### Trace response header syntax + +```http +trace-id: TRACE_ID +``` + +### Inspect Flight response headers + +To inspect Flight response headers when using a client library, pass a `FlightClientMiddleware` instance. +that defines a middleware callback function for the `onHeadersReceived` event (the particular function name you use depends on the client library language). + +The following example uses Python client middleware that adds request headers and extracts the trace ID from the `DoGet` response headers: + + + +{{% code-placeholders "DATABASE_(NAME|TOKEN)|APP_REQUEST_ID" %}} + + + +```python +import pyarrow.flight as flight + +class TracingClientMiddleWareFactory(flight.ClientMiddleware): + # Defines a custom middleware factory that returns a middleware instance. + def __init__(self): + self.request_headers = [] + self.response_headers = [] + self.traces = [] + + def addRequestHeader(self, header): + self.request_headers.append(header) + + def addResponseHeader(self, header): + self.response_headers.append(header) + + def addTrace(self, traceid): + self.traces.append(traceid) + + def createTrace(self, traceid): + # Append InfluxDB constants to the trace ID. + trace = f"{traceid}:1112223334445:0:1" + + # To the list of request headers, + # add a tuple with the header key and value as bytes. + self.addRequestHeader((b"influx-trace-id", trace.encode('utf-8'))) + + def start_call(self, info): + return TracingClientMiddleware(info.method, self) + +class TracingClientMiddleware(flight.ClientMiddleware): + # Defines middleware with client event callback methods. + def __init__(self, method, callback_obj): + self._method = method + self.callback = callback_obj + + def call_completed(self, exception): + print('callback: call_completed') + if(exception): + print(f" ...with exception: {exception}") + + def sending_headers(self): + print('callback: sending_headers: ', self.callback.request_headers) + if len(self.callback.request_headers) > 0: + return dict(self.callback.request_headers) + + def received_headers(self, headers): + self.callback.addResponseHeader(headers) + # For the DO_GET action, extract the trace ID from the response headers. + if str(self._method) == "FlightMethod.DO_GET" and "trace-id" in headers: + trace_id = headers["trace-id"][0] + self.callback.addTrace(trace_id) + +from influxdb_client_3 import InfluxDBClient3 +import secrets + +def use_middleware_trace_header(): + print('# Use Flight client middleware to enable tracing.') + + # Instantiate the middleware. + res = TracingClientMiddleWareFactory() + + # Instantiate the client, passing in the middleware instance that provides + # event callbacks for the request. + client = InfluxDBClient3(token=f"DATABASE_TOKEN", + host=f"{{< influxdb/host >}}", + database=f"DATABASE_NAME", + flight_client_options={"middleware": (res,)}) + + # Generate a trace ID for the query: + # 1. Generate a random 8-byte value as bytes. + # 2. Encode the value as hexadecimal. + random_bytes = secrets.token_bytes(8) + trace_id = random_bytes.hex() + + res.createTrace(trace_id) + + sql = "SELECT * FROM home WHERE time >= now() - INTERVAL '30 days'" + + client.query(sql) + client.close() + assert trace_id in res.traces[0], "Expect trace ID in DoGet response." + +use_middleware_trace_header() +``` +{{% /code-placeholders %}} + +{{% /tab-content %}} +{{< /tabs-wrapper >}} + +Replace the following: + +- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} database +- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}: a [database token](/influxdb/cloud-dedicated/admin/tokens/) with sufficient permissions to the specified database + +{{% note %}} +Store or log your query trace ID to ensure you can provide it to InfluxDB Support for troubleshooting. +{{% /note %}} + +After you run your query with tracing enabled, do the following: + +- Remove the tracing header from subsequent runs of the query (to [avoid unnecessary tracing](#avoid-unnecessary-tracing)). +- Provide the trace ID in a request to InfluxDB Support. + +## Retrieve query information + +In addition to the SQL standard `information_schema`, {{% product-name %}} contains _system_ tables that provide access to +InfluxDB-specific information. +The information in each system table is scoped to the namespace you're querying; +you can only retrieve system information for that particular instance. + +To get information about queries you've run on the current instance, use SQL to query the [`system.queries` table](/influxdb/cloud-dedicated/reference/internals/system-tables/#systemqueries-measurement), which contains information from the querier instance currently handling queries. +If you [enabled trace logging for the query](#enable-trace-logging-for-a-query), the `trace-id` appears in the `system.queries.trace_id` column for the query. + +The `system.queries` table is an InfluxDB v3 **debug feature**. +To enable the feature and query `system.queries`, include an `"iox-debug"` header set to `"true"` and use SQL to query the table. + +The following sample code shows how to use the Python client library to do the following: + +1. Enable tracing for a query. +2. Retrieve the trace ID record from `system.queries`. + + + +{{% code-placeholders "DATABASE_(NAME|TOKEN)|APP_REQUEST_ID" %}} + + + +```python +from influxdb_client_3 import InfluxDBClient3 +import secrets +import pandas + +def get_query_information(): + print('# Get query information') + + client = InfluxDBClient3(token = f"DATABASE_TOKEN", + host = f"{{< influxdb/host >}}", + database = f"DATABASE_NAME") + + random_bytes = secrets.token_bytes(16) + trace_id = random_bytes.hex() + trace_value = (f"{trace_id}:1112223334445:0:1").encode('utf-8') + sql = "SELECT * FROM home WHERE time >= now() - INTERVAL '30 days'" + + try: + client.query(sql, headers=[(b'influx-trace-id', trace_value)]) + client.close() + except Exception as e: + print("Query error: ", e) + + client = InfluxDBClient3(token = f"DATABASE_TOKEN", + host = f"{{< influxdb/host >}}", + database = f"DATABASE_NAME") + + import time + df = pandas.DataFrame() + + for i in range(0, 5): + time.sleep(1) + # Use SQL + # To query the system.queries table for your trace ID, pass the following: + # - the iox-debug: true request header + # - an SQL query for the trace_id column + reader = client.query(f'''SELECT compute_duration, query_type, query_text, + success, trace_id + FROM system.queries + WHERE issue_time >= now() - INTERVAL '1 day' + AND trace_id = '{trace_id}' + ORDER BY issue_time DESC + ''', + headers=[(b"iox-debug", b"true")], + mode="reader") + + df = reader.read_all().to_pandas() + if df.shape[0]: + break + + assert df.shape == (1, 5), f"Expect a row for the query trace ID." + print(df) + +get_query_information() +``` +{{% /code-placeholders %}} + +The output is similar to the following: + +```text +compute_duration query_type query_text success trace_id + 0 days sql SELECT compute_duration, quer... True 67338... +``` diff --git a/content/influxdb/cloud-dedicated/query-data/execute-queries/troubleshoot.md b/content/influxdb/cloud-dedicated/query-data/execute-queries/troubleshoot.md index 78b9fc2dd..8556702d0 100644 --- a/content/influxdb/cloud-dedicated/query-data/execute-queries/troubleshoot.md +++ b/content/influxdb/cloud-dedicated/query-data/execute-queries/troubleshoot.md @@ -24,6 +24,7 @@ Learn how to handle responses and troubleshoot errors encountered when querying - [Internal Error: Received RST_STREAM](#internal-error-received-rst_stream) - [Internal Error: stream terminated by RST_STREAM with NO_ERROR](#internal-error-stream-terminated-by-rst_stream-with-no_error) - [Invalid Argument: Invalid ticket](#invalid-argument-invalid-ticket) + - [Timeout: Deadline exceeded](#timeout-deadline-exceeded) - [Unauthenticated: Unauthenticated](#unauthenticated-unauthenticated) - [Unauthorized: Permission denied](#unauthorized-permission-denied) - [FlightUnavailableError: Could not get default pem root certs](#flightunavailableerror-could-not-get-default-pem-root-certs) @@ -80,7 +81,8 @@ SELECT co, delete, hum, room, temp, time The Python client library outputs the following schema representation: -```py + +```python Schema: co: int64 -- field metadata -- @@ -175,7 +177,7 @@ _For a list of gRPC codes that servers and clients may return, see [Status codes **Example**: -```sh +```structuredtext Flight returned internal error, with message: Received RST_STREAM with error code 2. gRPC client debug context: UNKNOWN:Error received from peer ipv4:34.196.233.7:443 {grpc_message:"Received RST_STREAM with error code 2"} ``` @@ -192,11 +194,12 @@ Flight returned internal error, with message: Received RST_STREAM with error cod **Example**: + ```sh pyarrow._flight.FlightInternalError: Flight returned internal error, with message: stream terminated by RST_STREAM with error code: NO_ERROR. gRPC client debug context: UNKNOWN:Error received from peer ipv4:3.123.149.45:443 {created_time:"2023-07-26T14:12:44.992317+02:00", grpc_status:13, grpc_message:"stream terminated by RST_STREAM with error code: NO_ERROR"}. Client context: OK ``` -**Potential Reasons**: +**Potential reasons**: - The server terminated the stream, but there wasn't any specific error associated with it. - Possible network disruption, even if it's temporary. @@ -208,21 +211,35 @@ pyarrow._flight.FlightInternalError: Flight returned internal error, with messag **Example**: + ```sh pyarrow.lib.ArrowInvalid: Flight returned invalid argument error, with message: Invalid ticket. Error: Invalid ticket. gRPC client debug context: UNKNOWN:Error received from peer ipv4:54.158.68.83:443 {created_time:"2023-08-31T17:56:42.909129-05:00", grpc_status:3, grpc_message:"Invalid ticket. Error: Invalid ticket"}. Client context: IOError: Server never sent a data message. Detail: Internal ``` -**Potential Reasons**: +**Potential reasons**: - The request is missing the database name or some other required metadata value. - The request contains bad query syntax. +#### Timeout: Deadline exceeded + + +```sh +pyarrow._flight.FlightTimedOutError: Flight returned timeout error, with message: Deadline Exceeded. gRPC client debug context: UNKNOWN:Deadline Exceeded {grpc_status:4, created_time:"2023-09-27T15:30:58.540385-05:00"}. Client context: IOError: Server never sent a data message. Detail: Internal +``` + +**Potential reasons**: + +- The server's response time exceeded the number of seconds allowed by the client. + See how to specify `timeout` in [FlightCallOptions](https://arrow.apache.org/docs/python/generated/pyarrow.flight.FlightCallOptions.html#pyarrow.flight.FlightCallOptions). + #### Unauthenticated: Unauthenticated **Example**: + ```sh Flight returned unauthenticated error, with message: unauthenticated. gRPC client debug context: UNKNOWN:Error received from peer ipv4:34.196.233.7:443 {grpc_message:"unauthenticated", grpc_status:16, created_time:"2023-08-28T15:38:33.380633-05:00"}. Client context: IOError: Server never sent a data message. Detail: Internal ``` @@ -238,6 +255,7 @@ Flight returned unauthenticated error, with message: unauthenticated. gRPC clien **Example**: + ```sh pyarrow._flight.FlightUnauthorizedError: Flight returned unauthorized error, with message: Permission denied. gRPC client debug context: UNKNOWN:Error received from peer ipv4:54.158.68.83:443 {grpc_message:"Permission denied", grpc_status:7, created_time:"2023-08-31T17:51:08.271009-05:00"}. Client context: IOError: Server never sent a data message. Detail: Internal ``` @@ -254,6 +272,7 @@ pyarrow._flight.FlightUnauthorizedError: Flight returned unauthorized error, wit If unable to locate a root certificate for _gRPC+TLS_, the Flight client returns errors similar to the following: + ```sh UNKNOWN:Failed to load file... filename:"/usr/share/grpc/roots.pem", children:[UNKNOWN:No such file or directory diff --git a/content/influxdb/cloud-dedicated/reference/internals/system-tables.md b/content/influxdb/cloud-dedicated/reference/internals/system-tables.md index e75d0d7af..84abaae47 100644 --- a/content/influxdb/cloud-dedicated/reference/internals/system-tables.md +++ b/content/influxdb/cloud-dedicated/reference/internals/system-tables.md @@ -16,34 +16,68 @@ related: InfluxDB system measurements contain time series data used by and generated from the InfluxDB internal monitoring system. -Each InfluxDB Cloud Dedicated namespace includes the following system measurements: +Each {{% product-name %}} namespace includes the following system measurements: -- [queries](#_queries-system-measurement) + -## queries system measurement +- [system.queries measurement](#systemqueries-measurement) + - [system.queries schema](#systemqueries-schema) + +## system.queries measurement The `system.queries` measurement stores log entries for queries executed for the provided namespace (database) on the node that is currently handling queries. -The following example shows how to list queries recorded in the `system.queries` measurement: +```python +from influxdb_client_3 import InfluxDBClient3 +client = InfluxDBClient3(token = DATABASE_TOKEN, + host = HOSTNAME, + org = '', + database=DATABASE_NAME) +client.query('select * from home') +reader = client.query(''' + SELECT * + FROM system.queries + WHERE issue_time >= now() - INTERVAL '1 day' + AND query_text LIKE '%select * from home%' + ''', + language='sql', + headers=[(b"iox-debug", b"true")], + mode="reader") +print("# system.queries schema\n") +print(reader.schema) +``` -```sql -SELECT issue_time, query_type, query_text, success FROM system.queries; + + +`system.queries` has the following schema: + +```python +# system.queries schema + +issue_time: timestamp[ns] not null +query_type: string not null +query_text: string not null +completed_duration: duration[ns] +success: bool not null +trace_id: string ``` _When listing measurements (tables) available within a namespace, some clients and query tools may include the `queries` table in the list of namespace tables._ `system.queries` reflects a process-local, in-memory, namespace-scoped query log. +The query log isn't shared across instances within the same deployment. While this table may be useful for debugging and monitoring queries, keep the following in mind: - Records stored in `system.queries` are volatile. - Records are lost on pod restarts. - Queries for one namespace can evict records from another namespace. -- Data reflects the state of a specific pod answering queries for the namespace. +- Data reflects the state of a specific pod answering queries for the namespace----the log view is scoped to the requesting namespace and queries aren't leaked across namespaces. - A query for records in `system.queries` can return different results depending on the pod the request was routed to. **Data retention:** System data can be transient and is deleted on pod restarts. +The log size per instance is limited and the log view is scoped to the requesting namespace. -### queries measurement schema +### system.queries schema - **system.queries** _(measurement)_ - **fields**: diff --git a/content/influxdb/cloud-serverless/query-data/execute-queries/client-libraries/python.md b/content/influxdb/cloud-serverless/query-data/execute-queries/client-libraries/python.md index c1e9dfbb4..0fea598b5 100644 --- a/content/influxdb/cloud-serverless/query-data/execute-queries/client-libraries/python.md +++ b/content/influxdb/cloud-serverless/query-data/execute-queries/client-libraries/python.md @@ -26,6 +26,7 @@ related: - /influxdb/cloud-serverless/query-data/sql/ - /influxdb/cloud-serverless/reference/influxql/ - /influxdb/cloud-serverless/reference/sql/ + - /influxdb/cloud-serverless/query-data/execute-queries/troubleshoot/ list_code_example: | ```py @@ -33,7 +34,7 @@ list_code_example: | # Instantiate an InfluxDB client client = InfluxDBClient3( - host='cloud2.influxdata.com', + host='{{< influxdb/host >}}', token='DATABASE_TOKEN', database='DATABASE_NAME' ) @@ -306,7 +307,7 @@ and specify the following arguments: #### Example {#execute-query-example} -The following examples show how to use SQL or InfluxQL to select all fields in a measurement, and then output the results formatted as a Markdown table. +The following example shows how to use SQL or InfluxQL to select all fields in a measurement, and then use PyArrow functions to extract metadata and aggregate data. {{% code-tabs-wrapper %}} {{% code-tabs %}} diff --git a/content/influxdb/cloud-serverless/query-data/execute-queries/optimize-queries.md b/content/influxdb/cloud-serverless/query-data/execute-queries/optimize-queries.md new file mode 100644 index 000000000..71dd821fa --- /dev/null +++ b/content/influxdb/cloud-serverless/query-data/execute-queries/optimize-queries.md @@ -0,0 +1,106 @@ +--- +title: Optimize queries +description: > + Optimize your SQL and InfluxQL queries to improve performance and reduce their memory and compute (CPU) requirements. +weight: 401 +menu: + influxdb_cloud_serverless: + name: Optimize queries + parent: Execute queries +influxdb/cloud-serverless/tags: [query, sql, influxql] +related: + - /influxdb/cloud-serverless/query-data/sql/ + - /influxdb/cloud-serverless/query-data/influxql/ + - /influxdb/cloud-serverless/query-data/execute-queries/troubleshoot/ + - /influxdb/cloud-serverless/reference/client-libraries/v3/ +--- + +## Troubleshoot query performance + +Use the following tools to help you identify performance bottlenecks and troubleshoot problems in queries: + + + +- [Troubleshoot query performance](#troubleshoot-query-performance) + - [EXPLAIN and ANALYZE](#explain-and-analyze) + - [Enable trace logging](#enable-trace-logging) + + + +### EXPLAIN and ANALYZE + +To view the query engine's execution plan and metrics for an SQL query, prepend [`EXPLAIN`](/influxdb/cloud-serverless/reference/sql/explain/) or [`EXPLAIN ANALYZE`](/influxdb/cloud-serverless/reference/sql/explain/#explain-analyze) to the query. +The report can reveal query bottlenecks such as a large number of table scans or parquet files, and can help triage the question, "Is the query slow due to the amount of work required or due to a problem with the schema, compactor, etc.?" + +The following example shows how to use the InfluxDB v3 Python client library and pandas to view `EXPLAIN` and `EXPLAIN ANALYZE` results for a query: + + + +{{% code-placeholders "BUCKET_NAME|API_TOKEN|APP_REQUEST_ID" %}} +```python +from influxdb_client_3 import InfluxDBClient3 +import pandas as pd +import tabulate # Required for pandas.to_markdown() + +def explain_and_analyze(): + print('Use SQL EXPLAIN and ANALYZE to view query plan information.') + + # Instantiate an InfluxDB client. + client = InfluxDBClient3(token = f"API_TOKEN", + host = f"{{< influxdb/host >}}", + database = f"BUCKET_NAME") + + sql_explain = '''EXPLAIN SELECT * + FROM home + WHERE time >= now() - INTERVAL '90 days' + ORDER BY time''' + + table = client.query(sql_explain) + df = table.to_pandas() + + sql_explain_analyze = '''EXPLAIN ANALYZE SELECT * + FROM home + WHERE time >= now() - INTERVAL '90 days' + ORDER BY time''' + + table = client.query(sql_explain_analyze) + + # Combine the Dataframes and output the plan information. + df = pd.concat([df, table.to_pandas()]) + + assert df.shape == (3, 2) and df.columns.to_list() == ['plan_type', 'plan'] + print(df[['plan_type', 'plan']].to_markdown(index=False)) + + client.close() + +explain_and_analyze() +``` +{{% /code-placeholders %}} + +Replace the following: + +- {{% code-placeholder-key %}}`BUCKET_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} database +- {{% code-placeholder-key %}}`API_TOKEN`{{% /code-placeholder-key %}}: a [database token](/influxdb/cloud-serverless/admin/tokens/) with sufficient permissions to the specified database + +The output is similar to the following: + +```markdown +| plan_type | plan | +|:------------------|:---------------------------------------------------------------------------------------------------------------------------------------------| +| logical_plan | Sort: home.time ASC NULLS LAST | +| | TableScan: home projection=[co, hum, room, sensor, temp, time], full_filters=[home.time >= TimestampNanosecond(1688491380936276013, None)] | +| physical_plan | SortExec: expr=[time@5 ASC NULLS LAST] | +| | EmptyExec: produce_one_row=false | +| Plan with Metrics | SortExec: expr=[time@5 ASC NULLS LAST], metrics=[output_rows=0, elapsed_compute=1ns, spill_count=0, spilled_bytes=0] | +| | EmptyExec: produce_one_row=false, metrics=[] +``` + +### Enable trace logging + +Customers with an {{% product-name %}} [annual or support contract](https://www.influxdata.com/influxdb-cloud-pricing/) can [contact InfluxData Support](https://support.influxdata.com/) to enable tracing and request help troubleshooting your query. +With tracing enabled, InfluxDB Support can trace system processes and analyze log information for a query instance. +The tracing system follows the [OpenTelemetry traces](https://opentelemetry.io/docs/concepts/signals/traces/) model for providing observability into a request. diff --git a/content/influxdb/cloud-serverless/query-data/execute-queries/troubleshoot.md b/content/influxdb/cloud-serverless/query-data/execute-queries/troubleshoot.md index bf93b79a5..c62c07512 100644 --- a/content/influxdb/cloud-serverless/query-data/execute-queries/troubleshoot.md +++ b/content/influxdb/cloud-serverless/query-data/execute-queries/troubleshoot.md @@ -197,7 +197,7 @@ Flight returned internal error, with message: Received RST_STREAM with error cod pyarrow._flight.FlightInternalError: Flight returned internal error, with message: stream terminated by RST_STREAM with error code: NO_ERROR. gRPC client debug context: UNKNOWN:Error received from peer ipv4:3.123.149.45:443 {created_time:"2023-07-26T14:12:44.992317+02:00", grpc_status:13, grpc_message:"stream terminated by RST_STREAM with error code: NO_ERROR"}. Client context: OK ``` -**Potential Reasons**: +**Potential reasons**: - The server terminated the stream, but there wasn't any specific error associated with it. - Possible network disruption, even if it's temporary. @@ -213,7 +213,7 @@ pyarrow._flight.FlightInternalError: Flight returned internal error, with messag ArrowInvalid: Flight returned invalid argument error, with message: bucket "otel5" not found. gRPC client debug context: UNKNOWN:Error received from peer ipv4:3.123.149.45:443 {grpc_message:"bucket \"otel5\" not found", grpc_status:3, created_time:"2023-08-09T16:37:30.093946+01:00"}. Client context: IOError: Server never sent a data message. Detail: Internal ``` -**Potential Reasons**: +**Potential reasons**: - The specified bucket doesn't exist. @@ -227,7 +227,7 @@ ArrowInvalid: Flight returned invalid argument error, with message: bucket "otel pyarrow.lib.ArrowInvalid: Flight returned invalid argument error, with message: Invalid ticket. Error: Invalid ticket. gRPC client debug context: UNKNOWN:Error received from peer ipv4:54.158.68.83:443 {created_time:"2023-08-31T17:56:42.909129-05:00", grpc_status:3, grpc_message:"Invalid ticket. Error: Invalid ticket"}. Client context: IOError: Server never sent a data message. Detail: Internal ``` -**Potential Reasons**: +**Potential reasons**: - The request is missing the bucket name or some other required metadata value. - The request contains bad query syntax. diff --git a/content/influxdb/clustered/query-data/execute-queries/optimize-queries.md b/content/influxdb/clustered/query-data/execute-queries/optimize-queries.md new file mode 100644 index 000000000..bc6cc067c --- /dev/null +++ b/content/influxdb/clustered/query-data/execute-queries/optimize-queries.md @@ -0,0 +1,115 @@ +--- +title: Optimize queries +description: > + Optimize your SQL and InfluxQL queries to improve performance and reduce their memory and compute (CPU) requirements. +weight: 401 +menu: + influxdb_clustered: + name: Optimize queries + parent: Execute queries +influxdb/clustered/tags: [query, sql, influxql] +related: + - /influxdb/clustered/query-data/sql/ + - /influxdb/clustered/query-data/influxql/ + - /influxdb/clustered/query-data/execute-queries/troubleshoot/ + - /influxdb/clustered/reference/client-libraries/v3/ +--- + +Use the following tools to help you identify performance bottlenecks and troubleshoot problems in queries: + + + +- [EXPLAIN and ANALYZE](#explain-and-analyze) + + + +### EXPLAIN and ANALYZE + +To view the query engine's execution plan and metrics for an SQL query, prepend [`EXPLAIN`](/influxdb/clustered/reference/sql/explain/) or [`EXPLAIN ANALYZE`](/influxdb/clustered/reference/sql/explain/#explain-analyze) to the query. +The report can reveal query bottlenecks such as a large number of table scans or parquet files, and can help triage the question, "Is the query slow due to the amount of work required or due to a problem with the schema, compactor, etc.?" + +The following example shows how to use the InfluxDB v3 Python client library and pandas to view `EXPLAIN` and `EXPLAIN ANALYZE` results for a query: + + + +{{% code-placeholders "DATABASE_(NAME|TOKEN)" %}} + + +```python +from influxdb_client_3 import InfluxDBClient3 +import pandas as pd +import tabulate # Required for pandas.to_markdown() + +# Instantiate an InfluxDB client. +client = InfluxDBClient3(token = f"DATABASE_TOKEN", + host = f"{{< influxdb/host >}}", + database = f"DATABASE_NAME") + +sql_explain = '''EXPLAIN + SELECT temp + FROM home + WHERE time >= now() - INTERVAL '90 days' + AND room = 'Kitchen' + ORDER BY time''' + +table = client.query(sql_explain) +df = table.to_pandas() +print(df.to_markdown(index=False)) + +assert df.shape == (2, 2), f'Expect {df.shape} to have 2 columns, 2 rows' +assert 'physical_plan' in df.plan_type.values, "Expect physical_plan" +assert 'logical_plan' in df.plan_type.values, "Expect logical_plan" +``` + +{{< expand-wrapper >}} +{{% expand "View EXPLAIN example results" %}} +| plan_type | plan | +|:--------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| logical_plan | Projection: home.temp | +| | Sort: home.time ASC NULLS LAST | +| | Projection: home.temp, home.time | +| | TableScan: home projection=[room, temp, time], full_filters=[home.time >= TimestampNanosecond(1688676582918581320, None), home.room = Dictionary(Int32, Utf8("Kitchen"))] | +| physical_plan | ProjectionExec: expr=[temp@0 as temp] | +| | SortExec: expr=[time@1 ASC NULLS LAST] | +| | EmptyExec: produce_one_row=false | +{{% /expand %}} +{{< /expand-wrapper >}} + + + +```python +sql_explain_analyze = '''EXPLAIN ANALYZE + SELECT * + FROM home + WHERE time >= now() - INTERVAL '90 days' + ORDER BY time''' + +table = client.query(sql_explain_analyze) +df = table.to_pandas() +print(df.to_markdown(index=False)) + +assert df.shape == (1,2) +assert 'Plan with Metrics' in df.plan_type.values, "Expect plan metrics" + +client.close() +``` +{{% /code-placeholders %}} + +Replace the following: + +- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} database +- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}: a [database token](/influxdb/cloud-dedicated/admin/tokens/) with sufficient permissions to the specified database + +{{< expand-wrapper >}} +{{% expand "View EXPLAIN ANALYZE example results" %}} +| plan_type | plan | +|:------------------|:-----------------------------------------------------------------------------------------------------------------------| +| Plan with Metrics | ProjectionExec: expr=[temp@0 as temp], metrics=[output_rows=0, elapsed_compute=1ns] | +| | SortExec: expr=[time@1 ASC NULLS LAST], metrics=[output_rows=0, elapsed_compute=1ns, spill_count=0, spilled_bytes=0] | +| | EmptyExec: produce_one_row=false, metrics=[] +{{% /expand %}} +{{< /expand-wrapper >}} diff --git a/content/influxdb/clustered/query-data/execute-queries/troubleshoot.md b/content/influxdb/clustered/query-data/execute-queries/troubleshoot.md index f9b636eb6..be5ec2fac 100644 --- a/content/influxdb/clustered/query-data/execute-queries/troubleshoot.md +++ b/content/influxdb/clustered/query-data/execute-queries/troubleshoot.md @@ -197,7 +197,7 @@ Flight returned internal error, with message: Received RST_STREAM with error cod pyarrow._flight.FlightInternalError: Flight returned internal error, with message: stream terminated by RST_STREAM with error code: NO_ERROR. gRPC client debug context: UNKNOWN:Error received from peer ipv4:3.123.149.45:443 {created_time:"2023-07-26T14:12:44.992317+02:00", grpc_status:13, grpc_message:"stream terminated by RST_STREAM with error code: NO_ERROR"}. Client context: OK ``` -**Potential Reasons**: +**Potential reasons**: - The server terminated the stream, but there wasn't any specific error associated with it. - Possible network disruption, even if it's temporary. @@ -213,7 +213,7 @@ pyarrow._flight.FlightInternalError: Flight returned internal error, with messag pyarrow.lib.ArrowInvalid: Flight returned invalid argument error, with message: Invalid ticket. Error: Invalid ticket. gRPC client debug context: UNKNOWN:Error received from peer ipv4:54.158.68.83:443 {created_time:"2023-08-31T17:56:42.909129-05:00", grpc_status:3, grpc_message:"Invalid ticket. Error: Invalid ticket"}. Client context: IOError: Server never sent a data message. Detail: Internal ``` -**Potential Reasons**: +**Potential reasons**: - The request is missing the database name or some other required metadata value. - The request contains bad query syntax. diff --git a/test/requirements.txt b/test/requirements.txt index 2c42bb011..33d7077c5 100644 --- a/test/requirements.txt +++ b/test/requirements.txt @@ -1,5 +1,6 @@ ## Code sample dependencies -influxdb3-python +# Temporary fork for passing headers in query options. +influxdb3-python @ git+https://github.com/jstirnaman/influxdb3-python@4abd41c710e79f85333ba81258b10daff54d05b0 pandas ## Tabulate for printing pandas DataFrames. tabulate