301 lines
13 KiB
Markdown
301 lines
13 KiB
Markdown
Learn how to set appropriate query timeouts for InfluxDB 3 to balance performance and resource protection.
|
|
|
|
Query timeouts prevent resource monopolization while allowing legitimate queries to complete successfully.
|
|
The key is finding the "goldilocks zone"—timeouts that are not too short (causing legitimate queries to fail) and not too long (allowing runaway queries to monopolize resources).
|
|
|
|
- [Understanding query timeouts](#understanding-query-timeouts)
|
|
- [How query routing affects timeout strategy](#how-query-routing-affects-timeout-strategy)
|
|
- [Timeout configuration best practices](#timeout-configuration-best-practices)
|
|
- [InfluxDB 3 client library examples](#influxdb-3-client-library-examples)
|
|
- [Monitoring and troubleshooting](#monitoring-and-troubleshooting)
|
|
|
|
## Understanding query timeouts
|
|
|
|
Query timeouts define the maximum duration a query can run before being canceled.
|
|
In {{% product-name %}}, timeouts serve multiple purposes:
|
|
|
|
- **Resource protection**: Prevent runaway queries from monopolizing system resources
|
|
- **Performance optimization**: Ensure responsive system behavior for time-sensitive operations
|
|
- **Cost control**: Limit compute resource consumption
|
|
- **User experience**: Provide predictable response times for applications and dashboards
|
|
|
|
Query execution includes network latency, query planning, data retrieval, processing, and result serialization.
|
|
|
|
### The "goldilocks zone" for query timeouts
|
|
|
|
Optimal timeouts are:
|
|
- **Long enough**: To accommodate normal query execution under typical load
|
|
- **Short enough**: To prevent resource monopolization and provide reasonable feedback
|
|
- **Adaptive**: Adjusted based on query type, system load, and historical performance
|
|
|
|
## How query routing affects timeout strategy
|
|
|
|
InfluxDB 3 uses round-robin query routing to balance load across multiple queriers.
|
|
This creates a "checkout line" effect that influences timeout strategy.
|
|
|
|
> [!Note]
|
|
> #### Concurrent query execution
|
|
>
|
|
> InfluxDB 3 supports concurrent query execution, which helps minimize the impact of intensive or inefficient queries.
|
|
> However, you should still use appropriate timeouts and optimize your queries for best performance.
|
|
|
|
### The checkout line analogy
|
|
|
|
Consider a grocery store with multiple checkout lines:
|
|
- Customers (queries) are distributed across lines (queriers)
|
|
- A slow customer (long-running query) can block others in the same line
|
|
- More checkout lines (queriers) provide more alternatives when retrying
|
|
|
|
If one querier is unhealthy or has been hijacked by a "noisy neighbor" query (excessively resource hungry), giving up sooner may save time--it's like jumping to a cashier with no customers in line. However, if all queriers are overloaded, then short retries may exacerbate the problem--you wouldn't jump to the end of another line if the cashier is already starting to scan your items.
|
|
|
|
### Noisy neighbor effects
|
|
|
|
In distributed systems:
|
|
- A single long-running query can impact other queries on the same querier
|
|
- Shorter timeouts with retries can help queries find less congested queriers
|
|
- The effectiveness depends on the number of available queriers
|
|
|
|
### When shorter timeouts help
|
|
|
|
- **Multiple queriers available**: Retries can find less congested queriers
|
|
- **Uneven load distribution**: Some queriers may be significantly less busy
|
|
- **Temporary congestion**: Brief spikes in query load or resource usage
|
|
|
|
### When shorter timeouts hurt
|
|
|
|
- **Few queriers**: Limited alternatives for retries
|
|
- **System-wide congestion**: All queriers are equally busy
|
|
- **Expensive query planning**: High overhead for query preparation
|
|
|
|
## Timeout configuration best practices
|
|
|
|
### Make timeouts adjustable
|
|
|
|
Configure timeouts that can be modified without service restarts using environment variables, configuration files, runtime APIs, or per-query overrides. Design your client applications to easily adjust timeouts on the fly, allowing you to respond quickly to performance changes and test different timeout strategies without code changes.
|
|
|
|
See the [InfluxDB 3 client library examples](#influxdb-3-client-library-examples)
|
|
for how to configure timeouts in Python.
|
|
|
|
### Use tiered timeout strategies
|
|
|
|
Implement different timeout classes based on query characteristics.
|
|
|
|
#### Starting point recommendations
|
|
|
|
{{% hide-in "cloud-serverless" %}}
|
|
| Query Type | Recommended Timeout | Use Case | Rationale |
|
|
|------------|-------------------|-----------|-----------|
|
|
| UI and dashboard | 10 seconds | Interactive dashboards, real-time monitoring | Users expect immediate feedback |
|
|
| Generic default | 60 seconds | Application queries, APIs | Balances performance and reliability |
|
|
| Mixed workload | 2 minutes | Development, testing environments | Accommodates various query types |
|
|
| Analytical and background | 5 minutes | Reports, batch processing, ETL operations | Complex queries need more time |
|
|
{{% /hide-in %}}
|
|
|
|
{{% show-in "cloud-serverless" %}}
|
|
| Query Type | Recommended Timeout | Use Case | Rationale |
|
|
|------------|-------------------|-----------|-----------|
|
|
| UI and dashboard | 10 seconds | Interactive dashboards, real-time monitoring | Users expect immediate feedback |
|
|
| Generic default | 30 seconds | Application queries, APIs | Serverless optimized for shorter queries |
|
|
| Mixed workload | 60 seconds | Development, testing environments | Limited by serverless execution model |
|
|
| Analytical and background | 2 minutes | Reports, batch processing | Complex queries within serverless limits |
|
|
{{% /show-in %}}
|
|
|
|
{{% show-in "enterprise, core" %}}
|
|
> [!Tip]
|
|
> #### Use caching
|
|
> Where immediate feedback is crucial, consider using [Last Value Cache](/influxdb3/version/admin/manage-last-value-caches/) to speed up queries for recent values and [Distinct Value Cache](/influxdb3/version/admin/manage-distinct-value-caches/) to speed up queries for distinct values.
|
|
{{% /show-in %}}
|
|
|
|
### Implement progressive timeout and retry logic
|
|
|
|
Consider using more sophisticated retry strategies rather than simple fixed retries:
|
|
|
|
1. **Exponential backoff**: Increase delay between retry attempts
|
|
2. **Jitter**: Add randomness to prevent thundering herd effects
|
|
3. **Circuit breakers**: Stop retries when system is overloaded
|
|
4. **Deadline propagation**: Respect overall operation deadlines
|
|
|
|
### Warning signs
|
|
|
|
Consider these indicators that timeouts may need adjustment:
|
|
|
|
- **Timeouts > 10 minutes**: Usually indicates [query optimization](/influxdb3/version/query-data/troubleshoot-and-optimize/optimize-queries/) opportunities
|
|
- **High retry rates**: May indicate timeouts are too aggressive
|
|
- **Resource utilization spikes**: Long-running queries may need shorter timeouts
|
|
- **User complaints**: Balance between performance and user experience
|
|
|
|
### Environment-specific considerations
|
|
|
|
- **Development**: Use longer timeouts for debugging
|
|
- **Production**: Use shorter timeouts with monitoring
|
|
- **Cost-sensitive**: Use aggressive timeouts and [query optimization](/influxdb3/version/query-data/troubleshoot-and-optimize/optimize-queries/)
|
|
|
|
### Experimental and ad-hoc queries
|
|
|
|
When introducing a new query to your application or when issuing ad-hoc queries to a database with many users, your query might be the "noisy neighbor" (the shopping cart overloaded with groceries). By setting a tighter timeout on experimental queries you can reduce the impact on other users.
|
|
|
|
|
|
## InfluxDB 3 client library examples
|
|
|
|
### Python client with timeout configuration
|
|
|
|
Configure timeouts in the InfluxDB 3 Python client:
|
|
|
|
```python { placeholders="DATABASE_NAME|HOST_URL|AUTH_TOKEN" }
|
|
import influxdb_client_3 as InfluxDBClient3
|
|
|
|
# Configure different timeout classes (in seconds)
|
|
ui_timeout = 10 # For dashboard queries
|
|
api_timeout = 60 # For application queries
|
|
batch_timeout = 300 # For analytical queries
|
|
|
|
# Create client with default timeout
|
|
client = InfluxDBClient3.InfluxDBClient3(
|
|
host="https://{{< influxdb/host >}}",
|
|
database="DATABASE_NAME",
|
|
token="AUTH_TOKEN",
|
|
timeout=api_timeout # Python client uses seconds
|
|
)
|
|
|
|
# Quick query with short timeout
|
|
def query_latest_data():
|
|
try:
|
|
result = client.query(
|
|
query="SELECT * FROM sensors WHERE time >= now() - INTERVAL '5 minutes' ORDER BY time DESC LIMIT 10",
|
|
timeout=ui_timeout
|
|
)
|
|
return result.to_pandas()
|
|
except Exception as e:
|
|
print(f"Quick query failed: {e}")
|
|
return None
|
|
|
|
# Analytical query with longer timeout
|
|
def query_daily_averages():
|
|
query = """
|
|
SELECT
|
|
DATE_TRUNC('day', time) as day,
|
|
room,
|
|
AVG(temperature) as avg_temp,
|
|
COUNT(*) as readings
|
|
FROM sensors
|
|
WHERE time >= now() - INTERVAL '30 days'
|
|
GROUP BY DATE_TRUNC('day', time), room
|
|
ORDER BY day DESC, room
|
|
"""
|
|
|
|
try:
|
|
result = client.query(
|
|
query=query,
|
|
timeout=batch_timeout
|
|
)
|
|
return result.to_pandas()
|
|
except Exception as e:
|
|
print(f"Analytical query failed: {e}")
|
|
return None
|
|
```
|
|
|
|
Replace the following:
|
|
|
|
{{% hide-in "cloud-serverless" %}}
|
|
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: the name of the database to query{{% /hide-in %}}
|
|
{{% show-in "cloud-serverless" %}}
|
|
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: the name of the bucket to query{{% /show-in %}}
|
|
{{% show-in "clustered,cloud-dedicated" %}}
|
|
- {{% code-placeholder-key %}}`AUTH_TOKEN`{{% /code-placeholder-key %}}: a [database token](/influxdb3/clustered/admin/tokens/#database-tokens) with _read_ access to the specified database.{{% /show-in %}}
|
|
{{% show-in "cloud-serverless" %}}
|
|
- {{% code-placeholder-key %}}`AUTH_TOKEN`{{% /code-placeholder-key %}}: an [API token](/influxdb3/cloud-serverless/admin/tokens/) with _read_ access to the specified bucket.{{% /show-in %}}
|
|
{{% show-in "enterprise,core" %}}
|
|
- {{% code-placeholder-key %}}`AUTH_TOKEN`{{% /code-placeholder-key %}}: your {{% token-link "database" %}}with read permissions on the specified database{{% /show-in %}}
|
|
|
|
### Basic retry logic implementation
|
|
|
|
Implement simple retry strategies with progressive timeouts:
|
|
|
|
```python
|
|
import time
|
|
import influxdb_client_3 as InfluxDBClient3
|
|
|
|
def query_with_retry(client, query: str, initial_timeout: int = 60, max_retries: int = 2):
|
|
"""Execute query with basic retry and progressive timeout increase"""
|
|
|
|
for attempt in range(max_retries + 1):
|
|
# Progressive timeout: increase timeout on each retry
|
|
timeout_seconds = initial_timeout + attempt * 30
|
|
|
|
try:
|
|
result = client.query(
|
|
query=query,
|
|
timeout=timeout_seconds
|
|
)
|
|
return result
|
|
|
|
except Exception as e:
|
|
if attempt == max_retries:
|
|
print(f"Query failed after {max_retries + 1} attempts: {e}")
|
|
raise
|
|
|
|
# Simple backoff delay
|
|
delay = 2 * (attempt + 1)
|
|
print(f"Query attempt {attempt + 1} failed: {e}")
|
|
print(f"Retrying in {delay} seconds with timeout {timeout_seconds}s...")
|
|
time.sleep(delay)
|
|
|
|
return None
|
|
|
|
# Usage example
|
|
result = query_with_retry(
|
|
client=client,
|
|
query="SELECT * FROM large_table WHERE time >= now() - INTERVAL '1 day'",
|
|
initial_timeout=60,
|
|
max_retries=2
|
|
)
|
|
```
|
|
|
|
## Monitoring and troubleshooting
|
|
|
|
### Key metrics to monitor
|
|
|
|
Track these essential timeout-related metrics:
|
|
|
|
- **Query duration percentiles**: P50, P95, P99 execution times
|
|
- **Timeout rate**: Percentage of queries that time out
|
|
- **Error rates**: Timeout errors vs. other failure types
|
|
- **Resource utilization**: CPU and memory usage during query execution
|
|
|
|
### Common timeout issues
|
|
|
|
#### High timeout rates
|
|
|
|
**Symptoms**: Many queries exceeding timeout limits
|
|
|
|
**Common causes**:
|
|
- Timeouts set too aggressively for query complexity
|
|
- System resource constraints
|
|
- Inefficient query patterns
|
|
|
|
**Solutions**:
|
|
1. Analyze query performance patterns
|
|
2. [Optimize slow queries](/influxdb3/version/query-data/troubleshoot-and-optimize/optimize-queries/) or increase timeouts appropriately
|
|
3. Scale system resources
|
|
|
|
#### Inconsistent query performance
|
|
|
|
**Symptoms**: Same queries sometimes fast, sometimes timeout
|
|
|
|
**Common causes**:
|
|
|
|
- Resource contention from concurrent queries
|
|
- Data compaction state (queries may be faster after compaction completes)
|
|
|
|
**Solutions**:
|
|
|
|
1. Analyze query patterns to identify and optimize slow queries
|
|
2. Implement retry logic with exponential backoff in your client applications
|
|
3. Adjust timeout values based on observed query performance patterns
|
|
{{% show-in "enterprise,core" %}}
|
|
4. Implement [Last Value Cache](/influxdb3/version/admin/manage-last-value-caches/) to speed up queries for recent values
|
|
5. Implement [Distinct Value Cache](/influxdb3/version/admin/manage-distinct-value-caches/) to speed up queries for distinct values
|
|
{{% /show-in %}}
|
|
|
|
> [!Note]
|
|
> Regular analysis of timeout patterns helps identify optimization opportunities and system scaling needs. |