Merge branch 'master' into jts-askai-group-filters

jts-askai-group-filters
Jason Stirnaman 2025-11-12 15:13:33 -05:00 committed by GitHub
commit 2d3f0e44dc
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 24 additions and 4 deletions

View File

@ -38,7 +38,6 @@ cluster efficiency.
- [Migrate to specialized nodes](#migrate-to-specialized-nodes) - [Migrate to specialized nodes](#migrate-to-specialized-nodes)
- [Manage configurations](#manage-configurations) - [Manage configurations](#manage-configurations)
## Specialize nodes for specific workloads ## Specialize nodes for specific workloads
In an {{% product-name %}} cluster, you can dedicate nodes to specific tasks: In an {{% product-name %}} cluster, you can dedicate nodes to specific tasks:
@ -65,6 +64,7 @@ influxdb3 serve --mode=all
``` ```
Available modes: Available modes:
- `all`: All capabilities enabled (default) - `all`: All capabilities enabled (default)
- `ingest`: Data ingestion and line protocol parsing - `ingest`: Data ingestion and line protocol parsing
- `query`: Query execution and data retrieval - `query`: Query execution and data retrieval
@ -103,6 +103,7 @@ influxdb3 \
``` ```
**Configuration rationale:** **Configuration rationale:**
- **12 IO threads**: Handle multiple concurrent writers (Telegraf agents, applications) - **12 IO threads**: Handle multiple concurrent writers (Telegraf agents, applications)
- **20 DataFusion threads**: Required for data snapshot operations that convert buffered writes to Parquet files - **20 DataFusion threads**: Required for data snapshot operations that convert buffered writes to Parquet files
- **60% memory pool**: Balance between write buffers and data snapshot operations - **60% memory pool**: Balance between write buffers and data snapshot operations
@ -126,6 +127,7 @@ du -sh /path/to/data/wal/
``` ```
> [!Important] > [!Important]
>
> #### Scale IO threads with concurrent writers > #### Scale IO threads with concurrent writers
> >
> If you see only 2 CPU cores at 100% on a large ingester, increase > If you see only 2 CPU cores at 100% on a large ingester, increase
@ -158,6 +160,7 @@ influxdb3 \
``` ```
**Configuration rationale:** **Configuration rationale:**
- **4 IO threads**: Minimal, just for HTTP request handling - **4 IO threads**: Minimal, just for HTTP request handling
- **60 DataFusion threads**: Maximum parallelism for query execution - **60 DataFusion threads**: Maximum parallelism for query execution
- **90% memory pool**: Maximize memory for complex aggregations - **90% memory pool**: Maximize memory for complex aggregations
@ -211,6 +214,7 @@ influxdb3 \
``` ```
**Configuration rationale:** **Configuration rationale:**
- **2 IO threads**: Minimal, compaction is DataFusion-intensive - **2 IO threads**: Minimal, compaction is DataFusion-intensive
- **30 DataFusion threads**: Maximum threads for sort/merge operations - **30 DataFusion threads**: Maximum threads for sort/merge operations
- **24h gen2 duration**: Time-based compaction strategy - **24h gen2 duration**: Time-based compaction strategy
@ -396,6 +400,7 @@ GROUP BY table_name;
``` ```
#### Query nodes #### Query nodes
```sql ```sql
-- Monitor query performance -- Monitor query performance
SELECT SELECT
@ -408,6 +413,7 @@ WHERE issue_time > now() - INTERVAL '5 minutes'
``` ```
#### Compactor nodes #### Compactor nodes
```sql ```sql
-- Monitor compaction progress -- Monitor compaction progress
SELECT SELECT
@ -447,6 +453,7 @@ curl -X POST "http://query-01:8181/api/v3/query_sql" \
``` ```
> [!Tip] > [!Tip]
>
> ### Extend monitoring with plugins > ### Extend monitoring with plugins
> >
> Enhance your cluster monitoring capabilities using the InfluxDB 3 processing engine. The [InfluxDB 3 plugins library](https://github.com/influxdata/influxdb3_plugins) includes several monitoring and alerting plugins: > Enhance your cluster monitoring capabilities using the InfluxDB 3 processing engine. The [InfluxDB 3 plugins library](https://github.com/influxdata/influxdb3_plugins) includes several monitoring and alerting plugins:
@ -466,6 +473,7 @@ Use the [monitoring queries](#monitor-cluster-wide-metrics) to identify the foll
#### High CPU with low throughput (Ingest nodes) #### High CPU with low throughput (Ingest nodes)
**Detection query:** **Detection query:**
```sql ```sql
-- Check for high failed query rate indicating parsing issues -- Check for high failed query rate indicating parsing issues
SELECT SELECT
@ -477,6 +485,7 @@ WHERE issue_time > now() - INTERVAL '5 minutes';
``` ```
**Symptoms:** **Symptoms:**
- Only 2 CPU cores at 100% on large machines - Only 2 CPU cores at 100% on large machines
- High write latency despite available resources - High write latency despite available resources
- Failed queries due to parsing timeouts - Failed queries due to parsing timeouts
@ -486,6 +495,7 @@ WHERE issue_time > now() - INTERVAL '5 minutes';
#### Memory pressure alerts (Query nodes) #### Memory pressure alerts (Query nodes)
**Detection query:** **Detection query:**
```sql ```sql
-- Monitor queries with high memory usage or failures -- Monitor queries with high memory usage or failures
SELECT SELECT
@ -498,6 +508,7 @@ WHERE issue_time > now() - INTERVAL '5 minutes'
``` ```
**Symptoms:** **Symptoms:**
- Queries failing with out-of-memory errors - Queries failing with out-of-memory errors
- High memory usage approaching pool limits - High memory usage approaching pool limits
- Slow query execution times - Slow query execution times
@ -507,6 +518,7 @@ WHERE issue_time > now() - INTERVAL '5 minutes'
#### Compaction falling behind (Compactor nodes) #### Compaction falling behind (Compactor nodes)
**Detection query:** **Detection query:**
```sql ```sql
-- Check compaction event frequency and success rate -- Check compaction event frequency and success rate
SELECT SELECT
@ -519,6 +531,7 @@ GROUP BY event_type;
``` ```
**Symptoms:** **Symptoms:**
- Decreasing compaction event frequency - Decreasing compaction event frequency
- Growing number of small Parquet files - Growing number of small Parquet files
- Increasing query times due to file fragmentation - Increasing query times due to file fragmentation
@ -530,6 +543,7 @@ GROUP BY event_type;
### Ingest node issues ### Ingest node issues
**Problem**: Low throughput despite available CPU **Problem**: Low throughput despite available CPU
```bash ```bash
# Check: Are only 2 cores busy? # Check: Are only 2 cores busy?
top -H -p $(pgrep influxdb3) top -H -p $(pgrep influxdb3)
@ -539,6 +553,7 @@ top -H -p $(pgrep influxdb3)
``` ```
**Problem**: Data snapshot creation affecting ingest **Problem**: Data snapshot creation affecting ingest
```bash ```bash
# Check: DataFusion threads at 100% during data snapshots to Parquet # Check: DataFusion threads at 100% during data snapshots to Parquet
# Solution: Reserve more DataFusion threads for snapshot operations # Solution: Reserve more DataFusion threads for snapshot operations
@ -548,6 +563,7 @@ top -H -p $(pgrep influxdb3)
### Query node issues ### Query node issues
**Problem**: Slow queries despite resources **Problem**: Slow queries despite resources
```bash ```bash
# Check: Memory pressure # Check: Memory pressure
free -h free -h
@ -557,6 +573,7 @@ free -h
``` ```
**Problem**: Poor cache hit rates **Problem**: Poor cache hit rates
```bash ```bash
# Solution: Increase Parquet cache # Solution: Increase Parquet cache
--parquet-mem-cache-size=10GB --parquet-mem-cache-size=10GB
@ -565,6 +582,7 @@ free -h
### Compactor node issues ### Compactor node issues
**Problem**: Compaction falling behind **Problem**: Compaction falling behind
```bash ```bash
# Check: Compaction queue length # Check: Compaction queue length
# Solution: Add more compactor nodes or increase threads # Solution: Add more compactor nodes or increase threads
@ -603,6 +621,7 @@ node3: --mode=compact --num-io-threads=2
## Manage configurations ## Manage configurations
<!--
### Use configuration files ### Use configuration files
Create node-specific configuration files: Create node-specific configuration files:
@ -629,6 +648,7 @@ Launch with configuration:
```bash ```bash
influxdb3 serve --config ingester.toml influxdb3 serve --config ingester.toml
``` ```
-->
### Configure using environment variables ### Configure using environment variables