Merge branch 'master' into jts-askai-group-filters
commit
2d3f0e44dc
|
|
@ -38,7 +38,6 @@ cluster efficiency.
|
||||||
- [Migrate to specialized nodes](#migrate-to-specialized-nodes)
|
- [Migrate to specialized nodes](#migrate-to-specialized-nodes)
|
||||||
- [Manage configurations](#manage-configurations)
|
- [Manage configurations](#manage-configurations)
|
||||||
|
|
||||||
|
|
||||||
## Specialize nodes for specific workloads
|
## Specialize nodes for specific workloads
|
||||||
|
|
||||||
In an {{% product-name %}} cluster, you can dedicate nodes to specific tasks:
|
In an {{% product-name %}} cluster, you can dedicate nodes to specific tasks:
|
||||||
|
|
@ -65,6 +64,7 @@ influxdb3 serve --mode=all
|
||||||
```
|
```
|
||||||
|
|
||||||
Available modes:
|
Available modes:
|
||||||
|
|
||||||
- `all`: All capabilities enabled (default)
|
- `all`: All capabilities enabled (default)
|
||||||
- `ingest`: Data ingestion and line protocol parsing
|
- `ingest`: Data ingestion and line protocol parsing
|
||||||
- `query`: Query execution and data retrieval
|
- `query`: Query execution and data retrieval
|
||||||
|
|
@ -103,6 +103,7 @@ influxdb3 \
|
||||||
```
|
```
|
||||||
|
|
||||||
**Configuration rationale:**
|
**Configuration rationale:**
|
||||||
|
|
||||||
- **12 IO threads**: Handle multiple concurrent writers (Telegraf agents, applications)
|
- **12 IO threads**: Handle multiple concurrent writers (Telegraf agents, applications)
|
||||||
- **20 DataFusion threads**: Required for data snapshot operations that convert buffered writes to Parquet files
|
- **20 DataFusion threads**: Required for data snapshot operations that convert buffered writes to Parquet files
|
||||||
- **60% memory pool**: Balance between write buffers and data snapshot operations
|
- **60% memory pool**: Balance between write buffers and data snapshot operations
|
||||||
|
|
@ -126,6 +127,7 @@ du -sh /path/to/data/wal/
|
||||||
```
|
```
|
||||||
|
|
||||||
> [!Important]
|
> [!Important]
|
||||||
|
>
|
||||||
> #### Scale IO threads with concurrent writers
|
> #### Scale IO threads with concurrent writers
|
||||||
>
|
>
|
||||||
> If you see only 2 CPU cores at 100% on a large ingester, increase
|
> If you see only 2 CPU cores at 100% on a large ingester, increase
|
||||||
|
|
@ -158,6 +160,7 @@ influxdb3 \
|
||||||
```
|
```
|
||||||
|
|
||||||
**Configuration rationale:**
|
**Configuration rationale:**
|
||||||
|
|
||||||
- **4 IO threads**: Minimal, just for HTTP request handling
|
- **4 IO threads**: Minimal, just for HTTP request handling
|
||||||
- **60 DataFusion threads**: Maximum parallelism for query execution
|
- **60 DataFusion threads**: Maximum parallelism for query execution
|
||||||
- **90% memory pool**: Maximize memory for complex aggregations
|
- **90% memory pool**: Maximize memory for complex aggregations
|
||||||
|
|
@ -211,6 +214,7 @@ influxdb3 \
|
||||||
```
|
```
|
||||||
|
|
||||||
**Configuration rationale:**
|
**Configuration rationale:**
|
||||||
|
|
||||||
- **2 IO threads**: Minimal, compaction is DataFusion-intensive
|
- **2 IO threads**: Minimal, compaction is DataFusion-intensive
|
||||||
- **30 DataFusion threads**: Maximum threads for sort/merge operations
|
- **30 DataFusion threads**: Maximum threads for sort/merge operations
|
||||||
- **24h gen2 duration**: Time-based compaction strategy
|
- **24h gen2 duration**: Time-based compaction strategy
|
||||||
|
|
@ -396,6 +400,7 @@ GROUP BY table_name;
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Query nodes
|
#### Query nodes
|
||||||
|
|
||||||
```sql
|
```sql
|
||||||
-- Monitor query performance
|
-- Monitor query performance
|
||||||
SELECT
|
SELECT
|
||||||
|
|
@ -408,6 +413,7 @@ WHERE issue_time > now() - INTERVAL '5 minutes'
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Compactor nodes
|
#### Compactor nodes
|
||||||
|
|
||||||
```sql
|
```sql
|
||||||
-- Monitor compaction progress
|
-- Monitor compaction progress
|
||||||
SELECT
|
SELECT
|
||||||
|
|
@ -447,6 +453,7 @@ curl -X POST "http://query-01:8181/api/v3/query_sql" \
|
||||||
```
|
```
|
||||||
|
|
||||||
> [!Tip]
|
> [!Tip]
|
||||||
|
>
|
||||||
> ### Extend monitoring with plugins
|
> ### Extend monitoring with plugins
|
||||||
>
|
>
|
||||||
> Enhance your cluster monitoring capabilities using the InfluxDB 3 processing engine. The [InfluxDB 3 plugins library](https://github.com/influxdata/influxdb3_plugins) includes several monitoring and alerting plugins:
|
> Enhance your cluster monitoring capabilities using the InfluxDB 3 processing engine. The [InfluxDB 3 plugins library](https://github.com/influxdata/influxdb3_plugins) includes several monitoring and alerting plugins:
|
||||||
|
|
@ -466,6 +473,7 @@ Use the [monitoring queries](#monitor-cluster-wide-metrics) to identify the foll
|
||||||
#### High CPU with low throughput (Ingest nodes)
|
#### High CPU with low throughput (Ingest nodes)
|
||||||
|
|
||||||
**Detection query:**
|
**Detection query:**
|
||||||
|
|
||||||
```sql
|
```sql
|
||||||
-- Check for high failed query rate indicating parsing issues
|
-- Check for high failed query rate indicating parsing issues
|
||||||
SELECT
|
SELECT
|
||||||
|
|
@ -477,6 +485,7 @@ WHERE issue_time > now() - INTERVAL '5 minutes';
|
||||||
```
|
```
|
||||||
|
|
||||||
**Symptoms:**
|
**Symptoms:**
|
||||||
|
|
||||||
- Only 2 CPU cores at 100% on large machines
|
- Only 2 CPU cores at 100% on large machines
|
||||||
- High write latency despite available resources
|
- High write latency despite available resources
|
||||||
- Failed queries due to parsing timeouts
|
- Failed queries due to parsing timeouts
|
||||||
|
|
@ -486,6 +495,7 @@ WHERE issue_time > now() - INTERVAL '5 minutes';
|
||||||
#### Memory pressure alerts (Query nodes)
|
#### Memory pressure alerts (Query nodes)
|
||||||
|
|
||||||
**Detection query:**
|
**Detection query:**
|
||||||
|
|
||||||
```sql
|
```sql
|
||||||
-- Monitor queries with high memory usage or failures
|
-- Monitor queries with high memory usage or failures
|
||||||
SELECT
|
SELECT
|
||||||
|
|
@ -498,6 +508,7 @@ WHERE issue_time > now() - INTERVAL '5 minutes'
|
||||||
```
|
```
|
||||||
|
|
||||||
**Symptoms:**
|
**Symptoms:**
|
||||||
|
|
||||||
- Queries failing with out-of-memory errors
|
- Queries failing with out-of-memory errors
|
||||||
- High memory usage approaching pool limits
|
- High memory usage approaching pool limits
|
||||||
- Slow query execution times
|
- Slow query execution times
|
||||||
|
|
@ -507,6 +518,7 @@ WHERE issue_time > now() - INTERVAL '5 minutes'
|
||||||
#### Compaction falling behind (Compactor nodes)
|
#### Compaction falling behind (Compactor nodes)
|
||||||
|
|
||||||
**Detection query:**
|
**Detection query:**
|
||||||
|
|
||||||
```sql
|
```sql
|
||||||
-- Check compaction event frequency and success rate
|
-- Check compaction event frequency and success rate
|
||||||
SELECT
|
SELECT
|
||||||
|
|
@ -519,6 +531,7 @@ GROUP BY event_type;
|
||||||
```
|
```
|
||||||
|
|
||||||
**Symptoms:**
|
**Symptoms:**
|
||||||
|
|
||||||
- Decreasing compaction event frequency
|
- Decreasing compaction event frequency
|
||||||
- Growing number of small Parquet files
|
- Growing number of small Parquet files
|
||||||
- Increasing query times due to file fragmentation
|
- Increasing query times due to file fragmentation
|
||||||
|
|
@ -530,6 +543,7 @@ GROUP BY event_type;
|
||||||
### Ingest node issues
|
### Ingest node issues
|
||||||
|
|
||||||
**Problem**: Low throughput despite available CPU
|
**Problem**: Low throughput despite available CPU
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Check: Are only 2 cores busy?
|
# Check: Are only 2 cores busy?
|
||||||
top -H -p $(pgrep influxdb3)
|
top -H -p $(pgrep influxdb3)
|
||||||
|
|
@ -539,6 +553,7 @@ top -H -p $(pgrep influxdb3)
|
||||||
```
|
```
|
||||||
|
|
||||||
**Problem**: Data snapshot creation affecting ingest
|
**Problem**: Data snapshot creation affecting ingest
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Check: DataFusion threads at 100% during data snapshots to Parquet
|
# Check: DataFusion threads at 100% during data snapshots to Parquet
|
||||||
# Solution: Reserve more DataFusion threads for snapshot operations
|
# Solution: Reserve more DataFusion threads for snapshot operations
|
||||||
|
|
@ -548,6 +563,7 @@ top -H -p $(pgrep influxdb3)
|
||||||
### Query node issues
|
### Query node issues
|
||||||
|
|
||||||
**Problem**: Slow queries despite resources
|
**Problem**: Slow queries despite resources
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Check: Memory pressure
|
# Check: Memory pressure
|
||||||
free -h
|
free -h
|
||||||
|
|
@ -557,6 +573,7 @@ free -h
|
||||||
```
|
```
|
||||||
|
|
||||||
**Problem**: Poor cache hit rates
|
**Problem**: Poor cache hit rates
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Solution: Increase Parquet cache
|
# Solution: Increase Parquet cache
|
||||||
--parquet-mem-cache-size=10GB
|
--parquet-mem-cache-size=10GB
|
||||||
|
|
@ -565,6 +582,7 @@ free -h
|
||||||
### Compactor node issues
|
### Compactor node issues
|
||||||
|
|
||||||
**Problem**: Compaction falling behind
|
**Problem**: Compaction falling behind
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Check: Compaction queue length
|
# Check: Compaction queue length
|
||||||
# Solution: Add more compactor nodes or increase threads
|
# Solution: Add more compactor nodes or increase threads
|
||||||
|
|
@ -603,6 +621,7 @@ node3: --mode=compact --num-io-threads=2
|
||||||
|
|
||||||
## Manage configurations
|
## Manage configurations
|
||||||
|
|
||||||
|
<!--
|
||||||
### Use configuration files
|
### Use configuration files
|
||||||
|
|
||||||
Create node-specific configuration files:
|
Create node-specific configuration files:
|
||||||
|
|
@ -629,6 +648,7 @@ Launch with configuration:
|
||||||
```bash
|
```bash
|
||||||
influxdb3 serve --config ingester.toml
|
influxdb3 serve --config ingester.toml
|
||||||
```
|
```
|
||||||
|
-->
|
||||||
|
|
||||||
### Configure using environment variables
|
### Configure using environment variables
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue