Sync plugin documentation: basic_transformation, downsampler (#6784)

* sync: update plugin documentation from influxdb3_plugins@main

Plugins: basic_transformation, downsampler

* Apply suggestions from code review

---------

Co-authored-by: jstirnaman <jstirnaman@users.noreply.github.com>
Co-authored-by: Jason Stirnaman <jstirnaman@influxdata.com>
copilot/fix-typo-day-to-doy
github-actions[bot] 2026-02-02 15:13:57 -06:00 committed by GitHub
parent 7f28155055
commit 30637d8ae8
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 1390 additions and 1234 deletions

View File

@ -1,35 +1,42 @@
The Basic Transformation Plugin enables real-time and scheduled transformation of time series data in InfluxDB 3.
Transform field and tag names, convert values between units, and apply custom string replacements to standardize or clean your data.
The plugin supports both scheduled batch processing of historical data and real-time transformation as data is written.
The Basic Transformation Plugin enables real-time and scheduled transformation of time series data in {{% product-name %}}. Transform field and tag names, convert values between units, and apply custom string replacements to standardize or clean your data. The plugin supports both scheduled batch processing of historical data and real-time transformation as data is written.
## Configuration
Plugin parameters may be specified as key-value pairs in the `--trigger-arguments` flag (CLI) or in the `trigger_arguments` field (API) when creating a trigger. Some plugins support TOML configuration files, which can be specified using the plugin's `config_file_path` parameter.
If a plugin supports multiple trigger specifications, some parameters may depend on the trigger specification that you use.
### Plugin metadata
This plugin includes a JSON metadata schema in its docstring that defines supported trigger types and configuration parameters.
### Required parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `measurement` | string | required | Source measurement containing data to transform |
| `target_measurement` | string | required | Destination measurement for transformed data |
| `target_database` | string | current database | Database for storing transformed data |
| `dry_run` | string | `"false"` | When `"true"`, logs transformations without writing |
| Parameter | Type | Default | Description |
|----------------------|--------|------------------|---------------------------------------------------|
| `measurement` | string | required | Source measurement containing data to transform |
| `target_measurement` | string | required | Destination measurement for transformed data |
| `target_database` | string | current database | Database for storing transformed data |
| `dry_run` | string | "false" | When "true", logs transformations without writing |
### Transformation parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `names_transformations` | string | none | Field/tag name transformation rules. Format: `'field1:"transform1 transform2".field2:"transform3"'` |
| `values_transformations` | string | none | Field value transformation rules. Format: `'field1:"transform1".field2:"transform2"'` |
| `custom_replacements` | string | none | Custom string replacements. Format: `'rule_name:"find=replace"'` |
| `custom_regex` | string | none | Regex patterns for field matching. Format: `'pattern_name:"temp%"'` |
| Parameter | Type | Default | Description |
|--------------------------|--------|---------|-----------------------------------------------------------------------------------------------------|
| `names_transformations` | string | none | Field/tag name transformation rules. Format: `'field1:"transform1 transform2".field2:"transform3"'` |
| `values_transformations` | string | none | Field value transformation rules. Format: `'field1:"transform1".field2:"transform2"'` |
| `custom_replacements` | string | none | Custom string replacements. Format: `'rule_name:"find=replace"'` |
| `custom_regex` | string | none | Regex patterns for field matching. Format: `'pattern_name:"temp%"'` |
### Data selection parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `window` | string | required (scheduled only) | Historical data window. Format: `<number><unit>` (for example, `"30d"`, `"1h"`) |
| `included_fields` | string | all fields | Dot-separated list of fields to include (for example, `"temp.humidity"`) |
| `excluded_fields` | string | none | Dot-separated list of fields to exclude |
| `filters` | string | none | Query filters. Format: `'field:"operator value"'` |
| Parameter | Type | Default | Description |
|-------------------|--------|---------------------------|----------------------------------------------------------------------|
| `window` | string | required (scheduled only) | Historical data window. Format: `<number><unit>` (for example, "30d", "1h") |
| `included_fields` | string | all fields and tags | Dot-separated list of fields and tags to include (for example, "temp.humidity.location") |
| `excluded_fields` | string | none | Dot-separated list of fields and tags to exclude |
| `filters` | string | none | Query filters. Format: `'field:"operator value"'` |
### TOML configuration
@ -37,24 +44,24 @@ The plugin supports both scheduled batch processing of historical data and real-
|--------------------|--------|---------|----------------------------------------------------------------------------------|
| `config_file_path` | string | none | TOML config file path relative to `PLUGIN_DIR` (required for TOML configuration) |
*To use a TOML configuration file, set the `PLUGIN_DIR` environment variable and specify the `config_file_path` in the trigger arguments.* This is in addition to the `--plugin-dir` flag when starting InfluxDB 3.
*To use a TOML configuration file, set the `PLUGIN_DIR` environment variable and specify the `config_file_path` in the trigger arguments.* This is in addition to the `--plugin-dir` flag when starting {{% product-name %}}.
#### Example TOML configurations
- [basic_transformation_config_scheduler.toml](https://github.com/influxdata/influxdb3_plugins/blob/master/influxdata/basic_transformation/basic_transformation_config_scheduler.toml) - for scheduled triggers
- [basic_transformation_config_data_writes.toml](https://github.com/influxdata/influxdb3_plugins/blob/master/influxdata/basic_transformation/basic_transformation_config_data_writes.toml) - for data write triggers
```bash
influxdb3 create trigger \
--database mydb \
--plugin-filename basic_transformation.py \
--trigger-spec "every:1d" \
--trigger-arguments config_file_path=basic_transformation_config_scheduler.toml \
basic_transform_trigger
```
For more information on using TOML configuration files, see the Using TOML Configuration Files section in the [influxdb3_plugins/README.md](https://github.com/influxdata/influxdb3_plugins/blob/master/README.md).
For more information on using TOML configuration files, see the Using TOML Configuration Files section in the [influxdb3_plugins
/README.md](https://github.com/influxdata/influxdb3_plugins/blob/master/README.md).
## Data requirements
The plugin assumes that the table schema is already defined in the database, as it relies on this schema to retrieve field and tag names required for processing.
## Software Requirements
- **{{% product-name %}}**: with the Processing Engine enabled
- **Python packages**:
- `pint` (for unit conversions)
## Schema requirements
@ -67,7 +74,8 @@ The plugin assumes that the table schema is already defined in the database, as
## Installation steps
1. Start {{% product-name %}} with the Processing Engine enabled (`--plugin-dir /path/to/plugins`)
1. Start {{% product-name %}} with the Processing Engine enabled (`--plugin-dir /path/to/plugins`):
```bash
influxdb3 serve \
--node-id node0 \
@ -75,15 +83,11 @@ The plugin assumes that the table schema is already defined in the database, as
--data-dir ~/.influxdb3 \
--plugin-dir ~/.plugins
```
2. Install required Python packages:
- `pint` (for unit conversions)
```bash
influxdb3 install package pint
```
## Trigger setup
### Scheduled transformation
@ -98,7 +102,6 @@ influxdb3 create trigger \
--trigger-arguments 'measurement=temperature,window=24h,target_measurement=temperature_normalized,names_transformations=temp:"snake",values_transformations=temp:"convert_degC_to_degF"' \
hourly_temp_transform
```
### Real-time transformation
Transform data as it's written:
@ -108,10 +111,9 @@ influxdb3 create trigger \
--database mydb \
--plugin-filename gh:influxdata/basic_transformation/basic_transformation.py \
--trigger-spec "all_tables" \
--trigger-arguments 'measurement=sensor_data,target_measurement=sensor_data_clean,names_transformations=.*:"snake alnum_underscore_only"' \
--trigger-arguments 'measurement=sensor_data,target_measurement=sensor_data_clean,names_transformations=.*:"snake remove_special_chars normalize_underscores"' \
realtime_clean
```
## Example usage
### Example 1: Temperature unit conversion
@ -137,15 +139,14 @@ influxdb3 query \
--database weather \
"SELECT * FROM temps_fahrenheit"
```
### Expected output
```
location | temperature | time
---------|-------------|-----
office | 72.5 | 2024-01-01T00:00:00Z
```
location | temperature | time
---------|-------------|-----
office | 72.5 | 2024-01-01T00:00:00Z
**Transformation details:**
- Before: `Temperature=22.5` (Celsius)
- After: `temperature=72.5` (Fahrenheit, field name converted to snake_case)
@ -159,7 +160,7 @@ influxdb3 create trigger \
--database sensors \
--plugin-filename gh:influxdata/basic_transformation/basic_transformation.py \
--trigger-spec "all_tables" \
--trigger-arguments 'measurement=raw_sensors,target_measurement=clean_sensors,names_transformations=.*:"snake alnum_underscore_only collapse_underscore trim_underscore"' \
--trigger-arguments 'measurement=raw_sensors,target_measurement=clean_sensors,names_transformations=.*:"remove_special_chars snake collapse_underscore trim_underscore"' \
field_cleaner
# Write data with inconsistent field names
@ -172,15 +173,14 @@ influxdb3 query \
--database sensors \
"SELECT * FROM clean_sensors"
```
### Expected output
```
device | room_temperature | humidity | time
--------|------------------|----------|-----
sensor1 | 20.1 | 45.2 | 2024-01-01T00:00:00Z
```
device | room_temperature | humidity | time
--------|------------------|----------|-----
sensor1 | 20.1 | 45.2 | 2024-01-01T00:00:00Z
**Transformation details:**
- Before: `"Room Temperature"=20.1`, `"__Humidity_%"=45.2`
- After: `room_temperature=20.1`, `humidity=45.2` (field names standardized)
@ -197,7 +197,47 @@ influxdb3 create trigger \
--trigger-arguments 'measurement=products,window=7d,target_measurement=products_updated,values_transformations=status:"status_replace",custom_replacements=status_replace:"In Stock=available.Out of Stock=unavailable"' \
status_updater
```
## Using TOML Configuration Files
This plugin supports using TOML configuration files to specify all plugin arguments. This is useful for complex configurations or when you want to version control your plugin settings.
### Important Requirements
**To use TOML configuration files, you must set the `PLUGIN_DIR` environment variable in the {{% product-name %}} host environment.** This is required in addition to the `--plugin-dir` flag when starting {{% product-name %}}:
- `--plugin-dir` tells {{% product-name %}} where to find plugin Python files
- `PLUGIN_DIR` environment variable tells the plugins where to find TOML configuration files
### Setting Up TOML Configuration
1. **Start {{% product-name %}} with the PLUGIN_DIR environment variable set**:
```bash
PLUGIN_DIR=~/.plugins influxdb3 serve \
--node-id node0 \
--object-store file \
--data-dir ~/.influxdb3 \
--plugin-dir ~/.plugins
```
2. **Copy the example TOML configuration file to your plugin directory**:
```bash
cp basic_transformation_config_scheduler.toml ~/.plugins/
# or for data writes:
cp basic_transformation_config_data_writes.toml ~/.plugins/
```
3. **Edit the TOML file** to match your requirements. The TOML file contains all the arguments defined in the plugin's argument schema (see the JSON schema in the docstring at the top of basic_transformation.py).
4. **Create a trigger using the `config_file_path` argument**:
```bash
influxdb3 create trigger \
--database mydb \
--plugin-filename basic_transformation.py \
--trigger-spec "every:1d" \
--trigger-arguments config_file_path=basic_transformation_config_scheduler.toml \
basic_transform_trigger
```
## Code overview
### Files
@ -213,8 +253,8 @@ Logs are stored in the `_internal` database (or the database where the trigger i
```bash
influxdb3 query --database _internal "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'your_trigger_name'"
```
Log columns:
- **event_time**: Timestamp of the log event
- **trigger_name**: Name of the trigger that generated the log
- **log_level**: Severity level (INFO, WARN, ERROR)
@ -223,32 +263,80 @@ Log columns:
### Main functions
#### `process_scheduled_call(influxdb3_local, call_time, args)`
Handles scheduled transformation tasks.
Queries historical data within the specified window and applies transformations.
Handles scheduled transformation tasks. Queries historical data within the specified window and applies transformations.
Key operations:
1. Parses configuration from arguments
2. Queries source measurement with filters
3. Applies name and value transformations
4. Writes transformed data to target measurement
#### `process_writes(influxdb3_local, table_batches, args)`
Handles real-time transformation during data writes.
Processes incoming data batches and applies transformations before writing.
Handles real-time transformation during data writes. Processes incoming data batches and applies transformations before writing.
Key operations:
1. Filters relevant table batches
2. Applies transformations to each row
3. Writes to target measurement immediately
#### `apply_transformations(value, transformations)`
Core transformation engine that applies a chain of transformations to a value.
Supported transformations:
- String operations: `lower`, `upper`, `snake`
- Space handling: `space_to_underscore`, `remove_space`
- Character filtering: `alnum_underscore_only`
- Underscore management: `collapse_underscore`, `trim_underscore`
**Case conversions:**
- `lower` - Convert to lowercase
- `upper` - Convert to uppercase
- `snake` - Convert to snake_case
- `camel` - Convert to camelCase
- `pascal` - Convert to PascalCase
- `kebab` - Convert to kebab-case
- `title` - Convert to Title Case
- `capitalize_first` - Capitalize first letter only
- `capitalize_words` - Capitalize each word
**String cleaning and normalization:**
- `space_to_underscore` - Replace spaces with underscores
- `remove_space` - Remove all spaces
- `alnum_underscore_only` - Keep only alphanumeric and underscore characters
- `collapse_underscore` - Collapse multiple underscores into one
- `trim_underscore` - Remove leading/trailing underscores
- `normalize_whitespace` - Normalize whitespace to single spaces
- `normalize_dashes` - Normalize dashes and underscores to dashes
- `normalize_underscores` - Normalize dashes and spaces to underscores
**Character filtering:**
- `remove_digits` - Remove all digits
- `remove_punctuation` - Remove punctuation marks
- `keep_alphanumeric` - Keep only letters and numbers
- `remove_special_chars` - Remove special characters (keep letters, numbers, spaces, _, -)
**String extraction and filtering:**
- `extract_numbers_only` - Extract only numeric characters
- `extract_letters_only` - Extract only alphabetic characters
**Mathematical operations (for numeric values):**
- `abs` - Absolute value
- `round2` - Round to 2 decimal places
- `sqrt` - Square root
- `ln` - Natural logarithm
- `floor` - Round down to nearest integer
- `ceil` - Round up to nearest integer
**Value conversion and clamping:**
- `to_percentage` - Multiply by 100 (convert to percentage)
- `from_percentage` - Divide by 100 (convert from percentage)
- `clamp_min_zero` - Limit minimum value to zero
- `clamp_max_hundred` - Limit maximum value to 100
- `boolean_to_int` - Convert boolean values to 1/0
**Other operations:**
- `reverse` - Reverse the string
- Unit conversions: `convert_<from>_to_<to>`
- Custom replacements: User-defined string substitutions
@ -257,53 +345,57 @@ Supported transformations:
### Common issues
#### Issue: Transformations not applying
**Solution**: Check that field names match exactly (case-sensitive).
Use regex patterns for flexible matching:
**Solution**: Check that field names match exactly (case-sensitive). Use regex patterns for flexible matching:
```bash
--trigger-arguments 'custom_regex=temp_fields:"temp%",values_transformations=temp_fields:"convert_degC_to_degF"'
```
#### Issue: "Permission denied" errors in logs
**Solution**: Ensure the plugin file has execute permissions:
```bash
chmod +x ~/.plugins/basic_transformation.py
```
#### Issue: Unit conversion failing
**Solution**: Verify unit names are valid pint units.
Common units:
**Solution**: Verify unit names are valid pint units. Common units:
- Temperature: `degC`, `degF`, `degK`
- Length: `meter`, `foot`, `inch`
- Time: `second`, `minute`, `hour`
#### Issue: No data in target measurement
**Solution**:
**Solution**:
1. Check dry_run is not set to "true"
2. Verify source measurement contains data
3. Check logs for errors:
```bash
influxdb3 query \
--database _internal \
"SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'your_trigger_name'"
```
### Debugging tips
1. **Enable dry run** to test transformations:
```bash
--trigger-arguments 'dry_run=true,...'
```
2. **Use specific time windows** for testing:
```bash
--trigger-arguments 'window=1h,...'
```
3. **Check field names** in source data:
```bash
influxdb3 query --database mydb "SHOW FIELD KEYS FROM measurement"
```
### Performance considerations
- Field name caching reduces query overhead (1-hour cache)

View File

@ -1,41 +1,47 @@
The Downsampler Plugin enables time-based data aggregation and downsampling in InfluxDB 3.
Reduce data volume by aggregating measurements over specified time intervals using functions like avg, sum, min, max, derivative, or median.
The plugin supports both scheduled batch processing of historical data and on-demand downsampling through HTTP requests.
Each downsampled record includes metadata about the original data points compressed.
The Downsampler Plugin enables time-based data aggregation and downsampling in {{% product-name %}}. Reduce data volume by aggregating measurements over specified time intervals using functions like avg, sum, min, max, median, count, stddev, first_value, last_value, var, or approx_median. The plugin supports both scheduled batch processing of historical data and on-demand downsampling through HTTP requests. Each downsampled record includes metadata about the original data points compressed.
## Configuration
Plugin parameters may be specified as key-value pairs in the `--trigger-arguments` flag (CLI) or in the `trigger_arguments` field (API) when creating a trigger. Some plugins support TOML configuration files, which can be specified using the plugin's `config_file_path` parameter.
If a plugin supports multiple trigger specifications, some parameters may depend on the trigger specification that you use.
### Plugin metadata
This plugin includes a JSON metadata schema in its docstring that defines supported trigger types and configuration parameters.
### Required parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `source_measurement` | string | required | Source measurement containing data to downsample |
| `target_measurement` | string | required | Destination measurement for downsampled data |
| `window` | string | required (scheduled only) | Time window for each downsampling job. Format: `<number><unit>` (for example, `"1h"`, `"1d"`) |
| Parameter | Type | Default | Description |
|----------------------|--------|---------------------------|------------------------------------------------------------------------------------|
| `source_measurement` | string | required | Source measurement containing data to downsample |
| `target_measurement` | string | required | Destination measurement for downsampled data |
| `window` | string | required (scheduled only) | Time window for each downsampling job. Format: `<number><unit>` (for example, "1h", "1d") |
### Aggregation parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `interval` | string | `"10min"` | Time interval for downsampling. Format: `<number><unit>` (for example, `"10min"`, `"2h"`, `"1d"`) |
| `calculations` | string | "avg" | Aggregation functions. Single function or dot-separated field:aggregation pairs |
| `specific_fields` | string | all fields | Dot-separated list of fields to downsample (for example, `"co.temperature"`) |
| `excluded_fields` | string | none | Dot-separated list of fields to exclude from downsampling |
| Parameter | Type | Default | Description |
|-------------------|--------|------------|--------------------------------------------------------------------------------------|
| `interval` | string | "10min" | Time interval for downsampling. Format: `<number><unit>` (for example, "10min", "2h", "1d") |
| `calculations` | string | "avg" | Aggregation functions. Single function or dot-separated field:aggregation pairs |
| `specific_fields` | string | all fields | Dot-separated list of fields to downsample (for example, "co.temperature") |
| `excluded_fields` | string | none | Dot-separated list of fields and tags to exclude from downsampling results |
### Filtering parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `tag_values` | string | none | Tag filters. Format: `tag:value1@value2@value3` for multiple values |
| `offset` | string | "0" | Time offset to apply to the window |
| Parameter | Type | Default | Description |
|--------------|--------|---------|---------------------------------------------------------------------|
| `tag_values` | string | none | Tag filters. Format: `tag:value1@value2@value3` for multiple values |
| `offset` | string | "0" | Time offset to apply to the window |
### Advanced parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `target_database` | string | "default" | Database for storing downsampled data |
| `max_retries` | integer | 5 | Maximum number of retries for write operations |
| `batch_size` | string | "30d" | Time interval for batch processing (HTTP mode only) |
| Parameter | Type | Default | Description |
|-------------------|---------|-----------|-----------------------------------------------------|
| `target_database` | string | "default" | Database for storing downsampled data |
| `max_retries` | integer | 5 | Maximum number of retries for write operations |
| `batch_size` | string | "30d" | Time interval for batch processing (HTTP mode only) |
### TOML configuration
@ -43,25 +49,26 @@ Each downsampled record includes metadata about the original data points compres
|--------------------|--------|---------|----------------------------------------------------------------------------------|
| `config_file_path` | string | none | TOML config file path relative to `PLUGIN_DIR` (required for TOML configuration) |
*To use a TOML configuration file, set the `PLUGIN_DIR` environment variable and specify the `config_file_path` in the trigger arguments.* This is in addition to the `--plugin-dir` flag when starting InfluxDB 3.
*To use a TOML configuration file, set the `PLUGIN_DIR` environment variable and specify the `config_file_path` in the trigger arguments.* This is in addition to the `--plugin-dir` flag when starting {{% product-name %}}.
#### Example TOML configuration
[downsampling_config_scheduler.toml](https://github.com/influxdata/influxdb3-plugins/blob/main/content/shared/influxdb3-plugins/plugins-library/official/downsampling_config_scheduler.toml)
[downsampling_config_scheduler.toml](https://github.com/influxdata/influxdb3_plugins/blob/master/influxdata/downsampler/downsampling_config_scheduler.toml)
For more information on using TOML configuration files, see the Using TOML Configuration Files section in the [influxdb3_plugins
/README.md](/README.md).
For more information on using TOML configuration files, see the Using TOML Configuration Files section in the [influxdb3_plugins/README.md](https://github.com/influxdata/influxdb3_plugins/blob/master/README.md).
## Schema management
Each downsampled record includes three additional metadata columns:
- `record_count`—the number of original points compressed into this single downsampled row
- `time_from`—the minimum timestamp among the original points in the interval
- `time_to`—the maximum timestamp among the original points in the interval
- `record_count` — the number of original points compressed into this single downsampled row
- `time_from` — the minimum timestamp among the original points in the interval
- `time_to` — the maximum timestamp among the original points in the interval
## Installation steps
1. Start {{% product-name %}} with the Processing Engine enabled (`--plugin-dir /path/to/plugins`)
1. Start {{% product-name %}} with the Processing Engine enabled (`--plugin-dir /path/to/plugins`):
```bash
influxdb3 serve \
--node-id node0 \
@ -69,7 +76,6 @@ Each downsampled record includes three additional metadata columns:
--data-dir ~/.influxdb3 \
--plugin-dir ~/.plugins
```
2. No additional Python packages required for this plugin.
## Trigger setup
@ -86,7 +92,6 @@ influxdb3 create trigger \
--trigger-arguments 'source_measurement=cpu_metrics,target_measurement=cpu_hourly,interval=1h,window=6h,calculations=avg,specific_fields=usage_user.usage_system' \
cpu_hourly_downsample
```
### On-demand downsampling
Trigger downsampling via HTTP requests:
@ -98,7 +103,6 @@ influxdb3 create trigger \
--trigger-spec "request:downsample" \
downsample_api
```
## Example usage
### Example 1: CPU metrics hourly aggregation
@ -124,15 +128,14 @@ influxdb3 query \
--database system_metrics \
"SELECT * FROM cpu_hourly WHERE time >= now() - 1d"
```
### Expected output
```
host | usage_user | usage_system | usage_idle | record_count | time_from | time_to | time
--------|------------|--------------|------------|--------------|---------------------|---------------------|-----
server1 | 44.8 | 11.9 | 43.3 | 60 | 2024-01-01T00:00:00Z| 2024-01-01T00:59:59Z| 2024-01-01T01:00:00Z
```
host | usage_user | usage_system | usage_idle | record_count | time_from | time_to | time
--------|------------|--------------|------------|--------------|---------------------|---------------------|-----
server1 | 44.8 | 11.9 | 43.3 | 60 | 2024-01-01T00:00:00Z| 2024-01-01T00:59:59Z| 2024-01-01T01:00:00Z
**Aggregation details:**
- Before: 60 individual CPU measurements over 1 hour
- After: 1 aggregated measurement with averages and metadata
- Metadata shows original record count and time range
@ -160,13 +163,11 @@ influxdb3 query \
--database sensors \
"SELECT * FROM environment_10min WHERE time >= now() - 1h"
```
### Expected output
```
location | temperature | humidity | pressure | record_count | time
---------|-------------|----------|----------|--------------|-----
office | 22.3 | 44.8 | 1015.1 | 10 | 2024-01-01T00:10:00Z
```
location | temperature | humidity | pressure | record_count | time
---------|-------------|----------|----------|--------------|-----
office | 22.3 | 44.8 | 1015.1 | 10 | 2024-01-01T00:10:00Z
### Example 3: HTTP API downsampling with backfill
@ -203,8 +204,8 @@ Logs are stored in the `_internal` database (or the database where the trigger i
```bash
influxdb3 query --database _internal "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'your_trigger_name'"
```
Log columns:
- **event_time**: Timestamp of the log event (with nanosecond precision)
- **trigger_name**: Name of the trigger that generated the log
- **log_level**: Severity level (INFO, WARN, ERROR)
@ -213,88 +214,144 @@ Log columns:
### Main functions
#### `process_scheduled_call(influxdb3_local, call_time, args)`
Handles scheduled downsampling tasks.
Queries historical data within the specified window and applies aggregation functions.
Handles scheduled downsampling tasks. Queries historical data within the specified window and applies aggregation functions.
Key operations:
1. Parses configuration from arguments or TOML file
2. Queries source measurement with optional tag filters
3. Applies time-based aggregation with specified functions
4. Writes downsampled data with metadata columns
#### `process_http_request(influxdb3_local, request_body, args)`
Handles HTTP-triggered on-demand downsampling.
Processes batch downsampling with configurable time ranges for backfill scenarios.
Handles HTTP-triggered on-demand downsampling. Processes batch downsampling with configurable time ranges for backfill scenarios.
Key operations:
1. Parses JSON request body parameters
2. Processes data in configurable time batches
3. Applies aggregation functions to historical data
4. Returns processing statistics and results
#### `aggregate_data(data, interval, calculations)`
Core aggregation engine that applies statistical functions to time-series data.
Supported aggregation functions:
- `avg`: Average value
- `sum`: Sum of values
- `min`: Minimum value
- `max`: Maximum value
- `derivative`: Rate of change
- `median`: Median value
- `count`: Count of values
- `stddev`: Standard deviation
- `first_value`: First value in time interval
- `last_value`: Last value in time interval
- `var`: Variance of values
- `approx_median`: Approximate median (faster than exact median)
## Troubleshooting
### Common issues
#### Issue: No data in target measurement
**Solution**: Check that source measurement exists and contains data in the specified time window:
```bash
influxdb3 query --database mydb "SELECT COUNT(*) FROM source_measurement WHERE time >= now() - 1h"
```
#### Issue: Aggregation function not working
**Solution**: Verify field names and aggregation syntax. Use SHOW FIELD KEYS to check available fields:
```bash
influxdb3 query --database mydb "SHOW FIELD KEYS FROM source_measurement"
```
#### Issue: Tag filters not applied
**Solution**: Check tag value format. Use @ separator for multiple values:
```bash
--trigger-arguments 'tag_values=host:server1@server2@server3'
```
#### Issue: HTTP endpoint not accessible
**Solution**: Verify the trigger was created with correct request specification:
```bash
influxdb3 list triggers --database mydb
```
### Debugging tips
1. **Check execution logs** with task ID filtering:
```bash
influxdb3 query --database _internal \
"SELECT * FROM system.processing_engine_logs WHERE log_text LIKE '%task_id%' ORDER BY event_time DESC LIMIT 10"
```
2. **Test with smaller time windows** for debugging:
```bash
--trigger-arguments 'window=5min,interval=1min'
```
3. **Verify field types** before aggregation:
```bash
influxdb3 query --database mydb "SELECT * FROM source_measurement LIMIT 1"
```
### Performance considerations
- **Batch processing**: Use appropriate batch_size for HTTP requests to balance memory usage and performance
- **Field filtering**: Use specific_fields to process only necessary data
- **Retry logic**: Configure max_retries based on network reliability
- **Metadata overhead**: Metadata columns add ~20% storage overhead but provide valuable debugging information
#### Consolidate calculations in fewer triggers
For best performance, define a single trigger per measurement that performs all necessary field calculations.
Avoid creating multiple separate triggers that each handle only one field or calculation.
Internal testing showed significant performance differences based on trigger design:
- **Many triggers** (one calculation each): When 134 triggers were created, each handling a single calculation for a measurement, the cluster showed degraded performance with high CPU and memory usage.
- **Consolidated triggers** (all calculations per measurement): When triggers were restructured so each one performed all necessary field calculations for a measurement, CPU usage dropped to approximately 4% and memory remained stable.
#### Recommended {.green}
Combine all field calculations for a measurement in one trigger:
```bash
influxdb3 create trigger \
--database mydb \
--plugin-filename gh:influxdata/downsampler/downsampler.py \
--trigger-spec "every:1h" \
--trigger-arguments 'source_measurement=temperature,target_measurement=temperature_hourly,interval=1h,window=6h,calculations=temp:avg.temp:max.temp:min,specific_fields=temp' \
temperature_hourly_downsample
```
#### Not recommended {.orange}
Multiple triggers for the same measurement creates unnecessary overhead:
```bash
# Avoid creating multiple triggers for calculations on the same measurement
influxdb3 create trigger ... --trigger-arguments 'calculations=temp:avg' avg_trigger
influxdb3 create trigger ... --trigger-arguments 'calculations=temp:max' max_trigger
influxdb3 create trigger ... --trigger-arguments 'calculations=temp:min' min_trigger
```
#### Use specific_fields to limit processing
If your measurement contains fields that you don't need to downsample, use the `specific_fields` parameter to specify only the relevant ones.
Without this parameter, the downsampler processes all fields and applies the default aggregation (such as `avg`) to fields not listed in your calculations, which can lead to unnecessary processing and storage.
```bash
# Only downsample the 'temp' field, ignore other fields in the measurement
--trigger-arguments 'specific_fields=temp'
# Downsample multiple specific fields
--trigger-arguments 'specific_fields=temp.humidity.pressure'
```
#### Additional performance tips
- **Batch processing**: Use appropriate `batch_size` for HTTP requests to balance memory usage and performance
- **Retry logic**: Configure `max_retries` based on network reliability
- **Metadata overhead**: Metadata columns add approximately 20% storage overhead but provide valuable debugging information
- **Index optimization**: Tag filters are more efficient than field filters for large datasets
## Report an issue

View File

@ -59,6 +59,7 @@
"markdown-link": "^0.1.1",
"mermaid": "^11.10.0",
"p-limit": "^5.0.0",
"playwright": "^1.58.1",
"turndown": "^7.2.2",
"vanillajs-datepicker": "^1.3.4"
},
@ -98,7 +99,11 @@
"debug:inspect": "node scripts/puppeteer/inspect-page.js"
},
"type": "module",
"browserslist": ["last 2 versions", "not dead", "not IE 11"],
"browserslist": [
"last 2 versions",
"not dead",
"not IE 11"
],
"engines": {
"node": ">=16.0.0"
},

2194
yarn.lock

File diff suppressed because it is too large Load Diff