Sync plugin documentation: basic_transformation, downsampler (#6784)
* sync: update plugin documentation from influxdb3_plugins@main Plugins: basic_transformation, downsampler * Apply suggestions from code review --------- Co-authored-by: jstirnaman <jstirnaman@users.noreply.github.com> Co-authored-by: Jason Stirnaman <jstirnaman@influxdata.com>copilot/fix-typo-day-to-doy
parent
7f28155055
commit
30637d8ae8
|
|
@ -1,35 +1,42 @@
|
|||
The Basic Transformation Plugin enables real-time and scheduled transformation of time series data in InfluxDB 3.
|
||||
Transform field and tag names, convert values between units, and apply custom string replacements to standardize or clean your data.
|
||||
The plugin supports both scheduled batch processing of historical data and real-time transformation as data is written.
|
||||
|
||||
The Basic Transformation Plugin enables real-time and scheduled transformation of time series data in {{% product-name %}}. Transform field and tag names, convert values between units, and apply custom string replacements to standardize or clean your data. The plugin supports both scheduled batch processing of historical data and real-time transformation as data is written.
|
||||
|
||||
## Configuration
|
||||
|
||||
Plugin parameters may be specified as key-value pairs in the `--trigger-arguments` flag (CLI) or in the `trigger_arguments` field (API) when creating a trigger. Some plugins support TOML configuration files, which can be specified using the plugin's `config_file_path` parameter.
|
||||
|
||||
If a plugin supports multiple trigger specifications, some parameters may depend on the trigger specification that you use.
|
||||
|
||||
### Plugin metadata
|
||||
|
||||
This plugin includes a JSON metadata schema in its docstring that defines supported trigger types and configuration parameters.
|
||||
|
||||
### Required parameters
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `measurement` | string | required | Source measurement containing data to transform |
|
||||
| `target_measurement` | string | required | Destination measurement for transformed data |
|
||||
| `target_database` | string | current database | Database for storing transformed data |
|
||||
| `dry_run` | string | `"false"` | When `"true"`, logs transformations without writing |
|
||||
| Parameter | Type | Default | Description |
|
||||
|----------------------|--------|------------------|---------------------------------------------------|
|
||||
| `measurement` | string | required | Source measurement containing data to transform |
|
||||
| `target_measurement` | string | required | Destination measurement for transformed data |
|
||||
| `target_database` | string | current database | Database for storing transformed data |
|
||||
| `dry_run` | string | "false" | When "true", logs transformations without writing |
|
||||
|
||||
### Transformation parameters
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `names_transformations` | string | none | Field/tag name transformation rules. Format: `'field1:"transform1 transform2".field2:"transform3"'` |
|
||||
| `values_transformations` | string | none | Field value transformation rules. Format: `'field1:"transform1".field2:"transform2"'` |
|
||||
| `custom_replacements` | string | none | Custom string replacements. Format: `'rule_name:"find=replace"'` |
|
||||
| `custom_regex` | string | none | Regex patterns for field matching. Format: `'pattern_name:"temp%"'` |
|
||||
| Parameter | Type | Default | Description |
|
||||
|--------------------------|--------|---------|-----------------------------------------------------------------------------------------------------|
|
||||
| `names_transformations` | string | none | Field/tag name transformation rules. Format: `'field1:"transform1 transform2".field2:"transform3"'` |
|
||||
| `values_transformations` | string | none | Field value transformation rules. Format: `'field1:"transform1".field2:"transform2"'` |
|
||||
| `custom_replacements` | string | none | Custom string replacements. Format: `'rule_name:"find=replace"'` |
|
||||
| `custom_regex` | string | none | Regex patterns for field matching. Format: `'pattern_name:"temp%"'` |
|
||||
|
||||
### Data selection parameters
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `window` | string | required (scheduled only) | Historical data window. Format: `<number><unit>` (for example, `"30d"`, `"1h"`) |
|
||||
| `included_fields` | string | all fields | Dot-separated list of fields to include (for example, `"temp.humidity"`) |
|
||||
| `excluded_fields` | string | none | Dot-separated list of fields to exclude |
|
||||
| `filters` | string | none | Query filters. Format: `'field:"operator value"'` |
|
||||
| Parameter | Type | Default | Description |
|
||||
|-------------------|--------|---------------------------|----------------------------------------------------------------------|
|
||||
| `window` | string | required (scheduled only) | Historical data window. Format: `<number><unit>` (for example, "30d", "1h") |
|
||||
| `included_fields` | string | all fields and tags | Dot-separated list of fields and tags to include (for example, "temp.humidity.location") |
|
||||
| `excluded_fields` | string | none | Dot-separated list of fields and tags to exclude |
|
||||
| `filters` | string | none | Query filters. Format: `'field:"operator value"'` |
|
||||
|
||||
### TOML configuration
|
||||
|
||||
|
|
@ -37,24 +44,24 @@ The plugin supports both scheduled batch processing of historical data and real-
|
|||
|--------------------|--------|---------|----------------------------------------------------------------------------------|
|
||||
| `config_file_path` | string | none | TOML config file path relative to `PLUGIN_DIR` (required for TOML configuration) |
|
||||
|
||||
*To use a TOML configuration file, set the `PLUGIN_DIR` environment variable and specify the `config_file_path` in the trigger arguments.* This is in addition to the `--plugin-dir` flag when starting InfluxDB 3.
|
||||
*To use a TOML configuration file, set the `PLUGIN_DIR` environment variable and specify the `config_file_path` in the trigger arguments.* This is in addition to the `--plugin-dir` flag when starting {{% product-name %}}.
|
||||
|
||||
#### Example TOML configurations
|
||||
|
||||
- [basic_transformation_config_scheduler.toml](https://github.com/influxdata/influxdb3_plugins/blob/master/influxdata/basic_transformation/basic_transformation_config_scheduler.toml) - for scheduled triggers
|
||||
- [basic_transformation_config_data_writes.toml](https://github.com/influxdata/influxdb3_plugins/blob/master/influxdata/basic_transformation/basic_transformation_config_data_writes.toml) - for data write triggers
|
||||
|
||||
```bash
|
||||
influxdb3 create trigger \
|
||||
--database mydb \
|
||||
--plugin-filename basic_transformation.py \
|
||||
--trigger-spec "every:1d" \
|
||||
--trigger-arguments config_file_path=basic_transformation_config_scheduler.toml \
|
||||
basic_transform_trigger
|
||||
```
|
||||
For more information on using TOML configuration files, see the Using TOML Configuration Files section in the [influxdb3_plugins/README.md](https://github.com/influxdata/influxdb3_plugins/blob/master/README.md).
|
||||
|
||||
For more information on using TOML configuration files, see the Using TOML Configuration Files section in the [influxdb3_plugins
|
||||
/README.md](https://github.com/influxdata/influxdb3_plugins/blob/master/README.md).
|
||||
## Data requirements
|
||||
|
||||
The plugin assumes that the table schema is already defined in the database, as it relies on this schema to retrieve field and tag names required for processing.
|
||||
|
||||
## Software Requirements
|
||||
|
||||
- **{{% product-name %}}**: with the Processing Engine enabled
|
||||
- **Python packages**:
|
||||
- `pint` (for unit conversions)
|
||||
|
||||
## Schema requirements
|
||||
|
||||
|
|
@ -67,7 +74,8 @@ The plugin assumes that the table schema is already defined in the database, as
|
|||
|
||||
## Installation steps
|
||||
|
||||
1. Start {{% product-name %}} with the Processing Engine enabled (`--plugin-dir /path/to/plugins`)
|
||||
1. Start {{% product-name %}} with the Processing Engine enabled (`--plugin-dir /path/to/plugins`):
|
||||
|
||||
```bash
|
||||
influxdb3 serve \
|
||||
--node-id node0 \
|
||||
|
|
@ -75,15 +83,11 @@ The plugin assumes that the table schema is already defined in the database, as
|
|||
--data-dir ~/.influxdb3 \
|
||||
--plugin-dir ~/.plugins
|
||||
```
|
||||
|
||||
2. Install required Python packages:
|
||||
|
||||
- `pint` (for unit conversions)
|
||||
|
||||
```bash
|
||||
influxdb3 install package pint
|
||||
```
|
||||
|
||||
## Trigger setup
|
||||
|
||||
### Scheduled transformation
|
||||
|
|
@ -98,7 +102,6 @@ influxdb3 create trigger \
|
|||
--trigger-arguments 'measurement=temperature,window=24h,target_measurement=temperature_normalized,names_transformations=temp:"snake",values_transformations=temp:"convert_degC_to_degF"' \
|
||||
hourly_temp_transform
|
||||
```
|
||||
|
||||
### Real-time transformation
|
||||
|
||||
Transform data as it's written:
|
||||
|
|
@ -108,10 +111,9 @@ influxdb3 create trigger \
|
|||
--database mydb \
|
||||
--plugin-filename gh:influxdata/basic_transformation/basic_transformation.py \
|
||||
--trigger-spec "all_tables" \
|
||||
--trigger-arguments 'measurement=sensor_data,target_measurement=sensor_data_clean,names_transformations=.*:"snake alnum_underscore_only"' \
|
||||
--trigger-arguments 'measurement=sensor_data,target_measurement=sensor_data_clean,names_transformations=.*:"snake remove_special_chars normalize_underscores"' \
|
||||
realtime_clean
|
||||
```
|
||||
|
||||
## Example usage
|
||||
|
||||
### Example 1: Temperature unit conversion
|
||||
|
|
@ -137,15 +139,14 @@ influxdb3 query \
|
|||
--database weather \
|
||||
"SELECT * FROM temps_fahrenheit"
|
||||
```
|
||||
|
||||
### Expected output
|
||||
```
|
||||
location | temperature | time
|
||||
---------|-------------|-----
|
||||
office | 72.5 | 2024-01-01T00:00:00Z
|
||||
```
|
||||
|
||||
location | temperature | time
|
||||
---------|-------------|-----
|
||||
office | 72.5 | 2024-01-01T00:00:00Z
|
||||
|
||||
**Transformation details:**
|
||||
|
||||
- Before: `Temperature=22.5` (Celsius)
|
||||
- After: `temperature=72.5` (Fahrenheit, field name converted to snake_case)
|
||||
|
||||
|
|
@ -159,7 +160,7 @@ influxdb3 create trigger \
|
|||
--database sensors \
|
||||
--plugin-filename gh:influxdata/basic_transformation/basic_transformation.py \
|
||||
--trigger-spec "all_tables" \
|
||||
--trigger-arguments 'measurement=raw_sensors,target_measurement=clean_sensors,names_transformations=.*:"snake alnum_underscore_only collapse_underscore trim_underscore"' \
|
||||
--trigger-arguments 'measurement=raw_sensors,target_measurement=clean_sensors,names_transformations=.*:"remove_special_chars snake collapse_underscore trim_underscore"' \
|
||||
field_cleaner
|
||||
|
||||
# Write data with inconsistent field names
|
||||
|
|
@ -172,15 +173,14 @@ influxdb3 query \
|
|||
--database sensors \
|
||||
"SELECT * FROM clean_sensors"
|
||||
```
|
||||
|
||||
### Expected output
|
||||
```
|
||||
device | room_temperature | humidity | time
|
||||
--------|------------------|----------|-----
|
||||
sensor1 | 20.1 | 45.2 | 2024-01-01T00:00:00Z
|
||||
```
|
||||
|
||||
device | room_temperature | humidity | time
|
||||
--------|------------------|----------|-----
|
||||
sensor1 | 20.1 | 45.2 | 2024-01-01T00:00:00Z
|
||||
|
||||
**Transformation details:**
|
||||
|
||||
- Before: `"Room Temperature"=20.1`, `"__Humidity_%"=45.2`
|
||||
- After: `room_temperature=20.1`, `humidity=45.2` (field names standardized)
|
||||
|
||||
|
|
@ -197,7 +197,47 @@ influxdb3 create trigger \
|
|||
--trigger-arguments 'measurement=products,window=7d,target_measurement=products_updated,values_transformations=status:"status_replace",custom_replacements=status_replace:"In Stock=available.Out of Stock=unavailable"' \
|
||||
status_updater
|
||||
```
|
||||
## Using TOML Configuration Files
|
||||
|
||||
This plugin supports using TOML configuration files to specify all plugin arguments. This is useful for complex configurations or when you want to version control your plugin settings.
|
||||
|
||||
### Important Requirements
|
||||
|
||||
**To use TOML configuration files, you must set the `PLUGIN_DIR` environment variable in the {{% product-name %}} host environment.** This is required in addition to the `--plugin-dir` flag when starting {{% product-name %}}:
|
||||
|
||||
- `--plugin-dir` tells {{% product-name %}} where to find plugin Python files
|
||||
- `PLUGIN_DIR` environment variable tells the plugins where to find TOML configuration files
|
||||
|
||||
### Setting Up TOML Configuration
|
||||
|
||||
1. **Start {{% product-name %}} with the PLUGIN_DIR environment variable set**:
|
||||
|
||||
```bash
|
||||
PLUGIN_DIR=~/.plugins influxdb3 serve \
|
||||
--node-id node0 \
|
||||
--object-store file \
|
||||
--data-dir ~/.influxdb3 \
|
||||
--plugin-dir ~/.plugins
|
||||
```
|
||||
2. **Copy the example TOML configuration file to your plugin directory**:
|
||||
|
||||
```bash
|
||||
cp basic_transformation_config_scheduler.toml ~/.plugins/
|
||||
# or for data writes:
|
||||
cp basic_transformation_config_data_writes.toml ~/.plugins/
|
||||
```
|
||||
3. **Edit the TOML file** to match your requirements. The TOML file contains all the arguments defined in the plugin's argument schema (see the JSON schema in the docstring at the top of basic_transformation.py).
|
||||
|
||||
4. **Create a trigger using the `config_file_path` argument**:
|
||||
|
||||
```bash
|
||||
influxdb3 create trigger \
|
||||
--database mydb \
|
||||
--plugin-filename basic_transformation.py \
|
||||
--trigger-spec "every:1d" \
|
||||
--trigger-arguments config_file_path=basic_transformation_config_scheduler.toml \
|
||||
basic_transform_trigger
|
||||
```
|
||||
## Code overview
|
||||
|
||||
### Files
|
||||
|
|
@ -213,8 +253,8 @@ Logs are stored in the `_internal` database (or the database where the trigger i
|
|||
```bash
|
||||
influxdb3 query --database _internal "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'your_trigger_name'"
|
||||
```
|
||||
|
||||
Log columns:
|
||||
|
||||
- **event_time**: Timestamp of the log event
|
||||
- **trigger_name**: Name of the trigger that generated the log
|
||||
- **log_level**: Severity level (INFO, WARN, ERROR)
|
||||
|
|
@ -223,32 +263,80 @@ Log columns:
|
|||
### Main functions
|
||||
|
||||
#### `process_scheduled_call(influxdb3_local, call_time, args)`
|
||||
Handles scheduled transformation tasks.
|
||||
Queries historical data within the specified window and applies transformations.
|
||||
|
||||
Handles scheduled transformation tasks. Queries historical data within the specified window and applies transformations.
|
||||
|
||||
Key operations:
|
||||
|
||||
1. Parses configuration from arguments
|
||||
2. Queries source measurement with filters
|
||||
3. Applies name and value transformations
|
||||
4. Writes transformed data to target measurement
|
||||
|
||||
#### `process_writes(influxdb3_local, table_batches, args)`
|
||||
Handles real-time transformation during data writes.
|
||||
Processes incoming data batches and applies transformations before writing.
|
||||
|
||||
Handles real-time transformation during data writes. Processes incoming data batches and applies transformations before writing.
|
||||
|
||||
Key operations:
|
||||
|
||||
1. Filters relevant table batches
|
||||
2. Applies transformations to each row
|
||||
3. Writes to target measurement immediately
|
||||
|
||||
#### `apply_transformations(value, transformations)`
|
||||
|
||||
Core transformation engine that applies a chain of transformations to a value.
|
||||
|
||||
Supported transformations:
|
||||
- String operations: `lower`, `upper`, `snake`
|
||||
- Space handling: `space_to_underscore`, `remove_space`
|
||||
- Character filtering: `alnum_underscore_only`
|
||||
- Underscore management: `collapse_underscore`, `trim_underscore`
|
||||
|
||||
**Case conversions:**
|
||||
- `lower` - Convert to lowercase
|
||||
- `upper` - Convert to uppercase
|
||||
- `snake` - Convert to snake_case
|
||||
- `camel` - Convert to camelCase
|
||||
- `pascal` - Convert to PascalCase
|
||||
- `kebab` - Convert to kebab-case
|
||||
- `title` - Convert to Title Case
|
||||
- `capitalize_first` - Capitalize first letter only
|
||||
- `capitalize_words` - Capitalize each word
|
||||
|
||||
**String cleaning and normalization:**
|
||||
- `space_to_underscore` - Replace spaces with underscores
|
||||
- `remove_space` - Remove all spaces
|
||||
- `alnum_underscore_only` - Keep only alphanumeric and underscore characters
|
||||
- `collapse_underscore` - Collapse multiple underscores into one
|
||||
- `trim_underscore` - Remove leading/trailing underscores
|
||||
- `normalize_whitespace` - Normalize whitespace to single spaces
|
||||
- `normalize_dashes` - Normalize dashes and underscores to dashes
|
||||
- `normalize_underscores` - Normalize dashes and spaces to underscores
|
||||
|
||||
**Character filtering:**
|
||||
- `remove_digits` - Remove all digits
|
||||
- `remove_punctuation` - Remove punctuation marks
|
||||
- `keep_alphanumeric` - Keep only letters and numbers
|
||||
- `remove_special_chars` - Remove special characters (keep letters, numbers, spaces, _, -)
|
||||
|
||||
**String extraction and filtering:**
|
||||
- `extract_numbers_only` - Extract only numeric characters
|
||||
- `extract_letters_only` - Extract only alphabetic characters
|
||||
|
||||
**Mathematical operations (for numeric values):**
|
||||
- `abs` - Absolute value
|
||||
- `round2` - Round to 2 decimal places
|
||||
- `sqrt` - Square root
|
||||
- `ln` - Natural logarithm
|
||||
- `floor` - Round down to nearest integer
|
||||
- `ceil` - Round up to nearest integer
|
||||
|
||||
**Value conversion and clamping:**
|
||||
- `to_percentage` - Multiply by 100 (convert to percentage)
|
||||
- `from_percentage` - Divide by 100 (convert from percentage)
|
||||
- `clamp_min_zero` - Limit minimum value to zero
|
||||
- `clamp_max_hundred` - Limit maximum value to 100
|
||||
- `boolean_to_int` - Convert boolean values to 1/0
|
||||
|
||||
**Other operations:**
|
||||
- `reverse` - Reverse the string
|
||||
- Unit conversions: `convert_<from>_to_<to>`
|
||||
- Custom replacements: User-defined string substitutions
|
||||
|
||||
|
|
@ -257,53 +345,57 @@ Supported transformations:
|
|||
### Common issues
|
||||
|
||||
#### Issue: Transformations not applying
|
||||
**Solution**: Check that field names match exactly (case-sensitive).
|
||||
Use regex patterns for flexible matching:
|
||||
|
||||
**Solution**: Check that field names match exactly (case-sensitive). Use regex patterns for flexible matching:
|
||||
|
||||
```bash
|
||||
--trigger-arguments 'custom_regex=temp_fields:"temp%",values_transformations=temp_fields:"convert_degC_to_degF"'
|
||||
```
|
||||
|
||||
#### Issue: "Permission denied" errors in logs
|
||||
|
||||
**Solution**: Ensure the plugin file has execute permissions:
|
||||
|
||||
```bash
|
||||
chmod +x ~/.plugins/basic_transformation.py
|
||||
```
|
||||
|
||||
#### Issue: Unit conversion failing
|
||||
**Solution**: Verify unit names are valid pint units.
|
||||
Common units:
|
||||
|
||||
**Solution**: Verify unit names are valid pint units. Common units:
|
||||
|
||||
- Temperature: `degC`, `degF`, `degK`
|
||||
- Length: `meter`, `foot`, `inch`
|
||||
- Time: `second`, `minute`, `hour`
|
||||
|
||||
#### Issue: No data in target measurement
|
||||
**Solution**:
|
||||
|
||||
**Solution**:
|
||||
|
||||
1. Check dry_run is not set to "true"
|
||||
2. Verify source measurement contains data
|
||||
3. Check logs for errors:
|
||||
|
||||
```bash
|
||||
influxdb3 query \
|
||||
--database _internal \
|
||||
"SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'your_trigger_name'"
|
||||
```
|
||||
|
||||
### Debugging tips
|
||||
|
||||
1. **Enable dry run** to test transformations:
|
||||
|
||||
```bash
|
||||
--trigger-arguments 'dry_run=true,...'
|
||||
```
|
||||
|
||||
2. **Use specific time windows** for testing:
|
||||
|
||||
```bash
|
||||
--trigger-arguments 'window=1h,...'
|
||||
```
|
||||
|
||||
3. **Check field names** in source data:
|
||||
|
||||
```bash
|
||||
influxdb3 query --database mydb "SHOW FIELD KEYS FROM measurement"
|
||||
```
|
||||
|
||||
### Performance considerations
|
||||
|
||||
- Field name caching reduces query overhead (1-hour cache)
|
||||
|
|
|
|||
|
|
@ -1,41 +1,47 @@
|
|||
The Downsampler Plugin enables time-based data aggregation and downsampling in InfluxDB 3.
|
||||
Reduce data volume by aggregating measurements over specified time intervals using functions like avg, sum, min, max, derivative, or median.
|
||||
The plugin supports both scheduled batch processing of historical data and on-demand downsampling through HTTP requests.
|
||||
Each downsampled record includes metadata about the original data points compressed.
|
||||
|
||||
The Downsampler Plugin enables time-based data aggregation and downsampling in {{% product-name %}}. Reduce data volume by aggregating measurements over specified time intervals using functions like avg, sum, min, max, median, count, stddev, first_value, last_value, var, or approx_median. The plugin supports both scheduled batch processing of historical data and on-demand downsampling through HTTP requests. Each downsampled record includes metadata about the original data points compressed.
|
||||
|
||||
## Configuration
|
||||
|
||||
Plugin parameters may be specified as key-value pairs in the `--trigger-arguments` flag (CLI) or in the `trigger_arguments` field (API) when creating a trigger. Some plugins support TOML configuration files, which can be specified using the plugin's `config_file_path` parameter.
|
||||
|
||||
If a plugin supports multiple trigger specifications, some parameters may depend on the trigger specification that you use.
|
||||
|
||||
### Plugin metadata
|
||||
|
||||
This plugin includes a JSON metadata schema in its docstring that defines supported trigger types and configuration parameters.
|
||||
|
||||
### Required parameters
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `source_measurement` | string | required | Source measurement containing data to downsample |
|
||||
| `target_measurement` | string | required | Destination measurement for downsampled data |
|
||||
| `window` | string | required (scheduled only) | Time window for each downsampling job. Format: `<number><unit>` (for example, `"1h"`, `"1d"`) |
|
||||
| Parameter | Type | Default | Description |
|
||||
|----------------------|--------|---------------------------|------------------------------------------------------------------------------------|
|
||||
| `source_measurement` | string | required | Source measurement containing data to downsample |
|
||||
| `target_measurement` | string | required | Destination measurement for downsampled data |
|
||||
| `window` | string | required (scheduled only) | Time window for each downsampling job. Format: `<number><unit>` (for example, "1h", "1d") |
|
||||
|
||||
### Aggregation parameters
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `interval` | string | `"10min"` | Time interval for downsampling. Format: `<number><unit>` (for example, `"10min"`, `"2h"`, `"1d"`) |
|
||||
| `calculations` | string | "avg" | Aggregation functions. Single function or dot-separated field:aggregation pairs |
|
||||
| `specific_fields` | string | all fields | Dot-separated list of fields to downsample (for example, `"co.temperature"`) |
|
||||
| `excluded_fields` | string | none | Dot-separated list of fields to exclude from downsampling |
|
||||
| Parameter | Type | Default | Description |
|
||||
|-------------------|--------|------------|--------------------------------------------------------------------------------------|
|
||||
| `interval` | string | "10min" | Time interval for downsampling. Format: `<number><unit>` (for example, "10min", "2h", "1d") |
|
||||
| `calculations` | string | "avg" | Aggregation functions. Single function or dot-separated field:aggregation pairs |
|
||||
| `specific_fields` | string | all fields | Dot-separated list of fields to downsample (for example, "co.temperature") |
|
||||
| `excluded_fields` | string | none | Dot-separated list of fields and tags to exclude from downsampling results |
|
||||
|
||||
### Filtering parameters
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `tag_values` | string | none | Tag filters. Format: `tag:value1@value2@value3` for multiple values |
|
||||
| `offset` | string | "0" | Time offset to apply to the window |
|
||||
| Parameter | Type | Default | Description |
|
||||
|--------------|--------|---------|---------------------------------------------------------------------|
|
||||
| `tag_values` | string | none | Tag filters. Format: `tag:value1@value2@value3` for multiple values |
|
||||
| `offset` | string | "0" | Time offset to apply to the window |
|
||||
|
||||
### Advanced parameters
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
|-----------|------|---------|-------------|
|
||||
| `target_database` | string | "default" | Database for storing downsampled data |
|
||||
| `max_retries` | integer | 5 | Maximum number of retries for write operations |
|
||||
| `batch_size` | string | "30d" | Time interval for batch processing (HTTP mode only) |
|
||||
| Parameter | Type | Default | Description |
|
||||
|-------------------|---------|-----------|-----------------------------------------------------|
|
||||
| `target_database` | string | "default" | Database for storing downsampled data |
|
||||
| `max_retries` | integer | 5 | Maximum number of retries for write operations |
|
||||
| `batch_size` | string | "30d" | Time interval for batch processing (HTTP mode only) |
|
||||
|
||||
### TOML configuration
|
||||
|
||||
|
|
@ -43,25 +49,26 @@ Each downsampled record includes metadata about the original data points compres
|
|||
|--------------------|--------|---------|----------------------------------------------------------------------------------|
|
||||
| `config_file_path` | string | none | TOML config file path relative to `PLUGIN_DIR` (required for TOML configuration) |
|
||||
|
||||
*To use a TOML configuration file, set the `PLUGIN_DIR` environment variable and specify the `config_file_path` in the trigger arguments.* This is in addition to the `--plugin-dir` flag when starting InfluxDB 3.
|
||||
*To use a TOML configuration file, set the `PLUGIN_DIR` environment variable and specify the `config_file_path` in the trigger arguments.* This is in addition to the `--plugin-dir` flag when starting {{% product-name %}}.
|
||||
|
||||
#### Example TOML configuration
|
||||
|
||||
[downsampling_config_scheduler.toml](https://github.com/influxdata/influxdb3-plugins/blob/main/content/shared/influxdb3-plugins/plugins-library/official/downsampling_config_scheduler.toml)
|
||||
[downsampling_config_scheduler.toml](https://github.com/influxdata/influxdb3_plugins/blob/master/influxdata/downsampler/downsampling_config_scheduler.toml)
|
||||
|
||||
For more information on using TOML configuration files, see the Using TOML Configuration Files section in the [influxdb3_plugins
|
||||
/README.md](/README.md).
|
||||
For more information on using TOML configuration files, see the Using TOML Configuration Files section in the [influxdb3_plugins/README.md](https://github.com/influxdata/influxdb3_plugins/blob/master/README.md).
|
||||
|
||||
## Schema management
|
||||
|
||||
Each downsampled record includes three additional metadata columns:
|
||||
- `record_count`—the number of original points compressed into this single downsampled row
|
||||
- `time_from`—the minimum timestamp among the original points in the interval
|
||||
- `time_to`—the maximum timestamp among the original points in the interval
|
||||
|
||||
- `record_count` — the number of original points compressed into this single downsampled row
|
||||
- `time_from` — the minimum timestamp among the original points in the interval
|
||||
- `time_to` — the maximum timestamp among the original points in the interval
|
||||
|
||||
## Installation steps
|
||||
|
||||
1. Start {{% product-name %}} with the Processing Engine enabled (`--plugin-dir /path/to/plugins`)
|
||||
1. Start {{% product-name %}} with the Processing Engine enabled (`--plugin-dir /path/to/plugins`):
|
||||
|
||||
```bash
|
||||
influxdb3 serve \
|
||||
--node-id node0 \
|
||||
|
|
@ -69,7 +76,6 @@ Each downsampled record includes three additional metadata columns:
|
|||
--data-dir ~/.influxdb3 \
|
||||
--plugin-dir ~/.plugins
|
||||
```
|
||||
|
||||
2. No additional Python packages required for this plugin.
|
||||
|
||||
## Trigger setup
|
||||
|
|
@ -86,7 +92,6 @@ influxdb3 create trigger \
|
|||
--trigger-arguments 'source_measurement=cpu_metrics,target_measurement=cpu_hourly,interval=1h,window=6h,calculations=avg,specific_fields=usage_user.usage_system' \
|
||||
cpu_hourly_downsample
|
||||
```
|
||||
|
||||
### On-demand downsampling
|
||||
|
||||
Trigger downsampling via HTTP requests:
|
||||
|
|
@ -98,7 +103,6 @@ influxdb3 create trigger \
|
|||
--trigger-spec "request:downsample" \
|
||||
downsample_api
|
||||
```
|
||||
|
||||
## Example usage
|
||||
|
||||
### Example 1: CPU metrics hourly aggregation
|
||||
|
|
@ -124,15 +128,14 @@ influxdb3 query \
|
|||
--database system_metrics \
|
||||
"SELECT * FROM cpu_hourly WHERE time >= now() - 1d"
|
||||
```
|
||||
|
||||
### Expected output
|
||||
```
|
||||
host | usage_user | usage_system | usage_idle | record_count | time_from | time_to | time
|
||||
--------|------------|--------------|------------|--------------|---------------------|---------------------|-----
|
||||
server1 | 44.8 | 11.9 | 43.3 | 60 | 2024-01-01T00:00:00Z| 2024-01-01T00:59:59Z| 2024-01-01T01:00:00Z
|
||||
```
|
||||
|
||||
host | usage_user | usage_system | usage_idle | record_count | time_from | time_to | time
|
||||
--------|------------|--------------|------------|--------------|---------------------|---------------------|-----
|
||||
server1 | 44.8 | 11.9 | 43.3 | 60 | 2024-01-01T00:00:00Z| 2024-01-01T00:59:59Z| 2024-01-01T01:00:00Z
|
||||
|
||||
**Aggregation details:**
|
||||
|
||||
- Before: 60 individual CPU measurements over 1 hour
|
||||
- After: 1 aggregated measurement with averages and metadata
|
||||
- Metadata shows original record count and time range
|
||||
|
|
@ -160,13 +163,11 @@ influxdb3 query \
|
|||
--database sensors \
|
||||
"SELECT * FROM environment_10min WHERE time >= now() - 1h"
|
||||
```
|
||||
|
||||
### Expected output
|
||||
```
|
||||
location | temperature | humidity | pressure | record_count | time
|
||||
---------|-------------|----------|----------|--------------|-----
|
||||
office | 22.3 | 44.8 | 1015.1 | 10 | 2024-01-01T00:10:00Z
|
||||
```
|
||||
|
||||
location | temperature | humidity | pressure | record_count | time
|
||||
---------|-------------|----------|----------|--------------|-----
|
||||
office | 22.3 | 44.8 | 1015.1 | 10 | 2024-01-01T00:10:00Z
|
||||
|
||||
### Example 3: HTTP API downsampling with backfill
|
||||
|
||||
|
|
@ -203,8 +204,8 @@ Logs are stored in the `_internal` database (or the database where the trigger i
|
|||
```bash
|
||||
influxdb3 query --database _internal "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'your_trigger_name'"
|
||||
```
|
||||
|
||||
Log columns:
|
||||
|
||||
- **event_time**: Timestamp of the log event (with nanosecond precision)
|
||||
- **trigger_name**: Name of the trigger that generated the log
|
||||
- **log_level**: Severity level (INFO, WARN, ERROR)
|
||||
|
|
@ -213,88 +214,144 @@ Log columns:
|
|||
### Main functions
|
||||
|
||||
#### `process_scheduled_call(influxdb3_local, call_time, args)`
|
||||
Handles scheduled downsampling tasks.
|
||||
Queries historical data within the specified window and applies aggregation functions.
|
||||
|
||||
Handles scheduled downsampling tasks. Queries historical data within the specified window and applies aggregation functions.
|
||||
|
||||
Key operations:
|
||||
|
||||
1. Parses configuration from arguments or TOML file
|
||||
2. Queries source measurement with optional tag filters
|
||||
3. Applies time-based aggregation with specified functions
|
||||
4. Writes downsampled data with metadata columns
|
||||
|
||||
#### `process_http_request(influxdb3_local, request_body, args)`
|
||||
Handles HTTP-triggered on-demand downsampling.
|
||||
Processes batch downsampling with configurable time ranges for backfill scenarios.
|
||||
|
||||
Handles HTTP-triggered on-demand downsampling. Processes batch downsampling with configurable time ranges for backfill scenarios.
|
||||
|
||||
Key operations:
|
||||
|
||||
1. Parses JSON request body parameters
|
||||
2. Processes data in configurable time batches
|
||||
3. Applies aggregation functions to historical data
|
||||
4. Returns processing statistics and results
|
||||
|
||||
#### `aggregate_data(data, interval, calculations)`
|
||||
|
||||
Core aggregation engine that applies statistical functions to time-series data.
|
||||
|
||||
Supported aggregation functions:
|
||||
|
||||
- `avg`: Average value
|
||||
- `sum`: Sum of values
|
||||
- `min`: Minimum value
|
||||
- `max`: Maximum value
|
||||
- `derivative`: Rate of change
|
||||
- `median`: Median value
|
||||
- `count`: Count of values
|
||||
- `stddev`: Standard deviation
|
||||
- `first_value`: First value in time interval
|
||||
- `last_value`: Last value in time interval
|
||||
- `var`: Variance of values
|
||||
- `approx_median`: Approximate median (faster than exact median)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common issues
|
||||
|
||||
#### Issue: No data in target measurement
|
||||
|
||||
**Solution**: Check that source measurement exists and contains data in the specified time window:
|
||||
|
||||
```bash
|
||||
influxdb3 query --database mydb "SELECT COUNT(*) FROM source_measurement WHERE time >= now() - 1h"
|
||||
```
|
||||
|
||||
#### Issue: Aggregation function not working
|
||||
|
||||
**Solution**: Verify field names and aggregation syntax. Use SHOW FIELD KEYS to check available fields:
|
||||
|
||||
```bash
|
||||
influxdb3 query --database mydb "SHOW FIELD KEYS FROM source_measurement"
|
||||
```
|
||||
|
||||
#### Issue: Tag filters not applied
|
||||
|
||||
**Solution**: Check tag value format. Use @ separator for multiple values:
|
||||
|
||||
```bash
|
||||
--trigger-arguments 'tag_values=host:server1@server2@server3'
|
||||
```
|
||||
|
||||
#### Issue: HTTP endpoint not accessible
|
||||
|
||||
**Solution**: Verify the trigger was created with correct request specification:
|
||||
|
||||
```bash
|
||||
influxdb3 list triggers --database mydb
|
||||
```
|
||||
|
||||
### Debugging tips
|
||||
|
||||
1. **Check execution logs** with task ID filtering:
|
||||
|
||||
```bash
|
||||
influxdb3 query --database _internal \
|
||||
"SELECT * FROM system.processing_engine_logs WHERE log_text LIKE '%task_id%' ORDER BY event_time DESC LIMIT 10"
|
||||
```
|
||||
|
||||
2. **Test with smaller time windows** for debugging:
|
||||
|
||||
```bash
|
||||
--trigger-arguments 'window=5min,interval=1min'
|
||||
```
|
||||
|
||||
3. **Verify field types** before aggregation:
|
||||
|
||||
```bash
|
||||
influxdb3 query --database mydb "SELECT * FROM source_measurement LIMIT 1"
|
||||
```
|
||||
|
||||
### Performance considerations
|
||||
|
||||
- **Batch processing**: Use appropriate batch_size for HTTP requests to balance memory usage and performance
|
||||
- **Field filtering**: Use specific_fields to process only necessary data
|
||||
- **Retry logic**: Configure max_retries based on network reliability
|
||||
- **Metadata overhead**: Metadata columns add ~20% storage overhead but provide valuable debugging information
|
||||
#### Consolidate calculations in fewer triggers
|
||||
|
||||
For best performance, define a single trigger per measurement that performs all necessary field calculations.
|
||||
Avoid creating multiple separate triggers that each handle only one field or calculation.
|
||||
|
||||
Internal testing showed significant performance differences based on trigger design:
|
||||
|
||||
- **Many triggers** (one calculation each): When 134 triggers were created, each handling a single calculation for a measurement, the cluster showed degraded performance with high CPU and memory usage.
|
||||
- **Consolidated triggers** (all calculations per measurement): When triggers were restructured so each one performed all necessary field calculations for a measurement, CPU usage dropped to approximately 4% and memory remained stable.
|
||||
|
||||
#### Recommended {.green}
|
||||
Combine all field calculations for a measurement in one trigger:
|
||||
|
||||
```bash
|
||||
influxdb3 create trigger \
|
||||
--database mydb \
|
||||
--plugin-filename gh:influxdata/downsampler/downsampler.py \
|
||||
--trigger-spec "every:1h" \
|
||||
--trigger-arguments 'source_measurement=temperature,target_measurement=temperature_hourly,interval=1h,window=6h,calculations=temp:avg.temp:max.temp:min,specific_fields=temp' \
|
||||
temperature_hourly_downsample
|
||||
```
|
||||
#### Not recommended {.orange}
|
||||
Multiple triggers for the same measurement creates unnecessary overhead:
|
||||
|
||||
```bash
|
||||
# Avoid creating multiple triggers for calculations on the same measurement
|
||||
influxdb3 create trigger ... --trigger-arguments 'calculations=temp:avg' avg_trigger
|
||||
influxdb3 create trigger ... --trigger-arguments 'calculations=temp:max' max_trigger
|
||||
influxdb3 create trigger ... --trigger-arguments 'calculations=temp:min' min_trigger
|
||||
```
|
||||
#### Use specific_fields to limit processing
|
||||
|
||||
If your measurement contains fields that you don't need to downsample, use the `specific_fields` parameter to specify only the relevant ones.
|
||||
Without this parameter, the downsampler processes all fields and applies the default aggregation (such as `avg`) to fields not listed in your calculations, which can lead to unnecessary processing and storage.
|
||||
|
||||
```bash
|
||||
# Only downsample the 'temp' field, ignore other fields in the measurement
|
||||
--trigger-arguments 'specific_fields=temp'
|
||||
|
||||
# Downsample multiple specific fields
|
||||
--trigger-arguments 'specific_fields=temp.humidity.pressure'
|
||||
```
|
||||
#### Additional performance tips
|
||||
|
||||
- **Batch processing**: Use appropriate `batch_size` for HTTP requests to balance memory usage and performance
|
||||
- **Retry logic**: Configure `max_retries` based on network reliability
|
||||
- **Metadata overhead**: Metadata columns add approximately 20% storage overhead but provide valuable debugging information
|
||||
- **Index optimization**: Tag filters are more efficient than field filters for large datasets
|
||||
|
||||
## Report an issue
|
||||
|
|
|
|||
|
|
@ -59,6 +59,7 @@
|
|||
"markdown-link": "^0.1.1",
|
||||
"mermaid": "^11.10.0",
|
||||
"p-limit": "^5.0.0",
|
||||
"playwright": "^1.58.1",
|
||||
"turndown": "^7.2.2",
|
||||
"vanillajs-datepicker": "^1.3.4"
|
||||
},
|
||||
|
|
@ -98,7 +99,11 @@
|
|||
"debug:inspect": "node scripts/puppeteer/inspect-page.js"
|
||||
},
|
||||
"type": "module",
|
||||
"browserslist": ["last 2 versions", "not dead", "not IE 11"],
|
||||
"browserslist": [
|
||||
"last 2 versions",
|
||||
"not dead",
|
||||
"not IE 11"
|
||||
],
|
||||
"engines": {
|
||||
"node": ">=16.0.0"
|
||||
},
|
||||
|
|
|
|||
Loading…
Reference in New Issue