docs-v2/content/shared/influxdb3-plugins/plugins-library/official/mad-check.md

345 lines
14 KiB
Markdown

The MAD-Based Anomaly Detection Plugin provides real-time anomaly detection for time series data in InfluxDB 3 using Median Absolute Deviation (MAD).
Detect outliers in your field values as data is written, with configurable thresholds for both count-based and duration-based alerts.
The plugin maintains in-memory deques for efficient computation and integrates with the Notification Sender Plugin to deliver alerts via multiple channels.
## Configuration
### Required parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `measurement` | string | required | Source measurement to monitor for anomalies |
| `mad_thresholds` | string | required | MAD threshold conditions. Format: `field:k:window_count:threshold` |
| `senders` | string | required | Dot-separated list of notification channels (for example, `"slack.discord"`) |
### MAD threshold parameters
| Component | Description | Example |
|-----------|-------------|---------|
| `field_name` | The numeric field to monitor | `temp` |
| `k` | MAD multiplier for anomaly threshold | `2.5` |
| `window_count` | Number of recent points for MAD computation | `20` |
| `threshold` | Count (integer) or duration (for example, `"2m"`, `"1h"`) | `5` or `2m` |
Multiple thresholds are separated by `@`: `temp:2.5:20:5@load:3:10:2m`
### Optional parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `influxdb3_auth_token` | string | env var | API token for InfluxDB 3 (or use INFLUXDB3_AUTH_TOKEN env var) |
| `state_change_count` | string | "0" | Maximum allowed value flips before suppressing notifications |
| `notification_count_text` | string | see below | Template for count-based alerts with variables: $table, $field, $threshold_count, $tags |
| `notification_time_text` | string | see below | Template for duration-based alerts with variables: $table, $field, $threshold_time, $tags |
| `notification_path` | string | "notify" | URL path for the notification sending plugin |
| `port_override` | string | "8181" | Port number where InfluxDB accepts requests |
| `config_file_path` | string | none | Path to TOML config file relative to PLUGIN_DIR |
Default notification templates:
- Count: `"MAD count alert: Field $field in $table outlier for $threshold_count consecutive points. Tags: $tags"`
- Time: `"MAD duration alert: Field $field in $table outlier for $threshold_time. Tags: $tags"`
### Notification channel parameters
#### Slack
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `slack_webhook_url` | string | Yes | Webhook URL from Slack |
| `slack_headers` | string | No | Base64-encoded HTTP headers |
#### Discord
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `discord_webhook_url` | string | Yes | Webhook URL from Discord |
| `discord_headers` | string | No | Base64-encoded HTTP headers |
#### HTTP
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `http_webhook_url` | string | Yes | Custom webhook URL for POST requests |
| `http_headers` | string | No | Base64-encoded HTTP headers |
#### SMS/WhatsApp (via Twilio)
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `twilio_sid` | string | Yes | Twilio Account SID (or use TWILIO_SID env var) |
| `twilio_token` | string | Yes | Twilio Auth Token (or use TWILIO_TOKEN env var) |
| `twilio_from_number` | string | Yes | Sender phone number |
| `twilio_to_number` | string | Yes | Recipient phone number |
## Schema requirements
The plugin assumes that the table schema is already defined in the database, as it relies on this schema to retrieve field and tag names required for processing.
> [!WARNING]
> #### Requires existing schema
>
> By design, the plugin returns an error if the schema doesn't exist or doesn't contain the expected columns.
### TOML configuration
| Parameter | Type | Default | Description |
|--------------------|--------|---------|----------------------------------------------------------------------------------|
| `config_file_path` | string | none | TOML config file path relative to `PLUGIN_DIR` (required for TOML configuration) |
*To use a TOML configuration file, set the `PLUGIN_DIR` environment variable and specify the `config_file_path` in the trigger arguments.* This is in addition to the `--plugin-dir` flag when starting InfluxDB 3.
#### Example TOML configuration
[mad_anomaly_config_data_writes.toml](https://github.com/influxdata/influxdb3_plugins/blob/master/influxdata/mad_check/mad_anomaly_config_data_writes.toml)
For more information on using TOML configuration files, see the Using TOML Configuration Files section in the [influxdb3_plugins
/README.md](https://github.com/influxdata/influxdb3_plugins/blob/master/README.md).
## Installation
1. Start {{% product-name %}} with the Processing Engine enabled (`--plugin-dir /path/to/plugins`)
2. Install required Python packages:
- `requests` (for notification delivery)
```bash
influxdb3 install package requests
```
3. Install and configure the official [Notifier plugin](https://github.com/influxdata/influxdb3_plugins/tree/main/influxdata/notifier)
## Trigger setup
### Real-time anomaly detection
Detect anomalies as data is written:
```bash
influxdb3 create trigger \
--database mydb \
--plugin-filename gh:influxdata/mad_check/mad_check_plugin.py \
--trigger-spec "all_tables" \
--trigger-arguments 'measurement=cpu,mad_thresholds="temp:2.5:20:5@load:3:10:2m",senders=slack,slack_webhook_url="https://hooks.slack.com/services/..."' \
mad_anomaly_detector
```
## Example usage
### Example 1: Basic count-based anomaly detection
Detect when temperature exceeds 2.5 MADs from the median for 5 consecutive points:
```bash
# Create trigger for count-based detection
influxdb3 create trigger \
--database sensors \
--plugin-filename gh:influxdata/mad_check/mad_check_plugin.py \
--trigger-spec "all_tables" \
--trigger-arguments 'measurement=environment,mad_thresholds="temperature:2.5:20:5",senders=slack,slack_webhook_url="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"' \
temp_anomaly_detector
# Write test data with an anomaly
influxdb3 write \
--database sensors \
"environment,room=office temperature=22.1"
influxdb3 write \
--database sensors \
"environment,room=office temperature=22.3"
influxdb3 write \
--database sensors \
"environment,room=office temperature=45.8" # Anomaly
# Continue writing anomalous values...
```
### Expected results
- Plugin maintains a 20-point window of recent temperature values
- Computes median and MAD from this window
- When temperature exceeds median ± 2.5*MAD for 5 consecutive points, sends Slack notification
- Notification includes: "MAD count alert: Field temperature in environment outlier for 5 consecutive points. Tags: room=office"
### Example 2: Duration-based anomaly detection with multiple fields
Monitor CPU load and memory usage with different thresholds:
```bash
# Create trigger with multiple thresholds
influxdb3 create trigger \
--database monitoring \
--plugin-filename gh:influxdata/mad_check/mad_check_plugin.py \
--trigger-spec "all_tables" \
--trigger-arguments 'measurement=system_metrics,mad_thresholds="cpu_load:3:30:2m@memory_used:2.5:30:5m",senders=slack.discord,slack_webhook_url="https://hooks.slack.com/...",discord_webhook_url="https://discord.com/api/webhooks/..."' \
system_anomaly_detector
```
### Expected results
- Monitors two fields independently:
- `cpu_load`: Alerts when exceeds 3 MADs for 2 minutes
- `memory_used`: Alerts when exceeds 2.5 MADs for 5 minutes
- Sends notifications to both Slack and Discord
### Example 3: Anomaly detection with flip suppression
Prevent alert fatigue from rapidly fluctuating values:
```bash
# Create trigger with flip suppression
influxdb3 create trigger \
--database iot \
--plugin-filename gh:influxdata/mad_check/mad_check_plugin.py \
--trigger-spec "all_tables" \
--trigger-arguments 'measurement=sensor_data,mad_thresholds="vibration:2:50:10",state_change_count=3,senders=http,http_webhook_url="https://api.example.com/alerts",notification_count_text="Vibration anomaly detected on $table. Field: $field, Tags: $tags"' \
vibration_monitor
```
### Expected results
- Detects vibration anomalies exceeding 2 MADs for 10 consecutive points
- If values flip between normal/anomalous more than 3 times in the 50-point window, suppresses notifications
- Sends custom formatted message to HTTP endpoint
## Using TOML Configuration Files
This plugin supports using TOML configuration files to specify all plugin arguments.
### Important Requirements
**To use TOML configuration files, you must set the `PLUGIN_DIR` environment variable in the InfluxDB 3 host environment.**
### Setting Up TOML Configuration
1. **Start InfluxDB 3 with the PLUGIN_DIR environment variable set**:
```bash
PLUGIN_DIR=~/.plugins influxdb3 serve --node-id node0 --object-store file --data-dir ~/.influxdb3 --plugin-dir ~/.plugins
```
2. **Copy the example TOML configuration file to your plugin directory**:
```bash
cp mad_anomaly_config_data_writes.toml ~/.plugins/
```
3. **Edit the TOML file** to match your requirements:
```toml
# Required parameters
measurement = "cpu"
mad_thresholds = "temp:2.5:20:5@load:3:10:2m"
senders = "slack"
# Notification settings
slack_webhook_url = "https://hooks.slack.com/services/..."
notification_count_text = "Custom alert: $field anomaly detected"
```
4. **Create a trigger using the `config_file_path` argument**:
```bash
influxdb3 create trigger \
--database mydb \
--plugin-filename mad_check_plugin.py \
--trigger-spec "all_tables" \
--trigger-arguments config_file_path=mad_anomaly_config_data_writes.toml \
mad_toml_trigger
```
## Code overview
### Files
- `mad_check_plugin.py`: The main plugin code containing the handler for data write triggers
- `mad_anomaly_config_data_writes.toml`: Example TOML configuration file
### Logging
Logs are stored in the `_internal` database in the `system.processing_engine_logs` table:
```bash
influxdb3 query --database _internal "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'your_trigger_name'"
```
Log columns:
- **event_time**: Timestamp of the log event
- **trigger_name**: Name of the trigger that generated the log
- **log_level**: Severity level (INFO, WARN, ERROR)
- **log_text**: Message describing the action or error
### Main functions
#### `process_writes(influxdb3_local, table_batches, args)`
Handles real-time anomaly detection on incoming data.
Key operations:
1. Filters table batches for the specified measurement
2. Maintains in-memory deques of recent values per field
3. Computes MAD for each monitored field
4. Tracks consecutive outliers and duration
5. Sends notifications when thresholds are met
### Key algorithms
#### MAD (Median Absolute Deviation) Calculation
```python
median = statistics.median(values)
mad = statistics.median([abs(x - median) for x in values])
threshold = k * mad
is_anomaly = abs(value - median) > threshold
```
#### Flip Detection
Counts transitions between normal and anomalous states within the window to prevent alert fatigue from rapidly changing values.
## Troubleshooting
### Common issues
#### Issue: No notifications being sent
**Solution**:
1. Verify the Notification Sender Plugin is installed and running
2. Check webhook URLs are correct:
```bash
influxdb3 query --database _internal "SELECT * FROM system.processing_engine_logs WHERE log_text LIKE '%notification%'"
```
3. Ensure notification channel parameters are provided for selected senders
#### Issue: "Invalid MAD thresholds format" error
**Solution**: Check threshold format is correct:
- Count-based: `field:k:window:count` (for example, `temp:2.5:20:5`)
- Duration-based: `field:k:window:duration` (for example, `temp:2.5:20:2m`)
- Multiple thresholds separated by `@`
#### Issue: Too many false positive alerts
**Solution**:
1. Increase the k multiplier (for example, from 2.5 to 3.0)
2. Increase the threshold count or duration
3. Enable flip suppression with `state_change_count`
4. Increase the window size for more stable statistics
#### Issue: Missing anomalies (false negatives)
**Solution**:
1. Decrease the k multiplier
2. Decrease the threshold count or duration
3. Check if data has seasonal patterns that affect the median
### Debugging tips
1. **Monitor deque sizes**:
```bash
influxdb3 query --database _internal "SELECT * FROM system.processing_engine_logs WHERE log_text LIKE '%Deque%'"
```
2. **Check MAD calculations**:
```bash
influxdb3 query --database _internal "SELECT * FROM system.processing_engine_logs WHERE log_text LIKE '%MAD:%'"
```
3. **Test with known anomalies**:
Write test data with obvious outliers to verify detection
### Performance considerations
- **Memory usage**: Each field maintains a deque of `window_count` values
- **Computation**: MAD is computed on every data write for monitored fields
- **Caching**: Measurement and tag names are cached for 1 hour
- **Notification retries**: Failed notifications retry up to 3 times with exponential backoff
## Report an issue
For plugin issues, see the Plugins repository [issues page](https://github.com/influxdata/influxdb3_plugins/issues).
## Find support for {{% product-name %}}
The [InfluxDB Discord server](https://discord.gg/9zaNCW2PRT) is the best place to find support for InfluxDB 3 Core and InfluxDB 3 Enterprise.
For other InfluxDB versions, see the [Support and feedback](#bug-reports-and-feedback) options.