The System Metrics Plugin provides comprehensive system monitoring capabilities for InfluxDB 3, collecting CPU, memory, disk, and network metrics from the host system. Monitor detailed performance insights including per-core CPU statistics, memory usage breakdowns, disk I/O performance, and network interface statistics. Features configurable metric collection with robust error handling and retry logic for reliable monitoring. ## Configuration ### Required parameters No required parameters - all system metrics are collected by default with sensible defaults. ### System monitoring parameters | Parameter | Type | Default | Description | |-------------------|---------|-------------|--------------------------------------------------------------------------------| | `hostname` | string | `localhost` | Hostname to tag all metrics with for system identification | | `include_cpu` | boolean | `true` | Include comprehensive CPU metrics collection (overall and per-core statistics) | | `include_memory` | boolean | `true` | Include memory metrics collection (RAM usage, swap statistics, page faults) | | `include_disk` | boolean | `true` | Include disk metrics collection (partition usage, I/O statistics, performance) | | `include_network` | boolean | `true` | Include network metrics collection (interface statistics and error counts) | | `max_retries` | integer | `3` | Maximum retry attempts on failure with graceful error handling | ### TOML configuration | Parameter | Type | Default | Description | |--------------------|--------|---------|----------------------------------------------------------------------------------| | `config_file_path` | string | none | TOML config file path relative to `PLUGIN_DIR` (required for TOML configuration) | *To use a TOML configuration file, set the `PLUGIN_DIR` environment variable and specify the `config_file_path` in the trigger arguments.* This is in addition to the `--plugin-dir` flag when starting InfluxDB 3. #### Example TOML configuration [system_metrics_config_scheduler.toml](https://github.com/influxdata/influxdb3_plugins/blob/master/influxdata/system_metrics/system_metrics_config_scheduler.toml) For more information on using TOML configuration files, see the Using TOML Configuration Files section in the [influxdb3_plugins /README.md](https://github.com/influxdata/influxdb3_plugins/blob/master/README.md). ## Installation steps 1. Start {{% product-name %}} with the Processing Engine enabled (`--plugin-dir /path/to/plugins`) 2. Install required Python packages: - `psutil` (for system metrics collection) ```bash influxdb3 install package psutil ``` ## Trigger setup ### Basic scheduled trigger Monitor system performance every 30 seconds: ```bash influxdb3 create trigger \ --database system_monitoring \ --plugin-filename gh:influxdata/system_metrics/system_metrics.py \ --trigger-spec "every:30s" \ system_metrics_trigger ``` ### Custom configuration Monitor specific metrics with custom hostname: ```bash influxdb3 create trigger \ --database system_monitoring \ --plugin-filename gh:influxdata/system_metrics/system_metrics.py \ --trigger-spec "every:30s" \ --trigger-arguments hostname=web-server-01,include_disk=false,max_retries=5 \ system_metrics_custom_trigger ``` ## Example usage ### Example 1: Web server monitoring Monitor web server performance every 15 seconds with network statistics: ```bash # Create trigger for web server monitoring influxdb3 create trigger \ --database web_monitoring \ --plugin-filename gh:influxdata/system_metrics/system_metrics.py \ --trigger-spec "every:15s" \ --trigger-arguments hostname=web-server-01,include_network=true \ web_server_metrics # Query recent CPU metrics influxdb3 query \ --database web_monitoring \ "SELECT * FROM system_cpu WHERE time >= now() - interval '5 minutes' LIMIT 5" ``` ### Expected output ``` +---------------+-------+------+--------+------+--------+-------+-------+-----------+------------------+ | host | cpu | user | system | idle | iowait | nice | load1 | load5 | time | +---------------+-------+------+--------+------+--------+-------+-------+-----------+------------------+ | web-server-01 | total | 12.5 | 5.3 | 81.2 | 0.8 | 0.0 | 0.85 | 0.92 | 2024-01-15 10:00 | | web-server-01 | total | 13.1 | 5.5 | 80.4 | 0.7 | 0.0 | 0.87 | 0.93 | 2024-01-15 10:01 | | web-server-01 | total | 11.8 | 5.1 | 82.0 | 0.9 | 0.0 | 0.83 | 0.91 | 2024-01-15 10:02 | +---------------+-------+------+--------+------+--------+-------+-------+-----------+------------------+ ``` ### Example 2: Database server monitoring Focus on CPU and disk metrics for database server: ```bash # Create trigger for database server influxdb3 create trigger \ --database db_monitoring \ --plugin-filename gh:influxdata/system_metrics/system_metrics.py \ --trigger-spec "every:30s" \ --trigger-arguments hostname=db-primary,include_disk=true,include_cpu=true,include_network=false \ database_metrics # Query disk usage influxdb3 query \ --database db_monitoring \ "SELECT * FROM system_disk_usage WHERE host = 'db-primary'" ``` ### Example 3: High-frequency monitoring Collect all metrics every 10 seconds with higher retry tolerance: ```bash # Create high-frequency monitoring trigger influxdb3 create trigger \ --database system_monitoring \ --plugin-filename gh:influxdata/system_metrics/system_metrics.py \ --trigger-spec "every:10s" \ --trigger-arguments hostname=critical-server,max_retries=10 \ high_freq_metrics ``` ## Code overview ### Files - `system_metrics.py`: The main plugin code containing system metrics collection logic - `system_metrics_config_scheduler.toml`: Example TOML configuration file for scheduled triggers ### Logging Logs are stored in the `_internal` database (or the database where the trigger is created) in the `system.processing_engine_logs` table. To view logs: ```bash influxdb3 query --database _internal "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'your_trigger_name'" ``` Log columns: - **event_time**: Timestamp of the log event - **trigger_name**: Name of the trigger that generated the log - **log_level**: Severity level (INFO, WARN, ERROR) - **log_text**: Message describing the action or error ### Main functions #### `process_scheduled_call(influxdb3_local, call_time, args)` The main entry point for scheduled triggers. Collects system metrics based on configuration and writes them to InfluxDB. Key operations: 1. Parses configuration from arguments 2. Collects CPU, memory, disk, and network metrics based on configuration 3. Writes metrics to InfluxDB with proper error handling and retry logic #### `collect_cpu_metrics(influxdb3_local, hostname)` Collects CPU utilization and performance metrics including per-core statistics and system load averages. #### `collect_memory_metrics(influxdb3_local, hostname)` Collects memory usage statistics including RAM, swap, and page fault information. #### `collect_disk_metrics(influxdb3_local, hostname)` Collects disk usage and I/O statistics for all mounted partitions. #### `collect_network_metrics(influxdb3_local, hostname)` Collects network interface statistics including bytes transferred and error counts. ### Measurements and Fields #### system_cpu Overall CPU statistics and metrics: - **Tags**: `host`, `cpu=total` - **Fields**: `user`, `system`, `idle`, `iowait`, `nice`, `irq`, `softirq`, `steal`, `guest`, `guest_nice`, `frequency_current`, `frequency_min`, `frequency_max`, `ctx_switches`, `interrupts`, `soft_interrupts`, `syscalls`, `load1`, `load5`, `load15` #### system_cpu_cores Per-core CPU statistics: - **Tags**: `host`, `core` (core number) - **Fields**: `usage`, `user`, `system`, `idle`, `iowait`, `nice`, `irq`, `softirq`, `steal`, `guest`, `guest_nice`, `frequency_current`, `frequency_min`, `frequency_max` #### system_memory System memory statistics: - **Tags**: `host` - **Fields**: `total`, `available`, `used`, `free`, `active`, `inactive`, `buffers`, `cached`, `shared`, `slab`, `percent` #### system_swap Swap memory statistics: - **Tags**: `host` - **Fields**: `total`, `used`, `free`, `percent`, `sin`, `sout` #### system_memory_faults Memory page fault information (when available): - **Tags**: `host` - **Fields**: `page_faults`, `major_faults`, `minor_faults`, `rss`, `vms`, `dirty`, `uss`, `pss` #### system_disk_usage Disk partition usage: - **Tags**: `host`, `device`, `mountpoint`, `fstype` - **Fields**: `total`, `used`, `free`, `percent` #### system_disk_io Disk I/O statistics: - **Tags**: `host`, `device` - **Fields**: `reads`, `writes`, `read_bytes`, `write_bytes`, `read_time`, `write_time`, `busy_time`, `read_merged_count`, `write_merged_count` #### system_disk_performance Calculated disk performance metrics: - **Tags**: `host`, `device` - **Fields**: `read_bytes_per_sec`, `write_bytes_per_sec`, `read_iops`, `write_iops`, `avg_read_latency_ms`, `avg_write_latency_ms`, `util_percent` #### system_network Network interface statistics: - **Tags**: `host`, `interface` - **Fields**: `bytes_sent`, `bytes_recv`, `packets_sent`, `packets_recv`, `errin`, `errout`, `dropin`, `dropout` ## Troubleshooting ### Common issues #### Issue: Permission errors on disk I/O metrics Some disk I/O metrics may require elevated permissions. **Solution**: The plugin will continue collecting other metrics even if some require elevated permissions. Consider running InfluxDB 3 with appropriate permissions if disk I/O metrics are critical. #### Issue: Missing psutil library ``` ERROR: No module named 'psutil' ``` **Solution**: Install the psutil package: ```bash influxdb3 install package psutil ``` #### Issue: High CPU usage from plugin If the plugin causes high CPU usage, consider: - Increasing the trigger interval (for example, from `every:10s` to `every:30s`) - Disabling unnecessary metric types - Reducing the number of disk partitions monitored #### Issue: No data being collected **Solution**: 1. Check that the trigger is active: ```bash influxdb3 query --database _internal "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'your_trigger_name'" ``` 2. Verify system permissions allow access to system metrics 3. Check that the psutil package is properly installed ### Debugging tips 1. **Check recent metrics collection**: ```bash # List all system metric measurements influxdb3 query \ --database system_monitoring \ "SHOW MEASUREMENTS WHERE measurement =~ /^system_/" # Check recent CPU metrics influxdb3 query \ --database system_monitoring \ "SELECT COUNT(*) FROM system_cpu WHERE time >= now() - interval '1 hour'" ``` 2. **Monitor plugin logs**: ```bash influxdb3 query \ --database _internal \ "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'system_metrics_trigger' ORDER BY time DESC LIMIT 10" ``` 3. **Test metric collection manually**: ```bash influxdb3 test schedule_plugin \ --database system_monitoring \ --schedule "0 0 * * * ?" \ system_metrics.py ``` ### Performance considerations - The plugin collects comprehensive system metrics efficiently using the psutil library - Metric collection is optimized to minimize system overhead - Error handling and retry logic ensure reliable operation - Configurable metric types allow focusing on relevant metrics only ## Report an issue For plugin issues, see the Plugins repository [issues page](https://github.com/influxdata/influxdb3_plugins/issues). ## Find support for {{% product-name %}} The [InfluxDB Discord server](https://discord.gg/9zaNCW2PRT) is the best place to find support for InfluxDB 3 Core and InfluxDB 3 Enterprise. For other InfluxDB versions, see the [Support and feedback](#bug-reports-and-feedback) options.