12 KiB
The System Metrics Plugin provides comprehensive system monitoring capabilities for InfluxDB 3, collecting CPU, memory, disk, and network metrics from the host system. Monitor detailed performance insights including per-core CPU statistics, memory usage breakdowns, disk I/O performance, and network interface statistics. Features configurable metric collection with robust error handling and retry logic for reliable monitoring.
Configuration
Required parameters
No required parameters - all system metrics are collected by default with sensible defaults.
System monitoring parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
hostname |
string | localhost |
Hostname to tag all metrics with for system identification |
include_cpu |
boolean | true |
Include comprehensive CPU metrics collection (overall and per-core statistics) |
include_memory |
boolean | true |
Include memory metrics collection (RAM usage, swap statistics, page faults) |
include_disk |
boolean | true |
Include disk metrics collection (partition usage, I/O statistics, performance) |
include_network |
boolean | true |
Include network metrics collection (interface statistics and error counts) |
max_retries |
integer | 3 |
Maximum retry attempts on failure with graceful error handling |
TOML configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
config_file_path |
string | none | TOML config file path relative to PLUGIN_DIR (required for TOML configuration) |
To use a TOML configuration file, set the PLUGIN_DIR environment variable and specify the config_file_path in the trigger arguments. This is in addition to the --plugin-dir flag when starting InfluxDB 3.
Example TOML configuration
system_metrics_config_scheduler.toml
For more information on using TOML configuration files, see the Using TOML Configuration Files section in the influxdb3_plugins /README.md.
Installation steps
-
Start {{% product-name %}} with the Processing Engine enabled (
--plugin-dir /path/to/plugins) -
Install required Python packages:
psutil(for system metrics collection)
influxdb3 install package psutil
Trigger setup
Basic scheduled trigger
Monitor system performance every 30 seconds:
influxdb3 create trigger \
--database system_monitoring \
--plugin-filename gh:influxdata/system_metrics/system_metrics.py \
--trigger-spec "every:30s" \
system_metrics_trigger
Custom configuration
Monitor specific metrics with custom hostname:
influxdb3 create trigger \
--database system_monitoring \
--plugin-filename gh:influxdata/system_metrics/system_metrics.py \
--trigger-spec "every:30s" \
--trigger-arguments hostname=web-server-01,include_disk=false,max_retries=5 \
system_metrics_custom_trigger
Example usage
Example 1: Web server monitoring
Monitor web server performance every 15 seconds with network statistics:
# Create trigger for web server monitoring
influxdb3 create trigger \
--database web_monitoring \
--plugin-filename gh:influxdata/system_metrics/system_metrics.py \
--trigger-spec "every:15s" \
--trigger-arguments hostname=web-server-01,include_network=true \
web_server_metrics
# Query recent CPU metrics
influxdb3 query \
--database web_monitoring \
"SELECT * FROM system_cpu WHERE time >= now() - interval '5 minutes' LIMIT 5"
Expected output
+---------------+-------+------+--------+------+--------+-------+-------+-----------+------------------+
| host | cpu | user | system | idle | iowait | nice | load1 | load5 | time |
+---------------+-------+------+--------+------+--------+-------+-------+-----------+------------------+
| web-server-01 | total | 12.5 | 5.3 | 81.2 | 0.8 | 0.0 | 0.85 | 0.92 | 2024-01-15 10:00 |
| web-server-01 | total | 13.1 | 5.5 | 80.4 | 0.7 | 0.0 | 0.87 | 0.93 | 2024-01-15 10:01 |
| web-server-01 | total | 11.8 | 5.1 | 82.0 | 0.9 | 0.0 | 0.83 | 0.91 | 2024-01-15 10:02 |
+---------------+-------+------+--------+------+--------+-------+-------+-----------+------------------+
Example 2: Database server monitoring
Focus on CPU and disk metrics for database server:
# Create trigger for database server
influxdb3 create trigger \
--database db_monitoring \
--plugin-filename gh:influxdata/system_metrics/system_metrics.py \
--trigger-spec "every:30s" \
--trigger-arguments hostname=db-primary,include_disk=true,include_cpu=true,include_network=false \
database_metrics
# Query disk usage
influxdb3 query \
--database db_monitoring \
"SELECT * FROM system_disk_usage WHERE host = 'db-primary'"
Example 3: High-frequency monitoring
Collect all metrics every 10 seconds with higher retry tolerance:
# Create high-frequency monitoring trigger
influxdb3 create trigger \
--database system_monitoring \
--plugin-filename gh:influxdata/system_metrics/system_metrics.py \
--trigger-spec "every:10s" \
--trigger-arguments hostname=critical-server,max_retries=10 \
high_freq_metrics
Code overview
Files
system_metrics.py: The main plugin code containing system metrics collection logicsystem_metrics_config_scheduler.toml: Example TOML configuration file for scheduled triggers
Logging
Logs are stored in the _internal database (or the database where the trigger is created) in the system.processing_engine_logs table. To view logs:
influxdb3 query --database _internal "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'your_trigger_name'"
Log columns:
- event_time: Timestamp of the log event
- trigger_name: Name of the trigger that generated the log
- log_level: Severity level (INFO, WARN, ERROR)
- log_text: Message describing the action or error
Main functions
process_scheduled_call(influxdb3_local, call_time, args)
The main entry point for scheduled triggers. Collects system metrics based on configuration and writes them to InfluxDB.
Key operations:
- Parses configuration from arguments
- Collects CPU, memory, disk, and network metrics based on configuration
- Writes metrics to InfluxDB with proper error handling and retry logic
collect_cpu_metrics(influxdb3_local, hostname)
Collects CPU utilization and performance metrics including per-core statistics and system load averages.
collect_memory_metrics(influxdb3_local, hostname)
Collects memory usage statistics including RAM, swap, and page fault information.
collect_disk_metrics(influxdb3_local, hostname)
Collects disk usage and I/O statistics for all mounted partitions.
collect_network_metrics(influxdb3_local, hostname)
Collects network interface statistics including bytes transferred and error counts.
Measurements and Fields
system_cpu
Overall CPU statistics and metrics:
- Tags:
host,cpu=total - Fields:
user,system,idle,iowait,nice,irq,softirq,steal,guest,guest_nice,frequency_current,frequency_min,frequency_max,ctx_switches,interrupts,soft_interrupts,syscalls,load1,load5,load15
system_cpu_cores
Per-core CPU statistics:
- Tags:
host,core(core number) - Fields:
usage,user,system,idle,iowait,nice,irq,softirq,steal,guest,guest_nice,frequency_current,frequency_min,frequency_max
system_memory
System memory statistics:
- Tags:
host - Fields:
total,available,used,free,active,inactive,buffers,cached,shared,slab,percent
system_swap
Swap memory statistics:
- Tags:
host - Fields:
total,used,free,percent,sin,sout
system_memory_faults
Memory page fault information (when available):
- Tags:
host - Fields:
page_faults,major_faults,minor_faults,rss,vms,dirty,uss,pss
system_disk_usage
Disk partition usage:
- Tags:
host,device,mountpoint,fstype - Fields:
total,used,free,percent
system_disk_io
Disk I/O statistics:
- Tags:
host,device - Fields:
reads,writes,read_bytes,write_bytes,read_time,write_time,busy_time,read_merged_count,write_merged_count
system_disk_performance
Calculated disk performance metrics:
- Tags:
host,device - Fields:
read_bytes_per_sec,write_bytes_per_sec,read_iops,write_iops,avg_read_latency_ms,avg_write_latency_ms,util_percent
system_network
Network interface statistics:
- Tags:
host,interface - Fields:
bytes_sent,bytes_recv,packets_sent,packets_recv,errin,errout,dropin,dropout
Troubleshooting
Common issues
Issue: Permission errors on disk I/O metrics
Some disk I/O metrics may require elevated permissions.
Solution: The plugin will continue collecting other metrics even if some require elevated permissions. Consider running InfluxDB 3 with appropriate permissions if disk I/O metrics are critical.
Issue: Missing psutil library
ERROR: No module named 'psutil'
Solution: Install the psutil package:
influxdb3 install package psutil
Issue: High CPU usage from plugin
If the plugin causes high CPU usage, consider:
- Increasing the trigger interval (for example, from
every:10stoevery:30s) - Disabling unnecessary metric types
- Reducing the number of disk partitions monitored
Issue: No data being collected
Solution:
- Check that the trigger is active:
influxdb3 query --database _internal "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'your_trigger_name'" - Verify system permissions allow access to system metrics
- Check that the psutil package is properly installed
Debugging tips
-
Check recent metrics collection:
# List all system metric measurements influxdb3 query \ --database system_monitoring \ "SHOW MEASUREMENTS WHERE measurement =~ /^system_/" # Check recent CPU metrics influxdb3 query \ --database system_monitoring \ "SELECT COUNT(*) FROM system_cpu WHERE time >= now() - interval '1 hour'" -
Monitor plugin logs:
influxdb3 query \ --database _internal \ "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'system_metrics_trigger' ORDER BY time DESC LIMIT 10" -
Test metric collection manually:
influxdb3 test schedule_plugin \ --database system_monitoring \ --schedule "0 0 * * * ?" \ system_metrics.py
Performance considerations
- The plugin collects comprehensive system metrics efficiently using the psutil library
- Metric collection is optimized to minimize system overhead
- Error handling and retry logic ensure reliable operation
- Configurable metric types allow focusing on relevant metrics only
Report an issue
For plugin issues, see the Plugins repository issues page.
Find support for {{% product-name %}}
The InfluxDB Discord server is the best place to find support for InfluxDB 3 Core and InfluxDB 3 Enterprise. For other InfluxDB versions, see the Support and feedback options.