12 KiB
The System Metrics Plugin provides comprehensive system monitoring capabilities for InfluxDB 3, collecting CPU, memory, disk, and network metrics from the host system. Monitor detailed performance insights including per-core CPU statistics, memory usage breakdowns, disk I/O performance, and network interface statistics. Features configurable metric collection with robust error handling and retry logic for reliable monitoring.
Configuration
Required parameters
No required parameters - all system metrics are collected by default with sensible defaults.
System monitoring parameters
Parameter | Type | Default | Description |
---|---|---|---|
hostname |
string | localhost |
Hostname to tag all metrics with for system identification |
include_cpu |
boolean | true |
Include comprehensive CPU metrics collection (overall and per-core statistics) |
include_memory |
boolean | true |
Include memory metrics collection (RAM usage, swap statistics, page faults) |
include_disk |
boolean | true |
Include disk metrics collection (partition usage, I/O statistics, performance) |
include_network |
boolean | true |
Include network metrics collection (interface statistics and error counts) |
max_retries |
integer | 3 |
Maximum retry attempts on failure with graceful error handling |
TOML configuration
Parameter | Type | Default | Description |
---|---|---|---|
config_file_path |
string | none | TOML config file path relative to PLUGIN_DIR (required for TOML configuration) |
To use a TOML configuration file, set the PLUGIN_DIR
environment variable and specify the config_file_path
in the trigger arguments. This is in addition to the --plugin-dir
flag when starting InfluxDB 3.
Example TOML configuration
system_metrics_config_scheduler.toml
For more information on using TOML configuration files, see the Using TOML Configuration Files section in the influxdb3_plugins /README.md.
Installation steps
-
Start {{% product-name %}} with the Processing Engine enabled (
--plugin-dir /path/to/plugins
) -
Install required Python packages:
psutil
(for system metrics collection)
influxdb3 install package psutil
Trigger setup
Basic scheduled trigger
Monitor system performance every 30 seconds:
influxdb3 create trigger \
--database system_monitoring \
--plugin-filename gh:influxdata/system_metrics/system_metrics.py \
--trigger-spec "every:30s" \
system_metrics_trigger
Custom configuration
Monitor specific metrics with custom hostname:
influxdb3 create trigger \
--database system_monitoring \
--plugin-filename gh:influxdata/system_metrics/system_metrics.py \
--trigger-spec "every:30s" \
--trigger-arguments hostname=web-server-01,include_disk=false,max_retries=5 \
system_metrics_custom_trigger
Example usage
Example 1: Web server monitoring
Monitor web server performance every 15 seconds with network statistics:
# Create trigger for web server monitoring
influxdb3 create trigger \
--database web_monitoring \
--plugin-filename gh:influxdata/system_metrics/system_metrics.py \
--trigger-spec "every:15s" \
--trigger-arguments hostname=web-server-01,include_network=true \
web_server_metrics
# Query recent CPU metrics
influxdb3 query \
--database web_monitoring \
"SELECT * FROM system_cpu WHERE time >= now() - interval '5 minutes' LIMIT 5"
Expected output
+---------------+-------+------+--------+------+--------+-------+-------+-----------+------------------+
| host | cpu | user | system | idle | iowait | nice | load1 | load5 | time |
+---------------+-------+------+--------+------+--------+-------+-------+-----------+------------------+
| web-server-01 | total | 12.5 | 5.3 | 81.2 | 0.8 | 0.0 | 0.85 | 0.92 | 2024-01-15 10:00 |
| web-server-01 | total | 13.1 | 5.5 | 80.4 | 0.7 | 0.0 | 0.87 | 0.93 | 2024-01-15 10:01 |
| web-server-01 | total | 11.8 | 5.1 | 82.0 | 0.9 | 0.0 | 0.83 | 0.91 | 2024-01-15 10:02 |
+---------------+-------+------+--------+------+--------+-------+-------+-----------+------------------+
Example 2: Database server monitoring
Focus on CPU and disk metrics for database server:
# Create trigger for database server
influxdb3 create trigger \
--database db_monitoring \
--plugin-filename gh:influxdata/system_metrics/system_metrics.py \
--trigger-spec "every:30s" \
--trigger-arguments hostname=db-primary,include_disk=true,include_cpu=true,include_network=false \
database_metrics
# Query disk usage
influxdb3 query \
--database db_monitoring \
"SELECT * FROM system_disk_usage WHERE host = 'db-primary'"
Example 3: High-frequency monitoring
Collect all metrics every 10 seconds with higher retry tolerance:
# Create high-frequency monitoring trigger
influxdb3 create trigger \
--database system_monitoring \
--plugin-filename gh:influxdata/system_metrics/system_metrics.py \
--trigger-spec "every:10s" \
--trigger-arguments hostname=critical-server,max_retries=10 \
high_freq_metrics
Code overview
Files
system_metrics.py
: The main plugin code containing system metrics collection logicsystem_metrics_config_scheduler.toml
: Example TOML configuration file for scheduled triggers
Logging
Logs are stored in the _internal
database (or the database where the trigger is created) in the system.processing_engine_logs
table. To view logs:
influxdb3 query --database _internal "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'your_trigger_name'"
Log columns:
- event_time: Timestamp of the log event
- trigger_name: Name of the trigger that generated the log
- log_level: Severity level (INFO, WARN, ERROR)
- log_text: Message describing the action or error
Main functions
process_scheduled_call(influxdb3_local, call_time, args)
The main entry point for scheduled triggers. Collects system metrics based on configuration and writes them to InfluxDB.
Key operations:
- Parses configuration from arguments
- Collects CPU, memory, disk, and network metrics based on configuration
- Writes metrics to InfluxDB with proper error handling and retry logic
collect_cpu_metrics(influxdb3_local, hostname)
Collects CPU utilization and performance metrics including per-core statistics and system load averages.
collect_memory_metrics(influxdb3_local, hostname)
Collects memory usage statistics including RAM, swap, and page fault information.
collect_disk_metrics(influxdb3_local, hostname)
Collects disk usage and I/O statistics for all mounted partitions.
collect_network_metrics(influxdb3_local, hostname)
Collects network interface statistics including bytes transferred and error counts.
Measurements and Fields
system_cpu
Overall CPU statistics and metrics:
- Tags:
host
,cpu=total
- Fields:
user
,system
,idle
,iowait
,nice
,irq
,softirq
,steal
,guest
,guest_nice
,frequency_current
,frequency_min
,frequency_max
,ctx_switches
,interrupts
,soft_interrupts
,syscalls
,load1
,load5
,load15
system_cpu_cores
Per-core CPU statistics:
- Tags:
host
,core
(core number) - Fields:
usage
,user
,system
,idle
,iowait
,nice
,irq
,softirq
,steal
,guest
,guest_nice
,frequency_current
,frequency_min
,frequency_max
system_memory
System memory statistics:
- Tags:
host
- Fields:
total
,available
,used
,free
,active
,inactive
,buffers
,cached
,shared
,slab
,percent
system_swap
Swap memory statistics:
- Tags:
host
- Fields:
total
,used
,free
,percent
,sin
,sout
system_memory_faults
Memory page fault information (when available):
- Tags:
host
- Fields:
page_faults
,major_faults
,minor_faults
,rss
,vms
,dirty
,uss
,pss
system_disk_usage
Disk partition usage:
- Tags:
host
,device
,mountpoint
,fstype
- Fields:
total
,used
,free
,percent
system_disk_io
Disk I/O statistics:
- Tags:
host
,device
- Fields:
reads
,writes
,read_bytes
,write_bytes
,read_time
,write_time
,busy_time
,read_merged_count
,write_merged_count
system_disk_performance
Calculated disk performance metrics:
- Tags:
host
,device
- Fields:
read_bytes_per_sec
,write_bytes_per_sec
,read_iops
,write_iops
,avg_read_latency_ms
,avg_write_latency_ms
,util_percent
system_network
Network interface statistics:
- Tags:
host
,interface
- Fields:
bytes_sent
,bytes_recv
,packets_sent
,packets_recv
,errin
,errout
,dropin
,dropout
Troubleshooting
Common issues
Issue: Permission errors on disk I/O metrics
Some disk I/O metrics may require elevated permissions.
Solution: The plugin will continue collecting other metrics even if some require elevated permissions. Consider running InfluxDB 3 with appropriate permissions if disk I/O metrics are critical.
Issue: Missing psutil library
ERROR: No module named 'psutil'
Solution: Install the psutil package:
influxdb3 install package psutil
Issue: High CPU usage from plugin
If the plugin causes high CPU usage, consider:
- Increasing the trigger interval (for example, from
every:10s
toevery:30s
) - Disabling unnecessary metric types
- Reducing the number of disk partitions monitored
Issue: No data being collected
Solution:
- Check that the trigger is active:
influxdb3 query --database _internal "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'your_trigger_name'"
- Verify system permissions allow access to system metrics
- Check that the psutil package is properly installed
Debugging tips
-
Check recent metrics collection:
# List all system metric measurements influxdb3 query \ --database system_monitoring \ "SHOW MEASUREMENTS WHERE measurement =~ /^system_/" # Check recent CPU metrics influxdb3 query \ --database system_monitoring \ "SELECT COUNT(*) FROM system_cpu WHERE time >= now() - interval '1 hour'"
-
Monitor plugin logs:
influxdb3 query \ --database _internal \ "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'system_metrics_trigger' ORDER BY time DESC LIMIT 10"
-
Test metric collection manually:
influxdb3 test schedule_plugin \ --database system_monitoring \ --schedule "0 0 * * * ?" \ system_metrics.py
Performance considerations
- The plugin collects comprehensive system metrics efficiently using the psutil library
- Metric collection is optimized to minimize system overhead
- Error handling and retry logic ensure reliable operation
- Configurable metric types allow focusing on relevant metrics only
Report an issue
For plugin issues, see the Plugins repository issues page.
Find support for {{% product-name %}}
The InfluxDB Discord server is the best place to find support for InfluxDB 3 Core and InfluxDB 3 Enterprise. For other InfluxDB versions, see the Support and feedback options.