docs-v2

12 KiB

Raw Permalink Blame History

The System Metrics Plugin provides comprehensive system monitoring capabilities for InfluxDB 3, collecting CPU, memory, disk, and network metrics from the host system. Monitor detailed performance insights including per-core CPU statistics, memory usage breakdowns, disk I/O performance, and network interface statistics. Features configurable metric collection with robust error handling and retry logic for reliable monitoring.

Configuration

Required parameters

No required parameters - all system metrics are collected by default with sensible defaults.

System monitoring parameters

Parameter	Type	Default	Description
`hostname`	string	`localhost`	Hostname to tag all metrics with for system identification
`include_cpu`	boolean	`true`	Include comprehensive CPU metrics collection (overall and per-core statistics)
`include_memory`	boolean	`true`	Include memory metrics collection (RAM usage, swap statistics, page faults)
`include_disk`	boolean	`true`	Include disk metrics collection (partition usage, I/O statistics, performance)
`include_network`	boolean	`true`	Include network metrics collection (interface statistics and error counts)
`max_retries`	integer	`3`	Maximum retry attempts on failure with graceful error handling

TOML configuration

Parameter	Type	Default	Description
`config_file_path`	string	none	TOML config file path relative to `PLUGIN_DIR` (required for TOML configuration)

To use a TOML configuration file, set the PLUGIN_DIR environment variable and specify the config_file_path in the trigger arguments. This is in addition to the --plugin-dir flag when starting InfluxDB 3.

Example TOML configuration

system_metrics_config_scheduler.toml

For more information on using TOML configuration files, see the Using TOML Configuration Files section in the influxdb3_plugins /README.md.

Installation steps

Start {{% product-name %}} with the Processing Engine enabled (--plugin-dir /path/to/plugins)
Install required Python packages:
- psutil (for system metrics collection)
```
influxdb3 install package psutil
```

Trigger setup

Basic scheduled trigger

Monitor system performance every 30 seconds:

influxdb3 create trigger \
  --database system_monitoring \
  --plugin-filename gh:influxdata/system_metrics/system_metrics.py \
  --trigger-spec "every:30s" \
  system_metrics_trigger

Custom configuration

Monitor specific metrics with custom hostname:

influxdb3 create trigger \
  --database system_monitoring \
  --plugin-filename gh:influxdata/system_metrics/system_metrics.py \
  --trigger-spec "every:30s" \
  --trigger-arguments hostname=web-server-01,include_disk=false,max_retries=5 \
  system_metrics_custom_trigger

Example usage

Example 1: Web server monitoring

Monitor web server performance every 15 seconds with network statistics:

# Create trigger for web server monitoring
influxdb3 create trigger \
  --database web_monitoring \
  --plugin-filename gh:influxdata/system_metrics/system_metrics.py \
  --trigger-spec "every:15s" \
  --trigger-arguments hostname=web-server-01,include_network=true \
  web_server_metrics

# Query recent CPU metrics
influxdb3 query \
  --database web_monitoring \
  "SELECT * FROM system_cpu WHERE time >= now() - interval '5 minutes' LIMIT 5"

Expected output

+---------------+-------+------+--------+------+--------+-------+-------+-----------+------------------+
| host          | cpu   | user | system | idle | iowait | nice  | load1 | load5     | time             |
+---------------+-------+------+--------+------+--------+-------+-------+-----------+------------------+
| web-server-01 | total | 12.5 | 5.3    | 81.2 | 0.8    | 0.0   | 0.85  | 0.92      | 2024-01-15 10:00 |
| web-server-01 | total | 13.1 | 5.5    | 80.4 | 0.7    | 0.0   | 0.87  | 0.93      | 2024-01-15 10:01 |
| web-server-01 | total | 11.8 | 5.1    | 82.0 | 0.9    | 0.0   | 0.83  | 0.91      | 2024-01-15 10:02 |
+---------------+-------+------+--------+------+--------+-------+-------+-----------+------------------+

Example 2: Database server monitoring

Focus on CPU and disk metrics for database server:

# Create trigger for database server
influxdb3 create trigger \
  --database db_monitoring \
  --plugin-filename gh:influxdata/system_metrics/system_metrics.py \
  --trigger-spec "every:30s" \
  --trigger-arguments hostname=db-primary,include_disk=true,include_cpu=true,include_network=false \
  database_metrics

# Query disk usage
influxdb3 query \
  --database db_monitoring \
  "SELECT * FROM system_disk_usage WHERE host = 'db-primary'"

Example 3: High-frequency monitoring

Collect all metrics every 10 seconds with higher retry tolerance:

# Create high-frequency monitoring trigger
influxdb3 create trigger \
  --database system_monitoring \
  --plugin-filename gh:influxdata/system_metrics/system_metrics.py \
  --trigger-spec "every:10s" \
  --trigger-arguments hostname=critical-server,max_retries=10 \
  high_freq_metrics

Code overview

Files

system_metrics.py: The main plugin code containing system metrics collection logic
system_metrics_config_scheduler.toml: Example TOML configuration file for scheduled triggers

Logging

Logs are stored in the _internal database (or the database where the trigger is created) in the system.processing_engine_logs table. To view logs:

influxdb3 query --database _internal "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'your_trigger_name'"

Log columns:

event_time: Timestamp of the log event
trigger_name: Name of the trigger that generated the log
log_level: Severity level (INFO, WARN, ERROR)
log_text: Message describing the action or error

Main functions

`process_scheduled_call(influxdb3_local, call_time, args)`

The main entry point for scheduled triggers. Collects system metrics based on configuration and writes them to InfluxDB.

Key operations:

Parses configuration from arguments
Collects CPU, memory, disk, and network metrics based on configuration
Writes metrics to InfluxDB with proper error handling and retry logic

`collect_cpu_metrics(influxdb3_local, hostname)`

Collects CPU utilization and performance metrics including per-core statistics and system load averages.

`collect_memory_metrics(influxdb3_local, hostname)`

Collects memory usage statistics including RAM, swap, and page fault information.

`collect_disk_metrics(influxdb3_local, hostname)`

Collects disk usage and I/O statistics for all mounted partitions.

`collect_network_metrics(influxdb3_local, hostname)`

Collects network interface statistics including bytes transferred and error counts.

Measurements and Fields

system_cpu

Overall CPU statistics and metrics:

Tags: host, cpu=total
Fields: user, system, idle, iowait, nice, irq, softirq, steal, guest, guest_nice, frequency_current, frequency_min, frequency_max, ctx_switches, interrupts, soft_interrupts, syscalls, load1, load5, load15

system_cpu_cores

Per-core CPU statistics:

Tags: host, core (core number)
Fields: usage, user, system, idle, iowait, nice, irq, softirq, steal, guest, guest_nice, frequency_current, frequency_min, frequency_max

system_memory

System memory statistics:

Tags: host
Fields: total, available, used, free, active, inactive, buffers, cached, shared, slab, percent

system_swap

Swap memory statistics:

Tags: host
Fields: total, used, free, percent, sin, sout

system_memory_faults

Memory page fault information (when available):

Tags: host
Fields: page_faults, major_faults, minor_faults, rss, vms, dirty, uss, pss

system_disk_usage

Disk partition usage:

Tags: host, device, mountpoint, fstype
Fields: total, used, free, percent

system_disk_io

Disk I/O statistics:

Tags: host, device
Fields: reads, writes, read_bytes, write_bytes, read_time, write_time, busy_time, read_merged_count, write_merged_count

system_disk_performance

Calculated disk performance metrics:

Tags: host, device
Fields: read_bytes_per_sec, write_bytes_per_sec, read_iops, write_iops, avg_read_latency_ms, avg_write_latency_ms, util_percent

system_network

Network interface statistics:

Tags: host, interface
Fields: bytes_sent, bytes_recv, packets_sent, packets_recv, errin, errout, dropin, dropout

Troubleshooting

Common issues

Issue: Permission errors on disk I/O metrics

Some disk I/O metrics may require elevated permissions.

Solution: The plugin will continue collecting other metrics even if some require elevated permissions. Consider running InfluxDB 3 with appropriate permissions if disk I/O metrics are critical.

Issue: Missing psutil library

ERROR: No module named 'psutil'

Solution: Install the psutil package:

influxdb3 install package psutil

Issue: High CPU usage from plugin

If the plugin causes high CPU usage, consider:

Increasing the trigger interval (for example, from every:10s to every:30s)
Disabling unnecessary metric types
Reducing the number of disk partitions monitored

Issue: No data being collected

Solution:

Check that the trigger is active:

influxdb3 query --database _internal "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'your_trigger_name'"

Verify system permissions allow access to system metrics
Check that the psutil package is properly installed

Debugging tips

Check recent metrics collection:

# List all system metric measurements
influxdb3 query \
  --database system_monitoring \
  "SHOW MEASUREMENTS WHERE measurement =~ /^system_/"

# Check recent CPU metrics
influxdb3 query \
  --database system_monitoring \
  "SELECT COUNT(*) FROM system_cpu WHERE time >= now() - interval '1 hour'"

Monitor plugin logs:

influxdb3 query \
  --database _internal \
  "SELECT * FROM system.processing_engine_logs WHERE trigger_name = 'system_metrics_trigger' ORDER BY time DESC LIMIT 10"

Test metric collection manually:

influxdb3 test schedule_plugin \
  --database system_monitoring \
  --schedule "0 0 * * * ?" \
  system_metrics.py

Performance considerations

The plugin collects comprehensive system metrics efficiently using the psutil library
Metric collection is optimized to minimize system overhead
Error handling and retry logic ensure reliable operation
Configurable metric types allow focusing on relevant metrics only

Report an issue

For plugin issues, see the Plugins repository issues page.

Find support for {{% product-name %}}

The InfluxDB Discord server is the best place to find support for InfluxDB 3 Core and InfluxDB 3 Enterprise. For other InfluxDB versions, see the Support and feedback options.

12 KiB Raw Permalink Blame History