docs-v2/content/platform/troubleshoot/disk-usage.md

---
title: Troubleshoot high disk usage
list_title: High disk usage
description: Identify and troubleshoot high disk usage when using the InfluxData 1.x TICK stack.
menu:
  platform:
    name: Disk usage
    parent: Troubleshoot
    weight: 4
---

It's very important that components of your TICK stack do not run out of disk.
A machine with 100% disk usage will not function properly.

In a [monitoring dashboard](/platform/monitoring/influxdata-platform/monitoring-dashboards/), high disk usage
will appear in the **Disk Utilization %** metric and look similar to the following:

![High disk usage](/img/platform/troubleshooting-disk-usage.png)

## Potential causes

### Old data not being downsampled

InfluxDB uses retention policies and continuous queries to downsample older data and preserve disk space.
If using an infinite retention policy or one with a lengthy duration, high resolution
data will use more and more disk space.

### Log data not being dropped


Log data is incredibly useful in your monitoring solution, but can also require
more disk space than other types of time series data.
Many times, log data is stored in an infinite retention policy (the default retention
policy duration), meaning it never gets dropped.
This will inevitably lead to high disk utilization.

## Solutions

### Remove unnecessary data

The simplest solution to high disk utilization is removing old or unnecessary data.
This can be done by brute force (deleting/dropping data) or in a more graceful
manner by tuning the duration of your retention policies and adjusting the downsampling
rates in your continuous queries.

#### Log data retention policies

Log data should only be stored in a finite
[retention policy](/influxdb/v1/query_language/database_management/#retention-policy-management).
The duration of your retention policy is determined by how long you want to keep
log data around.

Whether or not you use a [continuous query](/influxdb/v1/query_language/continuous_queries/)
to downsample log data at the end of its retention period is up to you, but old log
data should either be downsampled or dropped altogether.

### Scale your machine's disk capacity

If removing or downsampling data isn't an option, you can always scale your machine's
disk capacity. How this is done depends on your hardware or virtualization configuration
and is not covered in this documentation.

## Recommendations

### Set up a disk usage alert

To preempt disk utilization issues, create a task that alerts you if disk usage
crosses certain thresholds. The example TICKscript [below](#example-tickscript-alert-for-disk-usage)
sets warning and critical disk usage thresholds and sends a message to Slack
whenever those thresholds are crossed.

_For information about Kapacitor tasks and alerts, see the [Kapacitor alerts](/kapacitor/v1/working/alerts/) documentation._

#### Example TICKscript alert for disk usage
```
// Disk usage alerts
// Alert when disks are this % full
var warn_threshold = 80
var crit_threshold = 90

// Use a larger period here, as the telegraf data can be a little late
// if the server is under load.
var period = 10m

// How often to query for the period.
var every = 20m

var data = batch
  |query('''
    SELECT last(used_percent) FROM "telegraf"."default".disk
    WHERE ("path" = '/influxdb/conf' or "path" = '/')
    ''')
    .period(period)
    .every(every)
    .groupBy('host', 'path')

data
  |alert()
    .id('Alert: Disk Usage, Host: {{ index .Tags "host" }}, Path: {{ index .Tags "path" }}')
    .warn(lambda: "last" > warn_threshold)
    .message('{{ .ID }}, Used Percent: {{ index .Fields "last" | printf "%0.0f" }}%')
    .details('')
    .stateChangesOnly()
    .slack()

data
  |alert()
    .id('Alert: Disk Usage, Host: {{ index .Tags "host" }}, Path: {{ index .Tags "path" }}')
    .crit(lambda: "last" > crit_threshold)
    .message('{{ .ID }}, Used Percent: {{ index .Fields "last" | printf "%0.0f" }}%')
    .details('')
    .slack()
```
roll over Platform troubleshooting docs PR 2020-08-21 21:38:04 +00:00			`---`
restructured platform section 2020-08-27 06:09:04 +00:00			`title: Troubleshoot high disk usage`
			`list_title: High disk usage`
platform upds (#2311) * platform upds * Update _index.md * add 1.x to TICK references * update platform page to showcase 2.0 and add Telegraf * get started with the InfluxData platform * update install and deploy page * update deploy the tick stack doc * update monitor platform doc * more first-round platform updates 2021-03-24 17:00:22 +00:00			`description: Identify and troubleshoot high disk usage when using the InfluxData 1.x TICK stack.`
roll over Platform troubleshooting docs PR 2020-08-21 21:38:04 +00:00			`menu:`
			`platform:`
			`name: Disk usage`
restructured platform section 2020-08-27 06:09:04 +00:00			`parent: Troubleshoot`
roll over Platform troubleshooting docs PR 2020-08-21 21:38:04 +00:00			`weight: 4`
			`---`

			`It's very important that components of your TICK stack do not run out of disk.`
			`A machine with 100% disk usage will not function properly.`

Platform link audit 2020-09-01 14:19:51 +00:00			`In a [monitoring dashboard](/platform/monitoring/influxdata-platform/monitoring-dashboards/), high disk usage`
roll over Platform troubleshooting docs PR 2020-08-21 21:38:04 +00:00			`will appear in the Disk Utilization % metric and look similar to the following:`

			`![High disk usage](/img/platform/troubleshooting-disk-usage.png)`

			`## Potential causes`

			`### Old data not being downsampled`

			`InfluxDB uses retention policies and continuous queries to downsample older data and preserve disk space.`
			`If using an infinite retention policy or one with a lengthy duration, high resolution`
			`data will use more and more disk space.`

			`### Log data not being dropped`


			`Log data is incredibly useful in your monitoring solution, but can also require`
			`more disk space than other types of time series data.`
			`Many times, log data is stored in an infinite retention policy (the default retention`
			`policy duration), meaning it never gets dropped.`
			`This will inevitably lead to high disk utilization.`

			`## Solutions`

			`### Remove unnecessary data`

			`The simplest solution to high disk utilization is removing old or unnecessary data.`
			`This can be done by brute force (deleting/dropping data) or in a more graceful`
			`manner by tuning the duration of your retention policies and adjusting the downsampling`
			`rates in your continuous queries.`

			`#### Log data retention policies`

			`Log data should only be stored in a finite`
Version restructure (#5133) * mass changes for version restructure * fixed latest-patch and flux version generator * updated hugo configs * fixed flux frontmatter injector * fixed flux frontmatter injector * WIP api generator updates for version restructure (#5128) * fixed telegraf plugin list * removed latest shortcode * fixed current-version * fixed product dropdown crosslinking * fixed alt links * WIP fixing links * fixed broken links * updated api doc generation * fixed additional resources * added version redirects to edge.js * fixed search placeholder * fixed paged titles 2023-09-13 05:33:31 +00:00			`[retention policy](/influxdb/v1/query_language/database_management/#retention-policy-management).`
roll over Platform troubleshooting docs PR 2020-08-21 21:38:04 +00:00			`The duration of your retention policy is determined by how long you want to keep`
			`log data around.`

Version restructure (#5133) * mass changes for version restructure * fixed latest-patch and flux version generator * updated hugo configs * fixed flux frontmatter injector * fixed flux frontmatter injector * WIP api generator updates for version restructure (#5128) * fixed telegraf plugin list * removed latest shortcode * fixed current-version * fixed product dropdown crosslinking * fixed alt links * WIP fixing links * fixed broken links * updated api doc generation * fixed additional resources * added version redirects to edge.js * fixed search placeholder * fixed paged titles 2023-09-13 05:33:31 +00:00			`Whether or not you use a [continuous query](/influxdb/v1/query_language/continuous_queries/)`
roll over Platform troubleshooting docs PR 2020-08-21 21:38:04 +00:00			`to downsample log data at the end of its retention period is up to you, but old log`
			`data should either be downsampled or dropped altogether.`

			`### Scale your machine's disk capacity`

			`If removing or downsampling data isn't an option, you can always scale your machine's`
			`disk capacity. How this is done depends on your hardware or virtualization configuration`
			`and is not covered in this documentation.`

			`## Recommendations`

			`### Set up a disk usage alert`

			`To preempt disk utilization issues, create a task that alerts you if disk usage`
			`crosses certain thresholds. The example TICKscript [below](#example-tickscript-alert-for-disk-usage)`
			`sets warning and critical disk usage thresholds and sends a message to Slack`
			`whenever those thresholds are crossed.`

Version restructure (#5133) * mass changes for version restructure * fixed latest-patch and flux version generator * updated hugo configs * fixed flux frontmatter injector * fixed flux frontmatter injector * WIP api generator updates for version restructure (#5128) * fixed telegraf plugin list * removed latest shortcode * fixed current-version * fixed product dropdown crosslinking * fixed alt links * WIP fixing links * fixed broken links * updated api doc generation * fixed additional resources * added version redirects to edge.js * fixed search placeholder * fixed paged titles 2023-09-13 05:33:31 +00:00			`_For information about Kapacitor tasks and alerts, see the [Kapacitor alerts](/kapacitor/v1/working/alerts/) documentation._`
roll over Platform troubleshooting docs PR 2020-08-21 21:38:04 +00:00
			`#### Example TICKscript alert for disk usage`
			```
			`// Disk usage alerts`
			`// Alert when disks are this % full`
			`var warn_threshold = 80`
			`var crit_threshold = 90`

			`// Use a larger period here, as the telegraf data can be a little late`
			`// if the server is under load.`
			`var period = 10m`

			`// How often to query for the period.`
			`var every = 20m`

			`var data = batch`
			`\|query('''`
			`SELECT last(used_percent) FROM "telegraf"."default".disk`
			`WHERE ("path" = '/influxdb/conf' or "path" = '/')`
			`''')`
			`.period(period)`
			`.every(every)`
			`.groupBy('host', 'path')`

			`data`
			`\|alert()`
			`.id('Alert: Disk Usage, Host: {{ index .Tags "host" }}, Path: {{ index .Tags "path" }}')`
			`.warn(lambda: "last" > warn_threshold)`
			`.message('{{ .ID }}, Used Percent: {{ index .Fields "last" \| printf "%0.0f" }}%')`
			`.details('')`
			`.stateChangesOnly()`
			`.slack()`

			`data`
			`\|alert()`
			`.id('Alert: Disk Usage, Host: {{ index .Tags "host" }}, Path: {{ index .Tags "path" }}')`
			`.crit(lambda: "last" > crit_threshold)`
			`.message('{{ .ID }}, Used Percent: {{ index .Fields "last" \| printf "%0.0f" }}%')`
			`.details('')`
			`.slack()`
			```