Merge pull request #986 from influxdata/query/percentages

Calculating percentages in Flux
2020-04-29 11:30:18 -06:00 · 2020-04-29 11:30:18 -06:00 · 6d784927f6
parent 9b1e68b01a 4d1a817a49
commit 6d784927f6
4 changed files with 316 additions and 91 deletions
--- a/assets/styles/layouts/article/_lists.scss
+++ b/assets/styles/layouts/article/_lists.scss
@ -13,21 +13,32 @@ ul {
 }
 ol {
-list-style: none;
+  list-style-type: none;
-counter-reset: item;
+  counter-reset: item;
-  li {
+  margin: 0;
-    position: relative;
+  padding: 0;
-    counter-increment: item;
+}
-    &:before {
+
-      content: counter(item) ". ";
+ol > li {
-      position: absolute;
+  display: table;
-      left: -1.6em;
+  counter-increment: item;
-      color: $article-bold;
+  margin-bottom: 0.6em;
-      font-weight: bold;
+
-    }
+  &:before {
-    ul {
+    content: counters(item, ".") ". ";
-      counter-reset: item;
+    display: table-cell;
-    }
+    padding-right: 0.6em;
    letter-spacing: .05rem;
    color: $article-bold;
    font-weight: bold;
  }
 }
 li ol > li {
  margin: .5rem 0;
  &:before {
    content: counters(item, ".") ".";
  }
 }
--- a/content/v2.0/query-data/flux/calculate-percentages.md
+++ b/content/v2.0/query-data/flux/calculate-percentages.md
@ -0,0 +1,216 @@
 ---
 title: Calculate percentages with Flux
 list_title: Calculate percentages
 description: >
  Use [`pivot()` or `join()`](/v2.0/query-data/flux/mathematic-operations/#pivot-vs-join)
  and the [`map()` function](/v2.0/reference/flux/stdlib/built-in/transformations/map/)
  to align operand values into rows and calculate a percentage.
 menu:
  v2_0:
    name: Calculate percentages
    parent: Query with Flux
 weight: 206
 aliases:
 - /v2.0/query-data/guides/manipulate-timestamps/
 related:
  - /v2.0/query-data/flux/mathematic-operations
  - /v2.0/reference/flux/stdlib/built-in/transformations/map
  - /v2.0/reference/flux/stdlib/built-in/transformations/pivot
  - /v2.0/reference/flux/stdlib/built-in/transformations/join
 list_query_example: percentages
 ---
 Calculating percentages from queried data is a common use case for time series data.
 To calculate a percentage in Flux, operands must be in each row.
 Use `map()` to re-map values in the row and calculate a percentage.
 **To calculate percentages**
 1. Use [`from()`](/v2.0/reference/flux/stdlib/built-in/inputs/from/),
   [`range()`](/v2.0/reference/flux/stdlib/built-in/transformations/range/) and
   [`filter()`](/v2.0/reference/flux/stdlib/built-in/transformations/filter/) to query operands.
 2. Use [`pivot()` or `join()`](/v2.0/query-data/flux/mathematic-operations/#pivot-vs-join)
   to align operand values into rows.
 3. Use [`map()`](/v2.0/reference/flux/stdlib/built-in/transformations/map/)
   to divide the numerator operand value by the denominator operand value and multiply by 100.
 {{% note %}}
 The following examples use `pivot()` to align operands into rows because
 `pivot()` works in most cases and is more performant than `join()`.
 _See [Pivot vs join](/v2.0/query-data/flux/mathematic-operations/#pivot-vs-join)._
 {{% /note %}}
 ```js
 from(bucket: "example-bucket")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "m1" and r._field =~ /field[1-2]/ )
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
  |> map(fn: (r) => ({ r with _value: r.field1 / r.field2 * 100.0 }))
 ```
 ## GPU monitoring example
 The following example queries data from the gpu-monitor bucket and calculates the
 percentage of GPU memory used over time.
 Data includes the following:
 - **`gpu` measurement**
 - **`mem_used` field**: used GPU memory in bytes
 - **`mem_total` field**: total GPU memory in bytes
 ### Query mem_used and mem_total fields
 ```js
 from(bucket: "gpu-monitor")
  |> range(start: 2020-01-01T00:00:00Z)
  |> filter(fn: (r) => r._measurement == "gpu" and r._field =~ /mem_/)
 ```
 ###### Returns the following stream of tables:
 | _time                | _measurement | _field   | _value     |
 |:-----                |:------------:|:------:  | ------:    |
 | 2020-01-01T00:00:00Z | gpu          | mem_used | 2517924577 |
 | 2020-01-01T00:00:10Z | gpu          | mem_used | 2695091978 |
 | 2020-01-01T00:00:20Z | gpu          | mem_used | 2576980377 |
 | 2020-01-01T00:00:30Z | gpu          | mem_used | 3006477107 |
 | 2020-01-01T00:00:40Z | gpu          | mem_used | 3543348019 |
 | 2020-01-01T00:00:50Z | gpu          | mem_used | 4402341478 |
 <p style="margin:-2.5rem 0;"></p>
 | _time                | _measurement | _field    | _value     |
 |:-----                |:------------:|:------:   | ------:    |
 | 2020-01-01T00:00:00Z | gpu          | mem_total | 8589934592 |
 | 2020-01-01T00:00:10Z | gpu          | mem_total | 8589934592 |
 | 2020-01-01T00:00:20Z | gpu          | mem_total | 8589934592 |
 | 2020-01-01T00:00:30Z | gpu          | mem_total | 8589934592 |
 | 2020-01-01T00:00:40Z | gpu          | mem_total | 8589934592 |
 | 2020-01-01T00:00:50Z | gpu          | mem_total | 8589934592 |
 ### Pivot fields into columns
 Use `pivot()` to pivot the `mem_used` and `mem_total` fields into columns.
 Output includes `mem_used` and `mem_total` columns with values for each corresponding `_time`.
 ```js
 // ...
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
 ```
 ###### Returns the following:
 | _time                | _measurement | mem_used   | mem_total  |
 |:-----                |:------------:| --------:  | ---------: |
 | 2020-01-01T00:00:00Z | gpu          | 2517924577 | 8589934592 |
 | 2020-01-01T00:00:10Z | gpu          | 2695091978 | 8589934592 |
 | 2020-01-01T00:00:20Z | gpu          | 2576980377 | 8589934592 |
 | 2020-01-01T00:00:30Z | gpu          | 3006477107 | 8589934592 |
 | 2020-01-01T00:00:40Z | gpu          | 3543348019 | 8589934592 |
 | 2020-01-01T00:00:50Z | gpu          | 4402341478 | 8589934592 |
 ### Map new values
 Each row now contains the values necessary to calculate a percentage.
 Use `map()` to re-map values in each row.
 Divide `mem_used` by `mem_total` and multiply by 100 to return the percentage.
 {{% note %}}
 To return a precise float percentage value that includes decimal points, the example
 below casts integer field values to floats and multiplies by a float value (`100.0`).
 {{% /note %}}
 ```js
 // ...
  |> map(fn: (r) => ({
    _time: r._time,
    _measurement: r._measurement,
    _field: "mem_used_percent",
    _value: float(v: r.mem_used) / float(v: r.mem_total) * 100.0
  }))
 ```
 ##### Query results:
 | _time                | _measurement | _field           | _value  |
 |:-----                |:------------:|:------:          | ------: |
 | 2020-01-01T00:00:00Z | gpu          | mem_used_percent | 29.31   |
 | 2020-01-01T00:00:10Z | gpu          | mem_used_percent | 31.37   |
 | 2020-01-01T00:00:20Z | gpu          | mem_used_percent | 30.00   |
 | 2020-01-01T00:00:30Z | gpu          | mem_used_percent | 35.00   |
 | 2020-01-01T00:00:40Z | gpu          | mem_used_percent | 41.25   |
 | 2020-01-01T00:00:50Z | gpu          | mem_used_percent | 51.25   |
 ### Full query
 ```js
 from(bucket: "gpu-monitor")
  |> range(start: 2020-01-01T00:00:00Z)
  |> filter(fn: (r) => r._measurement == "gpu" and r._field =~ /mem_/ )
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
  |> map(fn: (r) => ({
    _time: r._time,
    _measurement: r._measurement,
    _field: "mem_used_percent",
    _value: float(v: r.mem_used) / float(v: r.mem_total) * 100.0
  }))
 ```
 ## Examples
 #### Calculate percentages using multiple fields
 ```js
 from(bucket: "example-bucket")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "example-measurement")
  |> filter(fn: (r) =>
    r._field == "used_system" or
    r._field == "used_user" or
    r._field == "total"
  )
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
  |> map(fn: (r) => ({ r with
    _value: float(v: r.used_system + r.used_user) / float(v: r.total) * 100.0
  }))
 ```
 #### Calculate percentages using multiple measurements
 1. Ensure measurements are in the same [bucket](/v2.0/reference/glossary/#bucket).
 2. Use `filter()` to include data from both measurements.
 3. Use `group()` to ungroup data and return a single table.
 4. Use `pivot()` to pivot fields into columns.
 5. Use `map()` to re-map rows and perform the percentage calculation.
 <!-- -->
 ```js
 from(bucket: "example-bucket")
  |> range(start: -1h)
  |> filter(fn: (r) =>
    (r._measurement == "m1" or r._measurement == "m2") and
    (r._field == "field1" or r._field == "field2")    
  )
  |> group()
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
  |> map(fn: (r) => ({ r with  _value: r.field1 / r.field2 * 100.0 }))
 ```
 #### Calculate percentages using multiple data sources
 ```js
 import "sql"
 import "influxdata/influxdb/secrets"
 pgUser = secrets.get(key: "POSTGRES_USER")
 pgPass = secrets.get(key: "POSTGRES_PASSWORD")
 pgHost = secrets.get(key: "POSTGRES_HOST")
 t1 = sql.from(
  driverName: "postgres",
  dataSourceName: "postgresql://${pgUser}:${pgPass}@${pgHost}",
  query:"SELECT id, name, available FROM exampleTable"
 )
 t2 = from(bucket: "example-bucket")
  |> range(start: -1h)
  |> filter(fn: (r) =>
    r._measurement == "example-measurement" and
    r._field == "example-field"
  )
 join(tables: {t1: t1, t2: t2}, on: ["id"])
  |> map(fn: (r) => ({ r with _value: r._value_t2 / r.available_t1 * 100.0 }))
 ```
--- a/content/v2.0/query-data/flux/mathematic-operations.md
+++ b/content/v2.0/query-data/flux/mathematic-operations.md
@ -18,6 +18,7 @@ related:
  - /v2.0/reference/flux/stdlib/built-in/transformations/aggregates/reduce/
  - /v2.0/reference/flux/language/operators/
  - /v2.0/reference/flux/stdlib/built-in/transformations/type-conversions/
  - /v2.0/query-data/flux/calculate-percentages/
 list_query_example: map_math
 ---
@ -98,6 +99,7 @@ percent(sample: 20.0, total: 80.0)
 To transform multiple values in an input stream, your function needs to:
 - [Handle piped-forward data](/v2.0/query-data/flux/custom-functions/#functions-that-manipulate-piped-forward-data).
 - Each operand necessary for the calculation exists in each row _(see [Pivot vs join](#pivot-vs-join) below)_.
 - Use the [`map()` function](/v2.0/reference/flux/stdlib/built-in/transformations/map) to iterate over each row.
 The example `multiplyByX()` function below includes:
@ -178,93 +180,57 @@ bytesToGB = (tables=<-) =>
 ### Calculate a percentage
 To calculate a percentage, use simple division, then multiply the result by 100.
 {{% note %}}
 Operands in percentage calculations should always be floats.
 {{% /note %}}
 ```js
 > 1.0 / 4.0 * 100.0
 25.0
 ```
-#### User vs system CPU usage
+_For an in-depth look at calculating percentages, see [Calculate percentates](/v2.0/query-data/flux/calculate-percentages)._
 The example below calculates the percentage of total CPU used by the `user` vs the `system`.
-{{< code-tabs-wrapper >}}
+## Pivot vs join
-{{% code-tabs %}}
+To query and use values in mathematical operations in Flux, operand values must
-[Comments](#)
+exists in a single row.
-[No Comments](#)
+Both `pivot()` and `join()` will do this, but there are important differences between the two:
 {{% /code-tabs %}}
-{{% code-tab-content %}}
+#### Pivot is more performant
 `pivot()` reads and operates on a single stream of data.
 `join()` requires two streams of data and the overhead of reading and combining
 both streams can be significant, especially for larger data sets.
 #### Use join for multiple data sources
 Use `join()` when querying data from different buckets or data sources.
 ##### Pivot fields into columns for mathematic calculations
 ```js
-// Custom function that converts usage_user and
+data
-// usage_system columns to floats
+  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
-usageToFloat = (tables=<-) =>
+  |> map(fn: (r) => ({ r with
-  tables
+    _value: (r.field1 + r.field2) / r.field3 * 100.0
-    |> map(fn: (r) => ({
+  }))
-      _time: r._time,
+```
      usage_user: float(v: r.usage_user),
      usage_system: float(v: r.usage_system)
      })
    )
-// Define the data source and filter user and system CPU usage
+##### Join multiple data sources for mathematic calculations
-// from 'cpu-total' in the 'cpu' measurement
+```js
-from(bucket: "example-bucket")
+import "sql"
 import "influxdata/influxdb/secrets"
 pgUser = secrets.get(key: "POSTGRES_USER")
 pgPass = secrets.get(key: "POSTGRES_PASSWORD")
 pgHost = secrets.get(key: "POSTGRES_HOST")
 t1 = sql.from(
  driverName: "postgres",
  dataSourceName: "postgresql://${pgUser}:${pgPass}@${pgHost}",
  query:"SELECT id, name, available FROM exampleTable"
 )
 t2 = from(bucket: "example-bucket")
  |> range(start: -1h)
  |> filter(fn: (r) =>
-    r._measurement == "cpu" and
+    r._measurement == "example-measurement" and
-    r._field == "usage_user" or
+    r._field == "example-field"
    r._field == "usage_system" and
    r.cpu == "cpu-total"
  )
-  // Pivot the output tables so usage_user and usage_system are in each row
+join(tables: {t1: t1, t2: t2}, on: ["id"])
-  |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
+  |> map(fn: (r) => ({ r with _value: r._value_t2 / r.available_t1 * 100.0 }))
  // Convert usage_user and usage_system to floats
  |> usageToFloat()
  // Map over each row and calculate the percentage of
  // CPU used by the user vs the system
  |> map(fn: (r) => ({
      // Preserve existing columns in each row
      r with
      usage_user: r.usage_user / (r.usage_user + r.usage_system) * 100.0,
      usage_system: r.usage_system / (r.usage_user +  r.usage_system) * 100.0
    })
  )
 ```
 {{% /code-tab-content %}}
 {{% code-tab-content %}}
 ```js
 usageToFloat = (tables=<-) =>
  tables
    |> map(fn: (r) => ({
      _time: r._time,
      usage_user: float(v: r.usage_user),
      usage_system: float(v: r.usage_system)
      })
    )
 from(bucket: "example-bucket")
  |> range(start: timeRangeStart, stop: timeRangeStop)
  |> filter(fn: (r) =>
    r._measurement == "cpu" and
    r._field == "usage_user" or
    r._field == "usage_system" and
    r.cpu == "cpu-total"
  )
  |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
  |> usageToFloat()
  |> map(fn: (r) => ({
      r with
      usage_user: r.usage_user / (r.usage_user + r.usage_system) * 100.0,
      usage_system: r.usage_system / (r.usage_user +  r.usage_system) * 100.0
    })
  )
 ```
 {{% /code-tab-content %}}
 {{< /code-tabs-wrapper >}}
--- a/data/query_examples.yml
+++ b/data/query_examples.yml
@ -332,6 +332,38 @@ moving_average:
      | 2020-01-01T00:06:00Z | 1.325  |
      | 2020-01-01T00:06:00Z | 1.150  |
 percentages:
  -
    code: |
      ```js
      data
        |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
        |> map(fn: (r) => ({
          _time: r._time,
          _field: "used_percent",
          _value: float(v: r.used) / float(v: r.total) * 100.0
        }))
      ```
    input: |
      | _time                | _field  | _value |
      |:-----                |:------: | ------:|
      | 2020-01-01T00:00:00Z | used    | 2.5    |
      | 2020-01-01T00:00:10Z | used    | 3.1    |
      | 2020-01-01T00:00:20Z | used    | 4.2    |
      | _time                | _field  | _value |
      |:-----                |:------: | ------:|
      | 2020-01-01T00:00:00Z | total   | 8.0    |
      | 2020-01-01T00:00:10Z | total   | 8.0    |
      | 2020-01-01T00:00:20Z | total   | 8.0    |
    output: |
      | _time                | _field       | _value |
      |:-----                |:------:      | ------:|
      | 2020-01-01T00:00:00Z | used_percent | 31.25  |
      | 2020-01-01T00:00:10Z | used_percent | 38.75  |
      | 2020-01-01T00:00:20Z | used_percent | 52.50  |
 quantile:
  -
    code: |