Merge pull request #986 from influxdata/query/percentages

Calculating percentages in Flux
2020-04-29 11:30:18 -06:00 · 2020-04-29 11:30:18 -06:00 · 6d784927f6
parent 9b1e68b01a 4d1a817a49
commit 6d784927f6
4 changed files with 316 additions and 91 deletions
--- a/assets/styles/layouts/article/_lists.scss
+++ b/assets/styles/layouts/article/_lists.scss
@ -13,21 +13,32 @@ ul {
 }

 ol {
-list-style: none;
-counter-reset: item;
-  li {
-    position: relative;
-    counter-increment: item;
-    &:before {
-      content: counter(item) ". ";
-      position: absolute;
-      left: -1.6em;
-      color: $article-bold;
-      font-weight: bold;
-    }
-    ul {
-      counter-reset: item;
-    }
+  list-style-type: none;
+  counter-reset: item;
+  margin: 0;
+  padding: 0;
+}
+
+ol > li {
+  display: table;
+  counter-increment: item;
+  margin-bottom: 0.6em;
+
+  &:before {
+    content: counters(item, ".") ". ";
+    display: table-cell;
+    padding-right: 0.6em;
+    letter-spacing: .05rem;
+    color: $article-bold;
+    font-weight: bold;
+  }
+}
+
+li ol > li {
+  margin: .5rem 0;
+
+  &:before {
+    content: counters(item, ".") ".";
  }
 }

--- a/content/v2.0/query-data/flux/calculate-percentages.md
+++ b/content/v2.0/query-data/flux/calculate-percentages.md
@ -0,0 +1,216 @@
+---
+title: Calculate percentages with Flux
+list_title: Calculate percentages
+description: >
+  Use [`pivot()` or `join()`](/v2.0/query-data/flux/mathematic-operations/#pivot-vs-join)
+  and the [`map()` function](/v2.0/reference/flux/stdlib/built-in/transformations/map/)
+  to align operand values into rows and calculate a percentage.
+menu:
+  v2_0:
+    name: Calculate percentages
+    parent: Query with Flux
+weight: 206
+aliases:
+ - /v2.0/query-data/guides/manipulate-timestamps/
+related:
+  - /v2.0/query-data/flux/mathematic-operations
+  - /v2.0/reference/flux/stdlib/built-in/transformations/map
+  - /v2.0/reference/flux/stdlib/built-in/transformations/pivot
+  - /v2.0/reference/flux/stdlib/built-in/transformations/join
+list_query_example: percentages
+---
+
+Calculating percentages from queried data is a common use case for time series data.
+To calculate a percentage in Flux, operands must be in each row.
+Use `map()` to re-map values in the row and calculate a percentage.
+
+**To calculate percentages**
+
+1. Use [`from()`](/v2.0/reference/flux/stdlib/built-in/inputs/from/),
+   [`range()`](/v2.0/reference/flux/stdlib/built-in/transformations/range/) and
+   [`filter()`](/v2.0/reference/flux/stdlib/built-in/transformations/filter/) to query operands.
+2. Use [`pivot()` or `join()`](/v2.0/query-data/flux/mathematic-operations/#pivot-vs-join)
+   to align operand values into rows.
+3. Use [`map()`](/v2.0/reference/flux/stdlib/built-in/transformations/map/)
+   to divide the numerator operand value by the denominator operand value and multiply by 100.
+
+{{% note %}}
+The following examples use `pivot()` to align operands into rows because
+`pivot()` works in most cases and is more performant than `join()`.
+_See [Pivot vs join](/v2.0/query-data/flux/mathematic-operations/#pivot-vs-join)._
+{{% /note %}}
+
+```js
+from(bucket: "example-bucket")
+  |> range(start: -1h)
+  |> filter(fn: (r) => r._measurement == "m1" and r._field =~ /field[1-2]/ )
+  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
+  |> map(fn: (r) => ({ r with _value: r.field1 / r.field2 * 100.0 }))
+```
+
+## GPU monitoring example
+The following example queries data from the gpu-monitor bucket and calculates the
+percentage of GPU memory used over time.
+Data includes the following:
+
+- **`gpu` measurement**
+- **`mem_used` field**: used GPU memory in bytes
+- **`mem_total` field**: total GPU memory in bytes
+
+### Query mem_used and mem_total fields
+```js
+from(bucket: "gpu-monitor")
+  |> range(start: 2020-01-01T00:00:00Z)
+  |> filter(fn: (r) => r._measurement == "gpu" and r._field =~ /mem_/)
+```
+
+###### Returns the following stream of tables:
+
+| _time                | _measurement | _field   | _value     |
+|:-----                |:------------:|:------:  | ------:    |
+| 2020-01-01T00:00:00Z | gpu          | mem_used | 2517924577 |
+| 2020-01-01T00:00:10Z | gpu          | mem_used | 2695091978 |
+| 2020-01-01T00:00:20Z | gpu          | mem_used | 2576980377 |
+| 2020-01-01T00:00:30Z | gpu          | mem_used | 3006477107 |
+| 2020-01-01T00:00:40Z | gpu          | mem_used | 3543348019 |
+| 2020-01-01T00:00:50Z | gpu          | mem_used | 4402341478 |
+
+<p style="margin:-2.5rem 0;"></p>
+
+| _time                | _measurement | _field    | _value     |
+|:-----                |:------------:|:------:   | ------:    |
+| 2020-01-01T00:00:00Z | gpu          | mem_total | 8589934592 |
+| 2020-01-01T00:00:10Z | gpu          | mem_total | 8589934592 |
+| 2020-01-01T00:00:20Z | gpu          | mem_total | 8589934592 |
+| 2020-01-01T00:00:30Z | gpu          | mem_total | 8589934592 |
+| 2020-01-01T00:00:40Z | gpu          | mem_total | 8589934592 |
+| 2020-01-01T00:00:50Z | gpu          | mem_total | 8589934592 |
+
+### Pivot fields into columns
+Use `pivot()` to pivot the `mem_used` and `mem_total` fields into columns.
+Output includes `mem_used` and `mem_total` columns with values for each corresponding `_time`.
+
+```js
+// ...
+  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
+```
+
+###### Returns the following:
+
+| _time                | _measurement | mem_used   | mem_total  |
+|:-----                |:------------:| --------:  | ---------: |
+| 2020-01-01T00:00:00Z | gpu          | 2517924577 | 8589934592 |
+| 2020-01-01T00:00:10Z | gpu          | 2695091978 | 8589934592 |
+| 2020-01-01T00:00:20Z | gpu          | 2576980377 | 8589934592 |
+| 2020-01-01T00:00:30Z | gpu          | 3006477107 | 8589934592 |
+| 2020-01-01T00:00:40Z | gpu          | 3543348019 | 8589934592 |
+| 2020-01-01T00:00:50Z | gpu          | 4402341478 | 8589934592 |
+
+### Map new values
+Each row now contains the values necessary to calculate a percentage.
+Use `map()` to re-map values in each row.
+Divide `mem_used` by `mem_total` and multiply by 100 to return the percentage.
+
+{{% note %}}
+To return a precise float percentage value that includes decimal points, the example
+below casts integer field values to floats and multiplies by a float value (`100.0`).
+{{% /note %}}
+
+```js
+// ...
+  |> map(fn: (r) => ({
+    _time: r._time,
+    _measurement: r._measurement,
+    _field: "mem_used_percent",
+    _value: float(v: r.mem_used) / float(v: r.mem_total) * 100.0
+  }))
+```
+##### Query results:
+
+| _time                | _measurement | _field           | _value  |
+|:-----                |:------------:|:------:          | ------: |
+| 2020-01-01T00:00:00Z | gpu          | mem_used_percent | 29.31   |
+| 2020-01-01T00:00:10Z | gpu          | mem_used_percent | 31.37   |
+| 2020-01-01T00:00:20Z | gpu          | mem_used_percent | 30.00   |
+| 2020-01-01T00:00:30Z | gpu          | mem_used_percent | 35.00   |
+| 2020-01-01T00:00:40Z | gpu          | mem_used_percent | 41.25   |
+| 2020-01-01T00:00:50Z | gpu          | mem_used_percent | 51.25   |
+
+### Full query
+```js
+from(bucket: "gpu-monitor")
+  |> range(start: 2020-01-01T00:00:00Z)
+  |> filter(fn: (r) => r._measurement == "gpu" and r._field =~ /mem_/ )
+  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
+  |> map(fn: (r) => ({
+    _time: r._time,
+    _measurement: r._measurement,
+    _field: "mem_used_percent",
+    _value: float(v: r.mem_used) / float(v: r.mem_total) * 100.0
+  }))
+```
+
+## Examples
+
+#### Calculate percentages using multiple fields
+```js
+from(bucket: "example-bucket")
+  |> range(start: -1h)
+  |> filter(fn: (r) => r._measurement == "example-measurement")
+  |> filter(fn: (r) =>
+    r._field == "used_system" or
+    r._field == "used_user" or
+    r._field == "total"
+  )
+  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
+  |> map(fn: (r) => ({ r with
+    _value: float(v: r.used_system + r.used_user) / float(v: r.total) * 100.0
+  }))
+```
+
+#### Calculate percentages using multiple measurements
+
+1. Ensure measurements are in the same [bucket](/v2.0/reference/glossary/#bucket).
+2. Use `filter()` to include data from both measurements.
+3. Use `group()` to ungroup data and return a single table.
+4. Use `pivot()` to pivot fields into columns.
+5. Use `map()` to re-map rows and perform the percentage calculation.
+
+<!-- -->
+```js
+from(bucket: "example-bucket")
+  |> range(start: -1h)
+  |> filter(fn: (r) =>
+    (r._measurement == "m1" or r._measurement == "m2") and
+    (r._field == "field1" or r._field == "field2")    
+  )
+  |> group()
+  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
+  |> map(fn: (r) => ({ r with  _value: r.field1 / r.field2 * 100.0 }))
+```
+
+#### Calculate percentages using multiple data sources
+```js
+import "sql"
+import "influxdata/influxdb/secrets"
+
+pgUser = secrets.get(key: "POSTGRES_USER")
+pgPass = secrets.get(key: "POSTGRES_PASSWORD")
+pgHost = secrets.get(key: "POSTGRES_HOST")
+
+t1 = sql.from(
+  driverName: "postgres",
+  dataSourceName: "postgresql://${pgUser}:${pgPass}@${pgHost}",
+  query:"SELECT id, name, available FROM exampleTable"
+)
+
+t2 = from(bucket: "example-bucket")
+  |> range(start: -1h)
+  |> filter(fn: (r) =>
+    r._measurement == "example-measurement" and
+    r._field == "example-field"
+  )
+
+join(tables: {t1: t1, t2: t2}, on: ["id"])
+  |> map(fn: (r) => ({ r with _value: r._value_t2 / r.available_t1 * 100.0 }))
+```
--- a/content/v2.0/query-data/flux/mathematic-operations.md
+++ b/content/v2.0/query-data/flux/mathematic-operations.md
@ -18,6 +18,7 @@ related:
  - /v2.0/reference/flux/stdlib/built-in/transformations/aggregates/reduce/
  - /v2.0/reference/flux/language/operators/
  - /v2.0/reference/flux/stdlib/built-in/transformations/type-conversions/
+  - /v2.0/query-data/flux/calculate-percentages/
 list_query_example: map_math
 ---

@ -98,6 +99,7 @@ percent(sample: 20.0, total: 80.0)
 To transform multiple values in an input stream, your function needs to:

 - [Handle piped-forward data](/v2.0/query-data/flux/custom-functions/#functions-that-manipulate-piped-forward-data).
+- Each operand necessary for the calculation exists in each row _(see [Pivot vs join](#pivot-vs-join) below)_.
 - Use the [`map()` function](/v2.0/reference/flux/stdlib/built-in/transformations/map) to iterate over each row.

 The example `multiplyByX()` function below includes:
@ -178,93 +180,57 @@ bytesToGB = (tables=<-) =>
 ### Calculate a percentage
 To calculate a percentage, use simple division, then multiply the result by 100.

-{{% note %}}
-Operands in percentage calculations should always be floats.
-{{% /note %}}
-
 ```js
 > 1.0 / 4.0 * 100.0
 25.0
 ```

-#### User vs system CPU usage
-The example below calculates the percentage of total CPU used by the `user` vs the `system`.
+_For an in-depth look at calculating percentages, see [Calculate percentates](/v2.0/query-data/flux/calculate-percentages)._

-{{< code-tabs-wrapper >}}
-{{% code-tabs %}}
-[Comments](#)
-[No Comments](#)
-{{% /code-tabs %}}
+## Pivot vs join
+To query and use values in mathematical operations in Flux, operand values must
+exists in a single row.
+Both `pivot()` and `join()` will do this, but there are important differences between the two:

-{{% code-tab-content %}}
+#### Pivot is more performant
+`pivot()` reads and operates on a single stream of data.
+`join()` requires two streams of data and the overhead of reading and combining
+both streams can be significant, especially for larger data sets.
+
+#### Use join for multiple data sources
+Use `join()` when querying data from different buckets or data sources.
+
+##### Pivot fields into columns for mathematic calculations
 ```js
-// Custom function that converts usage_user and
-// usage_system columns to floats
-usageToFloat = (tables=<-) =>
-  tables
-    |> map(fn: (r) => ({
-      _time: r._time,
-      usage_user: float(v: r.usage_user),
-      usage_system: float(v: r.usage_system)
-      })
-    )
+data
+  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
+  |> map(fn: (r) => ({ r with
+    _value: (r.field1 + r.field2) / r.field3 * 100.0
+  }))
+```

-// Define the data source and filter user and system CPU usage
-// from 'cpu-total' in the 'cpu' measurement
-from(bucket: "example-bucket")
+##### Join multiple data sources for mathematic calculations
+```js
+import "sql"
+import "influxdata/influxdb/secrets"
+
+pgUser = secrets.get(key: "POSTGRES_USER")
+pgPass = secrets.get(key: "POSTGRES_PASSWORD")
+pgHost = secrets.get(key: "POSTGRES_HOST")
+
+t1 = sql.from(
+  driverName: "postgres",
+  dataSourceName: "postgresql://${pgUser}:${pgPass}@${pgHost}",
+  query:"SELECT id, name, available FROM exampleTable"
+)
+
+t2 = from(bucket: "example-bucket")
  |> range(start: -1h)
  |> filter(fn: (r) =>
-    r._measurement == "cpu" and
-    r._field == "usage_user" or
-    r._field == "usage_system" and
-    r.cpu == "cpu-total"
+    r._measurement == "example-measurement" and
+    r._field == "example-field"
  )

-  // Pivot the output tables so usage_user and usage_system are in each row
-  |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
-
-  // Convert usage_user and usage_system to floats
-  |> usageToFloat()
-
-  // Map over each row and calculate the percentage of
-  // CPU used by the user vs the system
-  |> map(fn: (r) => ({
-      // Preserve existing columns in each row
-      r with
-      usage_user: r.usage_user / (r.usage_user + r.usage_system) * 100.0,
-      usage_system: r.usage_system / (r.usage_user +  r.usage_system) * 100.0
-    })
-  )
+join(tables: {t1: t1, t2: t2}, on: ["id"])
+  |> map(fn: (r) => ({ r with _value: r._value_t2 / r.available_t1 * 100.0 }))
 ```
-{{% /code-tab-content %}}
-
-{{% code-tab-content %}}
-```js
-usageToFloat = (tables=<-) =>
-  tables
-    |> map(fn: (r) => ({
-      _time: r._time,
-      usage_user: float(v: r.usage_user),
-      usage_system: float(v: r.usage_system)
-      })
-    )
-
-from(bucket: "example-bucket")
-  |> range(start: timeRangeStart, stop: timeRangeStop)
-  |> filter(fn: (r) =>
-    r._measurement == "cpu" and
-    r._field == "usage_user" or
-    r._field == "usage_system" and
-    r.cpu == "cpu-total"
-  )
-  |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
-  |> usageToFloat()
-  |> map(fn: (r) => ({
-      r with
-      usage_user: r.usage_user / (r.usage_user + r.usage_system) * 100.0,
-      usage_system: r.usage_system / (r.usage_user +  r.usage_system) * 100.0
-    })
-  )
-```
-{{% /code-tab-content %}}
-{{< /code-tabs-wrapper >}}
--- a/data/query_examples.yml
+++ b/data/query_examples.yml
@ -332,6 +332,38 @@ moving_average:
      | 2020-01-01T00:06:00Z | 1.325  |
      | 2020-01-01T00:06:00Z | 1.150  |

+percentages:
+  -
+    code: |
+      ```js
+      data
+        |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
+        |> map(fn: (r) => ({
+          _time: r._time,
+          _field: "used_percent",
+          _value: float(v: r.used) / float(v: r.total) * 100.0
+        }))
+      ```
+    input: |
+      | _time                | _field  | _value |
+      |:-----                |:------: | ------:|
+      | 2020-01-01T00:00:00Z | used    | 2.5    |
+      | 2020-01-01T00:00:10Z | used    | 3.1    |
+      | 2020-01-01T00:00:20Z | used    | 4.2    |
+
+      | _time                | _field  | _value |
+      |:-----                |:------: | ------:|
+      | 2020-01-01T00:00:00Z | total   | 8.0    |
+      | 2020-01-01T00:00:10Z | total   | 8.0    |
+      | 2020-01-01T00:00:20Z | total   | 8.0    |
+    output: |
+      | _time                | _field       | _value |
+      |:-----                |:------:      | ------:|
+      | 2020-01-01T00:00:00Z | used_percent | 31.25  |
+      | 2020-01-01T00:00:10Z | used_percent | 38.75  |
+      | 2020-01-01T00:00:20Z | used_percent | 52.50  |
+
+
 quantile:
  -
    code: |