fleshed out custom aggregate function doc, resolves #149

2019-04-23 17:26:53 -06:00 · 2019-04-23 17:26:53 -06:00 · 4fd2a33516
parent 1a89406071
commit 4fd2a33516
3 changed files with 249 additions and 43 deletions
--- a/content/v2.0/query-data/guides/custom-aggregate.md
+++ b/content/v2.0/query-data/guides/custom-aggregate.md
@ -1,43 +0,0 @@
 ---
 title: Create custom aggregate functions
 description: Create your own custom aggregate functions in Flux using the `reduce()` function. transform and manipulate data.
 v2.0/tags: [functions, custom, flux]
 menu:
  v2_0:
    name: Create custom aggregates
    parent: How-to guides
 weight: 208
 ---
 ## Characteristics of an aggregate
 Before creating a custom aggregate function, you must understand the characteristics of an aggregate function.
 - Takes all records/rows in a table and combines them into a single row.
 - Aggregates operator on input tables individually.
 ## How reduce() works
 - You start with an identity. It's an object. The identity defines the initial values for the `accumulator` parameter in the reduce `fn`.
 - The identity/accumulator object is passed through the reduce function and transformed. It outputs a new accumulator object with updated values for the same parameters.
 - The new object is passed back into the reduce function.
 - This cycle repeats until all records in the table have been read and modified.
 - It produces a table with a single record.
 ## Examples
 ### Average
 ```js
 average(tables=<-, outputField="average") =>
  tables
    |> reduce(
      identity: {sum: 0.0, count: 1.0, avg: 0.0}
      fn: (r, accumulator) => ({
        sum: accumulator.sum + r._value,
        count: accumulator.count + 1.0,
        avg: accumulator.sum / accumulator.count
      })
    )    
    |> set(key: "_field", value: outputField)
    |> duplicate(column: "avg", as: "_value")
 ```
 **Note:** `accumulator.x` represents the x attribute of the accumulator object before it is passed back into the `fn`.
--- a/content/v2.0/query-data/guides/custom-functions/_index.md
+++ b/content/v2.0/query-data/guides/custom-functions/_index.md
--- a/content/v2.0/query-data/guides/custom-functions/custom-aggregate.md
+++ b/content/v2.0/query-data/guides/custom-functions/custom-aggregate.md
@ -0,0 +1,249 @@
 ---
 title: Create custom aggregate functions
 description: Create your own custom aggregate functions in Flux using the `reduce()` function.
 v2.0/tags: [functions, custom, flux, aggregates]
 menu:
  v2_0:
    name: Custom aggregate functions
    parent: Create custom functions
 weight: 301
 ---
 Flux provides a number of built-in [aggregate functions](/v2.0/reference/flux/functions/built-in/transformations/aggregates/)
 that aggregate data in specific ways.
 However, these built-in functions may not meet your specific needs.
 The [`reduce()` function](/v2.0/reference/flux/functions/built-in/transformations/aggregates/reduce/)
 function provides a way to create custom aggregate functions in Flux.
 ## Aggregate function characteristics
 Aggregate functions all have the same basic characteristics:
 - They operate on individual input tables and aggregate a table's records into a single record.
 - The output table has the same group key as the input table.
 ## How reduce() works
 The `reduce()` function operates on one row at a time using the function defined in
 the [`fn` parameter](/v2.0/reference/flux/functions/built-in/transformations/aggregates/reduce/#fn).
 The function maps keys to specific values using two objects specified by the
 following parameters:
 | Parameter     | Description                                                              |
 |:---------:    |:-----------                                                              |
 | `r`           | An object that represents the row or record.                             |
 | `accumulator` | An object that contains values used in each row's aggregate calculation. |
 The `reduce()` function's [`identity` parameter](/v2.0/reference/flux/functions/built-in/transformations/aggregates/reduce/#identity)
 defines the initial `accumulator` object.
 ##### Example reduce() function
 ```js
 |> reduce(fn: (r, accumulator) => ({
    sum: r._value + accumulator.sum,
    product: r._value * accumulator.product
  })
  identity: {sum: 0.0, product: 1.0}
 )
 ```
 After processing a row, `reduce()` produces an output object and uses it as the
 `accumulator` object when processing the next row.
 {{% note %}}
 Because `reduce()` uses the output object as the accumulator when processing the next row,
 keys mapped in the `reduce()` function must match the `identity` object's keys.
 {{% /note %}}
 This cycle repeats until `reduce()` processes all records in the table.
 When it processes the last record, it outputs a table containing a single record
 with columns for each mapped key.
 ### Reduce process example
 The example `reduce()` function [above](#example-reduce-function), which produces
 a sum and product of all values in a table, would work as follows:
 ##### Sample table
 ```txt
                  _time   _value
 -----------------------  -------
 2019-04-23T16:10:49.00Z      1.6
 2019-04-23T16:10:59.00Z      2.3
 2019-04-23T16:11:09.00Z      0.7
 2019-04-23T16:11:19.00Z      1.2
 2019-04-23T16:11:29.00Z      3.8
 ```
 #### Processing the first row
 `reduce()` uses the row data to define `r` and the `identity` object to define `accumulator`.
 ```
 Input Objects
 -------------
 r: { _time: 2019-04-23T16:10:49.00Z, _value: 1.6 }
 accumulator: { sum: 0.0, product: 1.0 }
 Mappings
 --------
 sum: 1.6 + 0.0
 product: 1.6 * 1.0
 Output Object
 -------------
 { sum: 1.6, product: 1.6 }
 ```
 #### Processing the second row
 `reduce()` uses the output object from the first row as the `accumulator` object
 when processing the second row:
 ```
 Input Objects
 -------------
 r: { _time: 2019-04-23T16:10:59.00Z, _value: 2.3 }
 accumulator: { sum: 1.6, product: 1.6 }
 Mappings
 --------
 sum: 2.3 + 1.6
 product: 2.3 * 1.6
 Output Object
 -------------
 { sum: 3.9, product: 3.68 }
 ```
 #### Processing all other rows
 The cycle continues until all other rows are processed.
 Using the [sample table](#sample-table), the final output object would be:
 ##### Final output object
 ```txt
 { sum: 9.6, product: 11.74656 }
 ```
 And the output table would look like:
 ##### Output table
 ```txt
 sum    product
 ----  ---------
 9.6   11.74656
 ```
 {{% note %}}
 Because `_time` is not part of the group key and is not mapped in the `reduce()` function,
 it is dropped from the output table.
 {{% /note %}}
 ## Custom aggregate function examples
 ### Custom averaging function
 This example illustrates how to create a custom aggregate function that averages values in a table.
 However, the built-in [`mean()` function](/v2.0/reference/flux/functions/built-in/tranformations/aggregates/mean/)
 does the same thing and is much more performant.
 {{< code-tabs-wrapper >}}
 {{% code-tabs %}}
 [Comments](#)
 [No Comments](#)
 {{% /code-tabs %}}
 {{% code-tab-content %}}
 ```js
 average = (tables=<-, outputField="average") =>
  tables
    |> reduce(
      // Define the initial accumulator object
      identity: {
        count: 1.0,
        sum:   0.0,
        avg:   0.0
      }
      fn: (r, accumulator) => ({
        // Increment the counter on each reduce loop
        count: accumulator.count + 1.0,
        // Add the _value to the existing sum
        sum:   accumulator.sum + r._value,
        // Divide the existing sum by the existing count for a new average
        avg:   accumulator.sum / accumulator.count
      })
    )
    // Drop the sum and count columns since they are no longer needed
    |> drop(columns: ["sum", "count"])
    // Set the _field column of the output table to whatever
    // is provided in the outputField parameter
    |> set(key: "_field", value: outputField)
    // Rename to avg column to _value
    |> rename(columns: {avg: "_value"})
 ```
 {{% /code-tab-content %}}
 {{% code-tab-content %}}
 ```js
 average = (tables=<-, outputField="average") =>
  tables
    |> reduce(
      identity: {
        count: 1.0,
        sum:   0.0,
        avg:   0.0
      }
      fn: (r, accumulator) => ({
        count: accumulator.count + 1.0,
        sum:   accumulator.sum + r._value,
        avg:   accumulator.sum / accumulator.count
      })
    )
    |> drop(columns: ["sum", "count"])    
    |> set(key: "_field", value: outputField)
    |> rename(columns: {avg: "_value"})
 ```
 {{% /code-tab-content %}}
 {{< /code-tabs-wrapper >}}
 ### Aggregate multiple columns
 Built-in aggregate functions only operated on one column.
 To aggregate multiple columns at once, use the `reduce()` function to create a custom aggregate function.
 The following function expects input tables to have `c1_value` and `c2_value` columns
 and generates the average for each.
 ```js
 multiAvg = (tables=<-) =>
  tables
    |> reduce(
      identity: {
        count:  1.0,
        c1_sum: 0.0,
        c1_avg: 0.0,
        c2_sum: 0.0,
        c2_avg: 0.0
      }
      fn: (r, accumulator) => ({
        count:  accumulator.count + 1.0,
        c1_sum: accumulator.c1_sum + r.c1_value,
        c1_avg: accumulator.c1_sum / accumulator.count,
        c2_sum: accumulator.c2_sum + r.c2_value,
        c2_avg: accumulator.c2_sum / accumulator.count
      })
    )
 ```
 ### Aggregate gross and net profit
 This example aggregates gross and net profit.
 It expects `profit` and `expenses` columns in the input tables.
 ```js
 profitSummary = (tables=<-) =>
  tables
    |> reduce(
      identity: {
        gross: 0.0,
        net:   0.0
      }
      fn: (r, accumulator) => ({
        gross: accumulator.gross + r.profit,
        net:   accumulator.net + r.profit - r.expenses
      })
    )
 ```