Merge pull request #792 from influxdata/flux/geo-package

Flux Geo package
pull/794/head
Scott Anderson 2020-03-05 08:43:32 -07:00 committed by GitHub
commit a4e351111e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
8 changed files with 736 additions and 1 deletions

View File

@ -0,0 +1,124 @@
---
title: Flux Geo package
list_title: Geo package
description: >
The Flux Geo package provides tools for working with geo-temporal data,
such as filtering and grouping by geographic location.
Import the `experimental/geo` package.
menu:
v2_0_ref:
name: Geo
parent: Experimental
weight: 201
v2.0/tags: [functions, package, geo]
---
The Flux Geo package provides tools for working with geo-temporal data,
such as filtering and grouping by geographic location.
Import the `experimental/geo` package:
```js
import "experimental/geo"
```
{{< children type="functions" show="pages" >}}
## Geo schema requirements
The Geo package uses the Go implementation of the [S2 Geometry Library](https://s2geometry.io/).
Functions in the Geo package require the following:
- a **`s2_cell_id` tag** containing an **S2 cell ID as a token** (more information [below](#s2-cell-ids))
- a **`lat` field** containing the **latitude in decimal degrees** (WGS 84)
- a **`lon` field** containing the **longitude in decimal degrees** (WGS 84)
#### Schema recommendations
- a tag that identifies the data source
- a tag that identifies the point type (for example: `start`, `stop`, `via`)
- a field that identifies the track or route (for example: `id`, `tid`)
##### Examples of geo-temporal line protocol
```
taxi,pt=start,s2_cell_id=89c2594 tip=3.75,dist=14.3,lat=40.744614,lon=-73.979424,tid=1572566401123234345i 1572566401947779410
bike,id=biker-007,pt=via,s2_cell_id=89c25dc lat=40.753944,lon=-73.992035,tid=1572588100i 1572567115
```
## S2 Cell IDs
Use **latitude** and **longitude** with the `s2.CellID.ToToken` endpoint of the S2
Geometry Library to generate `s2_cell_id` tags.
Specify your [S2 Cell ID level](https://s2geometry.io/resources/s2cell_statistics.html).
{{% note %}}
For faster filtering, use higher S2 Cell ID levels.
But know that that higher levels increase
[series cardinality](/v2.0/reference/glossary/#series-cardinality).
{{% /note %}}
Language-specific implementations of the S2 Geometry Library provide methods for
generating S2 Cell ID tokens. For example:
- **Go:** [`s2.CellID.ToToken()`](https://godoc.org/github.com/golang/geo/s2#CellID.ToToken)
- **Python:** [`s2sphere.CellId.to_token()`](https://s2sphere.readthedocs.io/en/latest/api.html#s2sphere.CellId)
- **Javascript:** [`s2.cellid.toToken()`](https://github.com/mapbox/node-s2/blob/master/API.md#cellidtotoken---string)
## Region definitions
Many functions in the Geo package filter data based on geographic region.
Define geographic regions using the following shapes:
- [box](#box)
- [circle](#circle)
- [polygon](#polygon)
### box
Define a box-shaped region by specifying an object containing the following properties:
- **minLat:** minimum latitude in decimal degrees (WGS 84) _(Float)_
- **maxLat:** maximum latitude in decimal degrees (WGS 84) _(Float)_
- **minLon:** minimum longitude in decimal degrees (WGS 84) _(Float)_
- **maxLon:** maximum longitude in decimal degrees (WGS 84) _(Float)_
##### Example box-shaped region
```js
{
minLat: 40.51757813,
maxLat: 40.86914063,
minLon: -73.65234375,
maxLon: -72.94921875
}
```
### circle
Define a circular region by specifying an object containing the following properties:
- **lat**: latitude of the circle center in decimal degrees (WGS 84) _(Float)_
- **lon**: longitude of the circle center in decimal degrees (WGS 84) _(Float)_
- **radius**: radius of the circle in kilometers (km) _(Float)_
##### Example circular region
```js
{
lat: 40.69335938,
lon: -73.30078125,
radius: 20.0
}
```
### polygon
Define a custom polygon region using an object containing the following properties:
- **points**: points that define the custom polygon _(Array of objects)_
Define each point with an object containing the following properties:
- **lat**: latitude in decimal degrees (WGS 84) _(Float)_
- **lon**: longitude in decimal degrees (WGS 84) _(Float)_
##### Example polygonal region
```js
{
points: [
{lat: 40.671659, lon: -73.936631},
{lat: 40.706543, lon: -73.749177},
{lat: 40.791333, lon: -73.880327}
]
}
```

View File

@ -0,0 +1,68 @@
---
title: geo.asTracks() function
description: >
The geo.asTracks() function groups rows into tracks (sequential, related data points).
menu:
v2_0_ref:
name: geo.asTracks
parent: Geo
weight: 301
v2.0/tags: [functions, geo]
---
The `geo.asTracks()` function groups rows into tracks (sequential, related data points).
_**Function type:** Transformation_
```js
import "experimental/geo"
geo.asTracks(
groupBy: ["id","tid"],
orderBy: ["_time"]
)
```
## Parameters
### groupBy
Columns to group by.
These columns should uniquely identify each track.
Default is `["id","tid"]`.
_**Data type:** Array of strings_
### orderBy
Column to order results by.
Default is `["_time"]`
_**Data type:** Array of strings_
## Examples
##### Group tracks in a box-shaped region
```js
import "experimental/geo"
region = {
minLat: 40.51757813,
maxLat: 40.86914063,
minLon: -73.65234375,
maxLon: -72.94921875
}
from(bucket: "example-bucket")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "example-measurement")
|> geo.gridFilter(region: region)
|> geo.toRows(correlationKey: ["_time", "id"])
|> geo.asTracks()
```
## Function definition
```js
asTracks = (tables=<-, groupBy=["id","tid"], orderBy=["_time"]) =>
tables
|> group(columns: groupBy)
|> sort(columns: orderBy)
```

View File

@ -0,0 +1,183 @@
---
title: geo.filterRows() function
description: >
The geo.filterRows() function filters data by a specified geographic region with
the option of strict filtering.
menu:
v2_0_ref:
name: geo.filterRows
parent: Geo
weight: 301
v2.0/tags: [functions, geo]
related:
- /v2.0/reference/flux/stdlib/experimental/geo/gridfilter/
- /v2.0/reference/flux/stdlib/experimental/geo/strictfilter/
---
The `geo.filterRows()` function filters data by a specified geographic region with
the option of strict filtering.
This function is a combination of [`geo.gridFilter()`](/v2.0/reference/flux/stdlib/experimental/geo/gridfilter/)
and [`geo.strictFilter()`](/v2.0/reference/flux/stdlib/experimental/geo/strictfilter/).
_**Function type:** Transformation_
```js
import "experimental/geo"
geo.filterRows(
region: {lat: 40.69335938, lon: -73.30078125, radius: 20.0}
minSize: 24,
maxSize: -1,
level: -1,
s2cellIDLevel: -1,
correlationKey: ["_time"],
strict: true
)
```
## Parameters
### region
The region containing the desired data points.
Specify object properties for the shape.
_See [Region definitions](/v2.0/reference/flux/stdlib/experimental/geo/#region-definitions)._
_**Data type:** Object_
### minSize
Minimum number of cells that cover the specified region.
Default is `24`.
_**Data type:** Integer_
### maxSize
Maximum number of cells that cover the specified region.
Default is `-1`.
_**Data type:** Integer_
### level
[S2 cell level](https://s2geometry.io/resources/s2cell_statistics.html) of grid cells.
Default is `-1`.
_**Data type:** Integer_
{{% warn %}}
`level` is mutually exclusive with `minSize` and `maxSize` and must be less than
or equal to `s2cellIDLevel`.
{{% /warn %}}
### s2cellIDLevel
[S2 Cell level](https://s2geometry.io/resources/s2cell_statistics.html) used in `s2_cell_id` tag.
Default is `-1`.
_**Data type:** Integer_
{{% note %}}
When set to `-1`, `geo.filterRows()` attempts to automatically detect the S2 Cell ID level.
{{% /note %}}
### correlationKey
List of columns used to uniquely identify a row for output.
Default is `["_time"]`.
_**Data type:** Array of strings_
### strict
Enable strict geographic data filtering which filters points by longitude (`lon`) and latitude (`lat`).
For S2 grid cells that are partially covered by the defined region, only points
with coordinates in the defined region are returned.
Default is `true`.
_**Data type:** Boolean_
## Examples
##### Strictly filter data in a box-shaped region
```js
import "experimental/geo"
from(bucket: "example-bucket")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "example-measurement")
|> geo.filterRows(
region: {
minLat: 40.51757813,
maxLat: 40.86914063,
minLon: -73.65234375,
maxLon: -72.94921875
}
)
```
##### Approximately filter data in a circular region
The following example returns points with coordinates located in S2 grid cells partially
covered by the defined region even though some points my be located outside of the region.
```js
import "experimental/geo"
from(bucket: "example-bucket")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "example-measurement")
|> geo.filterRows(
region: {
lat: 40.69335938,
lon: -73.30078125,
radius: 20.0
}
strict: false
)
```
##### Filter data in a polygonal region
```js
import "experimental/geo"
from(bucket: "example-bucket")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "example-measurement")
|> geo.filterRows(
region: {
points: [
{lat: 40.671659, lon: -73.936631},
{lat: 40.706543, lon: -73.749177},
{lat: 40.791333, lon: -73.880327}
]
}
)
```
## Function definition
{{% truncate %}}
```js
filterRows = (
tables=<-,
region,
minSize=24,
maxSize=-1,
level=-1,
s2cellIDLevel=-1,
correlationKey=["_time"],
strict=true
) => {
_rows =
tables
|> gridFilter(
region,
minSize: minSize,
maxSize: maxSize,
level: level,
s2cellIDLevel: s2cellIDLevel
)
|> toRows(correlationKey)
_result =
if strict then
_rows
|> strictFilter(region)
else
_rows
return _result
}
```
{{% /truncate %}}

View File

@ -0,0 +1,134 @@
---
title: geo.gridFilter() function
description: >
The geo.gridFilter() function filters data by a specified geographic region.
menu:
v2_0_ref:
name: geo.gridFilter
parent: Geo
weight: 301
v2.0/tags: [functions, geo]
related:
- /v2.0/reference/flux/stdlib/experimental/geo/strictfilter/
- /v2.0/reference/flux/stdlib/experimental/geo/filterRows/
---
The `geo.gridFilter()` function filters data by a specified geographic region.
It compares input data to a set of S2 Cell ID tokens located in the specified [region](#region).
{{% note %}}
S2 Grid cells may not perfectly align with the defined region, so results may include
data with coordinates outside the region, but inside S2 grid cells partially covered by the region.
Use [`toRows()`](/v2.0/reference/flux/stdlib/experimental/geo/toRows/) and
[`geo.strictFilter()`](/v2.0/reference/flux/stdlib/experimental/geo/strictfilter/)
after `geo.gridFilter()` to precisely filter points.
{{% /note %}}
_**Function type:** Transformation_
```js
import "experimental/geo"
geo.gridFilter(
region: {lat: 40.69335938, lon: -73.30078125, radius: 20.0}
minSize: 24,
maxSize: -1,
level: -1,
s2cellIDLevel: -1
)
```
## Parameters
### region
The region containing the desired data points.
Specify object properties for the shape.
_See [Region definitions](/v2.0/reference/flux/stdlib/experimental/geo/#region-definitions)._
_**Data type:** Object_
### minSize
Minimum number of cells that cover the specified region.
Default is `24`.
_**Data type:** Integer_
### maxSize
Maximum number of cells that cover the specified region.
Default is `-1`.
_**Data type:** Integer_
### level
[S2 cell level](https://s2geometry.io/resources/s2cell_statistics.html) of grid cells.
Default is `-1`.
_**Data type:** Integer_
{{% warn %}}
`level` is mutually exclusive with `minSize` and `maxSize` and must be less than
or equal to `s2cellIDLevel`.
{{% /warn %}}
### s2cellIDLevel
[S2 Cell level](https://s2geometry.io/resources/s2cell_statistics.html) used in `s2_cell_id` tag.
Default is `-1`.
_**Data type:** Integer_
{{% note %}}
When set to `-1`, `gridFilter()` attempts to automatically detect the S2 Cell ID level.
{{% /note %}}
## Examples
##### Filter data in a box-shaped region
```js
import "experimental/geo"
from(bucket: "example-bucket")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "example-measurement")
|> geo.gridFilter(
region: {
minLat: 40.51757813,
maxLat: 40.86914063,
minLon: -73.65234375,
maxLon: -72.94921875
}
)
```
##### Filter data in a circular region
```js
import "experimental/geo"
from(bucket: "example-bucket")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "example-measurement")
|> geo.gridFilter(
region: {
lat: 40.69335938,
lon: -73.30078125,
radius: 20.0
}
)
```
##### Filter data in a custom polygon region
```js
import "experimental/geo"
from(bucket: "example-bucket")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "example-measurement")
|> geo.gridFilter(
region: {
points: [
{lat: 40.671659, lon: -73.936631},
{lat: 40.706543, lon: -73.749177},
{lat: 40.791333, lon: -73.880327}
]
}
)
```

View File

@ -0,0 +1,70 @@
---
title: geo.groupByArea() function
description: >
The geo.groupByArea() function groups rows by geographic area.
menu:
v2_0_ref:
name: geo.groupByArea
parent: Geo
weight: 301
v2.0/tags: [functions, geo]
---
The `geo.groupByArea()` function groups rows by geographic area.
Area sizes are determined by the specified [`level`](#level).
Each geographic area is assigned a unique identifier which is stored in the [`newColumn`](#newcolumn).
Results are grouped by `newColumn`.
_**Function type:** Transformation_
```js
import "experimental/geo"
geo.groupByArea(
newColumn: "geoArea",
level: 3,
s2cellIDLevel: -1
)
```
## Parameters
### newColumn
Name of the new column that stores the unique identifier for a geographic area.
_**Data type:** String_
### level
[S2 Cell level](https://s2geometry.io/resources/s2cell_statistics.html) used
to determine the size of each geographic area.
_**Data type:** Integer_
### s2cellIDLevel
[S2 Cell level](https://s2geometry.io/resources/s2cell_statistics.html) used in `s2_cell_id` tag.
Default is `-1`.
_**Data type:** Integer_
{{% note %}}
When set to `-1`, `geo.groupByArea()` attempts to automatically detect the S2 Cell ID level.
{{% /note %}}
## Examples
```js
import "experimental/geo"
region = {
minLat: 40.51757813,
maxLat: 40.86914063,
minLon: -73.65234375,
maxLon: -72.94921875
}
from(bucket: "example-bucket")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "example-measurement")
|> geo.gridFilter(region: region)
|> geo.toRows()
|> geo.groupByArea(newColumn: "geoArea", level: 3)
```

View File

@ -0,0 +1,100 @@
---
title: geo.strictFilter() function
description: >
The geo.strictFilter() function filters data by latitude and longitude.
menu:
v2_0_ref:
name: geo.strictFilter
parent: Geo
weight: 301
v2.0/tags: [functions, geo]
related:
- /v2.0/reference/flux/stdlib/experimental/geo/gridfilter/
- /v2.0/reference/flux/stdlib/experimental/geo/filterRows/
- /v2.0/reference/flux/stdlib/experimental/geo/toRows/
---
The `geo.strictFilter()` function filters data by latitude and longitude in a specified region.
This filter is more strict than [`geo.gridFilter()`](/v2.0/reference/flux/stdlib/experimental/geo/gridfilter/),
but for the best performance, use `geo.strictFilter()` **after** `geo.gridFilter()`.
{{% note %}}
`geo.strictFilter()` requires `lat` and `lon` columns in each row.
Use [`geo.toRows()`](/v2.0/reference/flux/stdlib/experimental/geo/gridfilter/)
to pivot `lat` and `lon` fields into each row **before** using `geo.strictFilter()`.
{{% /note %}}
_**Function type:** Transformation_
```js
import "experimental/geo"
geo.strictFilter(
region: {lat: 40.69335938, lon: -73.30078125, radius: 20.0}
)
```
## Parameters
### region
The region containing the desired data points.
Specify object properties for the shape.
_See [Region definitions](/v2.0/reference/flux/stdlib/experimental/geo/#region-definitions)._
_**Data type:** Object_
## Examples
##### Filter data in a box-shaped region
```js
import "experimental/geo"
from(bucket: "example-bucket")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "example-measurement")
|> geo.toRows()
|> geo.strictFilter(
region: {
minLat: 40.51757813,
maxLat: 40.86914063,
minLon: -73.65234375,
maxLon: -72.94921875
}
)
```
##### Filter data in a circular region
```js
import "experimental/geo"
from(bucket: "example-bucket")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "example-measurement")
|> geo.toRows()
|> geo.strictFilter(
region: {
lat: 40.69335938,
lon: -73.30078125,
radius: 20.0
}
)
```
##### Filter data in a custom polygon region
```js
import "experimental/geo"
from(bucket: "example-bucket")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "example-measurement")
|> geo.toRows()
|> geo.strictFilter(
region: {
points: [
{lat: 40.671659, lon: -73.936631},
{lat: 40.706543, lon: -73.749177},
{lat: 40.791333, lon: -73.880327}
]
}
)
```

View File

@ -0,0 +1,56 @@
---
title: geo.toRows() function
description: >
The geo.toRows() function ...
menu:
v2_0_ref:
name: geo.toRows
parent: Geo
weight: 301
v2.0/tags: [functions, geo]
related:
- /v2.0/reference/flux/stdlib/built-in/transformations/pivot/
---
The `geo.toRows()` function pivots data into row-wise sets base on time or other correlation columns.
For geo-temporal datasets, output rows include `lat` and `lon` columns required by
many Geo package functions.
_**Function type:** Transformation_
```js
import "experimental/geo"
geo.toRows(
correlationKey: ["_time"]
)
```
## Parameters
### correlationKey
List of columns used to uniquely identify a row for output.
Default is `["_time"]`.
_**Data type:** Array of strings_
## Examples
```js
import "experimental/geo"
from(bucket: "example-bucket")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "example-measurement")
|> geo.toRows()
```
## Function definition
```js
toRows = (tables=<-, correlationKey=["_time"]) =>
tables
|> pivot(
rowKey: correlationKey,
columnKey: ["_field"],
valueColumn: "_value"
)
```

View File

@ -6,7 +6,7 @@
{{ else if (eq $show "sections") }}
{{ .Scratch.Set "pages" .Page.Sections }}
{{ else if (eq $show "pages") }}
{{ .Scratch.Set "pages" .Page.Pages }}
{{ .Scratch.Set "pages" .Page.RegularPages }}
{{ end }}
{{ $pages := .Scratch.Get "pages" }}