Remove all my old design documentation
parent
302d2435a5
commit
99ad1bf228
218
docs/design.md
218
docs/design.md
|
@ -1,218 +0,0 @@
|
|||
## Chronograf
|
||||
[TOC]
|
||||
|
||||
### Design Philosophy
|
||||
|
||||
1. Present uniform interface to front-end covering Plutonium and InfluxDB OSS offerings.
|
||||
2. Simplify the front-end interaction with time-series database.
|
||||
3. Ease of setup and use.
|
||||
4. Extensible as base for future applications
|
||||
5. There will be an open source version of this.
|
||||
7. Stress Parallel Development across all teams.
|
||||
8. First class support of on-prem.
|
||||
9. Release to cloud first.
|
||||
|
||||
### Initial Goals
|
||||
1. Produce pre-canned graphs for devops telegraf data for docker containers or system stats.
|
||||
2. Up and running in 2 minutes
|
||||
3. User administration for Influx Enterprise.
|
||||
4. Leverage our existing enterprise front-end code.
|
||||
5. Leverage lessons-learned for enterprise back-end code.
|
||||
|
||||
### Versions
|
||||
|
||||
Each version will contain more and more features around monitoring various devops components.
|
||||
|
||||
#### Features
|
||||
|
||||
1. v1
|
||||
- Data explorer for both OSS and Enterprise
|
||||
- Dashboards for telegraf system metrics
|
||||
- User and Role adminstration
|
||||
- Proxy queries over OSS and Enterprise
|
||||
- Authenticate against OSS/Enterprise
|
||||
|
||||
2. v2
|
||||
- Telegraf agent service
|
||||
- Additional Dashboards for telegraf agent
|
||||
|
||||
### Supported Versions of TICK Stack
|
||||
We will only support 1.0 of the TICK stack.
|
||||
|
||||
|
||||
### Closed source vs Open Source
|
||||
|
||||
- Ideally, we would use the soon-to-be open source plutonium client to interact with Influx Enterprise. This would mean that this application could be entirely open source. (We should check with Todd and Nate.)
|
||||
- However, if in the future we want to deliver a closed source version, we'll use the open source version as a library. The open source library will define certain routes (/users, /whatever); the closed source version will either override those routes, or add new ones. This implies that the closed source version is simply additional or manipulated routes on the server.
|
||||
- Survey the experience of closed source with Jason and Nathaniel.
|
||||
|
||||
### Repository
|
||||
|
||||
#### Structure
|
||||
Both the javascript and go source will be in the same repository.
|
||||
|
||||
#### Builds
|
||||
Javascript build will be decoupled from Go build process.
|
||||
|
||||
Asset compilation will happen during build of backend-server.
|
||||
|
||||
This allows the front-end team to swap in mocked, auto-generated swagger backend for testing and development.
|
||||
|
||||
##### Javascript
|
||||
Webpack
|
||||
Static asset compilation during backend-server build.
|
||||
|
||||
|
||||
##### Go
|
||||
|
||||
We'll use GDM as the vendoring solution to maintain consistency with other pieces of TICK stack.
|
||||
|
||||
*Future work*: we must switch to the community vendoring solution when it actually seems mature.
|
||||
|
||||
### API
|
||||
|
||||
#### REST
|
||||
We'll use swagger interface definition to specify API and JSON validation. The goal is to emphasize designing to an interface facilitating parallel development.
|
||||
|
||||
At first, we'll autogenerate the http server in go from the swagger definition. This will free the team up from implementing the http validation, etc. and work strictly on
|
||||
business logic. Towards the end of the development cycle we'll implement the http routing and JSON validation.
|
||||
|
||||
#### Queries
|
||||
|
||||
Features would include:
|
||||
|
||||
1. Load balancing against all data nodes in cluster.
|
||||
1. Formatting the output results to be simple to use in frontend.
|
||||
1. Decimating the results to minimize network traffic.
|
||||
1. Use parameters to move query time range.
|
||||
1. Allow different types of response protocols (http GET, websocket, etc.).
|
||||
|
||||
- **`/proxy`:** used to send queries directly to the Influx backend. They should be most useful for the data explorer or other ad hoc query functionality.
|
||||
|
||||
##### `/proxy` Queries
|
||||
|
||||
Queries to the `/proxy` endpoint do not create new REST resources. Instead, it returns results of the query.
|
||||
|
||||
This endpoint uses POST with a JSON object to specify the query and the parameters. The endpoint's response will be the results of the query, or, the errors from the backend InfluxDB.
|
||||
|
||||
Errors in the 4xx range come from the Influxdb data source.
|
||||
|
||||
```sequence
|
||||
App->/proxy: POST query
|
||||
Note right of /proxy: Query Validation
|
||||
Note right of /proxy: Load balance query
|
||||
/proxy->Influx/Relay/Cluster: SELECT
|
||||
Influx/Relay/Cluster-->/proxy: Time Series
|
||||
Note right of /proxy: Format
|
||||
/proxy-->App: Formatted results
|
||||
```
|
||||
|
||||
Request:
|
||||
|
||||
```http
|
||||
POST /enterprise/v1/sources/{id}/proxy HTTP/1.1
|
||||
Accept: application/json
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"query": "SELECT * from telegraf where time > $value",
|
||||
"format": "dygraph",
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```http
|
||||
HTTP/1.1 200 OK
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"results": "..."
|
||||
}
|
||||
```
|
||||
|
||||
Error Response:
|
||||
|
||||
```http
|
||||
HTTP/1.1 400 Bad Request
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"code": 400,
|
||||
"message": "error parsing query: found..."
|
||||
}
|
||||
```
|
||||
|
||||
##### Load balancing
|
||||
|
||||
Use simple round robin load balancing requests to data nodes.
|
||||
Discover active data nodes using Plutonium meta client.
|
||||
|
||||
#### Backend-server store
|
||||
We will build a interface for storing API resources.
|
||||
|
||||
Some API resources could come from the influx data source (like users) most will be stored in a key/value or relational store.
|
||||
|
||||
Version 1.1 will use boltdb as the key/value store.
|
||||
|
||||
Future versions will support more HA data stores.
|
||||
|
||||
##### Objects
|
||||
|
||||
1. Data source
|
||||
|
||||
- Version 1.1 will have only one data source.
|
||||
- InfluxDB
|
||||
- InfluxDB Enterprise (this means clustering)
|
||||
- InfluxDB relay possibly.
|
||||
- Will provide meta data describing the data source (e.g. number of nodes)
|
||||
|
||||
1. User
|
||||
|
||||
- Version 1.1 will be a one-to-one mapping to influx.
|
||||
|
||||
1. Dashboards
|
||||
|
||||
- precanned dashboards for telegraf
|
||||
- Includes location of query resources.
|
||||
|
||||
1. Queries
|
||||
- Used to construct influxql.
|
||||
|
||||
1. Sessions
|
||||
- We could simply use the JWT token as the session information
|
||||
|
||||
1. Server Configuration
|
||||
|
||||
- Any setting that would normally in TICK stack land be in a file, we'll expose through an updatable API.
|
||||
- License/Organization info, modules(pre-canned dash, query builder, customer dash, config builder), usage and debug history/info, support contacts
|
||||
|
||||
#### Authentication
|
||||
|
||||
We want the backend data store (influx oss or influx meta) handle the authentication so that the web server has less responsibility.
|
||||
|
||||
We'll use JWT throughout.
|
||||
|
||||
### Testing
|
||||
Talk with Mark and Michael and talk about larger efforts. This will impact the repository layout.
|
||||
There is a potentially large testing matrix of components.
|
||||
|
||||
#### Integration Testing
|
||||
Because we are pulling together so many TICK stack components we will need strong integration testing.
|
||||
|
||||
- Stress testing.
|
||||
- Benchmark pathological queries
|
||||
- End to end testing. Telegraf -> Plutonium -> Chronograf = expected graph.
|
||||
- Would be nice to translate user stories to integration tests.
|
||||
- If someone finds a bug in the integration we need a test so it will never happen again.
|
||||
- Upgrade testing.
|
||||
- Version support.
|
||||
|
||||
#### Usability and Experience Testing
|
||||
|
||||
1. Owned by design team.
|
||||
1. We are trying to attract the devops crowd.
|
||||
|
||||
- Deployment experience
|
||||
- Ease of use.
|
||||
- Speed to accomplish task, e.g. find specific info, change setting.
|
|
@ -1,261 +0,0 @@
|
|||
## TL;DR
|
||||
* Use kapacitor `vars` JSON structure as serialization format.
|
||||
* Create chronograf endpoint that generates kapacitor tickscripts for all the different UI options.
|
||||
|
||||
### Proposal
|
||||
|
||||
1. Use kapacitors `vars` JSON structure as the definition of the alert
|
||||
2. Create service that generates tickscripts.
|
||||
|
||||
Currently, there are several alert "triggers" in the wireframe including:
|
||||
|
||||
* threshold
|
||||
* relative value
|
||||
* deadman
|
||||
|
||||
Also, there are several alert destinations like
|
||||
|
||||
* slack
|
||||
* pagerduty
|
||||
* victorops
|
||||
|
||||
Finally, the type of kapacitor tickscript needs to be specified as either:
|
||||
* `batch`
|
||||
* `stream`
|
||||
|
||||
The generator would take input like this.
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "I'm so triggered",
|
||||
"version": "1",
|
||||
"trigger": "threshold",
|
||||
"alerts": ["slack"],
|
||||
"type": "stream",
|
||||
"vars": {
|
||||
"database": {"type" : "string", "value" : "telegraf" },
|
||||
"rp": {"type" : "string", "value" : "autogen" },
|
||||
"measurement": {"type" : "string", "value" : "disk" },
|
||||
"where_filter": {"type": "lambda", "value": "\"cpu\" == 'cpu-total'"},
|
||||
"groups": {"type": "list", "value": [{"type":"string", "value":"host"},{"type":"string", "value":"dc"}]},
|
||||
"field": {"type" : "string", "value" : "used_percent" },
|
||||
"crit": {"type" : "lambda", "value" : "\"stat\" > 92.0" },
|
||||
"window": {"type" : "duration", "value" : "10s" },
|
||||
"file": {"type" : "string", "value" : "/tmp/disk_alert_log.txt" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
and would produce:
|
||||
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "I'm so triggered",
|
||||
"version": "1",
|
||||
"trigger": "threshold",
|
||||
"alerts": ["slack"],
|
||||
"type": "stream",
|
||||
"script": "...",
|
||||
"vars": {
|
||||
"database": {"type" : "string", "value" : "telegraf" },
|
||||
"rp": {"type" : "string", "value" : "autogen" },
|
||||
"measurement": {"type" : "string", "value" : "disk" },
|
||||
"where_filter": {"type": "lambda", "value": "\"cpu\" == 'cpu-total'"},
|
||||
"groups": {"type": "list", "value": [{"type":"string", "value":"host"},{"type":"string", "value":"dc"}]},
|
||||
"field": {"type" : "string", "value" : "used_percent" },
|
||||
"crit": {"type" : "lambda", "value" : "\"stat\" > 92.0" },
|
||||
"window": {"type" : "duration", "value" : "10s" },
|
||||
"file": {"type" : "string", "value" : "/tmp/disk_alert_log.txt" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The cool thing is that the `script`, `vars`, and `type` field can be used directly when POSTing to `/kapacitor/v1/tasks`.
|
||||
|
||||
### kapacitor vars
|
||||
kapacitor `vars` looks like this:
|
||||
|
||||
```json
|
||||
{
|
||||
"field_name" : {
|
||||
"value": <VALUE>,
|
||||
"type": <TYPE>,
|
||||
"description": "my cool comment"
|
||||
},
|
||||
"another_field" : {
|
||||
"value": <VALUE>,
|
||||
"type": <TYPE>,
|
||||
"description": "can I be a cool comment?"
|
||||
}
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
The following is a table of valid types and example values.
|
||||
|
||||
| Type | Example Value | Description |
|
||||
| ---- | ------------- | ----------- |
|
||||
| bool | true | "true" or "false" |
|
||||
| int | 42 | Any integer value |
|
||||
| float | 2.5 or 67 | Any numeric value |
|
||||
| duration | "1s" or 1000000000 | Any integer value interpretted in nanoseconds or an influxql duration string, (i.e. 10000000000 is 10s) |
|
||||
| string | "a string" | Any string value |
|
||||
| regex | "^abc.*xyz" | Any string value that represents a valid Go regular expression https://golang.org/pkg/regexp/ |
|
||||
| lambda | "\"value\" > 5" | Any string that is a valid TICKscript lambda expression |
|
||||
| star | "" | No value is required, a star type var represents the literal `*` in TICKscript (i.e. `.groupBy(*)`) |
|
||||
| list | [{"type": TYPE, "value": VALUE}] | A list of var objects. Currently lists may only contain string or star vars |
|
||||
|
||||
See the [kapacitor](https://github.com/influxdata/kapacitor/blob/master/client/API.md#vars) var API documentation.
|
||||
|
||||
## Tickscripts
|
||||
Create kapacitor scripts like this:
|
||||
|
||||
This script will take the last 10s of disk info and average the used_percent per host. If the average used_percent is greater than 92% it will crit and log to a file.
|
||||
|
||||
```javascript
|
||||
stream
|
||||
|from()
|
||||
.database('telegraf')
|
||||
.retentionPolicy('autogen')
|
||||
.measurement('disk')
|
||||
.groupBy('host')
|
||||
|window()
|
||||
.period(10s)
|
||||
.every(10s)
|
||||
|mean('used_percent')
|
||||
.as('stat')
|
||||
|alert()
|
||||
.id('{{ index .Tags "host"}}/disk_used')
|
||||
.message('{{ .ID }}:{{ index .Fields "stat" }}')
|
||||
.crit(lambda: "stat" > 92)
|
||||
.log('/tmp/disk_alert_log.txt')
|
||||
```
|
||||
|
||||
|
||||
### Variables
|
||||
Also, kapacitor has the ability to create variables. So, good style would be:
|
||||
|
||||
```javascript
|
||||
var crit = 92
|
||||
var period = 10s
|
||||
var every = 10s
|
||||
|
||||
stream
|
||||
|from()
|
||||
.database('telegraf')
|
||||
.retentionPolicy('autogen')
|
||||
.measurement('disk')
|
||||
.groupBy('host')
|
||||
|window()
|
||||
.period(period)
|
||||
.every(every)
|
||||
|mean('used_percent')
|
||||
.as('stat')
|
||||
|alert()
|
||||
.id('{{ index .Tags "host"}}/disk_used')
|
||||
.message('{{ .ID }}:{{ index .Fields "stat" }}')
|
||||
.crit(lambda: "stat" > crit)
|
||||
.log('/tmp/disk_alert_log.txt')
|
||||
```
|
||||
|
||||
|
||||
When using the kapacitor API, to define this script:
|
||||
|
||||
```http
|
||||
|
||||
POST /kapacitor/v1/tasks HTTP/1.1
|
||||
|
||||
{
|
||||
"id" : "TASK_ID",
|
||||
"type" : "stream",
|
||||
"dbrps": [{"db": "telegraf", "rp" : "autogen"}],
|
||||
"script": "stream | \n from() \n .database('telegraf') \n .retentionPolicy('autogen') \n .measurement('disk') \n .groupBy('host') | \n window() \n .period(period) \n .every(every) | \n mean('used_percent') \n .as('stat') | \n alert() \n .id('{{ index .Tags \"host\"}}/disk_used') \n .message('{{ .ID }}:{{ index .Fields \"stat\" }}') \n .crit(lambda: \"stat\" > crit) \n .log('/tmp/disk_alert_log.txt') \n ",
|
||||
"vars" : {
|
||||
"crit": {
|
||||
"value": 92,
|
||||
"type": "int"
|
||||
},
|
||||
"period": {
|
||||
"value": "10s",
|
||||
"type": "duration"
|
||||
},
|
||||
"every": {
|
||||
"value": "10s",
|
||||
"type": "duration"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Templates
|
||||
|
||||
However, kapacitor also has templates. Templates are used to decouple the data from the script itself. However, the template structure does not allow nodes to be templated.
|
||||
|
||||
```javascript
|
||||
// Which database to use
|
||||
var database string
|
||||
// DB's retention policy
|
||||
var rp = 'autogen'
|
||||
// Which measurement to consume
|
||||
var measurement string
|
||||
// Optional where filter
|
||||
var where_filter = lambda: TRUE
|
||||
// Optional list of group by dimensions
|
||||
var groups = 'host'
|
||||
// Which field to process
|
||||
var field string
|
||||
// Critical criteria, has access to 'mean' field
|
||||
var crit lambda
|
||||
// How much data to window
|
||||
var window duration
|
||||
// File for the alert
|
||||
var file = '/tmp/disk_alert_log.txt'
|
||||
// ID of the alert
|
||||
var id = '{{ index .Tags "host"}}/disk_used'
|
||||
// message of the alert
|
||||
var message = '{{ .ID }}:{{ index .Fields "stat" }}'
|
||||
|
||||
stream
|
||||
|from()
|
||||
.database(database)
|
||||
.retentionPolicy(rp)
|
||||
.measurement(measurement)
|
||||
.where(where_filter)
|
||||
.groupBy(groups)
|
||||
|window()
|
||||
.period(window)
|
||||
.every(window)
|
||||
|mean(field)
|
||||
.as('stat')
|
||||
|alert()
|
||||
.id(id)
|
||||
.message(message)
|
||||
.crit(crit)
|
||||
.log(file)
|
||||
```
|
||||
|
||||
In the template above some of the fields are typed and some have values. However, each `var` field can be defined and/or overridden by a `JSON` blob. For example:
|
||||
|
||||
```json
|
||||
{
|
||||
"database": {"type" : "string", "value" : "telegraf" },
|
||||
"rp": {"type" : "string", "value" : "autogen" },
|
||||
"measurement": {"type" : "string", "value" : "disk" },
|
||||
"where_filter": {"type": "lambda", "value": "\"cpu\" == 'cpu-total'"},
|
||||
"groups": {"type": "list", "value": [{"type":"string", "value":"host"},{"type":"string", "value":"dc"}]},
|
||||
"field": {"type" : "string", "value" : "used_percent" },
|
||||
"crit": {"type" : "lambda", "value" : "\"stat\" > 92.0" },
|
||||
"window": {"type" : "duration", "value" : "10s" },
|
||||
"file": {"type" : "string", "value" : "/tmp/disk_alert_log.txt" }
|
||||
}
|
||||
```
|
||||
|
||||
So, to use templates in kapacitor, one posts the template to [`/kapacitor/v1/templates`](https://docs.influxdata.com/kapacitor/v1.0/api/api/#templates).
|
||||
|
||||
This template can have a unique id. To use the template one just sends the `template-id` and the`JSON` blob as `vars` to the tasks endpoint.
|
||||
|
||||
Doing so separates the responsibility of the template creator from the task creator.
|
||||
Additionally, it is possible to list all the templates via [`/kapacitor/v1/templates`](https://docs.influxdata.com/kapacitor/v1.0/api/api/#list-templates).
|
||||
|
||||
Finally, each template from the list contains the variables that are required to be set.
|
|
@ -1,30 +0,0 @@
|
|||
##
|
||||
Query proxy will be a façade over InfluxDB, InfluxDB Enterprise Cluster, and InfluxDB Relay.
|
||||
|
||||
It will provide a uniform interface to `SELECT` a time range of data.
|
||||
|
||||
```http
|
||||
POST /enterprise/v1/sources/{id}/query HTTP/1.1
|
||||
Accept: application/json
|
||||
Content-Type: application/json
|
||||
{
|
||||
"query": "SELECT * from telegraf",
|
||||
"format": "dygraph",
|
||||
"max_points": 1000,
|
||||
"type": "http"
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
Response:
|
||||
|
||||
```http
|
||||
HTTP/1.1 202 OK
|
||||
{
|
||||
"link": {
|
||||
"rel": "self",
|
||||
"href": "/enterprise/v1/sources/{id}/query/{qid}",
|
||||
"type": "http"
|
||||
}
|
||||
}
|
||||
```
|
Loading…
Reference in New Issue