diff --git a/docs/design.md b/docs/design.md deleted file mode 100644 index 5c5eaaf63b..0000000000 --- a/docs/design.md +++ /dev/null @@ -1,218 +0,0 @@ -## Chronograf -[TOC] - -### Design Philosophy - -1. Present uniform interface to front-end covering Plutonium and InfluxDB OSS offerings. -2. Simplify the front-end interaction with time-series database. -3. Ease of setup and use. -4. Extensible as base for future applications -5. There will be an open source version of this. -7. Stress Parallel Development across all teams. -8. First class support of on-prem. -9. Release to cloud first. - -### Initial Goals -1. Produce pre-canned graphs for devops telegraf data for docker containers or system stats. -2. Up and running in 2 minutes -3. User administration for Influx Enterprise. -4. Leverage our existing enterprise front-end code. -5. Leverage lessons-learned for enterprise back-end code. - -### Versions - -Each version will contain more and more features around monitoring various devops components. - -#### Features - -1. v1 - - Data explorer for both OSS and Enterprise - - Dashboards for telegraf system metrics - - User and Role adminstration - - Proxy queries over OSS and Enterprise - - Authenticate against OSS/Enterprise - -2. v2 - - Telegraf agent service - - Additional Dashboards for telegraf agent - -### Supported Versions of TICK Stack -We will only support 1.0 of the TICK stack. - - -### Closed source vs Open Source - -- Ideally, we would use the soon-to-be open source plutonium client to interact with Influx Enterprise. This would mean that this application could be entirely open source. (We should check with Todd and Nate.) -- However, if in the future we want to deliver a closed source version, we'll use the open source version as a library. The open source library will define certain routes (/users, /whatever); the closed source version will either override those routes, or add new ones. This implies that the closed source version is simply additional or manipulated routes on the server. -- Survey the experience of closed source with Jason and Nathaniel. - -### Repository - -#### Structure -Both the javascript and go source will be in the same repository. - -#### Builds -Javascript build will be decoupled from Go build process. - -Asset compilation will happen during build of backend-server. - -This allows the front-end team to swap in mocked, auto-generated swagger backend for testing and development. - -##### Javascript -Webpack -Static asset compilation during backend-server build. - - -##### Go - -We'll use GDM as the vendoring solution to maintain consistency with other pieces of TICK stack. - -*Future work*: we must switch to the community vendoring solution when it actually seems mature. - -### API - -#### REST -We'll use swagger interface definition to specify API and JSON validation. The goal is to emphasize designing to an interface facilitating parallel development. - -At first, we'll autogenerate the http server in go from the swagger definition. This will free the team up from implementing the http validation, etc. and work strictly on -business logic. Towards the end of the development cycle we'll implement the http routing and JSON validation. - -#### Queries - -Features would include: - -1. Load balancing against all data nodes in cluster. -1. Formatting the output results to be simple to use in frontend. -1. Decimating the results to minimize network traffic. -1. Use parameters to move query time range. -1. Allow different types of response protocols (http GET, websocket, etc.). - -- **`/proxy`:** used to send queries directly to the Influx backend. They should be most useful for the data explorer or other ad hoc query functionality. - -##### `/proxy` Queries - -Queries to the `/proxy` endpoint do not create new REST resources. Instead, it returns results of the query. - -This endpoint uses POST with a JSON object to specify the query and the parameters. The endpoint's response will be the results of the query, or, the errors from the backend InfluxDB. - -Errors in the 4xx range come from the Influxdb data source. - -```sequence -App->/proxy: POST query -Note right of /proxy: Query Validation -Note right of /proxy: Load balance query -/proxy->Influx/Relay/Cluster: SELECT -Influx/Relay/Cluster-->/proxy: Time Series -Note right of /proxy: Format -/proxy-->App: Formatted results -``` - -Request: - -```http -POST /enterprise/v1/sources/{id}/proxy HTTP/1.1 -Accept: application/json -Content-Type: application/json - -{ - "query": "SELECT * from telegraf where time > $value", - "format": "dygraph", -} -``` - -Response: - -```http -HTTP/1.1 200 OK -Content-Type: application/json - -{ - "results": "..." -} -``` - -Error Response: - -```http -HTTP/1.1 400 Bad Request -Content-Type: application/json - -{ - "code": 400, - "message": "error parsing query: found..." -} -``` - -##### Load balancing - -Use simple round robin load balancing requests to data nodes. -Discover active data nodes using Plutonium meta client. - -#### Backend-server store -We will build a interface for storing API resources. - -Some API resources could come from the influx data source (like users) most will be stored in a key/value or relational store. - -Version 1.1 will use boltdb as the key/value store. - -Future versions will support more HA data stores. - -##### Objects - -1. Data source - - - Version 1.1 will have only one data source. - - InfluxDB - - InfluxDB Enterprise (this means clustering) - - InfluxDB relay possibly. - - Will provide meta data describing the data source (e.g. number of nodes) - -1. User - - - Version 1.1 will be a one-to-one mapping to influx. - -1. Dashboards - - - precanned dashboards for telegraf - - Includes location of query resources. - -1. Queries - - Used to construct influxql. - -1. Sessions - - We could simply use the JWT token as the session information - -1. Server Configuration - - - Any setting that would normally in TICK stack land be in a file, we'll expose through an updatable API. - - License/Organization info, modules(pre-canned dash, query builder, customer dash, config builder), usage and debug history/info, support contacts - -#### Authentication - -We want the backend data store (influx oss or influx meta) handle the authentication so that the web server has less responsibility. - -We'll use JWT throughout. - -### Testing -Talk with Mark and Michael and talk about larger efforts. This will impact the repository layout. -There is a potentially large testing matrix of components. - -#### Integration Testing -Because we are pulling together so many TICK stack components we will need strong integration testing. - -- Stress testing. - - Benchmark pathological queries -- End to end testing. Telegraf -> Plutonium -> Chronograf = expected graph. -- Would be nice to translate user stories to integration tests. -- If someone finds a bug in the integration we need a test so it will never happen again. -- Upgrade testing. -- Version support. - -#### Usability and Experience Testing - -1. Owned by design team. -1. We are trying to attract the devops crowd. - - - Deployment experience - - Ease of use. - - Speed to accomplish task, e.g. find specific info, change setting. diff --git a/docs/kapacitor.md b/docs/kapacitor.md deleted file mode 100644 index d5805c2c4a..0000000000 --- a/docs/kapacitor.md +++ /dev/null @@ -1,261 +0,0 @@ -## TL;DR -* Use kapacitor `vars` JSON structure as serialization format. -* Create chronograf endpoint that generates kapacitor tickscripts for all the different UI options. - -### Proposal - -1. Use kapacitors `vars` JSON structure as the definition of the alert -2. Create service that generates tickscripts. - -Currently, there are several alert "triggers" in the wireframe including: - -* threshold -* relative value -* deadman - -Also, there are several alert destinations like - -* slack -* pagerduty -* victorops - -Finally, the type of kapacitor tickscript needs to be specified as either: -* `batch` -* `stream` - -The generator would take input like this. - -```json -{ - "name": "I'm so triggered", - "version": "1", - "trigger": "threshold", - "alerts": ["slack"], - "type": "stream", - "vars": { - "database": {"type" : "string", "value" : "telegraf" }, - "rp": {"type" : "string", "value" : "autogen" }, - "measurement": {"type" : "string", "value" : "disk" }, - "where_filter": {"type": "lambda", "value": "\"cpu\" == 'cpu-total'"}, - "groups": {"type": "list", "value": [{"type":"string", "value":"host"},{"type":"string", "value":"dc"}]}, - "field": {"type" : "string", "value" : "used_percent" }, - "crit": {"type" : "lambda", "value" : "\"stat\" > 92.0" }, - "window": {"type" : "duration", "value" : "10s" }, - "file": {"type" : "string", "value" : "/tmp/disk_alert_log.txt" } - } -} -``` - -and would produce: - - - ```json - { - "name": "I'm so triggered", - "version": "1", - "trigger": "threshold", - "alerts": ["slack"], - "type": "stream", - "script": "...", - "vars": { - "database": {"type" : "string", "value" : "telegraf" }, - "rp": {"type" : "string", "value" : "autogen" }, - "measurement": {"type" : "string", "value" : "disk" }, - "where_filter": {"type": "lambda", "value": "\"cpu\" == 'cpu-total'"}, - "groups": {"type": "list", "value": [{"type":"string", "value":"host"},{"type":"string", "value":"dc"}]}, - "field": {"type" : "string", "value" : "used_percent" }, - "crit": {"type" : "lambda", "value" : "\"stat\" > 92.0" }, - "window": {"type" : "duration", "value" : "10s" }, - "file": {"type" : "string", "value" : "/tmp/disk_alert_log.txt" } - } - } - ``` - -The cool thing is that the `script`, `vars`, and `type` field can be used directly when POSTing to `/kapacitor/v1/tasks`. - -### kapacitor vars -kapacitor `vars` looks like this: - -```json -{ - "field_name" : { - "value": , - "type": , - "description": "my cool comment" - }, - "another_field" : { - "value": , - "type": , - "description": "can I be a cool comment?" - } -} - -``` - -The following is a table of valid types and example values. - -| Type | Example Value | Description | -| ---- | ------------- | ----------- | -| bool | true | "true" or "false" | -| int | 42 | Any integer value | -| float | 2.5 or 67 | Any numeric value | -| duration | "1s" or 1000000000 | Any integer value interpretted in nanoseconds or an influxql duration string, (i.e. 10000000000 is 10s) | -| string | "a string" | Any string value | -| regex | "^abc.*xyz" | Any string value that represents a valid Go regular expression https://golang.org/pkg/regexp/ | -| lambda | "\"value\" > 5" | Any string that is a valid TICKscript lambda expression | -| star | "" | No value is required, a star type var represents the literal `*` in TICKscript (i.e. `.groupBy(*)`) | -| list | [{"type": TYPE, "value": VALUE}] | A list of var objects. Currently lists may only contain string or star vars | - -See the [kapacitor](https://github.com/influxdata/kapacitor/blob/master/client/API.md#vars) var API documentation. - -## Tickscripts - Create kapacitor scripts like this: - - This script will take the last 10s of disk info and average the used_percent per host. If the average used_percent is greater than 92% it will crit and log to a file. - - ```javascript - stream - |from() - .database('telegraf') - .retentionPolicy('autogen') - .measurement('disk') - .groupBy('host') - |window() - .period(10s) - .every(10s) - |mean('used_percent') - .as('stat') - |alert() - .id('{{ index .Tags "host"}}/disk_used') - .message('{{ .ID }}:{{ index .Fields "stat" }}') - .crit(lambda: "stat" > 92) - .log('/tmp/disk_alert_log.txt') -``` - - -### Variables - Also, kapacitor has the ability to create variables. So, good style would be: - - ```javascript - var crit = 92 - var period = 10s - var every = 10s - -stream - |from() - .database('telegraf') - .retentionPolicy('autogen') - .measurement('disk') - .groupBy('host') - |window() - .period(period) - .every(every) - |mean('used_percent') - .as('stat') - |alert() - .id('{{ index .Tags "host"}}/disk_used') - .message('{{ .ID }}:{{ index .Fields "stat" }}') - .crit(lambda: "stat" > crit) - .log('/tmp/disk_alert_log.txt') - ``` - - -When using the kapacitor API, to define this script: - -```http - -POST /kapacitor/v1/tasks HTTP/1.1 - -{ - "id" : "TASK_ID", - "type" : "stream", - "dbrps": [{"db": "telegraf", "rp" : "autogen"}], - "script": "stream | \n from() \n .database('telegraf') \n .retentionPolicy('autogen') \n .measurement('disk') \n .groupBy('host') | \n window() \n .period(period) \n .every(every) | \n mean('used_percent') \n .as('stat') | \n alert() \n .id('{{ index .Tags \"host\"}}/disk_used') \n .message('{{ .ID }}:{{ index .Fields \"stat\" }}') \n .crit(lambda: \"stat\" > crit) \n .log('/tmp/disk_alert_log.txt') \n ", - "vars" : { - "crit": { - "value": 92, - "type": "int" - }, - "period": { - "value": "10s", - "type": "duration" - }, - "every": { - "value": "10s", - "type": "duration" - } - } -} -``` - -### Templates - - However, kapacitor also has templates. Templates are used to decouple the data from the script itself. However, the template structure does not allow nodes to be templated. - - ```javascript -// Which database to use -var database string -// DB's retention policy -var rp = 'autogen' -// Which measurement to consume -var measurement string -// Optional where filter -var where_filter = lambda: TRUE -// Optional list of group by dimensions -var groups = 'host' -// Which field to process -var field string -// Critical criteria, has access to 'mean' field -var crit lambda -// How much data to window -var window duration -// File for the alert -var file = '/tmp/disk_alert_log.txt' -// ID of the alert -var id = '{{ index .Tags "host"}}/disk_used' -// message of the alert -var message = '{{ .ID }}:{{ index .Fields "stat" }}' - -stream - |from() - .database(database) - .retentionPolicy(rp) - .measurement(measurement) - .where(where_filter) - .groupBy(groups) - |window() - .period(window) - .every(window) - |mean(field) - .as('stat') - |alert() - .id(id) - .message(message) - .crit(crit) - .log(file) - ``` - -In the template above some of the fields are typed and some have values. However, each `var` field can be defined and/or overridden by a `JSON` blob. For example: - -```json -{ - "database": {"type" : "string", "value" : "telegraf" }, - "rp": {"type" : "string", "value" : "autogen" }, - "measurement": {"type" : "string", "value" : "disk" }, - "where_filter": {"type": "lambda", "value": "\"cpu\" == 'cpu-total'"}, - "groups": {"type": "list", "value": [{"type":"string", "value":"host"},{"type":"string", "value":"dc"}]}, - "field": {"type" : "string", "value" : "used_percent" }, - "crit": {"type" : "lambda", "value" : "\"stat\" > 92.0" }, - "window": {"type" : "duration", "value" : "10s" }, - "file": {"type" : "string", "value" : "/tmp/disk_alert_log.txt" } -} -``` - -So, to use templates in kapacitor, one posts the template to [`/kapacitor/v1/templates`](https://docs.influxdata.com/kapacitor/v1.0/api/api/#templates). - -This template can have a unique id. To use the template one just sends the `template-id` and the`JSON` blob as `vars` to the tasks endpoint. - -Doing so separates the responsibility of the template creator from the task creator. -Additionally, it is possible to list all the templates via [`/kapacitor/v1/templates`](https://docs.influxdata.com/kapacitor/v1.0/api/api/#list-templates). - -Finally, each template from the list contains the variables that are required to be set. diff --git a/docs/queries.md b/docs/queries.md deleted file mode 100644 index f331546a30..0000000000 --- a/docs/queries.md +++ /dev/null @@ -1,30 +0,0 @@ -## -Query proxy will be a façade over InfluxDB, InfluxDB Enterprise Cluster, and InfluxDB Relay. - -It will provide a uniform interface to `SELECT` a time range of data. - -```http -POST /enterprise/v1/sources/{id}/query HTTP/1.1 -Accept: application/json -Content-Type: application/json -{ - "query": "SELECT * from telegraf", - "format": "dygraph", - "max_points": 1000, - "type": "http" -} -``` - - -Response: - - ```http - HTTP/1.1 202 OK - { - "link": { - "rel": "self", - "href": "/enterprise/v1/sources/{id}/query/{qid}", - "type": "http" - } - } - ```