From dfcc083efa1595a344f33a935d9c8e563b0bd49c Mon Sep 17 00:00:00 2001 From: Paul Dix Date: Thu, 31 Oct 2013 13:36:56 -0400 Subject: [PATCH] Update readme with pointers to influxdb.org site. Move design stuff over to a notes page --- README.md | 83 +++---------------------------------------------- design_notes.md | 82 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 87 insertions(+), 78 deletions(-) create mode 100644 design_notes.md diff --git a/README.md b/README.md index 75af550832..ef73572227 100644 --- a/README.md +++ b/README.md @@ -1,83 +1,10 @@ -chronosdb [![Build Status](https://travis-ci.org/influxdb/influxdb.png?branch=master)](https://travis-ci.org/influxdb/influxdb) +InfluxDB [![Build Status](https://travis-ci.org/influxdb/influxdb.png?branch=master)](https://travis-ci.org/influxdb/influxdb) ========= -Scalable datastore for metrics, events, and real-time analytics +InfluxDB is an open source distributed time series database tha has no external dependencies. It's useful for metrics, events, and analytics with a built in HTTP API so you don't have to write any server side code to get up and running. InfluxDB is designed to answer queries in real-time. That means every data point is indexed as it comes in and is immediately available in queries that should return in < 100ms. It's designed to be scalabe, simple to install and manage, and fast to get data in and out. -Requirements ------------- +Read an [overview of the design goals and reasons for the project](http://influxdb.org/overview/). -* horizontal scalable -* http interface -* udp interface (low priority) -* persistent -* metadata for time series -* perform functions quickly (count, unique, sum, etc.) -* group by time intervals (e.g. count ticks every 5 minutes) -* joining multiple time series to generate new timeseries -* schema-less -* sql-like query language -* support multiple databases with read/write api key -* single time series should scale horizontally (no hot spots) -* dynamic cluster changes and data balancing -* pubsub layer -* continuous queries (keep connection open and return new points as they arrive) -* Delete ranges of points from any number of timeseries (that should reflect in disk space usage) -* querying should support one or more timeseries (possibly with regex to match on) +Check out the [getting started guide](http://influxdb.org/docs/) to read about how to install InfluxDB, start writing data, and issuing queries in just a few minutes. -New Requirements ----------------- -* Easy to backup and restore -* Large time range queries with one column ? -* Optimize for HDD access ? -* What are the common use cases that we should optimize for ? - -Modules -------- - - - +--------------------+ +--------------------+ - | | | | - | WebConsole/docs | | Http API | - | | | | - +------------------+-+ +-+------------------+ - | | - | | - +-----+-------+-----------+ - | | - | Lang. Bindings | - | | - +-----------------+ | - | | | - | Query Engine | | - | | | - +-----------------+-------+ - | | - +----+ Coordinator (consensus) +-----+ - | | | | - | +-------------------------+ | - | | - | | - +--------+-----------+ +-------+------------+ - | | | | - | Storage Engine | | Storage Engine | - | | | | - +--------+-----------+ +-------+------------+ - -Replication & Concensus Notes ------------------------------ - -Single raft cluster for which machines are in cluster and who owns which locations. -1. When a write comes into a server, figure out which machine owns the data, proxy out to that. -2. The machine proxies to the server, which assigns a sequence number -3. Each machine in the cluster asks the other machines that own hash ring locations what their latest sequence number is every 10 seconds (this is read repair) - -For example, take machines A, B, and C. Say B and C own ring location #2. If a write comes into A it will look up the configuration and pick B or C at random to proxy the write to. Say it goes to B. B assigns a sequence number of 1. It keeps a log for B2 of the writes. It will also keep a log for C2's writes. It then tries to write #1 to C. - -If the write is marked as a quorum write, then B won't return a success to A until the data has been written to both B and C. Every so often both B and C will ask each other what their latest writes are. - -Taking the example further, if we had server D that also owned ring location 2. B would ask C for writes to C2. If C is down it will ask D for writes to C2. This will ensure that if C fails no data will be lost. - -Coding Style ------------- - -1. Public functions should be at the top of the file, followed by a comment `// private functions` and all private functions. +See the [list of libraries for different langauges](http://influxdb.org/docs/libraries/javascript.html). Or see the [HTTP API documentation to start writing a library for your favorite language](http://influxdb.org/docs/api/http.html). diff --git a/design_notes.md b/design_notes.md new file mode 100644 index 0000000000..70660a2fc4 --- /dev/null +++ b/design_notes.md @@ -0,0 +1,82 @@ +Just some notes about requirements, design, and clustering. + +Scalable datastore for metrics, events, and real-time analytics + +Requirements +------------ + +* horizontally scalable +* http interface +* udp interface (low priority) +* persistent +* metadata for time series (low priority) +* perform functions quickly (count, unique, sum, etc.) +* group by time intervals (e.g. count ticks every 5 minutes) +* joining multiple time series to generate new timeseries +* schema-less +* sql-like query language +* support multiple databases with authentication +* single time series should scale horizontally (no hot spots) +* dynamic cluster changes and data balancing +* pubsub layer +* continuous queries (keep connection open and return new points as they arrive) +* Delete ranges of points from any number of timeseries (that should reflect in disk space usage) +* querying should support one or more timeseries (possibly with regex to match on) + +New Requirements +---------------- +* Easy to backup and restore +* Large time range queries with one column ? +* Optimize for HDD access ? +* What are the common use cases that we should optimize for ? + +Modules +------- + + + +--------------------+ +--------------------+ + | | | | + | WebConsole/docs | | Http API | + | | | | + +------------------+-+ +-+------------------+ + | | + | | + +-----+-------+-----------+ + | | + | Lang. Bindings | + | | + +-----------------+ | + | | | + | Query Engine | | + | | | + +-----------------+-------+ + | | + +----+ Coordinator (consensus) +-----+ + | | | | + | +-------------------------+ | + | | + | | + +--------+-----------+ +-------+------------+ + | | | | + | Storage Engine | | Storage Engine | + | | | | + +--------+-----------+ +-------+------------+ + +Replication & Concensus Notes +----------------------------- + +Single raft cluster for which machines are in cluster and who owns which locations. +1. When a write comes into a server, figure out which machine owns the data, proxy out to that. +2. The machine proxies to the server, which assigns a sequence number +3. Each machine in the cluster asks the other machines that own hash ring locations what their latest sequence number is every 10 seconds (this is read repair) + +For example, take machines A, B, and C. Say B and C own ring location #2. If a write comes into A it will look up the configuration and pick B or C at random to proxy the write to. Say it goes to B. B assigns a sequence number of 1. It keeps a log for B2 of the writes. It will also keep a log for C2's writes. It then tries to write #1 to C. + +If the write is marked as a quorum write, then B won't return a success to A until the data has been written to both B and C. Every so often both B and C will ask each other what their latest writes are. + +Taking the example further, if we had server D that also owned ring location 2. B would ask C for writes to C2. If C is down it will ask D for writes to C2. This will ensure that if C fails no data will be lost. + +Coding Style +------------ + +1. Public functions should be at the top of the file, followed by a comment `// private functions` and all private functions.