Commit Graph

71 Commits (3034d3fb5481ed58fa0306693be46a07420a66bd)

Author SHA1 Message Date
Stuart Carnie f3d45ba301 influxdata/influxdb/influxql -> influxdata/influxql 2017-10-30 14:40:26 -07:00
Stuart Carnie e9313876ab EXPLAIN ANALYZE
* Introduces EXPLAIN ANALYZE command, which
  produces a detailed tree of operations used to
  execute the query.

introduce context.Context to APIs

metrics package

* create groups of named measurements
* safe for concurrent access

tracing package

EXPLAIN ANALYZE implementation for OSS

Serialize EXPLAIN ANALYZE traces from remote nodes

use context.Background for tests

group with other stdlib packages

additional documentation and remove unused API

use influxdb/pkg/testing/assert

remove testify reference
2017-10-20 08:01:37 -07:00
Edd Robinson 9be7c5aaa6 Run relevant engine tests on both indexes 2017-08-23 10:47:01 +01:00
Jonathan A. Sternberg 9a2357c2c0 Separate the query engine into a separate package
This change provides a clear separation between the query engine
mechanics and the query language so that the language can be parsed and
dealt with separate from the query engine itself.
2017-08-16 13:38:43 -05:00
David Norton 1d8d739418 fix #8677: check for snapshot size == 0 2017-08-16 09:43:56 -04:00
Jason Wilder 778000435a Conver all keys from string to []byte in TSM engine
This switches all the interfaces that take string series key to
take a []byte.  This eliminates many small allocations where we
convert between to two repeatedly.  Eventually, this change should
propogate futher up the stack.
2017-07-28 11:00:50 -06:00
Stuart Carnie 46796d932f add database to index, engine and shard; call AuthorizeSeriesRead 2017-05-26 13:21:50 -07:00
Jason Wilder 2cac46ebbc Convert usage of strings to []byte
Measurement name and field were converted between []byte and string
repetively causing lots of garbage.  This switches the code to use
[]byte in the write path.
2017-05-12 14:05:19 -06:00
Joe LeGasse 087d9f4670 tsm: fixed test to not require sorted backup tarball 2017-05-11 12:00:19 -04:00
Jason Wilder f87fd7c7ed Stop background compaction goroutines when shard is cold
Each shard has a number of goroutines for compacting different levels
of TSM files.  When a shard goes cold and is fully compacted, these
goroutines are still running.

This change will stop background shard goroutines when the shard goes
cold and start them back up if new writes arrive.
2017-05-03 16:31:57 -06:00
Jason Wilder 3d1c0cd981 Don't return compaction plans for files already part of a plan
The compactor prevents the same file from being compacted by different
compaction runs, but it can result in warning errors in the logs that
are confusing.

This adds compaction plan tracking to the planner so that files are
only part of one plan at a given time.
2017-05-03 16:31:57 -06:00
Ben Johnson afe41f1c80
Fix tsm1/tsi1 broken tests. 2017-03-21 12:21:48 -06:00
Ben Johnson 358b1e0b05
Merge remote-tracking branch 'upstream/master' into tsi 2017-03-15 10:13:32 -06:00
Jason Wilder a024003f2c Merge branch '1.2' into jw-merge-12 2017-02-22 12:13:29 -07:00
Ben Johnson 78a9bb2527 Remove Tags.shouldCopy, replace with forceCopy on series creation.
Previously, tags had a `shouldCopy` flag to indicate if those tags
referenced an underlying buffer and should be copied to allow GC.
Unfortunately, this prevented tags from being copied that were
created and referenced the mmap which caused segfaults.

This change removes the `shouldCopy` flag and replaces it with a
`forceCopy` argument in `CreateSeriesIfNotExists()`. This allows
the write path to indicate that tags must be cloned on insert.
2017-02-21 11:13:35 -07:00
Ben Johnson d91e6eabac
Add max-values-per-tag to inmem index. 2017-02-06 11:14:13 -07:00
Ben Johnson 047c21f4d9
Merge remote-tracking branch 'upstream/master' into tsi 2017-01-24 09:28:58 -07:00
Edd Robinson feb7a2842c Use unbuffered error channels in tests 2017-01-17 10:53:15 -08:00
Edd Robinson 292b30b82b Fix subtle bugs and remove dead code from tsdb 2017-01-17 09:47:34 -08:00
Jonathan A. Sternberg d7c8c7ca4f Support subquery execution in the query language
This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.

Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).

The syntax is like this:

    SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)

This will execute derivative and then sum the result of those derivatives.
Another example:

    SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)

This would let you find the maximum minimum value of each host.

There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:

    SELECT mean(value) FROM cpu
    SELECT mean(value) FROM (SELECT value FROM cpu)

Have different performance characteristics. The first will calculate
`mean(value)` at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.
2017-01-07 13:00:48 -06:00
Ben Johnson 2b3cd415e2
Fixing rebase. 2017-01-06 09:52:16 -07:00
Edd Robinson 0f9b2bfe6a
Fix tests 2017-01-05 10:16:15 -07:00
Ben Johnson cb93f10120
Remove per-shard in-memory index. 2017-01-05 10:11:09 -07:00
Ben Johnson 409b0165f5
shared in-memory index 2017-01-05 10:09:57 -07:00
Ben Johnson a812502ea3
reintegrating in-memory index 2017-01-05 10:07:35 -07:00
Ben Johnson 2a81351992
Implement tsdb.Index interface on tsi1.Index. 2017-01-05 10:00:43 -07:00
Edd Robinson 149b1cef1d
Fix 32bit overflow; limit capacity 2017-01-05 09:59:10 -07:00
Edd Robinson d19fbf5ab4
Wire in HLL estimator 2017-01-05 09:54:03 -07:00
Edd Robinson 2b8efefef4
Initial index interface 2017-01-05 09:51:43 -07:00
Edd Robinson 05bc4dec00
Refactor 2017-01-05 09:50:23 -07:00
Edd Robinson c535e3899a
Remove in-memory index from Shard and Store 2017-01-05 09:47:09 -07:00
Edd Robinson 66edb32182 Sharded Cache using a hash ring 2016-12-14 18:23:36 +00:00
Edd Robinson d3e6d4e7ca Add benchmarks 2016-12-14 18:21:50 +00:00
Jason Wilder e8a28cfbab Expose Shard.LastModified
This returns the LastModified time of the shard.  The LastModified
time is the wall time when a change to the shards state occurred.
It uses the WAL or FileStore to determine the max mod time.
2016-11-23 10:04:07 -07:00
Jonathan A. Sternberg a515aeda39 Optimize first/last when no group by interval is present
The `first()` and `last()` functions response rate would increase linear
to the number of points even though it seems like it shouldn't. This
optimization greatly reduces the amount of time to return a response
when no `GROUP BY time(...)` clause is present in a query.
2016-10-25 09:57:31 -05:00
Jonathan A. Sternberg 3681bc8a43 Filter out series within shards that do not have data for that series
Previously, we would return a full tag set for every shard and the tag
set would include all series that existed in the database index
including series that didn't physically exist within that shard. This
led to the tag sets returned being incredibly huge when we had high
cardinality but sparse data. Since the data was sparse, it was
unexpected that it would cause such a large strain on the system by most
people.

Now we filter out the series ids that are not assigned to the current
shard when computing a tag set for that shard. This lowers the memory
usage for high cardinality sparse data drastically and allows queries on
those to complete successfully.

This does not resolve issues for high cardinality data in every shard
that is also spread out over a long series of time. That situation isn't
nearly as common as the above situation though.
2016-10-20 14:15:34 -05:00
Jason Wilder 7f96d78b79 Make encoder re-usable
This allows encoders to be re-used and maintained in a pool to
avoid allocating new ones on every compactions and write of an encoded
block.  The pool used is not a sync.Pool to ensure that the encoders
will not be garbage collected.
2016-09-26 12:19:15 -06:00
Jason Wilder 190537a557 Fix DeleteSeries when multiple fields exists
The logic for determining whether a series key was already in the
the set of TSM series was too restrictive.  It allowed only the first
field of a series to be added leaving all the remaing fields.
2016-08-31 20:35:35 -06:00
Ben Johnson 8aa224b22d
reduce memory allocations in index
This commit changes the index to point to index data in the shards
instead of keeping it in-memory on the heap.
2016-08-16 14:09:00 -06:00
Jason Wilder 0264966f5c Add index optimize planning step
For larger datasets, it's possible for shards to get into a state where
many large, dense TSM files exist.  While the shard is still hot for
writes, full compactions will skip these files since they are already
fairly optimized and full compactions are expensive.  If the write volume
is large enough, the shard can accumulate lots of these files.  When
a file is in this state, it's index can contain every series which
causes startup times to increase since each file must parse the full
set of series keys for every file.  If the number of series is high,
the index can be quite large causing large amount of disk IO at startup.

To fix this, a optmize compaction is run when a full compaction planning
step decides there is nothing to do.  The optimize compaction combines
and spreads the data and series keys across all files resulting in each
file containing the full series data for that shard and a subset of the
total set of keys in the shard.

This allows a shard to only store a series key once in the shard reducing
storage size as well allows a shard to only load each key once at startup.
2016-07-14 11:32:36 -06:00
rw 92e7fec5cf Benchmarks to count allocs in WritePoints. 2016-05-26 17:13:14 -07:00
Jonathan A. Sternberg 23f6a706bb Support cast syntax for selecting a specific type
Casting syntax is done with the PostgreSQL syntax `field1::float` to
specify which type should be used when selecting a field. You can also
do `field1::field` or `tag1::tag` to specify that a field or tag should
be selected.

This makes it possible to select a tag when a field key and a tag key
conflict with each other in a measurement. It also means it's possible
to choose a field with a specific type if multiple shards disagree. If
no types are given, the same ordering for how a type is chosen is used
to determine which type to return.

The FieldDimensions method has been updated to return the data type for
the fields that get returned. The SeriesKeys function has also been
removed since it is no longer needed. SeriesKeys was originally used for
the fill iterator, but then expanded to be used by auxiliary iterators
for determining the channel iterator types. The fill iterator doesn't
need it anymore and the auxiliary types are better served by
FieldDimensions implementing that functionality, so SeriesKeys is no
longer needed.

Fixes #6519.
2016-05-16 12:08:29 -04:00
Jason Wilder e0304ae3d5 Fix shards not getting assigned to series on restart
Also, simplifies the LoadMetaDataIndex func to not require a *Shard
2016-05-02 11:36:05 -06:00
Jonathan A. Sternberg 7ec2a991d5 Modify all of the iterators to allow returning an error on Next()
This also switches the remaining iterators to be lazy so they can return
errors properly. They needed to be converted to lazy initialization
anyway, which has the side effect of making it much easier for us to
propagate the underlying error during initialization.

Updated the Emitter to return errors when it cannot read properly from
the iterators.
2016-04-18 11:17:55 -04:00
Jonathan A. Sternberg bd5fdd797d Propagate the limit option to the low level iterators
When a GROUP BY or multiple sources are used, the top level limit
iterator requires reading the entire iterator stream so it can find all
of the tag groups it needs to return. For large data series, this ends
up with the limit iterator discarding a lot of output.

This change adds a new lower level limit iterator on each series itself
so that there are fewer data points that have to be thrown away by the
top level iterator.

Fixes #5553.
2016-04-15 18:23:54 -04:00
Ben Johnson 525e22c92b
tsm1 query engine alloc reduction
This commit makes a number of performance improvements to
reduce allocations during query execution. Several objects
and buffers are now reused across the components to avoid
allocations.

Previously a simple `count(value)` query across 1M points
would require 26,000+ allocations. After the changes in
this commit that number has been reduced to 88.
2016-04-11 14:50:59 -06:00
Jason Wilder 3f4c5a5585 Fix race on measurementFields
Both Shard and Engine had the same reference to the measurementField map,
but they each protected it with their own locks.  This causes a race when
write and queries are occurring because writes can add new fields to the
map while queries are reading from it.

The fix moves the ownership to the Engine and provides protected accessors
to that Shard now users.  For the most parts, the access on shard were old
dead code.

Fixing the measurementFields map race created a new race on the internal
fields map.  This is now unexported and protected via MeasurementFields
exported funcs.

Fixes #6188
2016-04-01 18:57:01 -06:00
Mark Rushakoff fb83374389 Track stats for number of series, measurements
Per database: track number of series and measurements
Per measurement: track number of series
2016-02-24 08:10:16 -08:00
Ben Johnson 5a0d1ab7c1 rename influxdb/influxdb to influxdata/influxdb
This commit changes all the import and URL references from:

    github.com/influxdb/influxdb

to:

    github.com/influxdata/influxdb
2016-02-10 10:26:18 -07:00
Ben Johnson b4cb770a7f refactor aux iterators 2016-02-10 09:40:27 -07:00