Now that each shard-local index is maintaining a bitset of series ids,
tracking the series present in the local shard's tsm engine, there is no
need to track shards in the `inmem` index.
This commit removes the methods associated with tracking those
series/shard relationships.
This commit adds a bitset into each shard's in-memory index, to be used to
track undeleted series ids. Currently tsi1 support is not implemented.
When new series are added to the shard, the series id is added
to the bitset. When series are deleted from the shard, the series
ids are removed from the bitset.
Becasue each shard shares the same inmem index reference, the bitset
is stored in the `ShardIndex`, which is local to each shard, and then
different references are passed into the shared `Index` object, depending
on which shard is writing the series.
Since possibly v0.9 DELETE SERIES has had the unwanted side effect of
removing series from the index when the last traces of series data are
removed from TSM. This occurred because the inmem index was rebuilt on
startup, and if there was no TSM data for a series then there could be
not series to add to the index.
This commit returns to the original (documented) DROP/DETETE SERIES
behaviour. As such, when issuing DROP SERIES all instances of matching
series will be removed from both the TSM engine and the index. When
issuing DELETE SERIES only TSM data will be removed.
It is up to the operator to remove series from the index.
NB, this commit does not address how to remove series data from the
series file when a shard rolls over.
This adds the capability to the engine to force a full compaction
to be scheduled. When called, it snapshots any data in the cache,
aborts running compactions and prevents level plans from returning
level plans.
The previous sha was taken from a revision on a devel branch that I
thought would continue staying in the tree after it was merged. That
revision was rebased away and the API was changed for the logger.
This updates the usage of the logger and adds a simple package for
constructing the base logger.
The 1.0 version of zap changed the format of the default console logger
so this change moves over to this new logger instead of attempting to
retain backwards compatibility with the old format.
When a meta query does not include a time component then it can be
answered exclusively by the index. This should result in a much faster
query execution that if the TSM engine was engaged.
This commit rewrites the following queries such that they make use
of the index where no time component is present:
- SHOW MEASUREMENTS
- SHOW SERIES
- SHOW TAG KEYS
- SHOW FIELD KEYS
* batch cursors return slices of timestamps and values to reduce call
overhead. Significantly improved iteration.
* added CreateCursor API to Shard, Engine
* moved build*Cursor to code gen
* Introduces EXPLAIN ANALYZE command, which
produces a detailed tree of operations used to
execute the query.
introduce context.Context to APIs
metrics package
* create groups of named measurements
* safe for concurrent access
tracing package
EXPLAIN ANALYZE implementation for OSS
Serialize EXPLAIN ANALYZE traces from remote nodes
use context.Background for tests
group with other stdlib packages
additional documentation and remove unused API
use influxdb/pkg/testing/assert
remove testify reference
This instructs the kernel that it can release memory used by mmap'd
TSM files when they are not actively being used. It the mappings are
use, the kernel will fault the pages back in. On linux, this causes
RES memory to drop immediately when run.
It prints the statistics of each iterator that will access the storage
engine. For each access of the storage engine, it will print the number
of shards that will potentially be accessed, the number of files that
may be accessed, the number of series that will be created, the number
of blocks, and the size of those blocks.
This change provides a clear separation between the query engine
mechanics and the query language so that the language can be parsed and
dealt with separate from the query engine itself.
Previously pseudo iterators could be created for meta data such
as series, measurement, and tag data. These iterators were created
at a higher level and lacked a lot of the power of the query engine.
This commit moves system iterators down to the series level and
supports the following:
- _name
- _seriesKey
- _tagKey
- _tagValue
- _fieldKey
These can be used as normal fields such as:
SELECT _seriesKey FROM cpu
This will return all the series keys for `cpu`.
* introduced UnsignedValue type
* leveraged existing int64 compression algorithms (RLE, Simple 8B)
* tsm and WAL can read and write UnsignedValue
* compaction is aware of UnsignedValue
* unsigned support to model, cursors and write points
NOTE: there is no support to create unsigned points, as the line
protocol has not been modified.
Measurement name and field were converted between []byte and string
repetively causing lots of garbage. This switches the code to use
[]byte in the write path.
The monitor goroutine ran for each shard and updated disk stats
as well as logged cardinality warnings. This goroutine has been
removed by making the disks stats more lightweight and callable
direclty from Statisics and move the logging to the tsdb.Store. The
latter allows one goroutine to handle all shards.
Each shard has a number of goroutines for compacting different levels
of TSM files. When a shard goes cold and is fully compacted, these
goroutines are still running.
This change will stop background shard goroutines when the shard goes
cold and start them back up if new writes arrive.
The Point is intended to be immutable after being parsed since it
is shared by several goroutines. When dropping a field (e.g. time),
corrupted data can result if one goroutine is delete the field
while another is marshaling the underlying byte slices.
To avoid this, the shard will just skip invalid fields and series
instead of trying to mutate them by deleting them.
If a bad query is run, kill query and limits would not kick in until
after it started executing. Some bad queries that involve high
cardinality can cause the server to OOM just from planning which
defeats the purpose of the max-select-series limit.
This change primarily fixes max-select-series limit so that the query
is killed earlier and has the side effect that kill query now can kill
a query while it's being planned.
Calling DiskSize can be expensive with many shards. Since the stats
collection runs this every 10s by default, it can be expensive and
wasteful to calculate the stats when nothing has changed. This avoids
re-calculating the shard size unless something has chagned.
A measurement name that begins with an underscore and does not conflict
with one of the reserved measurement names will now be passed untouched
to the underlying shards rather than being intercepted as an empty
measurement.
A user still shouldn't rely on measurements that begin with underscores
to always be accessible, but this will prevent the most common use case
from causing unexpected behavior since we will very rarely, if ever, add
additional system sources.
Previously, tags had a `shouldCopy` flag to indicate if those tags
referenced an underlying buffer and should be copied to allow GC.
Unfortunately, this prevented tags from being copied that were
created and referenced the mmap which caused segfaults.
This change removes the `shouldCopy` flag and replaces it with a
`forceCopy` argument in `CreateSeriesIfNotExists()`. This allows
the write path to indicate that tags must be cloned on insert.
They rebased a revision we were previously relying upon that allowed us
to use the vanity name so we are reverting back to an older version with
the old import path.
This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.
Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).
The syntax is like this:
SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)
This will execute derivative and then sum the result of those derivatives.
Another example:
SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)
This would let you find the maximum minimum value of each host.
There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:
SELECT mean(value) FROM cpu
SELECT mean(value) FROM (SELECT value FROM cpu)
Have different performance characteristics. The first will calculate
`mean(value)` at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.