Each shard has a number of goroutines for compacting different levels
of TSM files. When a shard goes cold and is fully compacted, these
goroutines are still running.
This change will stop background shard goroutines when the shard goes
cold and start them back up if new writes arrive.
The Point is intended to be immutable after being parsed since it
is shared by several goroutines. When dropping a field (e.g. time),
corrupted data can result if one goroutine is delete the field
while another is marshaling the underlying byte slices.
To avoid this, the shard will just skip invalid fields and series
instead of trying to mutate them by deleting them.
If a bad query is run, kill query and limits would not kick in until
after it started executing. Some bad queries that involve high
cardinality can cause the server to OOM just from planning which
defeats the purpose of the max-select-series limit.
This change primarily fixes max-select-series limit so that the query
is killed earlier and has the side effect that kill query now can kill
a query while it's being planned.
Calling DiskSize can be expensive with many shards. Since the stats
collection runs this every 10s by default, it can be expensive and
wasteful to calculate the stats when nothing has changed. This avoids
re-calculating the shard size unless something has chagned.
A measurement name that begins with an underscore and does not conflict
with one of the reserved measurement names will now be passed untouched
to the underlying shards rather than being intercepted as an empty
measurement.
A user still shouldn't rely on measurements that begin with underscores
to always be accessible, but this will prevent the most common use case
from causing unexpected behavior since we will very rarely, if ever, add
additional system sources.
Previously, tags had a `shouldCopy` flag to indicate if those tags
referenced an underlying buffer and should be copied to allow GC.
Unfortunately, this prevented tags from being copied that were
created and referenced the mmap which caused segfaults.
This change removes the `shouldCopy` flag and replaces it with a
`forceCopy` argument in `CreateSeriesIfNotExists()`. This allows
the write path to indicate that tags must be cloned on insert.
They rebased a revision we were previously relying upon that allowed us
to use the vanity name so we are reverting back to an older version with
the old import path.
This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.
Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).
The syntax is like this:
SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)
This will execute derivative and then sum the result of those derivatives.
Another example:
SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)
This would let you find the maximum minimum value of each host.
There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:
SELECT mean(value) FROM cpu
SELECT mean(value) FROM (SELECT value FROM cpu)
Have different performance characteristics. The first will calculate
`mean(value)` at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.
It looks like the real import path to the project is go.uber.org/zap
instead of github.com/uber-go/zap since the example in the project
references that path.
The logging library has been switched to use uber-go/zap. While the
logging has been changed to use structured logging, this commit does not
change any of the logging statements to take advantage of the new
structured log or new log levels. Those changes will come in future
commits.
This returns the LastModified time of the shard. The LastModified
time is the wall time when a change to the shards state occurred.
It uses the WAL or FileStore to determine the max mod time.
A new sorted slice was called by the monitor func every 10s. The
tag keys don't need to be sorted so this avoid the allocation of the
slice and one during sorting.
When a limit is exceeded, we return errors and sometimes log (if appropriate)
that a limit was exceeded. The messages don't always provide an indication
as to where or how they are configured.
Instead, return the config option (easily searchable for) as well as the limit
currently set and the value that exceeded it when possible.
Previously, we would return a full tag set for every shard and the tag
set would include all series that existed in the database index
including series that didn't physically exist within that shard. This
led to the tag sets returned being incredibly huge when we had high
cardinality but sparse data. Since the data was sparse, it was
unexpected that it would cause such a large strain on the system by most
people.
Now we filter out the series ids that are not assigned to the current
shard when computing a tag set for that shard. This lowers the memory
usage for high cardinality sparse data drastically and allows queries on
those to complete successfully.
This does not resolve issues for high cardinality data in every shard
that is also spread out over a long series of time. That situation isn't
nearly as common as the above situation though.
This changes the behavior of the max-series-per-database and
max-values-per-tag limits to drop points that would exceed the limits
and allow the remaining points to be written. Previously, the whole
batch would fail and return and 500 error to the client.
This now will write the allow points and return a `partial write`
error indicating some of the points were dropped, how many were
dropped and one of the problem measureent and tags.
The FieldIterator is used to scan over the fields of a point, providing
information, and delaying parsing/decoding the value until it is needed.
This change uses this new type to avoid the allocation of a map for the
fields which is then thrown away as soon as the points get converted
into columns within the datastore.
When deleting a shard, the shard is locked and then removed from the
index. Removal from the index can be slow if there are a lot of
series. During this time, the shard is still expected to exist by
the meta store and tsdb store so stats collections, queries and writes
could all be run on this shard while it's locked. This can cause everything
to lock up until the unindexing completes and the shard can be unlocked.
Fixes#7226
When deleting a shard, the shard is locked and then removed from the
index. Removal from the index can be slow if there are a lot of
series. During this time, the shard is still expected to exist by
the meta store and tsdb store so stats collections, queries and writes
could all be run on this shard while it's locked. This can cause everything
to lock up until the unindexing completes and the shard can be unlocked.
Fixes#7226
This commit fixes the `MaxSelectSeriesN` limit which was broken by
the implementation of lazy iterators. The setting previously limited
the total number of series but the new implementation limits the
concurrent number of series being processed.
The `SHOW MEASUREMENTS` and `SHOW TAG VALUES` cannot go through the
query engine to get the speed they need. They also only need access to
the database index and do not need access to specific shards. This
removes the query rewriting that was done to turn these two queries into
a select statement and reimplements them inside of the coordinator as an
interface on the TSDBStore.
Truncate the time interval output of the monitor service to be on even
time intervals rather than on every minute based on the start time. This
normalizes the output from the monitor service.
The tsdb package had a substantial amount of dead code related to the
old query engine still in there. It is no longer used, so it was removed
since it was left unmaintained. There is likely still more code that is
the same, but wasn't found as part of this code cleanup.
influxql has dead code show up because of the code generation so it is
not included in this pruning.
Updated `influx_inspect` to use the `FieldDimensions` method instead
(more reliable anyway). The `influx_tsm` program used its own vendored
copy of `FieldCodec` so it is not affected by this change. `FieldCodec`
was only used for the `b1` and `bz1` engines which were removed in 0.12,
but the code that created the field codec was never removed. This
limited the maximum number of fields to 255 even though that restriction
was removed with the `tsm1` engine.
Fixes#6869.
The TSDBStore interface needs to also allow for remote TSDBStore but the
DatabaseIndex is only for a local TSDB instance. Moved the optimized
SHOW TAG VALUES path to do a typecast to the LocalTSDBStore struct
instead of always attempting to use the optimized version.
If the TSDBStore is not local and does not have the DatabaseIndex, it
will default to using the distributed query instead.
This commit optimizes `SHOW TAG VALUES` so that it avoids the
`SELECT` query engine execution and iterator creation. There
are also optimizations to reduce individual memory allocations
and to reduce in-memory heap size by only operating on one
measurement at a time.
Execution time has been reduce to approximately 900ms for
500,000 rows. This is about 2µs per row. Of this time,
approximately 1µs is spent retrieving and sorting the row
and 1µs is spent encoding into JSON and writing to the
response body.
For restoring a shard, we need to be able to have the shard open,
but disabled. It was racy to open it and then disable it separately
since writes/queries could occur in between that time.
If you use a statement like this:
SELECT value FROM one..cpu, two..cpu
It will access both the `one` and `two` databases as if you had selected
the `cpu` measurement twice for both of them. Updated the `tsdb.Shard`
create iterator function to filter out any sources that do not apply to
that shard so this duplication doesn't happen.
Fixes#6701.
The list of field keys in the index may have differed from the field
keys in the actual shard. Fixing `SHOW FIELD KEYS` so it relies only on
the shard rather than the index.
Fixes#6659.
Casting syntax is done with the PostgreSQL syntax `field1::float` to
specify which type should be used when selecting a field. You can also
do `field1::field` or `tag1::tag` to specify that a field or tag should
be selected.
This makes it possible to select a tag when a field key and a tag key
conflict with each other in a measurement. It also means it's possible
to choose a field with a specific type if multiple shards disagree. If
no types are given, the same ordering for how a type is chosen is used
to determine which type to return.
The FieldDimensions method has been updated to return the data type for
the fields that get returned. The SeriesKeys function has also been
removed since it is no longer needed. SeriesKeys was originally used for
the fill iterator, but then expanded to be used by auxiliary iterators
for determining the channel iterator types. The fill iterator doesn't
need it anymore and the auxiliary types are better served by
FieldDimensions implementing that functionality, so SeriesKeys is no
longer needed.
Fixes#6519.
Before #6038 was merged, we needed to filter "name" so that it didn't
accidentally hit the code path that used "name" to check the name of a
measurement. This was changed to "_name" to avoid a conflict with a
legitimate tag that used "name" as the key.
SHOW TAG VALUES was never modified to remove the code that filtered out
"name". This removes that line of code so a condition with "name"
doesn't get removed erroneously.
Example:
SHOW TAG VALUES WITH KEY = host WHERE "name" = 'jsternberg'
Fixes#6581.
This commit changes the `SeriesIterator` to process one measurement
at a time and uses a `floatFastDedupeIterator` to avoid point
encoding during deduplication.
This remove the dropMeta param from the tsdb.Store.DeleteSeries and
lets the shard determine when to remove the meta data from the index
based on what series still have data in the shard.
This uncovered a nasty bug in compactions where a fully deleted series would
prematurely end the compactions and not carry forward the rest of the data
in the TSM file. This is now fixed as well.
When a shard is closed and removed due to retention policy enforcement,
the series contained in the shard would still exists in the index causing
a memory leak. Restarting the server would cause them not to be loaded.
Fixes#6457
This has various benefits:
- Users embedding InfluxDB within other Go programs can specify a different logger / prefix easily.
- More consistent with code used elsewhere in InfluxDB (e.g. services, other `run.Server.*` fields, etc).
- This is also more efficient, because it means `executeQuery` no longer allocates a single `*log.Logger` each time it is called.
This also switches the remaining iterators to be lazy so they can return
errors properly. They needed to be converted to lazy initialization
anyway, which has the side effect of making it much easier for us to
propagate the underlying error during initialization.
Updated the Emitter to return errors when it cannot read properly from
the iterators.
Both Shard and Engine had the same reference to the measurementField map,
but they each protected it with their own locks. This causes a race when
write and queries are occurring because writes can add new fields to the
map while queries are reading from it.
The fix moves the ownership to the Engine and provides protected accessors
to that Shard now users. For the most parts, the access on shard were old
dead code.
Fixing the measurementFields map race created a new race on the internal
fields map. This is now unexported and protected via MeasurementFields
exported funcs.
Fixes#6188
This commit adds an `IteratorStats` that holds aggregate
iterator processing information. A method is also added to
`Iterator` to return the stats:
Stats() influxql.IteratorStats
The remote iterators will also emit their stats in the point
stream upon first connection, on a given interval, and then
finally once the last point has been sent.
This commit moves the `tsdb.Store.ExpandSources()` function onto
the `influxql.IteratorCreator` and provides support for issuing
source expansion across a cluster.