The following functions require ordered input but were not guaranteed to
received ordered input:
* `distinct()`
* `sample()`
* `holt_winters()`
* `holt_winters_with_fit()`
* `derivative()`
* `non_negative_derivative()`
* `difference()`
* `moving_average()`
* `elapsed()`
* `cumulative_sum()`
* `top()`
* `bottom()`
These function calls have now been modified to request that their input
be ordered by the query engine. This will prevent the improper output
that could have been caused by multiple series being merged together or
multiple shards being merged together potentially incorrectly when no
time grouping was specified.
Two additional functions were already correct to begin with (so there
are no bugs with these two, but I'm including their names for
completeness).
* `median()`
* `percentile()`
If there were multiple selectors and math, the query engine would
mistakenly think it was the only selector in the query and would not
match their timestamps.
Fixed the query engine to pass whether the selector should be treated as
a selector so queries like `max(value) * 1, min(value) * 1` will match
the timestamps of the result.
This commits adds a caching mechanism to the Data object, such that
when large numbers of users exist in the system, the cost of determining
if there is at least one admin user will be low.
To ensure that previously marshalled Data objects contain the correct
cached admin user value, we exhaustively determine if there is an admin
user present whenever we unmarshal a Data object.
The Window function will now check before it adjusts the offset whether
it is going to overflow or underflow. If it is going to do either, it
sets the start or end time to MinTime or MaxTime.
The following functions require ordered input but were not guaranteed to
received ordered input:
* `distinct()`
* `sample()`
* `holt_winters()`
* `holt_winters_with_fit()`
* `derivative()`
* `non_negative_derivative()`
* `difference()`
* `moving_average()`
* `elapsed()`
* `cumulative_sum()`
* `top()`
* `bottom()`
These function calls have now been modified to request that their input
be ordered by the query engine. This will prevent the improper output
that could have been caused by multiple series being merged together or
multiple shards being merged together potentially incorrectly when no
time grouping was specified.
Two additional functions were already correct to begin with (so there
are no bugs with these two, but I'm including their names for
completeness).
* `median()`
* `percentile()`
If there were multiple selectors and math, the query engine would
mistakenly think it was the only selector in the query and would not
match their timestamps.
Fixed the query engine to pass whether the selector should be treated as
a selector so queries like `max(value) * 1, min(value) * 1` will match
the timestamps of the result.
Additionally, support unary addition and subtraction for variables,
calls, and parenthesis expressions. Doing `-value` will be the
equivalent of doing `-1 * value` now.
This code was added to address some slow startup issues. It is believed
to be the cause of some segfault panic's that occur at query time when
the underlying MMAP array has been unmapped. The current structure of
code makes this change unnecessary now.
The timezone for a query can now be added to the end with something like
`TZ("America/Los_Angeles")` and it will localize the results of the
query to be in that timezone. The offset will automatically be set to
the offset for that timezone and offsets will automatically adjust for
daylight savings time so grouping by a day will result in a 25 hour day
once a year and a 23 hour day another day of the year.
The automatic adjustment of intervals for timezone offsets changing will
only happen if the group by period is greater than the timezone offset
would be. That means grouping by an hour or less will not be affected by
daylight savings time, but a 2 hour or 1 day interval will be.
The default timezone is UTC and existing queries are unaffected by this
change.
When times are returned as strings (when `epoch=1` is not used), the
results will be returned using the requested timezone format in RFC3339
format.
There is a lot of confusion in the code if the range is [start, end) or
[start, end]. This is not made easier because it is acts one way in some
areas and in another way in some other areas, but it is usually [start,
end]. The `time = ?` syntax assumed that it was [start, end) and added
an extra nanosecond to the end time to accomodate for that, but the
range was actually [start, end] and that caused it to include one extra
nanosecond when it shouldn't have.
This change fixes it so exactly one timestamp is selected when `time = ?`
is used.
This commits adds a caching mechanism to the Data object, such that
when large numbers of users exist in the system, the cost of determining
if there is at least one admin user will be low.
To ensure that previously marshalled Data objects contain the correct
cached admin user value, we exhaustively determine if there is an admin
user present whenever we unmarshal a Data object.
The liner dependency now handles the scenario where the terminal width
is reported as zero. Previously, liner would panic when it tried to
divide by the width (which was zero). Now it falls back onto a dumb
prompt rather than attempting to use a smart prompt and panicking.
When there were multiple series and anything other than the last series
had any null values, the series would start using the first point from
the next series to interpolate points.
Interpolation should not cross between series. Now, the linear fill
checks to make sure the next point is within the same series before
using it to perform interpolation.
When rewriting fields, wildcards within binary expressions were skipped.
This now throws an error whenever it finds a wildcard within a binary
expression in order to prevent the panic that occurs.
Instead of incrementing the `queryOk` statistic with or without the
continuous query running, it will only increment when the query is
actually executed.
Fsyncs to the WAL can cause higher IO with lots of small writes or
slower disks. This reworks the previous wal fsyncing to remove the
extra goroutine and remove the hard-coded 100ms delay. Writes to
the wal still maintain the invariant that they do not return to the
caller until the write is fsync'd.
This also adds a new config options wal-fsync-delay (default 0s)
which can be increased if a delay is desired. This is somewhat useful
for system with slower disks, but the current default works well as
is.
A single line comment will read until the end of a line and is started
with `--` (just like SQL). A multiline comment is with `/* */`. You
cannot nest multiline comments.
max-row-limit was set at 10000 since 1.0, but due to a bug it was
effectively 0 (disabled). 1.2 fixed this bug via #7368, but this
caused a breaking change w/ Grafana and any users upgrading from <1.2
who had not disabled the config manually.
If blocks containing overlapping ranges of time where partially
recombined, it was possible for the some points to get dropped
during compactions. This occurred because the window of time of
the points we need to merge did not account for the partial blocks
created from a prior merge.
Fixes#8084
Under high query load, a race exists in the cache and the WAL. Since
writes currently hit the cache first, they are availble for query before
they hit the WAL. If the WAL is writing and accessign the Value slice
at the same time that a query is run that needs to dedup the same slice,
a race occurs.
To fix this, the cache now just copies the values instead of storing the
slice passed in. Another way to fix this might be to have the writes go
to the wal before the cache. I think the latter would be better, but it
introduces some larger write path issues that we'd need to also address.
e.g. if the cache was full, writes to the WAL would need to be rejected
to avoid filling the disk.
Copying the slice in the cache is simpler for now and does not appear to
dramatically affect performance.
A measurement name that begins with an underscore and does not conflict
with one of the reserved measurement names will now be passed untouched
to the underlying shards rather than being intercepted as an empty
measurement.
A user still shouldn't rely on measurements that begin with underscores
to always be accessible, but this will prevent the most common use case
from causing unexpected behavior since we will very rarely, if ever, add
additional system sources.
Previously, subqueries would only be able to select tag fields within a
subquery if the subquery used a selector. But, it didn't filter out
aggregates like `mean()` so it would panic instead.
Now it is possible to select the tag directly instead of rewriting the
query in an invalid way.
Some queries in this form will still not work though. For example, the
following still does not function (but also doesn't panic):
SELECT count(host) FROM (SELECT mean, host FROM (SELECT mean(value) FROM cpu GROUP BY host))
The builder used for subqueries does not handle parenthesis, but a set
of parenthesis wrapping a field would cause it to panic. This code now
reduces the expression so the parenthesis are removed before being
processed.
Previously, tags had a `shouldCopy` flag to indicate if those tags
referenced an underlying buffer and should be copied to allow GC.
Unfortunately, this prevented tags from being copied that were
created and referenced the mmap which caused segfaults.
This change removes the `shouldCopy` flag and replaces it with a
`forceCopy` argument in `CreateSeriesIfNotExists()`. This allows
the write path to indicate that tags must be cloned on insert.
This commit introduces a new interface type, influxql.Authorizer, that
is passed as part of a statement's execution context and determines
whether the context is permitted to access a given database. In the
future, the Authorizer interface may be expanded to other resources
besides databases. In this commit, the Authorizer interface is
specifically used to determine which databases are returned when
executing SHOW DATABASES.
When HTTP authentication is enabled, the existing meta.UserInfo struct
implements Authorizer, meaning admin users can SHOW every database, and
non-admin users can SHOW only databases for which they have read and/or
write permission.
When HTTP authentication is disabled, all databases are visible through
SHOW DATABASES.
This addresses a long-standing issue where Chronograf or Grafana would
be unable to list databases if the logged-in user did not have admin
privileges.
Fixes#4785.
The following types of queries will panic:
SELECT mean, host FROM (SELECT mean(value) FROM cpu GROUP BY host)
SELECT top(sum, host, 3) FROM (SELECT sum(value) FROM cpu GROUP BY host)
These queries _should_ work, but due to a current limitation with
aggregate functions, the aggregate functions won't return any auxiliary
fields. So even if a tag is not an auxiliary field, it is treated that
way by the query engine and this query will fail.
Fixing this properly will take a longer period of time. This fix just
prevents the panic from killing the server while we fix this for real.
The order of series keys is in ascending alphabetical order, not
descending alphabetical order, when it is ordered by descending time.
This fixes the ordering so points are returned in descending order. The
emitter also had the conditions for choosing which iterator to use in
the wrong direction (which only affects aggregates with `FILL(none)`).
This fixes LIMIT and OFFSET when they are used in a subquery where the
grouping of the inner query is different than the grouping of the outer
query. When organizing tag sets, the grouping of the outer query is
used so the final result is in the correct order. But, unfortunately,
the optimization incorrectly limited the number of points based on the
grouping in the outer query rather than the grouping in the inner query.
The ideal solution would be to use the outer grouping to further
organize it by the grouping for the inner subquery, but that's more
difficult to do at the moment. As an easier fix, the query engine now
limits the output of each series. This may result in these types of
queries being slower in some situations like this one:
SELECT mean(value) FROM (SELECT value FROM cpu GROUP BY host LIMIT 1)
This will be slower in a situation where the `cpu` measurement has a
high cardinality and many different tags.
This also fixes `last()` and `first()` when they are used in a subquery
because those functions use `LIMIT 1` as an internal optimization.
The code that checked if a query was authorized did not account for
sources that were subqueries. Now, the check for the required privileges
will descend into the subquery and add the subqueries required
privileges to the list of required privileges for the entire query.
When using `non_negative_derivative()` and `last()` in a math aggregate
with each other, the math would not be matched with each other because
one of those aggregates would emit one fewer point than the others. The
math iterators have been modified so they now track the name and tags of
a point and match based on those.
This isn't necessarily ideal and may come to bite us in the future. We
don't necessarily have a defined structure for all iterators so it can
be difficult to know which of two points is supposed to come first in
the ordering. This uses the common ordering that usually makes sense,
but the query engine is getting complicated enough where I am not 100%
certain that this is correct in all circumstances.
Adds a `tsdb.Series.ForEachTag()` function for safely iterating
over a series' tags within the context of a lock. This preverts
tags from being dereferenced during iteration which can cause
a seg fault.
During development, I, at some point, decided that the dimensions should
be expanded based on what was available rather than what was present in
the subquery. I don't really know the rationale for this because I
forgot, but it doesn't make sense or seem to be particularly useful.
Expanding dimensions now just uses the values specified in the subquery
rather than expanding to all available dimensions of the measurement in
the subquery.
With the new shard mapper implementation, regexes were just ignored so
it attempted to look up the field type inside of a measurement with no
name (which cannot possibly exist) so it would think the field didn't
exist and map it as the unknown type.
During development, I, at some point, decided that the dimensions should
be expanded based on what was available rather than what was present in
the subquery. I don't really know the rationale for this because I
forgot, but it doesn't make sense or seem to be particularly useful.
Expanding dimensions now just uses the values specified in the subquery
rather than expanding to all available dimensions of the measurement in
the subquery.
With the new shard mapper implementation, regexes were just ignored so
it attempted to look up the field type inside of a measurement with no
name (which cannot possibly exist) so it would think the field didn't
exist and map it as the unknown type.
Fixes#7822
This change first ensures that databases and retention policies exist
before attempting to remove them from the Store. It also adds some
checks in the `DeleteDatabase` and `DeleteRetentionPolicy` to ensure
that maliciously named entries won't remove anything outside of the
configured data directory.