* off by default, enabled by `query-stats-enabled`
* writes to cq_query measurement of configured monitor database
* see CHANGELOG for schema of individual points
They rebased a revision we were previously relying upon that allowed us
to use the vanity name so we are reverting back to an older version with
the old import path.
URL=http://localhost:8086 go test -parallel 1 ./cmd/influxd/run
will run the tests over HTTP against localhost:8086. They currently
need to be run serially since they all write to the same DB.
This commit introduces a new interface type, influxql.Authorizer, that
is passed as part of a statement's execution context and determines
whether the context is permitted to access a given database. In the
future, the Authorizer interface may be expanded to other resources
besides databases. In this commit, the Authorizer interface is
specifically used to determine which databases are returned when
executing SHOW DATABASES.
When HTTP authentication is enabled, the existing meta.UserInfo struct
implements Authorizer, meaning admin users can SHOW every database, and
non-admin users can SHOW only databases for which they have read and/or
write permission.
When HTTP authentication is disabled, all databases are visible through
SHOW DATABASES.
This addresses a long-standing issue where Chronograf or Grafana would
be unable to list databases if the logged-in user did not have admin
privileges.
Fixes#4785.
The following types of queries will panic:
SELECT mean, host FROM (SELECT mean(value) FROM cpu GROUP BY host)
SELECT top(sum, host, 3) FROM (SELECT sum(value) FROM cpu GROUP BY host)
These queries _should_ work, but due to a current limitation with
aggregate functions, the aggregate functions won't return any auxiliary
fields. So even if a tag is not an auxiliary field, it is treated that
way by the query engine and this query will fail.
Fixing this properly will take a longer period of time. This fix just
prevents the panic from killing the server while we fix this for real.
The order of series keys is in ascending alphabetical order, not
descending alphabetical order, when it is ordered by descending time.
This fixes the ordering so points are returned in descending order. The
emitter also had the conditions for choosing which iterator to use in
the wrong direction (which only affects aggregates with `FILL(none)`).
When using `non_negative_derivative()` and `last()` in a math aggregate
with each other, the math would not be matched with each other because
one of those aggregates would emit one fewer point than the others. The
math iterators have been modified so they now track the name and tags of
a point and match based on those.
This isn't necessarily ideal and may come to bite us in the future. We
don't necessarily have a defined structure for all iterators so it can
be difficult to know which of two points is supposed to come first in
the ordering. This uses the common ordering that usually makes sense,
but the query engine is getting complicated enough where I am not 100%
certain that this is correct in all circumstances.
Fixes#7906
In an attempt to reduce the overhead of using regex for exact matches,
the query parser will replace `=~ /^thing$/` with `== 'thing'`, but the
conditions being checked would ignore if any flags were set on the
expression, so `=~ /(?i)^THING$/` was replaced with `== 'THING'`, which
will fail unless the case was already exact. This change ensures that no
flags have been changed from those defaulted by the parser.
Fixes#7906
In an attempt to reduce the overhead of using regex for exact matches,
the query parser will replace `=~ /^thing$/` with `== 'thing'`, but the
conditions being checked would ignore if any flags were set on the
expression, so `=~ /(?i)^THING$/` was replaced with `== 'THING'`, which
will fail unless the case was already exact. This change ensures that no
flags have been changed from those defaulted by the parser.
With the new shard mapper implementation, regexes were just ignored so
it attempted to look up the field type inside of a measurement with no
name (which cannot possibly exist) so it would think the field didn't
exist and map it as the unknown type.
With the new shard mapper implementation, regexes were just ignored so
it attempted to look up the field type inside of a measurement with no
name (which cannot possibly exist) so it would think the field didn't
exist and map it as the unknown type.
Previously, only time expressions got propagated inwards. The reason for
this was simple. If the outer query was going to filter to a specific
time range, then it would be unnecessary for the inner query to output
points within that time frame. It started as an optimization, but became
a feature because there was no reason to have the user repeat the same
time clause for the inner query as the outer query. So we allowed an
aggregate query with an interval to pass validation in the subquery if
the outer query had a time range. But `GROUP BY` clauses were not
propagated because that same logic didn't apply to them. It's not an
optimization there. So while grouping by a tag in the outer query
without grouping by it in the inner query was useless, there wasn't any
particular reason to care.
Then a bug was found where wildcards would propagate the dimensions
correctly, but the outer query containing a group by with the inner
query omitting it wouldn't correctly filter out the outer group by. We
could fix that filtering, but on further review, I had been seeing
people make that same mistake a lot. People seem to just believe that
the grouping should be propagated inwards. Instead of trying to fight
what the user wanted and explicitly erase groupings that weren't
propagated manually, we might as well just propagate them for the user
to make their lives easier. There is no useful situation where you would
want to group into buckets that can't physically exist so we might as
well do _something_ useful.
This will also now propagate time intervals to inner queries since the
same applies there. But, while the interval propagates, the following
query will not pass validation since it is still not possible to use a
grouping interval with a raw query (even if the inner query is an
aggregate):
SELECT * FROM (SELECT mean(value) FROM cpu) WHERE time > now() - 5m GROUP BY time(1m)
This also means wildcards will behave a bit differently. They will
retrieve dimensions from the sources in the inner query rather than just
using the dimensions in the group by.
Fixing top() and bottom() to return the correct auxiliary fields.
Unfortunately, we were not copying the buffer with the auxiliary fields
so those values would be overwritten by a later point.
Previously, only time expressions got propagated inwards. The reason for
this was simple. If the outer query was going to filter to a specific
time range, then it would be unnecessary for the inner query to output
points within that time frame. It started as an optimization, but became
a feature because there was no reason to have the user repeat the same
time clause for the inner query as the outer query. So we allowed an
aggregate query with an interval to pass validation in the subquery if
the outer query had a time range. But `GROUP BY` clauses were not
propagated because that same logic didn't apply to them. It's not an
optimization there. So while grouping by a tag in the outer query
without grouping by it in the inner query was useless, there wasn't any
particular reason to care.
Then a bug was found where wildcards would propagate the dimensions
correctly, but the outer query containing a group by with the inner
query omitting it wouldn't correctly filter out the outer group by. We
could fix that filtering, but on further review, I had been seeing
people make that same mistake a lot. People seem to just believe that
the grouping should be propagated inwards. Instead of trying to fight
what the user wanted and explicitly erase groupings that weren't
propagated manually, we might as well just propagate them for the user
to make their lives easier. There is no useful situation where you would
want to group into buckets that can't physically exist so we might as
well do _something_ useful.
This will also now propagate time intervals to inner queries since the
same applies there. But, while the interval propagates, the following
query will not pass validation since it is still not possible to use a
grouping interval with a raw query (even if the inner query is an
aggregate):
SELECT * FROM (SELECT mean(value) FROM cpu) WHERE time > now() - 5m GROUP BY time(1m)
This also means wildcards will behave a bit differently. They will
retrieve dimensions from the sources in the inner query rather than just
using the dimensions in the group by.
Fixing top() and bottom() to return the correct auxiliary fields.
Unfortunately, we were not copying the buffer with the auxiliary fields
so those values would be overwritten by a later point.
This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.
Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).
The syntax is like this:
SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)
This will execute derivative and then sum the result of those derivatives.
Another example:
SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)
This would let you find the maximum minimum value of each host.
There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:
SELECT mean(value) FROM cpu
SELECT mean(value) FROM (SELECT value FROM cpu)
Have different performance characteristics. The first will calculate
`mean(value)` at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.
It looks like the real import path to the project is go.uber.org/zap
instead of github.com/uber-go/zap since the example in the project
references that path.
This was needed when we were on go 1.4 but hasn't been needed since go
1.5. It was kept because we weren't sure if we were going to have to
rollback to an older version of Go at that time and we kept it so we
wouldn't forget to readd it.
Now that we are on go 1.7 with go 1.4 deprecated, there is no going back
so we might as well remove this so people can set GOMAXPROCS to a custom
value using environment variables.
The logging library has been switched to use uber-go/zap. While the
logging has been changed to use structured logging, this commit does not
change any of the logging statements to take advantage of the new
structured log or new log levels. Those changes will come in future
commits.
`percentile()` is supposed to be a selector and return the time of the
point, but that only got changed when the input was a float. Updating
the integer processor to also return the time of the point rather than
the beginning of the interval.
The `partial` tag has been added to the JSON response of a series and
the result so that a client knows when more of the series or result will
be sent in a future JSON chunk.
This helps interactive clients who don't want to wait for all of the
data to know if it is done processing the current series or the current
result. Previously, the client had to guess if the next chunk would
refer to the same result or a new result and it had to match the name
and tags of the two series to know if they were the same series. Now,
the client just needs to check the `partial` field included with the
response to know if it should expect more.
Fixed `max-row-limit` so it counts rows instead of results and it
truncates the response when the `max-row-limit` is reached.
There are 2 new keys in the configuration file.
- security-level: "none", "sign", or "encrypt".
- auth-file: The location of the user/password file.
Please see the collectd network doc for more details.
When a query would use a grouping with two different aggregates, it was
possible for one of the aggregates to return a value from a different
series key than the second aggregate. When these series keys didn't
match, the returned grouping would be screwed up because it sorted by
time before checking for name and tags.
This did not happen when the aggregates returned values for the same
series keys because then the iterators were aligned with each other.
When a limit is exceeded, we return errors and sometimes log (if appropriate)
that a limit was exceeded. The messages don't always provide an indication
as to where or how they are configured.
Instead, return the config option (easily searchable for) as well as the limit
currently set and the value that exceeded it when possible.
The admin console would dynamically discover the version from the
InfluxDB server, but for patch releases, it included the patch in the
link to the documentation and that wasn't a valid link.
Truncate the version so the documentation url is correct since we only
do documentation for `major.minor`.
Changes the default time boundaries for raw queries so raw queries will
range until the end of time. Aggregate queries continue to have their
default end time be `now()`.
The `cumulative_sum()` function can be used to sum each new point and
output the current total. For the following points:
cpu value=2 0
cpu value=4 10
cpu value=6 20
This would output the following points:
> SELECT cumulative_sum(value) FROM cpu
time value
---- -----
0 2
10 6
20 12
As can be seen, each new point adds to the sum of the previous point and
outputs the value with the same timestamp.
The function can also be used with an aggregate like `derivative()`.
> SELECT cumulative_sum(mean(value) FROM cpu WHERE time >= now() - 10m GROUP BY time(1m)
The tags passed into Statistics() calls are not supposed to be modified.
The balanceWriter in subscribers tried to modify them triggering a panic
because they can be nil.
It is now possible to use a mixed duration unit like `1h30m`. The
duration units can be in whatever order as long as they are connected to
each other.
There is a change to the scanner. A token such as `10x` will be scanned
as a duration literal, but will then fail to parse as an invalid
duration. This should not be a breaking change as there is no situation
where `10m10` was a valid order of tokens for the parser.
Fixes#3634.
Instead of having the parser set the defaults, the command will set the
defaults so that the constants for that are actually used. This way we
can also identify which things the user provided and which ones we are
filling with default values.
This allows the meta client to be able to make smarter decisions when
determining if the user requested a conflict or if the requested
capabilities match with what is currently available. If you just say
`CREATE DATABASE WITH NAME myrp`, the user doesn't really care what the
duration of the retention policy is and just wants to use the default.
Now, we can use that information to determine if an existing retention
policy would conflict with what the user requested rather than returning
an error if a default value ever gets changed since the meta client
command can communicate intent more easily.
This commit fixes the `MaxSelectSeriesN` limit which was broken by
the implementation of lazy iterators. The setting previously limited
the total number of series but the new implementation limits the
concurrent number of series being processed.
Updating the help formatting for all of the commands for consistency.
Removed tabs from the output in favor of using spaces so it is more
clear how the output is intended to look.
Truncate the time interval output of the monitor service to be on even
time intervals rather than on every minute based on the start time. This
normalizes the output from the monitor service.
Removing the data only build tag because it is no longer used. It was
previously used for building a data only node from the open source build
that could be used in clustering, but all of the RPC code was removed
since the separation was too much of a hindrance on development whenever
we found a bug.
Since the original purpose of this is now useless, there's no point in
having a confusing build tag around.
Normalize the output for the various help options so they all follow the
same format and display all relevant options.
Removing some of the unused config options from the configuration file
and updating the help documentation. Removing some remaining references
to clustering within the open source version.
This adds support for using regex expressions in SHOW TAG VALUES when
selecting the key. Also supporting the `!=` operation for the
comparison. Now you can do any of the following:
SHOW TAG VALUES WITH KEY != "region"
SHOW TAG VALUES WITH KEY =~ /region/
SHOW TAG VALUES WITH KEY !~ /region/
It also adds a new SetLiteral AST node that will potentially be used in
the future to allow set operations for other comparisons in the future.
Fixes#4532.