This adds support for using regex expressions in SHOW TAG VALUES when
selecting the key. Also supporting the `!=` operation for the
comparison. Now you can do any of the following:
SHOW TAG VALUES WITH KEY != "region"
SHOW TAG VALUES WITH KEY =~ /region/
SHOW TAG VALUES WITH KEY !~ /region/
It also adds a new SetLiteral AST node that will potentially be used in
the future to allow set operations for other comparisons in the future.
Fixes#4532.
The graphite service will attempt to create the retention policy and use
it. If the retention policy doesn't exist, it will be created with the
default options.
Fixes#5655.
The current code would compare every string literal it crossed and tried
to coerce them to time literals if the _looked_ like date/time strings.
The only time the TimeLiteral was used is when comparing to the the
'time' value in a where clause. This change moves the string parsing
code until we attempt to compare 'time' to a string, at which point we
know we need/want a TimeLiteral, and not just an ordinary string.
Fixes#6727
The level planner would keep including the same TSM files to be
recompacted even if they were already quite compacted and split
across several TSM files.
Fixes#6683
The default retention policy name is changed to "autogen" instead of
"default" since it ends up being ambiguous when we tell a user to check
the default retention policy, it is uncertain if we are referring to the
default retention policy (which can be changed) or the retention policy
with the name "default".
Now the automatically generated retention policy name is "autogen".
The default retention policy is now also configurable through the
configuration file so an administrator can customize what they think
should be the default.
Fixes#3733.
A query's String method is called multiple times per query. This commit
ensures all calls to query.String share use of a strings.NewReplacer.
This approximately halves the number of allocations for the benchmarked
query.
If you use a statement like this:
SELECT value FROM one..cpu, two..cpu
It will access both the `one` and `two` databases as if you had selected
the `cpu` measurement twice for both of them. Updated the `tsdb.Shard`
create iterator function to filter out any sources that do not apply to
that shard so this duplication doesn't happen.
Fixes#6701.
Use the latest documentation by default if the server version can't be
found. If it can be found, update the documentation link to that
specific version so it always points at the exact documentation relevant
for the server version.
Fixes#5906.
The limit optimization was put into the wrong place and caused only part
of the shard to be read when a limit was used. The optimization is
possible, but requires a bit of refactoring to the code here so the call
iterator is created per series before handed to the limit iterator.
Fixes#6661.
Due to an bug in TSM tombstone files, it was possible to create
empty tombstone files. At startup, the TSM file would error out
and not load the TSM file.
Instead, treat it as an empty v1 file so the TSM file can load
correctly.
Fixes#6641
The HTTPS configuration for the httpd service only had an option to
specify the certificate file and the same file would be used for both
the certificate and private key file (they could be concatenated
together).
This adds an additional option to specify the files differently from
each other while still allowing the previous behavior. If only
`https-certificate` is specified, the httpd service will try to load the
private key from the `https-certificate` file. If a separate
`https-private-key` file is specified, the private key will be loaded
from there instead.
Fixes#1310.
The parser can be passed a map of keys to literal values to be replaced
into the query. Parameters are preceded by a dollar sign (`$`). If a
parameter key is missing, an error is thrown by the parser.
Fixes#2926.
If there were duplicate points in multiple blocks, we would correctly
dedup the points and mark the regions of the blocks we've read.
Unfortunately, we were not excluding the already points as the cursor
moved to points in the later blocks which could cause points to be
return twice incorrectly.
Fixes#6611
The list of field keys in the index may have differed from the field
keys in the actual shard. Fixing `SHOW FIELD KEYS` so it relies only on
the shard rather than the index.
Fixes#6659.
When authenticating a request, check that an admin user exists instead
of checking for len(users) > 0. This prevents getting stuck with no
admin user and being unable to create one.
Casting syntax is done with the PostgreSQL syntax `field1::float` to
specify which type should be used when selecting a field. You can also
do `field1::field` or `tag1::tag` to specify that a field or tag should
be selected.
This makes it possible to select a tag when a field key and a tag key
conflict with each other in a measurement. It also means it's possible
to choose a field with a specific type if multiple shards disagree. If
no types are given, the same ordering for how a type is chosen is used
to determine which type to return.
The FieldDimensions method has been updated to return the data type for
the fields that get returned. The SeriesKeys function has also been
removed since it is no longer needed. SeriesKeys was originally used for
the fill iterator, but then expanded to be used by auxiliary iterators
for determining the channel iterator types. The fill iterator doesn't
need it anymore and the auxiliary types are better served by
FieldDimensions implementing that functionality, so SeriesKeys is no
longer needed.
Fixes#6519.
Drop database was closing and deleting each shard dir individually and
serially. It would then delete the empty database dirs.
This changes drop database to close all shards in parallel and run
one os.RemoveAll to remove everything under the db dir which is more
efficient.
This also reworked the locking to avoid locking the tsdb.Store for
long periods of time. That can cause queries and writes for other
databases to block as well.
The http connection limit is for any HTTP operation and is independent
of the other connection limits. It should be set to a higher value than
the query limit. The difference between this and the query limit is it
will close out the connection immediately without any further
processing.
This is the equivalent of the `max_connections` option in PostgreSQL.
Also removes some unused config options from the cluster config.
Fixes#6559.
On data sets with many series and potentially large series keys,
the cost of parsing the key and re-indexing can be high.
Loading the TSM keys into the index was being done repeatedly for
series that were already index by an earlier TSM file. This was
wasted worked and slows down shard loading.
Parsing the key was also innefficient and allocated a new string
slice. This was simplified to remove that allocation.
#6261 Added distinct improvement as a bugfix
is a fix that I'm looking for in 0.13 -- so wanted to add it to the CHANGELOG, in the event others are interested.
The cursors were returning the wrong value in the case when points
existed in both the cache and tsm files with the same timestamp. The
cache value should have been returned, but the tsm value was returned
incorrectly.
Fixes#6439
This commit changes the `SeriesIterator` to process one measurement
at a time and uses a `floatFastDedupeIterator` to avoid point
encoding during deduplication.
If a shard is empty for a specific field and the field type is something
other than a float, a nil iterator would get returned from one of the
empty shards and cause the combined iterators to be cast to the float
type and all other iterator types to be discarded (or for integers, to
be cast).
This is rare since most aggregates don't accept strings or booleans, but
for queries like:
SELECT distinct(string) FROM mydata
It would result in nothing getting returned if one of the shards didn't
have a value for `string`.
This change modifies the query engine to return nil for the shards
instead of a fake iterator and then to only use the fake iterator if the
final aggregate iterator is nil (meaning that no iterators could be
constructed for the field from any shard).
Fixes#6495.
An offset of `time(1m, now())` will anchor the offset to the current
time of the query. The default offset is `0s` which is the current
default anyway.
This fixes#2074 by making time zone offset support unnecessary. Time
comparisons can use timezones inside of the time clause and the offset
needed for non-hour timezone differences can be used as part of the
offset argument.
The code for parsing a key our of the WAL or TSM files in the engine
was naive and didn't account for measurements with escape chars. This
uses the correct parsing code to parse and load them correctly.
Fixes#6496
The time of the point will be returned with a selector when there is no
group by interval and when there is only one selector. Any other
conditions will return the start time of the interval.
Fixes#5890.
In order to follow REST a bit more carefully, all write operations
should go through a POST in the future. We still allow read operations
through either GET or POST (similar to the Graphite /render endpoint),
but write operations will trigger a returned warning as part of the JSON
response and will eventually return an error.
Also updates the Golang client libraries to always use POST instead of
GET.
Fixes#6290.
When a shard is closed and removed due to retention policy enforcement,
the series contained in the shard would still exists in the index causing
a memory leak. Restarting the server would cause them not to be loaded.
Fixes#6457
This also removes the dependency on `os/user` and uses the `HOME`
environment variable which is more common on Linux and Mac OS X for
customizing the history file location.
Removing this import also lets the `influx` binary be cross-compiled as
`os/user` relies on cgo.
This commit removes support for `SHOW SERVERS` and `DROP SERVER`
from the `influxql` package. It also removes extraneous cluster
testing code from `cmd/influxd/run`.
Fixes#6465
The derivative function had an arbitrary limitation that would cause it
to set the value to zero if the previous value was after the next value.
This caused all `ORDER BY desc` queries with `derivative()` to always
return zero values.
Fixes#4675.
Binary math inside of a where condition was previously disallowed. Now,
these types of queries are just passed verbatim down to the underlying
query engine which can handle it.
We may want to revisit this when it comes to tags at some point as it
prevents the more efficient filtering of tags that a simple expression
allows, but it allows a query like this to be done:
SELECT * FROM cpu WHERE value + 2 < 5
So while it can be better, this is a good initial implementation to
provide this functionality. There are very rare situations where a tag
may be used appropriately in one of these circumstances.
Fixes#3558.
Sanitizing is now done through pattern matching rather than parsing the
query and replacing the password in the query. This prevents
accidentally redacting the wrong part of a query and revealing what the
password is through association.
Fixes#3883.
The config path previously could only be specified through the command
line options. This made it very difficult to set a default config path
without using any option.
Now the environment variable can be set so the default configuration
path is set to a specific place, but can be overwritten using the
command line option.
The primary purpose of this is so the Docker container can have a
default configuration file, but not have to parse the command line
options to figure out if a different configuration file has been
specified while still allowing the user to only type `influxd` and have
the program start correctly.
This might also help #6392 as it would allow a default configuration
location to be included with the package by setting an environment
variable.
A default search path is also provided now with checking the following
paths for a config file when none is specified:
* `~/.influxdb/influxdb.conf`
* `/etc/influxdb/influxdb.conf`
The config command has also been modified to read this config file
before outputting a sample config.
This has various benefits:
- Users embedding InfluxDB within other Go programs can specify a different logger / prefix easily.
- More consistent with code used elsewhere in InfluxDB (e.g. services, other `run.Server.*` fields, etc).
- This is also more efficient, because it means `executeQuery` no longer allocates a single `*log.Logger` each time it is called.
The cache max memory size is an approximate size and can prevent a
shard from loading at startup. This change disable the max size
at startup to prevent this problem and sets the limt back after
reloading.
Fixes#6109
If users properly call client.Close(), then this will make sure that
established tcp connections dont continually grow when creating new http
clients.
This fixes the case where users are creating new http clients on top of
existing _valid_ connections.
This was encountered in Telegraf when we were recreating our http
clients after getting write failures that were unrelated to the actual
connection being severed (such as typos in the retention policy, see
https://github.com/influxdata/telegraf/issues/1058)
The series keys within a tag set were previously not sorted which would
cause the output to be non-deterministic. This sorts the output series
by their keys so it has a consistent output especially when using
limits.
Fixes#3166.
The optional sections of the command consumed the semicolon token and
didn't put it back for the outer loop. The code shouldn't explicitly
check for a semicolon or EOF anyway, so these checks were removed and
the token gets unscanned if it doesn't match the optional token that the
parser is looking for.
Fixes#6398.
For aggregate queries, derivatives will now alter the start time to one
interval behind and will use that interval to find the derivative of the
first point instead of giving no value for that interval. Null values
will still be discarded so if the interval before the one you are
querying is null, then it will be discarded like if it were in the
middle of the query. You can use `fill(0)` to fill in these values.
This does not apply to raw queries yet.
Also modified the derivative and difference aggregates to use the stream
iterator instead of the reduce slice iterator for space efficiency.
Fixes#3247. Contributes to #5943.
The deprecated message is now attached to a new attribute returned with
the results. This message can then be read by clients to warn a user
about upcoming changes to the query engine.
The `influx` client has already been modified to read this message and
print it out for every format except CSV.
The first warning message is a deprecated message about removing `IF NOT
EXISTS` from `CREATE DATABASE`.
The message will also be printed to the server log.
Fixes#5707.
The `percentile()` call previously did not validate that the first
argument was a variable reference and that would let an invalid query
slip by that would panic the query engine.
Added checking for this case and also included test cases for the other
calls that require a variable reference as the first argument.
Fixes#6379.
This commit makes a number of performance improvements to
reduce allocations during query execution. Several objects
and buffers are now reused across the components to avoid
allocations.
Previously a simple `count(value)` query across 1M points
would require 26,000+ allocations. After the changes in
this commit that number has been reduced to 88.
A missing tag on a point was sometimes treated as `""` and sometimes
treated as a separate `null` entity. This change modifies the equality
operations to always treat a missing tag as an empty string.
Empty tags are *not* indexed and do not have the same performance as a
tag that exists.
Fixes#3773.
If a point had no tags at all and was asked for the subset of tags with
at least one key, it would return a new set of tags that was completely
empty. In contrast, if the point had any tags at all, it would return a
set of tags with the tag value being an empty string. This lead to
a point with no tags being treated differently than a point with at
least one tag.
Fixing this so the tag value will always be an empty string for
consistency. A missing tag should always be empty.
The following query was fixed previously:
SELECT 'value' FROM cpu
This ended up hitting the `buildExprIterator()` code path and was
handled properly. But this query:
SELECT 'value', value FROM cpu
This took a different code path that would trigger a panic because it
triggered a panic instead of an error condition. This code path has now
been modified to trigger an error instead of a panic.
Fixes#6248.
Influxdb uses gdm for the dependency management. For gdm to work on Windows, it needs go 1.6. Influx still uses go 1.4.3. So go 1.6 will be sed for the pre-build stages, and go 1.4.3 to run the tests.
This arrangements is working in AppVeyor now.
A bigger refactor of these functions is needed to support #3290, but
this will work for the more common case that someone uses double quotes
instead of single quotes when surrounding a time literal.
Fixes#3932.
FormValue() would attempt to parse the body of a request when the
content-type is set to `application/x-www-form-urlencoded`. The write
handler never wants url-encoded forms, and should only ever check the
URL for query parameters.
Fixes#6061
The simple moving average will gradually emit points instead of waiting
until the end. This should apply to derivative and difference in the
future too.
Fixes#6112.
Related to #6140, but won't actually fix that problem. It will correctly
stop new queries from being started during shutdown and will send the
interrupt signal to queries during shutdown.
Since the interrupt signal is asynchronous, there isn't currently a way
to wait for the queries to complete themselves before shutting down the
engine.
The difference function is implemented very similar to how derivative is
implemented. It is an aggregate function that acts over the entire
aggregate. This function will also have the same problems that
derivative has with getting values from the previous interval or point.
This will be fixed separately as part of #5943.
Fixes#1825.
Partially fixes#6094.
Prior to this when passing the same query and CQ name in a CREATE
CONTINUOUS QUERY command an error would be returned. This means the
command was not behaving in a similar way to other commands.a
Now when running the command with the same CQ name and query string no
error will be returned. Note, this change does not parse the query, it
simply compares a normalised query string to the existing one on the CQ.
- Improved handling of tar and zip package outputs
- Added support for statically-compiled binary outputs
- Modifyed the nightly version format to be more human-readable
Allows configuration of shard group duration at database creation, and retention
policy create/alter time.
Query examples:
```
CREATE DATABASE testdb WITH DURATION 90d SHARD DURATION 30m NAME rp_testdb
CREATE RETENTION POLICY rp_testdb2 ON testdb DURATION INF REPLICATION 1 SHARD DURATION 30m
ALTER RETENTION POLICY rp_testdb2 ON testdb SHARD DURATION 1h
```
This can be useful with long duration retention policies with lots of data, where
you can split into smaller shards to relieve memory pressure.
This commit adds a configurable limit to the number of series that
can be returned from a `SELECT` statement. The limit is checked
immediately after planning and is determined by the use of iterator
stats.
Fixes#6076
This allows multiple semicolons in a row now and also requires that a
semicolon separate commands. The query specification says this is
required, but a boolean error in `ParseQuery` makes one semicolon
optional and multiple semicolons an error.
Fixes#5728.
If an OR was used, merging filters between different expressions would
not work correctly. If one of the sides had a set of series ids with a
condition and the other side had no series ids associated with the
expression, all of the series from the side with a condition would have
the condition ignored. Instead of defaulting a non-existant series
filter to true, it should just be false and the evaluation of the one
side that does exist should take care of determining if the series id
should be included or not. The AND condition used false correctly so did
not have to be changed.
If a tag did not exist and `!=` or `!~` were used, it would return false
even though the neither a field or a tag equaled those values. This has
now been modified to correctly return the correct series ids and the
correct condition.
Also fixed a panic that would occur when a tag caused a field access to
become unnecessary. The filter using the field access still got created
and used even though it was unnecessary, resulting in an attempted
access to a non-initialized map.
Fixes#5152 and a bunch of other miscellaneous issues.
This commit adds an `IteratorStats` that holds aggregate
iterator processing information. A method is also added to
`Iterator` to return the stats:
Stats() influxql.IteratorStats
The remote iterators will also emit their stats in the point
stream upon first connection, on a given interval, and then
finally once the last point has been sent.
Numbers in the query without any decimal will now be emitted as integers
instead and be parsed as an IntegerLiteral. This ensures we keep the
original context that a query was issued with and allows us to act more
similar to how programming languages are typically structured when it
comes to floats and ints.
This adds functionality for dealing with integers promoting to floats in
the various different places where math are used.
Fixes#5744 and #5629.
`SHOW TAG VALUES` output has been modified to print the measurement name
for every measurement and to return the output in two columns: key and
value. An example output might be:
> SHOW TAG VALUES WITH KEY IN (host, region)
name: cpu
---------
key value
host server01
region useast
name: mem
---------
key value
host server02
region useast
`measurementsByExpr` has been taught how to handle reserved keys (ones
with an underscore at the beginning) to allow reusing that function and
skipping over expressions that don't matter to the call.
Fixes#5593.
This should fix#5865
This commit also removes the dependecy on the influxql package constants
that were used to write b1 and bz1 files and have changed since the
release of 0.10
... by extracting the db/rp from the given path.
Now that the code has "standardized" on extracting db/rp this way, the
ShardLocation struct is no longer necessary and thus has been removed.
We're back on the previous style of passing the path and walPath to
NewShard.
Previously, CQs with the same name would be stored in the last run map
the same way. This caused only one of the CQs to run because after the
first one ran it would update the last run time for all CQs with the
same name.
Add the database name to the CQ ID in the last run map to differentiate
between CQs in different databases.
Fixes#5814.
Fixes#5612, #5573 and #5518.
Using the MetaExecuter, queries that need to run on both data nodes
and optionally the meta store will be executed across all data nodes
in the cluster.
This fixes a couple of issues with starting meta-only nodes.
1. We were always calling CreateDataNode regardless of whether the the
node is running data services. We only call that now when node is
data enabled.
2. The node.json was created along-side creating the data node. Since
we are not creatinga a data node, this didn't happen anymore. There
wasn't a simple way to do this in one place so it's actually handle
for when creating a meta or a data node now. Since the ID assigned
to the node is the same regardless of role this works in all combinations
of roles.
3. The JoinMetaServer didn't return the ID of the joining node which
created some races when multiple nodes were joining. The join call now
returns that information to the caller.
Fixes#5754
The cache had some incorrect logic for determine when a series needed
to be deduplicated. The logic was checking for unsorted points and
not considering duplicate points. This would manifest itself as many
points (duplicate) points being returned from the cache and after a
snapshot compaction run, the points would disappear because snapshot
compaction always deduplicates and sorts the points.
Added a test that reproduces the issue.
Fixes#5719
Fixes#5653 and #5394.
Previously dropping retention policies did not propogate to local TSDB
shards. Instead, the retention policiess would just be removed from the
Meta Store.
This PR adds ensures that data associated with retention policies is
removed, when the retention policy is dropped.
Also, it cleans up a couple of other methods in `tsdb`, including the
requirement to provide (redundant) shardIDs when deleting databases.
Previously, meta.Client would drop the default retention policy when
trying to create a database with a retention policy. The RPC has now
been modified to include the desired retention policy in the
CreateDatabase command and have it use that retention policy information
instead of the default configuration when provided.
This also lowers the number of RPC calls for
CreateDatabaseWithRetentionPolicy to only a single RPC call instead of
two.
Protections have also been included so creating a retention policy with
different parameters will return an error similar to if you tried to
modify the retention policy separately.
Fixes#5696.
This removes the MetaServers property from node.json to eliminate one
of the four places those addresses are stored on disk. We always use
the values that come through the config (via file, env var or -join arg).
Since we are pinned to go 1.4.3, we're using the same dependency
manager as telegraf to make builds more reproducible. We'll re-evaluate
vendoring when we can move off of 1.4.3.
Previously, the lease redirect was invalid causing anything relying
on a lease for execution (eg. continuous queries) to cease functioning.
The name/nodeid URL param parsing has been moved up to the top of the
handler so the options can be forwarded on to the real leader.
X-Github-Closes: #5592
We were closing the cursor when we read the last block which caused
the internal state to be cleared. In a group by query, we seeked multiple
times so depending on the group by interval and how the data was laid out
in the blocks, we woudl close the cursor and the last block would get skipped.
Fixes#5193
This makes the following syntax possible:
CREATE CONTINUOUS QUERY mycq ON mydb
RESAMPLE EVERY 1m FOR 1h
BEGIN
SELECT mean(value) INTO cpu_mean FROM cpu GROUP BY time(5m)
END
The RESAMPLE option customizes how often an interval will be sampled and
the duration. The interval is customized with EVERY. Any intervals
within the resampling duration on a multiple of the resample interval
will be updated with the new results from the query.
The duration is customized with FOR. This determines how long an
interval will participate in resampling.
Both options are optional. If RESAMPLE is in the syntax, at least one of
the two needs to be given. The default for both is the interval of the
continuous query.
The service also improves tracking of the last run time and the logic of
when a query for an interval should be run. When determining the oldest
interval to run for a query, the continuous query service determines
what would have been the optimal time to perform the next query based on
the last run time. It then uses this time to determine the oldest
interval that should be run using the resample duration and will
resample all intervals between this time and the current time as opposed
to potentially forgetting about the last run in an interval if the
continuous query service gets delayed for some reason.
This removes the previous config options for customizing continuous
queries since they are no longer relevant and adds a new option of
customizing the run interval. The run interval determines how often the
continuous query service polls for when it should execute a query. This
option defaults to 1s, but can be set to 1m if the least common factor
of all continuous queries' intervals is a higher value (like 1m).
Quotes are not supposed to be significant in field keys, but are
significant in field values. The code as it currently was would
consider quotes in a key to be significant, but the later parser that
would unmarshal the fields from the byte string did not consider those
quotes to be significant. This meant that the following string:
"a=1
The line protocol parser would see a mismatched quote instead of a valid
input to the line protocol. But more nefariously, the following string:
"a=1"=2
The line protocol parser would ignore the first equals since it is
located in the quotation marks and think this was a valid input. It
would then pass it on to the field parser who would panic and die when
it tried to parse `1"=2` as a number.
Fixes#4076.
Changed non-interactive mode to send everything through the CLI's parser the same way the interactive mode works.
Added multiline support for -execute flag.
This commit fixes a deadlock that occurs during b1 flushes. It's
caused by taking locks in a different order. In the flush, b1
locks the engine and then bolt. However, in the query cursor, a
lock is obtained on bolt first (via `DB.Begin()`) and then the
engine is locked while reading from the engine's cache.
The canonical graphite implementation will read and discard NaN values
instead of throwing an error when reading on the line receiver protocol.
Since this is the default behavior for graphite, InfluxDB should have
the same behavior for compatibility.
Previously, a NaN value would result in an error printed to the console.
When you have a large number of NaN values being sent every minute, this
results in the log file filling with useless messages.
This change ensures that if there are any fields in the WHERE clause of
an aggregate that are different from the fields in the SELECT clause,
that the cursors also decode those fields. Otherwise WHERE clauses of
the form 'SELECT f(w) FROM x WHERE y=z' will return incorrect results
Fixes issue #4701.