The following would, erroneously, not strip the tag from the inner
query:
SELECT value FROM (SELECT value FROM cpu GROUP BY host)
The inner query was supposed to group by the host tag, but the outer
query should strip it away since it is not being grouped by anymore.
This fixes things so that the result will have the tags stripped away
when they are not requested in the grouping.
It has previously been allowed for a subquery to use a tag within a
function (such as `count()`) when the tag is from a subquery and the
subquery itself references a field at some point to perform the join.
This functionality regressed in 1.6 because of a change in how
subqueries were executed that forgot to treat a tag the same as a string
field.
This fixes that regression and adds a test case to avoid hitting that
regression again.
This change makes it so that we simplify the math engine so it doesn't
use a complicated set of nested iterators. That way, we have to change
math in one fewer place.
It also greatly simplifies the query engine as now we can create the
necessary iterators, join them by time, name, and tags, and then use the
cursor interface to read them and use eval to compute the result. It
makes it so the auxiliary iterators and all of their complexity can be
removed.
This also makes use of the new eval functionality that was recently
added to the influxql package.
No math functions have been added, but the scaffolding has been included
so things like trigonometry functions are just a single commit away.
This also introduces a small breaking change. Because of the call
optimization, it is now possible to use the same selector multiple times
as a selector. So if you do this:
SELECT max(value) * 2, max(value) / 2 FROM cpu
This will now return the timestamp of the max value rather than zero
since this query is considered to have only a single selector rather
than multiple separate selectors. If any aspect of the selector is
different, such as different selector functions or different arguments,
it will consider the selectors to be aggregates like the old behavior.
This reverts commit 9aeae7ce82, reversing
changes made to 35b44cc2f0.
The contributor was unable to sign the contributor license agreement so
we have to revert this commit.
It inserted the changelog entry into version 1.0 because it couldn't
find any tags. Jenkins had been configured not to fetch any tags so the
changelog generator assumed this was the first version of the software.
Fetching tags is now enabled on Jenkins.
On the master or release branches, `git changelog` will be run to
automatically update the changelog based off of the pull requests that
have been merged and then it will commit and push those changes.
batchWrite was using the last database and retention policy read from
the input file. Because batchWrite was called only every batchSize lines
or at EOF, databases with fewer than batchWrite points could be imported
into the incorrect database.
This change forces a flush with batchWrite whenever processDML reads a
change in database or retention policy.
Remove some miscellaneous whitespace at the end of changelog entries and
fix two older entries. One accidentally included an inline html tag and
didn't display. The other was missing a newline between two changelog
entries and so didn't display correctly as separate bullet points.
* Live Restore + Enterprise data format compatability
* Extended ImportData to import all DB's if no db name given
* Added a new enterprise data test, and backup command now prints the backup file paths at conclusion
* Added whole-system backup test
* Update to use protobuf in all enterprise data cases
* Update to test to do cross-testing with enterprise version
* incremental enterprise backup format support
The string `node <n>` can be used to specify which data node the data
should be retrieved from. This uses the `node_id=X` query parameter that
is supported, but wasn't exposed anywhere in the client library.
We use this feature enough internally when attempting to find
inconsistencies or network errors that it is easier if this is just
supported. Otherwise, I continue having to recompile the CLI program
every time I need to do this.
To clear a previously set node, you can use `node 0` or `node clear`.
There is a strange race condition where a query can be killed and finish
at approximately the same time. If this happens, the query gets
retrieved by the killing task, the query gets closed by the normal
processing thread, and then the killing task attempts to kill the query
afterwards. Since the close doesn't mark the query as already killed
(since it's not killed, just merely unused), the killing thread attempts
to close the channel again.
Mark the query as killed whenever it is closed to prevent a double close
from happening. This should never cause the status to be erroneously
reported since the query status is removed from the query table within
the same lock scope.
When refactoring the query engine, I thought calling
`count(distinct(value))` multiple times was disallowed and so the
refactor made it so that wasn't possible.
It turns out that this pattern is allowed because since the distinct is
nested, it is aggregated anyway and can be combined with other
aggregates.
This removes the erroneously placed restriction.
This commit firstly ensures that a shard's size on disk is accurately
reported when using the tsi1 index, by including the on-disk size of the
tsi1 index in the calculation.
Secondly, this commit add support for shard streaming/copying when using
the tsi1 index. Prior to this, a tsi1 index would not be correctly
restored when streaming shards.
If the close happens when next is being called, it can result in a race
condition where the current iterator gets set to nil after the initial
check.
This also fixes the finalizer so it runs the close method in a goroutine
instead of running it by itself. This is because all finalizers run on
the same goroutine so a close that takes a long time can cause a backup
for all finalizers. This also removes the redundant call to
`runtime.SetFinalizer` from the finalizer itself because a finalizer,
when called, has already cleared itself.
Change the CLI to support quoted database names in `use` statements.
This also allows for all database names to be specified, including names
that contain spaces.
If we don't detect a server version, then there's a good chance that
we're not speaking to an InfluxDB server. We should warn the user about
this to make it easier for them to debug.
The previous sha was taken from a revision on a devel branch that I
thought would continue staying in the tree after it was merged. That
revision was rebased away and the API was changed for the logger.
This updates the usage of the logger and adds a simple package for
constructing the base logger.
The 1.0 version of zap changed the format of the default console logger
so this change moves over to this new logger instead of attempting to
retain backwards compatibility with the old format.
Update support in the `toml` package for parsing human-readble byte sizes.
Supported size suffixes are "k" or "K" for kibibytes, "m" or "M" for
mebibytes, and "g" or "G" for gibibytes. If a size suffix isn't specified
then bytes are assumed.
In the config, `cache-max-memory-size` and `cache-snapshot-memory-size` are
now typed as `toml.Size` and support the new syntax.
Windows computers may produce a utf16 file from the command line that
contains a byte-order-mark. Along with handling the utf8
byte-order-mark, this also handles the utf16 for better Windows
compatibility.
Fixes#8819.
Previously, the process of dropping expired shards according to the
retention policy duration, was managed by two independent goroutines in
the retention policy service. This behaviour was introduced in #2776,
at a time when there were both data and meta nodes in the OSS codebase.
The idea was that only the leader meta node would run the meta data
deletions in the first goroutine, and all other nodes would run the
local deletions in the second goroutine.
InfluxDB no longer operates in that way and so we ended up with two
independent goroutines that were carrying out an action that was really
dependent on each other.
If the second goroutine runs before the first then it may not see the
meta data changes indicating shards should be deleted and it won't
delete any shards locally. Shortly after this the first goroutine will
run and remove the meta data for the shard groups.
This results in a situation where it looks like the shards have gone,
but in fact they remain on disk (and importantly, their series within
the index) until the next time the second goroutine runs. By default
that's 30 minutes.
In the case where the shards to be removed would have removed the last
occurences of some series, then it's possible that if the database was already at its
maximum series limit (or tag limit for that matter), no further new series
can be inserted.
When a downstream server such as a proxy or loadbalancer between
influxdb and the client produces an error, the client currently does
not make this very obvious.
This change introduces checks on both the content type and the
influx version header to identify whether a request was served by
influxdb itself and returns a more appropriate error in the cases
where it can be determined a downstream issue is at play.
All of these services start up goroutines and then wait for the
goroutines to finish. Each of them has a `tsdb.PointBatcher` that may
return a point during the shutdown sequence. During the shutdown
sequence, a lock was held. This lock may get accessed when attempting to
write the point that came back from the `tsdb.PointBatcher`. This caused
the read lock attempt to wait forever for the write lock to be unlocked
during `Close()`.
This modifies these methods so that the write lock is released while
waiting for goroutines to finish in these three services.
If multiple tombstones exists for a series that ended up causing the
full data to be deleted, the blocks were not removed from the offsets
in the index. This causes the TSMReader to report that a key exist
but does not have any data.
During a compaction, every key should have at least one value. Since
this invariant was broken, the compaction aborted early and ends up
dropping all series keys that are lexigraphically greater than where
the breakage occured. This would cause data to be dropped during the
compaction.
This allows the query:
SELECT mean(value) FROM cpu GROUP BY time(1d)
To function in some way that makes sense. The upper limit is implicitly
the `now()` starting time and the lower limit will be whichever interval
the lowest point falls into.
When no lower bound is specified and `max-select-buckets` is specified,
the query will only consider points that would satisfy
`max-select-buckets`. So if you have one point written in 1970, have
another point within the last minute, and then do the above query with
`max-select-buckets` being equal to 10, the older point from 1970 will
not be considered.
This commit replaces the previous hashing algorithm used by the pkg.Filter with
one based on xxhash. Further, taking from the hashing literature, we can
represent k hashes with only two hash function, where previously Filter was using
four.
Further, unlike `murmur3`, `xxhash` is allocation-free, so allocations have
dramatically reduced when inserting and checking for hashes.