It prints the statistics of each iterator that will access the storage
engine. For each access of the storage engine, it will print the number
of shards that will potentially be accessed, the number of files that
may be accessed, the number of series that will be created, the number
of blocks, and the size of those blocks.
This is used quite a bit to determine which fields are needed in a
condition. When the condition gets large, the memory usage begins to
slow it down considerably and it doesn't take care of duplicates.
influx_inspect walks the data and wal directories building a list of
files to export. It then opens, reads, and exports each. If the file was
deleted between the time it was added to the list and the time the
inspect tool attempts to read it, the file is now skipped without
emitting an error.
Previously, subqueries would honor their own ordering. We never really
supported that and I have no idea if it would work since most parts in
the query engine assume that points are being delivered in only one
ordering.
Subqueries have now been modified so if a person tries to do different
ordering, they get an error when running the query. If they specify an
ordering in the top most query, that ordering gets propagated to all
subqueries.
Fixes#8699.
When merging streams of system iterators we don't use tags or time.
Instead we add series keys (in the case of, for example, `SHOW SERIES`)
to the `Aux` field of the iterators' elements. This is because we only
emit merged and sorted sets of series key to the client.
We currently use `SortedMergeHeap`s to merge together multiple
iterators, and the comparitor function did not consider `Aux` fields
when determining which heap to pop the next item off during a merge. As
such, `SHOW SERIES` and `SHOW TAG KEYS` (any meta query that gets
converted into a special type of `SELECT`) were returning results in
arbitrary order.
This issue was never noticed on the `inmem` index because the streams
are always duplicates of each other, and of course it doesn't matter if
you arbitrarily merge together two idential, sorted streams...
The issue first manifested itself on the `tsi1` index, but this fix will
apply to both indexes.
Other applications or services sometimes expose a header containing a
unique ID, which can then be included in logging or response information
to allow an operator to link inter-service requests. The most common
header name used by services in the wild appears to be `X-Request-ID`,
but `Request-Id` is also used.
This commit adds support for specifying either `X-Request-ID` or
`Request-Id` headers, which will then be used by InfluxDB when logging
request information, and also in the `X-Request-ID` and `Request-Id`
response headers.
We populate both `X-Request-ID` and `Request-Id` to maintain backwards
compatibility with previous version, and to support the more common
`X-Request-ID` header name.
If both `X-Request-ID` and `Request-Id` are specified, then
`X-Request-ID` is used.
If neither header is specified, then in line with previous behaviour, we
generate a v1 UUID.
If a large compaction was running and was aborted. It could would leave
some tmp files around for files that it had fully written. The current
active file was cleaned up, but already completed ones would not. This
would occur when a TSM file needed to rollover due to size.
The ConditionExpr function is more accurate because it parses the
condition and ensures that time conditions are actually used correctly.
That means that attempting to combine conditions with OR will not result
in the query silently pretending it's an AND and nested conditions work
correctly so there is only one way to read the query.
It also extracts the non-time conditions into a separate condition so we
can stop attempting to parse around the time conditions in lower layers
of the storage engine. This change does not remove those hacks, but a
following commit should be able to sanitize the condition and remove
them.
This commit provides more insight into server errors by both setting
the error on a response header, and, in the case of server errors (5xx),
logging those error messages to the HTTPD log, if [http] log_enabled =
true.
Previously pseudo iterators could be created for meta data such
as series, measurement, and tag data. These iterators were created
at a higher level and lacked a lot of the power of the query engine.
This commit moves system iterators down to the series level and
supports the following:
- _name
- _seriesKey
- _tagKey
- _tagValue
- _fieldKey
These can be used as normal fields such as:
SELECT _seriesKey FROM cpu
This will return all the series keys for `cpu`.
There was a change to speed up deleting and dropping measurements
that executed the deletes in parallel for all shards at once. #7015
When TSI was merged in #7618, the series keys passed into Shard.DeleteMeasurement
were removed and were expanded lower down. This causes memory to blow up
when a delete across many shards occurs as we now expand the set of series
keys N times instead of just once as before.
While running the deletes in parallel would be ideal, there have been a number
of optimizations in the delete path that make running deletes serially pretty
good. This change just limits the concurrency of the deletes which keeps memory
more stable.
When snapshots and compactions are disabled, the check to see if
the compaction should be aborted occurs in between writing to the
next TSM file. If a large compaction is running, it might take
a while for the file to be finished writing causing long delays.
This now interrupts compactions while iterating over the blocks to
write which allows them to abort immediately.
When a time zone shift happened at the edge of a time bucket, the logic
was incorrect. When we added an hour to the day with a time grouping of
2 hours, we took the hour away from the previous bucket instead of the
next bucket so the first bucket of the day would be from midnight to 1
AM and the second bucket would include 1 AM to 4 AM (2 AM to 3 AM does
not exist when shifting forwards). The correct buckets should have been
12 AM to 2 AM for the first bucket of the day and 3 AM to 4 AM for the
second (since 2 AM to 3 AM does not exist).
This also modifies the tests to use table tests and sub tests.
This calculates the start and end time along with any time zones shifts
so that continuous queries are run at the correct time when a time zone
is included in the query.
Removing the forced `Connection: close` header from the `/query`
endpoint. This was originally added because of golang/go#13165, but it
seems like it's possible to use pipelining with go 1.8 and http 1.1,
just not recommended.
After some testing, it appears that the channel returned by
`ResponseWriter.CloseNotify()` will not send a value if the connection
was not interrupted. We already account for this in /query by exiting
from the goroutine if the request has finished by signaling another
channel.
Since the handler already accounts for the possibility that the channel
will not signal and since `CloseNotify()` doesn't interfere with a
pipelined request, we can remove the forced `Connection: close` that was
added to force clients to establish a new connection.