Array cursors are enabled for storage RPC calls
tsm1:
* Implemented cursors that utilize Array decoders
storage:
* Abstractions to easily switch to Array cursors
there were two problems with this code:
1. the send on pending did not imply that the handler was running
2. there was a race starting the handler with timing out
1 is fixed by sending to a begin channel inside the handler. it is
then guaranteed that the timeout handler code has been entered.
2 is fixed by attempting to acquire the semaphore channel once before
checking the timeout channel. in this way, if there is capacity, which
in this test there is known to be, it is guaranteed to be taken. if
we check with the timer at the same time and the timer has already
fired, there is a pseudorandom chance the timer will be taken even
if there is capacity.
* Update Prometheus remote write to use metric name as measurement name and value as the field name.
* Update Prometheus remote read to use the storage.Read method to bypass the InfluxQL query engine.
multiple users have attempted to run influxdb in a docker container
with a windows host and a volume mounted from windows. that causes
problems because it apparently uses samba/cifs which does not
support fsync on directories. this patchset will, if it receives an EINVAL
on directory fsync, as is what appears to happen on samba/cifs, then it
will ignore it. this should help.
fixes#9833.
fixes#9630.
This commit adds `debug-pprof-enabled` which will start the default
`net/http/pprof` endpoint and bind against `localhost:6060`. This
will help to debug startup performance issues.
This commit adds throttling to the HTTP write endpoints based on
queue depth and, optionally, timeout. Two queues exist: `enqueued`
and `current`. The `current` queue is the number of concurrent
requests that can be processed. The `enqueued` queue limits the
maximum number of requests that can be waiting to be processed.
If the timeout is exceeded or the `enqueued` queue is full then
a `"503 Service unavailable"` code is returned and the error is
logged.
By default these options are turned off.
If the columns change between series, it will now act as if it was a new
statement id and reprint the headers. This only happens with show
diagnostics at the moment and we shouldn't add this functionality
anywhere else anyway.
Along with modifying ExecutionContext to be a context and have the
TaskManager return the context itself, this also creates a Monitor
interface and exposes the Monitor through the Context. This way, we can
access the monitor from within the query.Select method and keep all of
the limits inside of the query package instead of leaking them into the
statement executor.
An eventual goal is to remove the InterruptCh from the IteratorOptions
and use the Context instead, but for now, we'll just assign the done
channel from the Context to the IteratorOptions so at least they refer
to the same channel.
Remove the `Query` prefix from some structs and interfaces. They were
there so when the query engine was in the same package as influxql,
these would be differentiated. Now that the package name is query, the
extra prefix seems redundant.
* Live Restore + Enterprise data format compatability
* Extended ImportData to import all DB's if no db name given
* Added a new enterprise data test, and backup command now prints the backup file paths at conclusion
* Added whole-system backup test
* Update to use protobuf in all enterprise data cases
* Update to test to do cross-testing with enterprise version
* incremental enterprise backup format support
* only call ParseTags when necessary
* remove dependency on inmem.Series in tsdb test package
* Measurement and Series are no longer exported. Their use is restricted
to the inmem package
* improve Measurement and Series types by exporting immutable
fields and removing unnecessary APIs and locks
Reduced startup time from 28s to 17s. Overall improvement including
#9162 reduces startup from 46s to 17s for 1MM series across 14 shards.
* did not handle cached values correctly
* sort shards by time in either ascending or descending
order depending on the RPC request ordering to ensure they
are traversed in the correct order.
The previous sha was taken from a revision on a devel branch that I
thought would continue staying in the tree after it was merged. That
revision was rebased away and the API was changed for the logger.
This updates the usage of the logger and adds a simple package for
constructing the base logger.
The 1.0 version of zap changed the format of the default console logger
so this change moves over to this new logger instead of attempting to
retain backwards compatibility with the old format.
TODO(sgc): implement `writer` type that handles all the details
of writing frames to the RPC stream. Additional responsibilities
of writer include
* point frame recycling to reduce memory pressure
* skip empty point frames
* skip series frames with no points
When a meta query does not include a time component then it can be
answered exclusively by the index. This should result in a much faster
query execution that if the TSM engine was engaged.
This commit rewrites the following queries such that they make use
of the index where no time component is present:
- SHOW MEASUREMENTS
- SHOW SERIES
- SHOW TAG KEYS
- SHOW FIELD KEYS
Fixes#8819.
Previously, the process of dropping expired shards according to the
retention policy duration, was managed by two independent goroutines in
the retention policy service. This behaviour was introduced in #2776,
at a time when there were both data and meta nodes in the OSS codebase.
The idea was that only the leader meta node would run the meta data
deletions in the first goroutine, and all other nodes would run the
local deletions in the second goroutine.
InfluxDB no longer operates in that way and so we ended up with two
independent goroutines that were carrying out an action that was really
dependent on each other.
If the second goroutine runs before the first then it may not see the
meta data changes indicating shards should be deleted and it won't
delete any shards locally. Shortly after this the first goroutine will
run and remove the meta data for the shard groups.
This results in a situation where it looks like the shards have gone,
but in fact they remain on disk (and importantly, their series within
the index) until the next time the second goroutine runs. By default
that's 30 minutes.
In the case where the shards to be removed would have removed the last
occurences of some series, then it's possible that if the database was already at its
maximum series limit (or tag limit for that matter), no further new series
can be inserted.
* Fprint* functions
* No nakedness
* clarify panic messages
* spacing between case statements
* remove break in favor of return
* remove goto in favor of for { continue }
All of these services start up goroutines and then wait for the
goroutines to finish. Each of them has a `tsdb.PointBatcher` that may
return a point during the shutdown sequence. During the shutdown
sequence, a lock was held. This lock may get accessed when attempting to
write the point that came back from the `tsdb.PointBatcher`. This caused
the read lock attempt to wait forever for the write lock to be unlocked
during `Close()`.
This modifies these methods so that the write lock is released while
waiting for goroutines to finish in these three services.