* Extend storage service protobuf with TagKeys and TagValues
Co-authored-by: Michael Desa <mjdesa@gmail.com>
Co-authored-by: Jacob Marble <jacobmarble@influxdata.com>
* Extend the reads.Store interface with new TagKeys and TagValues APIs
* Extend readservice.store to implement refactored reads.Store interface
* Implement a StringIterator gRPC writer / serializer
* Implement a StringIterator gRPC reader / deserializer
* Implement a StringIterator merger
Storage converts references to _value in filter predicates to $.
It then considers any predicate that does not reference $ to be
a tag predicate. Tag predicates are used to construct series index
cursors.
This commit fixes a bug where field comparisons were being included
in tag predicates due to the HasFieldValueKey function searching
for comparison expressions referencing _value instead of $. Because
all references to _value had already been replaced. An expression
of the form '$ < 3000' would be considered a tag expression and
therefore would be mistakenly included as a tag predicate.
Fixes#13159.
A cursor can be in process (local) or remote. Remote cursors have
already applied the `hasPoints` check, to reduce network traffic.
Testing whether the cursor has points again here:
https://github.com/influxdata/influxdb/blob/master/storage/reads/reader.go#L221
will always return `false` for a remote cursor that has applied the
NoPoints optimization.
This temporary fix allows the `hasPoints` function to differentiate
a streamCursor and always return true in that case.
The storage engine will now drop any points that contain invalid tag
data. Special tag keys for the measurement and field key will be
excepted from this validation.
We will want to validate that all tag key/value data is valid unicode.
This commit changes the validation helper to only validate provided
tags, since measurements are currently very likely to contain invalid
utf-8 characters.
There are two exceptions to the tag validation: the validation of the
special tag keys for measurements and field keys.
When the WAL was moved up, the validation that happened at the cache
was skipped. This moves the field type validation for a batch of
points up ahead of the WAL again.
It is possible a StreamReader (gRPC) may return an empty response. This
change adds retry and bail-out support. When a bail-out occurs,
reads.ErrStreamNoData is returned.
StorageReadClient adapts a gRPC Storage_ReadClient to provide
cursors.CursorStats by reading the trailer after receiving the final
message from the stream.
This commit adds the pkg/lifecycle.Resource to help manage opening,
closing, and leasing out references to some resource. A resource
cannot be closed until all acquired references have been released.
If the debug_ref tag is enabled, all resource acquisitions keep
track of the stack trace that created them and have a finalizer
associated with them to print on stderr if they are leaked. It also
registers a handler on SIGUSR2 to dump all of the currently live
resources.
Having resources tracked in a uniform way with a data type allows us
to do more sophisticated tracking with the debug_ref tag, as well.
For example, we could panic the process if a resource cannot be
closed within a certain time frame, or attempt to figure out the
DAG of resource ownership dynamically.
This commit also fixes many issues around resources, correctness
during error scenarios, reporting of errors, idempotency of
close, tracking of memory for some data structures, resource leaks
in tests, and out of order dependency closes in tests.
It turns out that LastModified and DiskSize are unused, and so it
was easy to change to not care about the WAL.
This hooks up metrics and starts the WAL again.
At the cost of some nil checks, we don't have to have an interface, defend against
subtle bugs with nils in non-nil interfaces, an empty implementation, etc.
Also, the tsm1 engine is losing the WAL anyway.
Because the WAL relies on the tsm1.Value type, we move that into its own
tsm1/value package and set up some aliases forwarding them into tsm1. This
also required adding some methods and changing consumers to avoid the
unexported fields. I imagine this step will be useful one day when we make
the write path more efficient with respect to consuming points.
This commit additionally fixes some issues with generation. The iterator.tmpldata
and generation for array_cursor_* were removed accidentally when removing
iterators, making those generated files stale. Restore that and regenerate.
No change in functionality.
This commit improves the performance of a mass delete on the TSI index
by deleting at the measurement level instead of deleting each series
individually.
I did this with a dumb editor macro, so some comments changed too.
Also rename root package from platform to influxdb.
In interest of minimizing risk, anyone importing the root package has
now aliased it to "platform" so that no changes beyond imports were
necessary in those files.
Lastly, replace the old platform module to local path /dev/null so that
nobody can accidentally reintroduce a platform dependency while
migrating platform code to influxdb.
The storage bucket service wraps another bucket service, and invokes
actions on a storage engine based upon the actions taken upon buckets.
Currently, the storage bucket service will delete bucket data from the
storage engine when the bucket is deleted via the bucket service.
A standard Makefile is used now in all subdirs that run go generate.
Make will only generate the file if its source files changed.
The checkgenerate target runs clean to ensure all targets a generated
fresh.
The table interface was modified to expose the arrow buffers. The
storage table has now been converted to use this interface with the same
fixes so that it exposes arrow buffers.
The influxql package has also been updated to use the `DoArrow` method
from the `flux.Table` interface.
Moves the check to determine if a series has data for the current time
range into the groupBySort function, which is consistent with the
groupNoneSort function.
The flux query controller was updated to include a Shutdown method a
while ago. Explicitly handle query controller creation and shutdown
where applicable.
In influxd, this ensures that outstanding queries are handled before the
process dies. In tests, this ensures that query controller goroutines
aren't leaked, which drastically simplifies reading full stack traces.
This change also registers query controller metrics with the prometheus
registry in influxd.