* chore: upgrade Go to 1.17
* fix: update circleci container to go1.17
* fix: correct unsafe conversion of []byte to string
Co-authored-by: Sam Arnold <sarnold@influxdata.com>
* chore: harmonize OSS and Enterprise subscriber initialization
* refactor: chanwriter close blocks until chanwriter is flushed
* refactor: remove redundant state
* test: add test for blocked subscriber during subscription update
* fix: blocked subscriber does not fail subscription update
* fix: only iterate matching subscriptions
* fix: use serialized points in subscriber queues
* feat: add total-buffer-bytes config parameter to subscriptions
* fix: put subscription serialization on the write path
* fix: subscription service only needs regular mutex, not RWMutex
* fix: review comments
* chore: update changelog
Added a check for valid UTF-8 strings in measurement names,
tags name, tag values, and field names when writing to subscriptions.
Do not send the failing points to subscribers, and log the errors if at
debug level logging
Closes https://github.com/influxdata/influxdb/issues/21557
* fix(models): grow tag index buffer if needed (#20137)
(cherry picked from commit cae14176aa)
Co-authored-by: Tristan Su <foobar@users.noreply.github.com>
fixes#21303
* chore: Update flux to 0.67
* chore: Builds against 0.68 flux
* chore: Builds against 0.80.0
* chore: Builds against 0.90.0
* chore: Everything builds on latest flux
* chore: goimports fixed
* chore: fix tests locally
* chore: fix CI dockerfiles
* chore: clean up some unused code
* chore: remove flux repl and Spec in flux query json
* chore: port flux end to end tests from 2.x
* chore: fix up goimports
* chore: remove 32 bit build support
The flux in influxdb has been upgraded to use v0.33.2. A lot of
interfaces for the storage engine were changed during this so code had
to change to accomodate the new interfaces and remove the old ones.
Included in this commit is a patch file for the changes that were made.
A patch was generated for the following packages:
* `flux/stdlib/influxdata/influxdb`
* `storage/reads`
* `tsdb/cursors`
These are the three packages that are in common with version 2 of the
database and the first of these packages contains the specific
implementations that are used for version 1.
It is very possible that the next time we upgrade this, the patch will
not apply cleanly just like it wouldn't have applied cleanly to this
update. The patch is mostly meant to document exactly what changed
during the copy over to help ensure we don't forget things when adapting
the interfaces.
Add a patch file to hopefully make this easier in the future
This integrates the influxdb 1.x series to the latest version of Flux
and updates the code to use it. It also removes the dependency on
platform and copies the necessary code from storage into the 1.x series
so the dependency is unneeded.
The flux functions specific to 1.x have been moved to the same structure
that flux changed to with having a `stdlib` directory instead of a
`functions` directory. It also adds a `databases()` function that
returns the databases from the meta client.
This commit deletes most of the code to service reads from influxdb
and pulls it in from platform instead.
Of note, the models.Tag and models.Tags types are now aliases to the
platform models.Tag and models.Tags types. Additionally, many types
in the tsdb package relating to cursors are also aliases to the same
types in the platform cursors package.
This updates the platform and flux repos to the current master in the
Gopkg.lock.
When `influx_inspect buildtsi` is used to create a new `tsi1` index, spaces in measurement names are escaped, so measurement "a b" is changed to "a\ b".
This change modifies `models.ParseKeyBytes()` and `models.ParseName()` to unescape measurement names. `models.ParseKeyBytes()` returns unescaped tag keys, so this seems like the natural place to unescape measurement names.
Also followed `scanMeasurement()` to see what other code could be problematic, and this should be everything (the result of one other use of `scanMeasurement()` is later escaped).
Removed `tsdb.MeasurementFromSeriesKey()`. These methods are exported, so checked for side effects in other InfluxData repositories.
This commit adds the `-sanitize` flag to `influx_inspect deletetsm`
which will delete all keys that contain invalid, non-printable, or
replacement character unicode.
Usage:
```sh
$ influx_inspect deletetsm -sanitize PATH
```
This commit fixes a parsing bug that was causing extra tags to be
generated when the incoming point contained escaped commas.
As an optimisation, the slice of tags associated with a point was
being pre-allocated using the number of commas in the series key as a hint to
the appropriate size. The hinting did not consider literal comma values in
the key though, and so it was possible for extra (empty) tag key and
value pairs to be part of the tags structure associated with a parsed
point.
* only call ParseTags when necessary
* remove dependency on inmem.Series in tsdb test package
* Measurement and Series are no longer exported. Their use is restricted
to the inmem package
* improve Measurement and Series types by exporting immutable
fields and removing unnecessary APIs and locks
Reduced startup time from 28s to 17s. Overall improvement including
#9162 reduces startup from 46s to 17s for 1MM series across 14 shards.
Deleting high cardinality series could take a very long time, cause
write timeouts as well as dead lock the process. This fixes these
issue to by changing the approach for cleaning up the indexes and
reducing lock contention.
The prior approach delete each series and updated every index (inmem)
during the delete. This was very slow and cause the index to be locked
while it items in a slice were removed one by one. This has been changed
to mark series as deleted and then rebuild the index asynchronously which
speeds up the process.
There was also a dead lock that could occur when deleing the field set.
Deleting the field set held a write lock and the function it invoked under
the lock could try to take a read lock on the field set. This would then
deadlock. This approach was also very slow and caused time out for writes.
It now uses faster approach that checks for the existing of the measurment
in the cache and filestore which does not take write locks.
Previously pseudo iterators could be created for meta data such
as series, measurement, and tag data. These iterators were created
at a higher level and lacked a lot of the power of the query engine.
This commit moves system iterators down to the series level and
supports the following:
- _name
- _seriesKey
- _tagKey
- _tagValue
- _fieldKey
These can be used as normal fields such as:
SELECT _seriesKey FROM cpu
This will return all the series keys for `cpu`.
* introduced UnsignedValue type
* leveraged existing int64 compression algorithms (RLE, Simple 8B)
* tsm and WAL can read and write UnsignedValue
* compaction is aware of UnsignedValue
* unsigned support to model, cursors and write points
NOTE: there is no support to create unsigned points, as the line
protocol has not been modified.
The series key stored in TSM files includes the field. We validated
the series length using only the measurement and tag set which allowed
very large field names to overflow. This now checks the series key
as the measurement + tagset + field + the tsm field key separator size.
Measurement name and field were converted between []byte and string
repetively causing lots of garbage. This switches the code to use
[]byte in the write path.
The Point is intended to be immutable after being parsed since it
is shared by several goroutines. When dropping a field (e.g. time),
corrupted data can result if one goroutine is delete the field
while another is marshaling the underlying byte slices.
To avoid this, the shard will just skip invalid fields and series
instead of trying to mutate them by deleting them.
There was a check to ensure that fields exists when unmarshalBinary
is called. This created a map and other garbage just to see if any
fields exist.
This changes it to use a FieldIterator that does not allocate as
much as the other method.
If a field was named time was written and was subsequently dropped,
it could leave a trailing comma in the series key causing it to fail
to be parseable in other parts of the code.
If a field was named time was written and was subsequently dropped,
it could leave a trailing comma in the series key causing it to fail
to be parseable in other parts of the code.