If there is a significant amount of data in the WAL, then building the
TSI index can be problematic without being able to set the max cache
size to something larger.
This commit adds an option to se the maximum cache size.
This commit fixes an issue with the series file compaction process
where tombstones are lost after compaction and series existence
checks are not correct. This commit also fixes some smaller flushing
issues within the series file that mainly related to testing.
* the protocol service definition, ReadRequest and ReadResponse is
reused across projects, rather than requiring redefinition.
* the ReadRequest protocol buffer definition removes the concept of a
database and retention policy, replacing it with a field named
ReadSource of type google.protobuf.Any. OSS requests will use the
ReadSource message structure defined in local to this package, which
defines fields to represent a Database and RetentionPolicy. Other
implementations can provide their own data structure allowing the
remainder of the ReadRequest to be reused.
* The RPC service and Store are expected to be redefined to handle their
specific requirements for resolving a ReadSource
* ResultSet and GroupResultSet are interfaces representing non-grouping
and grouping read behavior respectively. Calling NewResultSet or
NewGroupResultSet will construct instances of these types
* The ResponseWriter type is exported to deal with serialization of
the ResultSet and GroupResultSet types
When adding many series using offline tooling, it's likely that every
series involves an entry being appended to a LogFile. Typically an entry
is 11 or 12 bytes, but the default bufio.Writer buffer size is only 4K.
This means by default a write of 10,000 new series would involve ~30
buffer flushes.
This commit makes the buffer configurable, and sets the value in
`buildtsi` such that it reflects the number of series being written to
the LogFile.
* utilizes `tsm1.Compactor#CompactFull` to fully compact the specified
shard
* the WAL is unmodified
* added `-verbose` option to show progress as TSM files are opened
This should not have caused correctness issues, but is an unintended
side effect that exporting data may cause compactions to run. It is
possible that a compaction would not run to completion, leaving .tmp
files around after an export.
* Update Prometheus remote write to use metric name as measurement name and value as the field name.
* Update Prometheus remote read to use the storage.Read method to bypass the InfluxQL query engine.
This commit adds `debug-pprof-enabled` which will start the default
`net/http/pprof` endpoint and bind against `localhost:6060`. This
will help to debug startup performance issues.
- Expose io for testing
- Initialize logging only when present
- Fix nil cases when replacing retention policies
- Use meta client when getting shard groups
- Disallow updating retention policies
- Delete shard files, not shard groups, when replacing shards
- Add duration and replication options for retention policy
This commit adds the `-sanitize` flag to `influx_inspect deletetsm`
which will delete all keys that contain invalid, non-printable, or
replacement character unicode.
Usage:
```sh
$ influx_inspect deletetsm -sanitize PATH
```
does some basic sanity checks. it's hard to be more exhaustive without
either taking a crazy amount of time, or being non-deterministic,
but at least this makes sure we barf in some cases.
Updated flags, help text, removed documentation for deprecated legacy options. Updated documentation to describe the syntax and options for the newer -portable format. Legacy support remains, but is only referenced in the online documentation.
A format.Writer is an abstraction for reading data from
storage.ResultSet and writing to various formats. Those included are
* binary: efficient binary format using protocol buffers. This is the
expected format for the import tooling. The data is written in the
desired shard group shape so that it can be read and streams to
TSM files without further transformation.
* line: line protocol use for exporting field type conflicts or as an
alternative, lossless export format
* text: two debugging modes for outputting series or series and values
in a more efficient format that line protocol.
* discard: reads and discards the source data. This can be useful for
benchmarking and profiling the read and decode performance.