When adding many series using offline tooling, it's likely that every
series involves an entry being appended to a LogFile. Typically an entry
is 11 or 12 bytes, but the default bufio.Writer buffer size is only 4K.
This means by default a write of 10,000 new series would involve ~30
buffer flushes.
This commit makes the buffer configurable, and sets the value in
`buildtsi` such that it reflects the number of series being written to
the LogFile.
* utilizes `tsm1.Compactor#CompactFull` to fully compact the specified
shard
* the WAL is unmodified
* added `-verbose` option to show progress as TSM files are opened
This should not have caused correctness issues, but is an unintended
side effect that exporting data may cause compactions to run. It is
possible that a compaction would not run to completion, leaving .tmp
files around after an export.
* Update Prometheus remote write to use metric name as measurement name and value as the field name.
* Update Prometheus remote read to use the storage.Read method to bypass the InfluxQL query engine.
This commit adds `debug-pprof-enabled` which will start the default
`net/http/pprof` endpoint and bind against `localhost:6060`. This
will help to debug startup performance issues.
- Expose io for testing
- Initialize logging only when present
- Fix nil cases when replacing retention policies
- Use meta client when getting shard groups
- Disallow updating retention policies
- Delete shard files, not shard groups, when replacing shards
- Add duration and replication options for retention policy
This commit adds the `-sanitize` flag to `influx_inspect deletetsm`
which will delete all keys that contain invalid, non-printable, or
replacement character unicode.
Usage:
```sh
$ influx_inspect deletetsm -sanitize PATH
```
does some basic sanity checks. it's hard to be more exhaustive without
either taking a crazy amount of time, or being non-deterministic,
but at least this makes sure we barf in some cases.
Updated flags, help text, removed documentation for deprecated legacy options. Updated documentation to describe the syntax and options for the newer -portable format. Legacy support remains, but is only referenced in the online documentation.
A format.Writer is an abstraction for reading data from
storage.ResultSet and writing to various formats. Those included are
* binary: efficient binary format using protocol buffers. This is the
expected format for the import tooling. The data is written in the
desired shard group shape so that it can be read and streams to
TSM files without further transformation.
* line: line protocol use for exporting field type conflicts or as an
alternative, lossless export format
* text: two debugging modes for outputting series or series and values
in a more efficient format that line protocol.
* discard: reads and discards the source data. This can be useful for
benchmarking and profiling the read and decode performance.
This code has been duplicated to other projects and its implementations
have grown out of sync. Now the code can live as a package-level
function rather than a method coupled with particular structs.
Remove the `Query` prefix from some structs and interfaces. They were
there so when the query engine was in the same package as influxql,
these would be differentiated. Now that the package name is query, the
extra prefix seems redundant.
For any systems that want to read the log file in the specific format,
the logo being printed on restart may not be good for those parsers
since the log parser would have to be aware of the logos existance or
capable of just ignoring lines it couldn't parse.
This gives an option to disable the printed logo if required.
Like other logging options, this will fail if the configuration file
itself is invalid.
This also adds support for both the v1 and v2 APIs to set the Proxy
function on the underlying http.Transport.
The functionality for proxy environment variables is provided by the
[http.ProxyFromEnvironment](https://godoc.org/net/http#ProxyFromEnvironment).
* Live Restore + Enterprise data format compatability
* Extended ImportData to import all DB's if no db name given
* Added a new enterprise data test, and backup command now prints the backup file paths at conclusion
* Added whole-system backup test
* Update to use protobuf in all enterprise data cases
* Update to test to do cross-testing with enterprise version
* incremental enterprise backup format support
The string `node <n>` can be used to specify which data node the data
should be retrieved from. This uses the `node_id=X` query parameter that
is supported, but wasn't exposed anywhere in the client library.
We use this feature enough internally when attempting to find
inconsistencies or network errors that it is easier if this is just
supported. Otherwise, I continue having to recompile the CLI program
every time I need to do this.
To clear a previously set node, you can use `node 0` or `node clear`.
Change the CLI to support quoted database names in `use` statements.
This also allows for all database names to be specified, including names
that contain spaces.
If we don't detect a server version, then there's a good chance that
we're not speaking to an InfluxDB server. We should warn the user about
this to make it easier for them to debug.
The previous sha was taken from a revision on a devel branch that I
thought would continue staying in the tree after it was merged. That
revision was rebased away and the API was changed for the logger.
This updates the usage of the logger and adds a simple package for
constructing the base logger.
The 1.0 version of zap changed the format of the default console logger
so this change moves over to this new logger instead of attempting to
retain backwards compatibility with the old format.
This commit carries out the initial refactor of the tsi1.Index into
tsi1.Partition. We then create a new tsi1.Index that will be an
abstraction over a collection of Partitions.
The integration test was intended to use the temporary directory for the
files that were created, but `INFLUXDB_WAL_DIR` is supposed to be
`INFLUXDB_DATA_WAL_DIR`.
Windows computers may produce a utf16 file from the command line that
contains a byte-order-mark. Along with handling the utf8
byte-order-mark, this also handles the utf16 for better Windows
compatibility.
* Fprint* functions
* No nakedness
* clarify panic messages
* spacing between case statements
* remove break in favor of return
* remove goto in favor of for { continue }
Changes the `influx_inspect inmem2tsi` tool to stream each TSM/WAL
file and convert to a TSI index instead of loading the entire shard's
in-memory index first.
This adds a new flag -exact that will return exact counts instead of
estimates. The default is to return estimates since exact counting
on a problem shard could consume a lot of memory.
influx_inspect walks the data and wal directories building a list of
files to export. It then opens, reads, and exports each. If the file was
deleted between the time it was added to the list and the time the
inspect tool attempts to read it, the file is now skipped without
emitting an error.
There are several places in the code where comma-ok map retrieval was
being used poorly. Some were benign, like checking existence before
issuing an unconditional delete with no cleanup. Others were potentially
far more serious: assuming that if 'ok' was true, then the resulting
pointer retrieved from the map would be non-nil. `nil` is a perfectly
valid value to store in a map of pointers, and the comma-ok syntax is
meant for when membership is distinct from having a non-zero value.
There was only one or two cases that I saw that being used correctly for
maps of pointers.
This change provides a clear separation between the query engine
mechanics and the query language so that the language can be parsed and
dealt with separate from the query engine itself.
This switches all the interfaces that take string series key to
take a []byte. This eliminates many small allocations where we
convert between to two repeatedly. Eventually, this change should
propogate futher up the stack.
The Points channel is nil until after the subscriber service is opened.
If it is append before it's opened, the PointsWriter holds onto the
old reference.
* off by default, enabled by `query-stats-enabled`
* writes to cq_query measurement of configured monitor database
* see CHANGELOG for schema of individual points
Measurement name and field were converted between []byte and string
repetively causing lots of garbage. This switches the code to use
[]byte in the write path.