If a bind-address of :8088 is used, cluster nodes cannot
connect to those nodes because there is no hostname portion
of the address. When we see a bind-address without a hostname,
use the os hostname or localhost if that fails if it is not specified
in the config already.
* Improve the ping endpoint so that it can optionally check for leader agreement across all meta servers
* Add Ping method to the meta client
* Fix ClusterID tests
* Remove WaitForLeader from meta client and remove unnecessary references to it
* Updated CreateShardGroup to not return an error if it already exists so it's idempotent
* Removed old test making sure you can't delete the default RP. You can delete it now, there was no reason to disallow it.
* Wired up the UpdateRetentionPolicy functionality
* Add dir, hostname, and bind address to top level config since it applies to services other than meta
* Add enabled flags to example toml for data and meta services
* Wire up add/remove raft peers and meta servers to meta service
* Update DROP SERVER to be either DROP META SERVER or DROP DATA SERVER
* Bring over statement executor from old meta package
* Start meta service client implementation
* Update meta service test to use the client
* Wire up node ID/meta server storage information
This changes backup and restore to work for TSM. It breaks it for b1 and bz1, but since those are getting removed it's ok.
The backup runs against any host that is specified and can backup either the metasstore, a database, specific retention policy, or a specific shard. It can also take incremental backups with the `since` flag, which will only backup TSM files that have been created since that timestamp.
The backup is safe to run online. However, for shards that are still hot for writes, they won't be able to create new TSM files while the backup for that single shard runs. If the backup isn't too large and the write throughput isn't too high this shouldn't be a problem since the writes will just go into the WAL cache.
One of the first unit tests in the cli tests called the Run method.
Since the Run method called os.Exit, it reported the unit tests as
succeeded. When parallel is set to 1, this skips _all_ unit tests after
the first one. When parallel is set to a higher value, unit tests run by
other processes still get run.
This changes the Run method to return an error (if one occurred). This
error can then be printed out and a bad exit status can be used to exit
the program from the main program instead. That causes the unit tests
to run correctly regardless of how many parallel processes are running.
Also added an additional option to the CLI called `IgnoreSignals`. If
this is set to true, then signals are not registered with the process.
Setting signals doesn't really work in unit tests so it's good to ensure
they don't get set in the first place.
In addition to fixing the influx cli tests, this adds a mock client to
the cli test for Use. PR #5183 added a validation for `use` to only be
able to select public databases so `_internal` couldn't be chosen. To
implement this, the `SHOW DATABASES` command was used by the internal
client.
Some of the unit tests in `cli_test.go` don't set the client to
anything. `TestParseCommand_Use` previously didn't, but now it needs to
have a client in the unit test with an empty test server.
This has a few changes in it (unfortuantely). The main change is to run compactions
concurrently. While implementing this, a few query and performance bugs showed up that
are also fixed by this commit.
Changed non-interactive mode to send everything through the CLI's parser the same way the interactive mode works.
Added multiline support for -execute flag.
Server registration and stats reporting has been removed from what was
once http://enterprise.influxdata.com. The app that lived there, now
runs at http://usage.influxdata.com, so that the subdomain can
eventually be repurposed. Because we also want to repurpose the
`enterprise-client` repo, we have also renamed that to `usage-client`.
InfluxDB no longer needs the `registration` service now, since all of
the endpoints it communicates with simply discard the data provided to
them.
Add StressTest type and auxillary interfaces
Add config structs
Move generator to config
Add utility methods used in stress
Add basic components for a stress test
Add touches
Add configuration options
Add unified results handlers
Add final print out of results
Add Success function to response type
Add query support
Send query results
Add comments to run.go
Change Basic to BasicWriter
Add basic query
Add incomplete README
Abstract out response handling
Change plugin to basic
Add responseHandler type
Add additional parameter to Query function
Add todo comments and cleanup main
Lower hard coded value
Add flag for profiling
Fix race condition
Wait at the right place
Chane point from struct to interface
Improve generic write throughput
Reorganize
Fastest State
Add toml config
Add test server
Add basic working version of config file
Move config file logic into its own file
Fix broken config file
Add query count to stress config
Add support for concurrency and batch interval
Reorder config option
Remove unneeded init
Remove old stress package
Move new stress code into stress directory
Rework influx_stress tool
Do something reasonable if no config is given
Remove unneeded comments
Add tests for stress package
Add comments and reorganize code
Add more comments
Count lines posted correctly
Add NewConfig method
Fix style issues
Add backticks to flag description
Fix grammar
Remove `StartTimer` calls where appropriate
Fix comment language
Change Reader to Querier
Reorder defer
Fix issues bought up by golint
Add more comments
Add more detailed Readme
Increase counter appropriately
Add return errors where appropriate
Add test coverage
Move `now()` from QueryClient to QueryGenerator
This change ensures that if there are any fields in the WHERE clause of
an aggregate that are different from the fields in the SELECT clause,
that the cursors also decode those fields. Otherwise WHERE clauses of
the form 'SELECT f(w) FROM x WHERE y=z' will return incorrect results
Fixes issue #4701.
match the info provided by the influx --help output,
and added history command
Reverted description for pretty command
+ minor edits
Removed duplication of command names
Signed-off-by: Anes Hasicic <anes.hasicic@gmail.com>
When unpacking the meta, the Store `Addr` is built
against the hostname and the `bind-address` port.
We can use this resolved address for the `RemoteAddr`
as well since according to the clustering docs the
`hostname must be resolved by all members in the cluster`
This change moves the logic to detect and display the Enterprise
registration hint into the same logic check as that which decides if the
successful-connection message should be displayed.
Fixes#4514.
This changes the HTTP line protocol handler to behave similar to the other
handler in that they will write as many points as possible. Previously, we
would fail the entire batch if one point failed. This can happen more frequently
now with NaN being more explicitly unsupported. Now it will write as many points
that parse successfully and return a "partial write" error to the client with the
lines that failed to parse.
This commit refactors the tsdb query engine to use separate aggregate
and raw execution paths, encapsulates cursor functionality, and removes
the TagSetCursor from the aggregate path. By removing the TagSetCursor,
we can pass sets of unordered values to the map functions and bypass
the `container/heap` entirely.
Registration also involves statistics and diagnostics upload, for the
purposes of remote management. This means there will be long-running
goroutines in effect. Therefore move the code to a service model.
If a tsm file was partially written, we were not able to read the
raw block data because we panic/exited when reading the corrupted
index. This allows us to read the raw blocks if we can.
For aggregate queries, having a null result means that you haven't
got any data for that time period. CQs used this as a signal that
the measurement was not created and dropped the entire write.
INTO queries can have any structure, including wildcards, so dropping
the entire query isn't going to work. Instead, just drop the nulls
returned.
Avoids a panic if a series ID exists in the tsm file but not in the IDs file.
Also handles the case were we don't have an ids file and just a tsm file.
Since INTO queries need to have absolute information about the database
to work, we need to create a loopback interface back to the cluster
in order to perform them.
This will read a tsm file and dump index, block and compression level info from the file.
It reads the file directly as opposed to reading it through the tsm engine which should
help with debugging and troubleshooting data file issues.
The implementation is not pretty but the output is very useful. In the future, we can
add data extraction, recovery and verification functionality if needed.
It is now possible to configure arbitrarily many tags in a generic
format. That is specifying the config option `tag_count=10` will add 10
tags to a series that are of the form `tag-key-n=tag-value` where n
ranges from 0 to 9.
Save current state
This commit changes `tsdb.mapFunc` to use `tsdb.MapInput` instead
of an iterator. This will make it easier and faster to pass blocks
of values from the new storage engine into the engine.
If the memory gets 5x above the partition size threshold, the WAL will start returning write failures to the clients. This will allow them to backoff their write volume.
Also updated the stress script to track failed requests and output messages on failure and when it returns to success.
Previously the measurement that was getting written into InfluxDB was
hard coded in the `Run` method as `cpu`. Now you can specify a
measurements by passing an `-m` flag to `influx_stress`.
The `-m` flag accepts a comma separated list of measurements. (e.g.
`influx_stress -m cpu,mem,disk`)
The `sort` methods were failing as responseTimes couldn't be cast to an
[]int. To fix this `ResponseTimes` now implements the `sort.Interface`
interface.
This commit abstracts out the body of `main()` and moves it into a separate
function `Runner` in `runner/runner.go`. Additonally two new types,
`Timer` and `Config` were introduced.
`Runner` takes in a `*Config` and returns the total number of points
written `totalPoints`, an slice of response times `responseTimes`, and a
timer with the starting and ending times of the test `*Timer`.
This is to prevents users from putting their system into an awkward
state. It is a policy that all databases must have at least a default
retention policy.
Fixes issue #3699.
The server was closing by stopping the most depended on services first
which causes various panics while higher level services are still processing
task when the server closes.
Fixes#3881