This fixes several issues related to the bind address and hostname:
* Allows bind addresses where a hostname or IP is not specified to
work correct and bind to all interfaces by default.
* Fixes the top-level "hostname" config option to allow overridding
all bind address hostnames. This allows a node to advertise a different
hostname than what is defined in the bind address setting.
* Adds the -hostname command-line option back to allow specifing
both -join and -hostname as command-line flags.
* Enforces a configuration precedence and overriding ability defined
as config file is overridden by env vars which are overriden by command-line
flags.
Fixes#5670#5671
The name of the column will be every measurement located inside of the
math expression in the order they are encountered in within the
expression.
Also handle `*influxql.ParenExpr` in the function
`(*influxql.Field).Name()`
Fixes#5730.
Previously, meta.Client would drop the default retention policy when
trying to create a database with a retention policy. The RPC has now
been modified to include the desired retention policy in the
CreateDatabase command and have it use that retention policy information
instead of the default configuration when provided.
This also lowers the number of RPC calls for
CreateDatabaseWithRetentionPolicy to only a single RPC call instead of
two.
Protections have also been included so creating a retention policy with
different parameters will return an error similar to if you tried to
modify the retention policy separately.
Fixes#5696.
A case (#5606) was found where a lot of data unexpectedly disappeared from a database
following a TSM conversion.
The proximate cause was an inconsistency between the root Bolt DB bucket list
and the meta data in the "series" bucket of the same shard. There were apparently valid
series in Bolt DB buckets that were no longer referenced by the meta data
in the "series" bucket - so-called orphaned series; since the conversion
process only iterated across the series found in the meta data, the conversion process
caused the orphaned series to be removed from the converted shards. This resulted in the
unexpected removal of data from the TSM shards that had previously been accessible
(despite the meta data inconsistency) in the b1 shards.
The root cause of the meta data inconsistency in the case above was a failure, in versions prior
to v0.9.3 (actually 3348dab) to update the "series" bucket with series that had been created in
previous shards during the life of the same influxd process instance.
This fix is required to avoid data loss during TSM conversions for shards that were created with
versions of influx that did not include 3348dab (e.g. prior to v0.9.3).
Analysis-by: Jon Seymour <jon@wildducktheories.com>
This removes the MetaServers property from node.json to eliminate one
of the four places those addresses are stored on disk. We always use
the values that come through the config (via file, env var or -join arg).
top() and bottom() point ordering was incorrect and using an inefficient
method of sorting. It has now been updated to use a heap and ordering is
being done by value first and time second (with earlier times always
taking priority).
Removed unit tests that test using `time` inside of the query to get the
real time instead of the interval time and only allowing the default
behavior. We will have another mechanism to get the real time during an
interval, but the current method is deprecated.
The top() and bottom() methods now have integer support.
It had the time values for the selectors being returned equal the actual
points time. We have decided to have the time always be the interval
time and adding another feature later that can return the selected
point's time in the future.
The history file is cleared before WriteHistory is called after each
command/exit() to prevent exponential file growth.
This commit addresses issue #5436, please see PR for full explanation.
Under highly conncurrent write load, the coordinating node would
create a connection to any other node that is part of the replica
group. Since each connection can be expensive, OOM sitations could
occur because there was no bounds on the number of new connections
that would be created. If writes on a remote node were slow, connections
could pile up an exacerbate the problem.
This switches the pool to be bounded and has a checkout that is blocking
with a timeout. If a connection is available, it's returned immediately.
If the pool still has room for more connections, it will create one if needed.
Otherwise, the call will block until a connection becomes available or
the timeout expires. In the case of a timeout, it is propogated back up
to the PointsWriter that determine what do return to the client.
If a bind-address of :8088 is used, cluster nodes cannot
connect to those nodes because there is no hostname portion
of the address. When we see a bind-address without a hostname,
use the os hostname or localhost if that fails if it is not specified
in the config already.
* Improve the ping endpoint so that it can optionally check for leader agreement across all meta servers
* Add Ping method to the meta client
* Fix ClusterID tests
* Remove WaitForLeader from meta client and remove unnecessary references to it
* Updated CreateShardGroup to not return an error if it already exists so it's idempotent
* Removed old test making sure you can't delete the default RP. You can delete it now, there was no reason to disallow it.
* Wired up the UpdateRetentionPolicy functionality
* Add dir, hostname, and bind address to top level config since it applies to services other than meta
* Add enabled flags to example toml for data and meta services
* Wire up add/remove raft peers and meta servers to meta service
* Update DROP SERVER to be either DROP META SERVER or DROP DATA SERVER
* Bring over statement executor from old meta package
* Start meta service client implementation
* Update meta service test to use the client
* Wire up node ID/meta server storage information
This changes backup and restore to work for TSM. It breaks it for b1 and bz1, but since those are getting removed it's ok.
The backup runs against any host that is specified and can backup either the metasstore, a database, specific retention policy, or a specific shard. It can also take incremental backups with the `since` flag, which will only backup TSM files that have been created since that timestamp.
The backup is safe to run online. However, for shards that are still hot for writes, they won't be able to create new TSM files while the backup for that single shard runs. If the backup isn't too large and the write throughput isn't too high this shouldn't be a problem since the writes will just go into the WAL cache.
One of the first unit tests in the cli tests called the Run method.
Since the Run method called os.Exit, it reported the unit tests as
succeeded. When parallel is set to 1, this skips _all_ unit tests after
the first one. When parallel is set to a higher value, unit tests run by
other processes still get run.
This changes the Run method to return an error (if one occurred). This
error can then be printed out and a bad exit status can be used to exit
the program from the main program instead. That causes the unit tests
to run correctly regardless of how many parallel processes are running.
Also added an additional option to the CLI called `IgnoreSignals`. If
this is set to true, then signals are not registered with the process.
Setting signals doesn't really work in unit tests so it's good to ensure
they don't get set in the first place.
In addition to fixing the influx cli tests, this adds a mock client to
the cli test for Use. PR #5183 added a validation for `use` to only be
able to select public databases so `_internal` couldn't be chosen. To
implement this, the `SHOW DATABASES` command was used by the internal
client.
Some of the unit tests in `cli_test.go` don't set the client to
anything. `TestParseCommand_Use` previously didn't, but now it needs to
have a client in the unit test with an empty test server.
This has a few changes in it (unfortuantely). The main change is to run compactions
concurrently. While implementing this, a few query and performance bugs showed up that
are also fixed by this commit.
Changed non-interactive mode to send everything through the CLI's parser the same way the interactive mode works.
Added multiline support for -execute flag.
Server registration and stats reporting has been removed from what was
once http://enterprise.influxdata.com. The app that lived there, now
runs at http://usage.influxdata.com, so that the subdomain can
eventually be repurposed. Because we also want to repurpose the
`enterprise-client` repo, we have also renamed that to `usage-client`.
InfluxDB no longer needs the `registration` service now, since all of
the endpoints it communicates with simply discard the data provided to
them.
Add StressTest type and auxillary interfaces
Add config structs
Move generator to config
Add utility methods used in stress
Add basic components for a stress test
Add touches
Add configuration options
Add unified results handlers
Add final print out of results
Add Success function to response type
Add query support
Send query results
Add comments to run.go
Change Basic to BasicWriter
Add basic query
Add incomplete README
Abstract out response handling
Change plugin to basic
Add responseHandler type
Add additional parameter to Query function
Add todo comments and cleanup main
Lower hard coded value
Add flag for profiling
Fix race condition
Wait at the right place
Chane point from struct to interface
Improve generic write throughput
Reorganize
Fastest State
Add toml config
Add test server
Add basic working version of config file
Move config file logic into its own file
Fix broken config file
Add query count to stress config
Add support for concurrency and batch interval
Reorder config option
Remove unneeded init
Remove old stress package
Move new stress code into stress directory
Rework influx_stress tool
Do something reasonable if no config is given
Remove unneeded comments
Add tests for stress package
Add comments and reorganize code
Add more comments
Count lines posted correctly
Add NewConfig method
Fix style issues
Add backticks to flag description
Fix grammar
Remove `StartTimer` calls where appropriate
Fix comment language
Change Reader to Querier
Reorder defer
Fix issues bought up by golint
Add more comments
Add more detailed Readme
Increase counter appropriately
Add return errors where appropriate
Add test coverage
Move `now()` from QueryClient to QueryGenerator
This change ensures that if there are any fields in the WHERE clause of
an aggregate that are different from the fields in the SELECT clause,
that the cursors also decode those fields. Otherwise WHERE clauses of
the form 'SELECT f(w) FROM x WHERE y=z' will return incorrect results
Fixes issue #4701.
match the info provided by the influx --help output,
and added history command
Reverted description for pretty command
+ minor edits
Removed duplication of command names
Signed-off-by: Anes Hasicic <anes.hasicic@gmail.com>
When unpacking the meta, the Store `Addr` is built
against the hostname and the `bind-address` port.
We can use this resolved address for the `RemoteAddr`
as well since according to the clustering docs the
`hostname must be resolved by all members in the cluster`
This change moves the logic to detect and display the Enterprise
registration hint into the same logic check as that which decides if the
successful-connection message should be displayed.
Fixes#4514.
This changes the HTTP line protocol handler to behave similar to the other
handler in that they will write as many points as possible. Previously, we
would fail the entire batch if one point failed. This can happen more frequently
now with NaN being more explicitly unsupported. Now it will write as many points
that parse successfully and return a "partial write" error to the client with the
lines that failed to parse.
This commit refactors the tsdb query engine to use separate aggregate
and raw execution paths, encapsulates cursor functionality, and removes
the TagSetCursor from the aggregate path. By removing the TagSetCursor,
we can pass sets of unordered values to the map functions and bypass
the `container/heap` entirely.
Registration also involves statistics and diagnostics upload, for the
purposes of remote management. This means there will be long-running
goroutines in effect. Therefore move the code to a service model.
If a tsm file was partially written, we were not able to read the
raw block data because we panic/exited when reading the corrupted
index. This allows us to read the raw blocks if we can.
For aggregate queries, having a null result means that you haven't
got any data for that time period. CQs used this as a signal that
the measurement was not created and dropped the entire write.
INTO queries can have any structure, including wildcards, so dropping
the entire query isn't going to work. Instead, just drop the nulls
returned.
Avoids a panic if a series ID exists in the tsm file but not in the IDs file.
Also handles the case were we don't have an ids file and just a tsm file.
Since INTO queries need to have absolute information about the database
to work, we need to create a loopback interface back to the cluster
in order to perform them.
This will read a tsm file and dump index, block and compression level info from the file.
It reads the file directly as opposed to reading it through the tsm engine which should
help with debugging and troubleshooting data file issues.
The implementation is not pretty but the output is very useful. In the future, we can
add data extraction, recovery and verification functionality if needed.