Previously, meta.Client would drop the default retention policy when
trying to create a database with a retention policy. The RPC has now
been modified to include the desired retention policy in the
CreateDatabase command and have it use that retention policy information
instead of the default configuration when provided.
This also lowers the number of RPC calls for
CreateDatabaseWithRetentionPolicy to only a single RPC call instead of
two.
Protections have also been included so creating a retention policy with
different parameters will return an error similar to if you tried to
modify the retention policy separately.
Fixes#5696.
This removes the MetaServers property from node.json to eliminate one
of the four places those addresses are stored on disk. We always use
the values that come through the config (via file, env var or -join arg).
I was trying to create a Diagnostics Client in the tsdb package, but
IIRC importing `monitor` caused an import cycle of:
tsdb -> monitor -> cluster -> tsdb.
Moving Diagnostics to its own package will allow further use of
diagnostics.Client without running into import cycles.
Meta HTTP commands are cluster level requests and were showing up in
the main log creating a lot of noise. Switch them to use the ClusterTracing
config option which is disabled by default.
Previously, the lease redirect was invalid causing anything relying
on a lease for execution (eg. continuous queries) to cease functioning.
The name/nodeid URL param parsing has been moved up to the top of the
handler so the options can be forwarded on to the real leader.
X-Github-Closes: #5592
* pass configured precision string to point parsing
* add Precision configuration to UDP config
* default configured precision to match what it appears to be now (from ParsePoints)
Possible fix for #5437. meta.Client.RetentionPolicy acquired a read-lock and
then called Database which called data() which acquired a read-lock again.
If a write lock was taken between these two read-locks (likely by Authenticate),
the write-lock would block, and the second read-lock would also block
causing a dead-lock.
Previously if you issued a CQ with a resample interval higher than the
query interval, such as the following:
CREATE CONTINUOUS QUERY cq ON db
RESAMPLE EVERY 4m
BEGIN
SELECT mean(value) INTO cpu_mean FROM cpu GROUP BY time(2m)
END
This would result in strange behavior because the FOR value defaulted to
the GROUP BY interval and the minimum time passing before a CQ ran was
also the resample interval, so it wouldn't run the appropriate intervals
even if you set the resample duration to a higher value.
This tweaks the CQ runner to set the minimum interval before a bucket
becomes capable of running to the lower of the query interval or the
resample interval instead of always using the resample interval.
It also sets the default resample duration to be the higher value of the
query interval or the resample interval so the above query gets a
default of 4m instead of 2m and will execute 2 queries every 4 minutes.
If you manually set the resample duration to a lower value than the
resample interval, the old behavior will still happen and should be
considered an error.
This also makes trying to create a continuous query with a resample
duration of below the resample interval or query interval (whichever is
higher) as an error returned by the parser.
Fixes#5286.
If a bind-address of :8088 is used, cluster nodes cannot
connect to those nodes because there is no hostname portion
of the address. When we see a bind-address without a hostname,
use the os hostname or localhost if that fails if it is not specified
in the config already.
* Improve the ping endpoint so that it can optionally check for leader agreement across all meta servers
* Add Ping method to the meta client
* Fix ClusterID tests
* Remove WaitForLeader from meta client and remove unnecessary references to it
* Updated CreateShardGroup to not return an error if it already exists so it's idempotent
* Removed old test making sure you can't delete the default RP. You can delete it now, there was no reason to disallow it.
* Wired up the UpdateRetentionPolicy functionality
This ensures that the meta service will gracefully handle host name changes in a single server configuration.
It also changes the raft setup to use the user specified bind address (and thus hostname) instead of pulling it off the listener, which returns the IP. This will enable users to have hostnames listed instead of IPs in the megastore, making it easier to read. This also means that underlying IPs can change without causing problems in a cluster.
* increase sleep on error in client exec in case a server went down so we don't max out retries before a new leader gets elected
* update and add close logic to service, handler, raft state, and the client
* Add dir, hostname, and bind address to top level config since it applies to services other than meta
* Add enabled flags to example toml for data and meta services
* Wire up add/remove raft peers and meta servers to meta service
* Update DROP SERVER to be either DROP META SERVER or DROP DATA SERVER
* Bring over statement executor from old meta package
* Start meta service client implementation
* Update meta service test to use the client
* Wire up node ID/meta server storage information
This changes backup and restore to work for TSM. It breaks it for b1 and bz1, but since those are getting removed it's ok.
The backup runs against any host that is specified and can backup either the metasstore, a database, specific retention policy, or a specific shard. It can also take incremental backups with the `since` flag, which will only backup TSM files that have been created since that timestamp.
The backup is safe to run online. However, for shards that are still hot for writes, they won't be able to create new TSM files while the backup for that single shard runs. If the backup isn't too large and the write throughput isn't too high this shouldn't be a problem since the writes will just go into the WAL cache.
This makes the following syntax possible:
CREATE CONTINUOUS QUERY mycq ON mydb
RESAMPLE EVERY 1m FOR 1h
BEGIN
SELECT mean(value) INTO cpu_mean FROM cpu GROUP BY time(5m)
END
The RESAMPLE option customizes how often an interval will be sampled and
the duration. The interval is customized with EVERY. Any intervals
within the resampling duration on a multiple of the resample interval
will be updated with the new results from the query.
The duration is customized with FOR. This determines how long an
interval will participate in resampling.
Both options are optional. If RESAMPLE is in the syntax, at least one of
the two needs to be given. The default for both is the interval of the
continuous query.
The service also improves tracking of the last run time and the logic of
when a query for an interval should be run. When determining the oldest
interval to run for a query, the continuous query service determines
what would have been the optimal time to perform the next query based on
the last run time. It then uses this time to determine the oldest
interval that should be run using the resample duration and will
resample all intervals between this time and the current time as opposed
to potentially forgetting about the last run in an interval if the
continuous query service gets delayed for some reason.
This removes the previous config options for customizing continuous
queries since they are no longer relevant and adds a new option of
customizing the run interval. The run interval determines how often the
continuous query service polls for when it should execute a query. This
option defaults to 1s, but can be set to 1m if the least common factor
of all continuous queries' intervals is a higher value (like 1m).
Server registration and stats reporting has been removed from what was
once http://enterprise.influxdata.com. The app that lived there, now
runs at http://usage.influxdata.com, so that the subdomain can
eventually be repurposed. Because we also want to repurpose the
`enterprise-client` repo, we have also renamed that to `usage-client`.
InfluxDB no longer needs the `registration` service now, since all of
the endpoints it communicates with simply discard the data provided to
them.
The canonical graphite implementation will read and discard NaN values
instead of throwing an error when reading on the line receiver protocol.
Since this is the default behavior for graphite, InfluxDB should have
the same behavior for compatibility.
Previously, a NaN value would result in an error printed to the console.
When you have a large number of NaN values being sent every minute, this
results in the log file filling with useless messages.
Go style -- and existing runtime stats -- do not use underscores, but
instead use camel case. This change makes the internal stats adhere to
that convention.
This changes the HTTP line protocol handler to behave similar to the other
handler in that they will write as many points as possible. Previously, we
would fail the entire batch if one point failed. This can happen more frequently
now with NaN being more explicitly unsupported. Now it will write as many points
that parse successfully and return a "partial write" error to the client with the
lines that failed to parse.
Float values are not supported in the existing engine and the tsm1
engines. This changes NewPoint to return an error if a field value
contains a NaN field. It also allows us to validate fields to prevent
other unsupported types from sneaking in through other input plugins.
Registration also involves statistics and diagnostics upload, for the
purposes of remote management. This means there will be long-running
goroutines in effect. Therefore move the code to a service model.
Everytime the purge check was running, a new segment was being added.
This meant the list of almost-empty files in the HH directories would
grow continually.
I believe this change address the issues with hinted-handoff not fully replicating all data to nodes that come back online after an outage.. A detailed explanation follows.
During testing of of hinted-handoff (HH) under various scenarios, HH stats showed that the HH Processor was occasionally encountering errors while unmarshalling hinted data. This error was not handled completely correctly, and in clusters with more than 3 nodes, this could cause the HH service to stall until the node was restarted. This was the high-level reason why HH data was not being replicated.
Furthermore by watching, at the byte-level, the hinted-handoff data it could be seen that HH segment block lengths were getting randomly set to 0, but the block data itself was fine (Block data contains hinted writes). This was the root cause of the unmarshalling errors outlined above. This, in turn, was tracked down to the HH system opening each segment file multiple times concurrently, which was not file-level thread-safe, so these mutiple open calls were corrupting the file.
Finally, the reason a segment file was being opened multiple times in parallel was because WriteShard on the HH Processor was checking for node queues in an unsafe manner. Since WriteShard can be called concurrently this was adding queues for the same node more than once, and each queue-addition results in opening segment files.
This change fixes the locking in WriteShard such the check for an existing HH queue for a given node is performed in a synchronized manner.
Without this change if hinted-handoff was disabled the service would
correctly reject writes, but it would process any data sitting in
hinted-handoff queues. With this change the service is completely
disabled.
With this change Graphite TCP connections are tracked on a per-service
basis. This allows a closing Graphite service to first shutdown any
active connections, thereby unblocking the rest of shutdowm.
This work exposed small shortcomings with the existing Diagnostics
system and that code has alse been tweaked.
Fixes issue #4017
With this change, the generic batcher used by many inputs can now be
buffered. Testing shows that this performance of the Graphite input by
10-100%, with the biggest improvements at lower numbers of connections.
If a timestamp was larger than the max epoch value was sent via
graphite it would cause the timestamp to overflow when it was
marshaled/unmarshaled back from the raft log. The overflow cause
the shard group to get created with the wrong timestamp which cause
a panic when writing the point. The panic was caused because the
timestamp that were supposed to exists in a map created by MapShards
did not actually exist so a nil ShardGroup was used.
The change prevents creating the point with an invalid timestamp. Since
graphite using a timestamp in seconds, the maximum range is known and
can be prevented. This also adds a check for the minimum range as well.
Fixes#3785
This change adds support for diagnostics by decomposing the existing
interface into two interfaces -- one for stats, and the other for
diags. It also adds some basic monitor of system, network, and the Go
runtime.
Through profiling of writes, point.Fields() and point.Name() were called
repeatedly in PointsWriter and the Shard. These calls are somewhat expensive
when writing large batches so we can cache them to avoid wasting CPU cycles.
Using influx_stress with default settings
Before:
Wrote 10000000 points at average rate of 202570
Average response time: 235.450355ms
After:
Wrote 10000000 points at average rate of 246120
Average response time: 182.881008ms
We now redact the credentials in the logger, so the function implemented
by the deleted lines now seems redudndant.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Previously password redaction only occurred inside the
authentication handler and the authentication handler is not on
the request path for OPTIONS requests and, in any case, would
not be invoked because of an early return on OPTIONS
requests by the CORS handler.
Now, we change the response logger to explictly replace any
occurrence of the 'p' parameter from the query string with
'[REDACTED]' prior to logging the response.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
If no chunking was requested by the user, the co-ordinating node buffers all
results in RAM before emitting a single result. However buffering was not
merging results for rows which had data for the same series. This change fixes this.
Fixes issue #3242.
If the hinted handoff segment is corrupt, the size read could be
invalid and attempting to create a slice using that size causes
a panic. Ideally, we'd have a checksum on the seqment record but
for now just return an error when the size is larger than the
segment file.
Fixes#3687
* Capitalize first letter of message
* Log all services staring consistently
* Remove some extraneous log statements in meta.Store
* Log data dirs for meta, data and hinted handoff
A short write has occurred and we do not have enough bytes to determine
the size of the payload. This is corrupted record that we should drop.
Instead of panicing, log the error and advance the queue since the error
at this location is unreoverable currently.
Fixes#3436