This fixes a couple of issues with starting meta-only nodes.
1. We were always calling CreateDataNode regardless of whether the the
node is running data services. We only call that now when node is
data enabled.
2. The node.json was created along-side creating the data node. Since
we are not creatinga a data node, this didn't happen anymore. There
wasn't a simple way to do this in one place so it's actually handle
for when creating a meta or a data node now. Since the ID assigned
to the node is the same regardless of role this works in all combinations
of roles.
3. The JoinMetaServer didn't return the ID of the joining node which
created some races when multiple nodes were joining. The join call now
returns that information to the caller.
Fixes#5754
The join option was incorrectly exposed on the meta config. It should
be at the top-level as a string and propogate down to the meta config
as a slice.
The cache had some incorrect logic for determine when a series needed
to be deduplicated. The logic was checking for unsorted points and
not considering duplicate points. This would manifest itself as many
points (duplicate) points being returned from the cache and after a
snapshot compaction run, the points would disappear because snapshot
compaction always deduplicates and sorts the points.
Added a test that reproduces the issue.
Fixes#5719
The intent of this change is to avoid writing caches created for
snapshot cache instances into the tsm1_cache measurement. We can do
this by avoiding use of the NewCache constructor. All other methods
are only intended to be called from on the engine cache - never
on a snapshot.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Since we are not locking but relying on atomic arithmetic,
use Add rather than Set. Will also result in slightly less garbage
being created.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
The intent of this change is to ensure that all statistic fields of the
resulting tsm1_cache measurement are initialized on initialization of
the cache. That way, any consumer of those measurements doesn't
have to deal with the null case.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Influx does not support fields with empty names or points
with no fields.
NewPoint is changed to validate that all field names are non-empty.
AddField is removed because we now require that all fields are
specified on construction.
NewPointFromByte is changed to return an error if a unmarshaled
binary point does not have any fields.
newFieldsFromBinary is changed to prevent an infinite loop that
can arise while attempting to parse corrupt binary point data.
TestNewPointsWithBytesWithCorruptData is changed to reflect the
change in the behaviour of NewPointFromByte.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Complementing and extending the changes in #5758.
Add 2 level statistics:
* snapshotCount
* cacheAgeMs
Add 2 counter statistics
* cachedBytes
* WALCompactionTimeMs
snapshotCount can be used to measure transient write errors that are causing snapshots to accumulate
cacheAgeMs can be used to guage the level of write activity into the cache
The differences between cachedBytes stats sampled at different times can be used to calculate cache throughput rates
The ratio (cachedBytes-diskBytes)/WALCompactionTimeMs can be used calculate WAL compaction throughput.
The ratio of difference between first and last WAL compaction time over the interval
length is an estimate of percentage of cache throughput consumed.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Fixes#5653 and #5394.
Previously dropping retention policies did not propogate to local TSDB
shards. Instead, the retention policiess would just be removed from the
Meta Store.
This PR adds ensures that data associated with retention policies is
removed, when the retention policy is dropped.
Also, it cleans up a couple of other methods in `tsdb`, including the
requirement to provide (redundant) shardIDs when deleting databases.
These tests check that NewPoint and NewPointFromBytes return an error if:
* arguments specify a point with no fields
* arguments specify a point with a field that has an empty name
These tests also check that Point.Fields() always returns, even in the presence
of corrupt binary data read with NewPointFromBytes.
These tests fail at this commit and are fixed by a subsequent commit.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Dropping a meta node that had already been removed from the config
would fail because the raft.RemovePeers call would return an error
that the address was unknown. This change skips calling RemovePeer
if it doesn't exist.
Dropping a non-existing ID would hang for 10 seconds becuase the
meta.Client retryUntilExec didn't differentiate before command errors
and redirect errors. In this case, the command would return an error
but we'd try 10 more times and ultimately give up and return the error.
We now return immediately if the command returned and error because
retrying it will not succeed.
Finally, the join loop had no delay and would immediately try to join
the other nodes hundreds of times a second. We now pause a second if we've
tried every node at least once.
This fixes several issues related to the bind address and hostname:
* Allows bind addresses where a hostname or IP is not specified to
work correct and bind to all interfaces by default.
* Fixes the top-level "hostname" config option to allow overridding
all bind address hostnames. This allows a node to advertise a different
hostname than what is defined in the bind address setting.
* Adds the -hostname command-line option back to allow specifing
both -join and -hostname as command-line flags.
* Enforces a configuration precedence and overriding ability defined
as config file is overridden by env vars which are overriden by command-line
flags.
Fixes#5670#5671