Commit Graph

9685 Commits (667ada3906710ca0f52b4054ce68e9388a7c2155)

Author SHA1 Message Date
Jon Seymour 530b86ba7d tsm: cache: restore the semantics of cachedBytes and memSize stats
Fixes #5805.

This commit undoes a regression introduced by #5789.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-24 06:16:46 +11:00
Jon Seymour 3475356dc9 tsm: cache: fix semantics of snapshotCount statistic to make it useful.
Fix for #5804.

The commit for #5789 rendered the semantics of snapshotCount statistic
useless. This commit restores semantics that have diagnostic value to
this statistic.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-24 06:13:54 +11:00
Jason Wilder e0b23fd5b0 Merge pull request #5765 from oiooj/master
No need check Meta.Dir twice
2016-02-23 11:20:24 -07:00
Jason Wilder 92ae2a0e2d Merge pull request #5787 from chris-ramon/handler-query-authorizer
Improvement on `run.NewServer` related to `meta.QueryAuthorizer`.
2016-02-23 11:11:56 -07:00
Gunnar 54957a4838 Merge pull request #5785 from influxdata/ga-build
Implement --generate option in build script
2016-02-23 09:54:39 -08:00
Jason Wilder e7c29d5a37 Merge pull request #5789 from influxdata/jw-5686
Simplify cache snapshotting
2016-02-23 10:46:54 -07:00
gunnaraasen e2d83e53cc Implement -generate option in build script 2016-02-23 09:08:25 -08:00
Jason Wilder 017c24c98e Simplify cache snapshotting
The Cache had support for taking multiple snapshots to support writing
multiple snapshots to TSM files concurrently if that happened to be
a bottleneck.  In practice, this is never a bottleneck and we only
run one snappshoting goroutine continously per shard which has worked
well for all workloads.

The multiple snapshot support introduces some unhandled failure scenarios
where wal segments could be removed without writing them to TSM files.  If
a snapshot compaction fails to write due to transient disk errors, subsequent
snapshots will continue, but the failed one will not be retried.  When the
subsequent ones succeeded, all closed wal segments are removed causing data
loss.

This change simplifies the snapshotting capability to ensure that there is only
ever one snapshot.  If one fails, the next snapshot will update the existing
snapshot and retry all of old and new data.

Fixes #5686
2016-02-23 09:38:51 -07:00
Jason Wilder 0df6d558c2 Merge pull request #5800 from influxdata/jw-5757-regression
Fix data nodes not getting created
2016-02-23 09:22:03 -07:00
Jason Wilder 9ead458399 Fix data nodes not getting created
This fixes a regression introduced in #5757 due to the node.ID getting
assigned by both the meta and data services.  When both roles are active,
the data CreateDataNode path was not getting called because a node ID was
already assigned.

This fixes the issue by seeing if a DataNode already exists for our node
ID, and if it does not, we create one.
2016-02-23 09:01:02 -07:00
Jonathan A. Sternberg 53056d862b Eliminating dead code in `(*influxql.SelectStatement).RewriteWildcards()`
The dimensions array in `RewriteWildcards` gets emptied by an earlier
section of the code and then tries to iterate over that empty slice to
append it to the list of dimensions.

That makes the loop dead code that can't ever be hit.

Also improve the efficiency of this method by not creating a new slice
when there are no wildcards. We already check at the beginning of the
function if there is a wildcard out of necessity. There's no point in
making a new slice and copying the contents if we know that there will
be no wildcards to expand.

It also improves memory efficiency by assuming that if a wildcard
exists, there is only one and the pre-allocated slice can take advantage
of that. If there are multiple wildcards, then a new slice will have to
be created in the middle of the loop to raise the capacity.
2016-02-23 10:36:01 -05:00
Edd Robinson 6f1c02fdbe Reconfigure shards and shard groups on node deletion
Fixes #5680.

When dropping a data node, the following will now happen on the
Meta Store.

  1) If any shards no longer have any owners (because the data node
     being dropped is the only owner), they will be reassigned a
     new owner from within their respective shard group.
  2) If a shard group no longer has any shards/data nodes, they will
     be marked as deleted.

When a shard is being assigned a new owner a data node with the fewest
number of shards in the shard group will be selected as the new owner.

Finally, checking the validity of a data node's ID now happens in the
Meta store, rather than in the state machine.
2016-02-23 15:35:43 +00:00
Jonathan A. Sternberg f7ef382596 Remove dimensions from field wildcards
When a wildcard is specified for the field but not the dimensions, the
dimensions get added to the list of fields as part of
`RewriteWildcards()`.

But when a dimension was given with no wildcard, the dimension didn't
get removed from the wildcard in the fields section. This teaches the
rewriter to disclude dimensions explicitly included from being expanded
as a field. Now this statement when a measurement has one tag named host
and a field named value:

    SELECT * FROM cpu GROUP BY host

Would expand to this:

    SELECT value FROM cpu GROUP BY host

Instead of this:

    SELECT host, value FROM cpu GROUP BY host

If you want the latter behavior, you can include it like this:

    SELECT host, * FROM cpu GROUP BY host

Fixes #5770.
2016-02-23 10:22:56 -05:00
Jonathan A. Sternberg 2837f641d3 Update toml dependency for slice panic when reading the config
The bug was fixed by BurntSushi/toml#84.

Also adding gdm install to `make tools`.
2016-02-23 08:45:01 -05:00
Edd Robinson 08ca148724 Set the retention policy on the store 2016-02-23 11:32:07 +00:00
Chris Ramón f235852c0b updates changelog 2016-02-23 00:03:32 -05:00
Chris Ramón e52accaf90 adds missing srv.Handler.QueryAuthorizer 2016-02-23 00:02:48 -05:00
Jason Wilder 2894234b1e Merge pull request #5757 from influxdata/jw-cluster
Meta node only fixes
2016-02-22 15:44:07 -07:00
Jonathan A. Sternberg 50753de032 Merge pull request #5782 from influxdata/js-5777-audit-panics-in-influxql
Remove the non-unreachable panics in the new query engine
2016-02-22 17:18:57 -05:00
Jason Wilder 6f39b355bc Code cleanups 2016-02-22 15:06:05 -07:00
Jason Wilder a2d3d44505 Fix creating meta only nodes
This fixes a couple of issues with starting meta-only nodes.

1. We were always calling CreateDataNode regardless of whether the the
node is running data services.  We only call that now when node is
data enabled.
2. The node.json was created along-side creating the data node. Since
we are not creatinga a data node, this didn't happen anymore.  There
wasn't a simple way to do this in one place so it's actually handle
for when creating a meta or a data node now.  Since the ID assigned
to the node is the same regardless of role this works in all combinations
of roles.
3. The JoinMetaServer didn't return the ID of the joining node which
created some races when multiple nodes were joining.  The join call now
returns that information to the caller.

Fixes #5754
2016-02-22 15:06:05 -07:00
Jason Wilder 194d8d4693 Ensure monitor store is disabled for meta only nodes
We can't store points locally so ensure it's disabled for now.
2016-02-22 15:05:47 -07:00
Jason Wilder a437002969 Fix join option in config file
The join option was incorrectly exposed on the meta config.  It should
be at the top-level as a string and propogate down to the meta config
as a slice.
2016-02-22 15:05:46 -07:00
Mark Rushakoff 7f457b8852 Merge pull request #5786 from influxdata/mr-fix-tsm1-test-compilation
Fix non-compiling test
2016-02-22 14:04:05 -08:00
Mark Rushakoff 191de2670c Fix non-compiling test 2016-02-22 13:49:11 -08:00
Mark Rushakoff fc5c8597ab Merge pull request #5758 from influxdata/mr-disk-stats
Track cache, WAL, filestore stats within tsm1 engine
2016-02-22 13:01:55 -08:00
Mark Rushakoff 688863cec5 Update changelog 2016-02-22 12:51:52 -08:00
Jason Wilder e25b5abf61 Merge pull request #5751 from influxdata/jw-5719
Fix cache not deduplicating points in some cases
2016-02-22 13:41:17 -07:00
Jason Wilder aa2e878019 Fix cache not deduplicating points in some cases
The cache had some incorrect logic for determine when a series needed
to be deduplicated.  The logic was checking for unsorted points and
not considering duplicate points.  This would manifest itself as many
points (duplicate) points being returned from the cache and after a
snapshot compaction run, the points would disappear because snapshot
compaction always deduplicates and sorts the points.

Added a test that reproduces the issue.

Fixes #5719
2016-02-22 13:24:42 -07:00
Jonathan A. Sternberg 7a03df2af1 Remove the non-unreachable panics in the new query engine
The only panics left are ones that should be unreachable unless there is
a bug.

Fixes #5777.
2016-02-22 12:52:43 -05:00
Mark Rushakoff c7223157a6 Merge commit 'c93da21' into mr-disk-stats 2016-02-22 09:32:56 -08:00
Jonathan A. Sternberg b6a0b6a65a Merge pull request #5742 from influxdata/js-ensure-non-empty-column-names
Ensure column names get implicitly renamed with conflicts
2016-02-22 08:55:38 -05:00
Jonathan A. Sternberg 87e04b1a46 Merge pull request #5776 from influxdata/js-5773-unsupported-call-panic
Replace a panic with returning an error when an unsupported call is used
2016-02-22 08:17:59 -05:00
Edd Robinson 10b9befd82 Merge pull request #5716 from jonseymour/js-tolerate-empty-field-names
models: tolerate empty field names when unpacking binary points
2016-02-22 12:44:45 +00:00
Jon Seymour c93da21a61 tsm: cache: only use NewCache for engine cache's snapshots use a simpler constructor
The intent of this change is to avoid writing caches created for
snapshot cache instances into the tsm1_cache measurement. We can do
this by avoiding use of the NewCache constructor. All other methods
are only intended to be called from on the engine cache - never
on a snapshot.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-22 15:17:43 +11:00
Jonathan A. Sternberg 6982d5310e Replace a panic with returning an error when an unsupported call is used
Fixes #5773.
2016-02-21 19:39:14 -05:00
Mark Rushakoff 2ab79e75eb Merge pull request #5775 from jonseymour/jss-5499-extend-tsm-cache-stats
tsm: cache: during writes, update the memSize statistic outside the lock
2016-02-21 14:36:56 -08:00
Jon Seymour 510ee2c790 tsm: cache: during writes, update the memSize statistic outside the lock
Since we are not locking but relying on atomic arithmetic,
use Add rather than Set. Will also result in slightly less garbage
being created.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-22 08:26:35 +11:00
Mark Rushakoff feceb4dae1 Merge pull request #5769 from jonseymour/jss-5499-extend-tsm-cache-stats
tsm: cache: ensure all statistics are initialised on cache creation.
2016-02-21 07:25:49 -08:00
Jon Seymour 9c6efe99f1 tsm: cache: ensure all statistics are initialised on cache creation.
The intent of this change is to ensure that all statistic fields of the
resulting tsm1_cache measurement are initialized on initialization of
the cache. That way, any consumer of those measurements doesn't
have to deal with the null case.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-21 15:33:50 +11:00
Mark Rushakoff 04645188fa Merge pull request #5762 from jonseymour/jss-5499-extend-tsm-cache-stats
tsm: cache: add cache throughput related statistics.
2016-02-20 10:15:17 -08:00
Jon Seymour a8877badcd Update CHANGELOG for #5716, #5664
Note that this series also includes cherry-pick of #5697.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-21 03:56:37 +11:00
oiooj f1c027543c No need check Meta.Dir twice 2016-02-20 23:54:24 +08:00
Jon Seymour d46e0407a0 Merge #5716
RHS merges cleanly with 0.10.0 maintenance branch.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-20 22:24:03 +11:00
Jon Seymour 9491846047 models: improve handling of points with empty field names or with no fields
Influx does not support fields with empty names or points
with no fields.

NewPoint is changed to validate that all field names are non-empty.

AddField is removed because we now require that all fields are
specified on construction.

NewPointFromByte is changed to return an error if a unmarshaled
binary point does not have any fields.

newFieldsFromBinary is changed to prevent an infinite loop that
can arise while attempting to parse corrupt binary point data.

TestNewPointsWithBytesWithCorruptData is changed to reflect the
change in the behaviour of NewPointFromByte.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-20 22:22:26 +11:00
Jon Seymour 6697c721fb tsm: cache: add cache throughput related statistics.
Complementing and extending the changes in #5758.

Add 2 level statistics:

  * snapshotCount
  * cacheAgeMs

Add 2 counter statistics

  * cachedBytes
  * WALCompactionTimeMs

snapshotCount can be used to measure transient write errors that are causing snapshots to accumulate

cacheAgeMs can be used to guage the level of write activity into the cache

The differences between cachedBytes stats sampled at different times can be used to calculate cache throughput rates

The ratio (cachedBytes-diskBytes)/WALCompactionTimeMs can be used calculate WAL compaction throughput.

The ratio of difference between first and last WAL compaction time over the interval
length is an estimate of percentage of cache throughput consumed.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-20 22:18:57 +11:00
Mark Rushakoff 602043e11b Add disk stats for FileStore 2016-02-19 16:37:34 -08:00
Mark Rushakoff d99c09cedd Add stats for current and old WAL segment sizes 2016-02-19 16:37:34 -08:00
Mark Rushakoff e76967efb6 Add stats to tsm1.Cache 2016-02-19 16:37:34 -08:00
Gunnar c1de263858 Merge pull request #5710 from influxdata/ga-admin
Move admin UI assets to the admin service directory
2016-02-19 15:36:23 -08:00