Commit Graph

9760 Commits (b12cf04a73cce3780fb469bede6e4467dd9255f2)

Author SHA1 Message Date
Goutham Veeramachaneni 8ff1143aaa Added Pull Request Template
As we are already making use of the template in Contributing.md, it makes sense to use github's new feature! Check https://github.com/blog/2111-issue-and-pull-request-templates
2016-02-25 20:25:23 +05:30
Jon Seymour 4d98a1cf28 tsm: cache: remove unnecessary lock escalation.
Previously, we needed a write lock on the cache because it was the
only lock we had available to guard updates to entry.values and
entry.needSort.

However, now we have a entry-scoped lock for this purpose, we don't
need the cache write lock for this purpose. Since merged() doesn't
modify the .store or the c.snapshot.sort, there is no need for
a write lock on the cache to protect the cache.

So, we don't need to escalate here - we simply rely on the entry lock
to protect the entries we are iterating over.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-26 01:31:54 +11:00
Edd Robinson aa845cec7e Check for shards needing conversion. Fixes #5723 2016-02-25 13:21:13 +00:00
Jason Wilder 452d77cbaf tsm: cache: introduce entry locks.
Based on @jwilder's alternative to the 'dirty' slice that featured
in previous iterations of this fix.

Suggested-by: Jason Wilder <jason@influxdb.com>
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-26 00:05:38 +11:00
Jon Seymour eb7eec078d tsm: cache: introduce commit lock to Cache
Currently two compactors can execute Engine.WriteSnapshot at once.

This isn't thread safe since both threads want to make modifications to
Cache.snapshot at the same time.

This commit introduces a lock which is acquired during Snapshot() and
released during ClearSnapshot(), ensuring that at most one thread
executes within Engine.WriteSnapshot() at once.

To ensure that we always release this lock, but only release the
snapshot resources on a successful commit, we modify ClearSnapshot() to
accept a boolean which indicates whether the write was successful or not
and guarantee to call this function if Snapshot() has been called.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-25 12:10:37 +11:00
Jon Seymour 45d025db99 tsm: cache: add a tests to demonstrate thread safety vulnerabilities
There are two tests that show two different one vulnerability.

One test shows that Cache.Deduplicate modifies entries in a snapshot's
store without a lock while cache readers are deduplicating those same
entries while correctly locked.

A second test shows that two threads trying to execute the methods
that Engine.WriteSnapshot calls will cause concurrent, unsynchronized
mutating access to the snapshot's store and entries.

The tests fail at this commit and are fixed by subsequent commits.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-25 12:10:31 +11:00
Jon Seymour d7d81f79da tsm: cache: add a test that demonstrates concurrent reads are safe
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-25 12:06:10 +11:00
Sean Beckett bb378732c0 Update README.md 2016-02-24 16:40:21 -08:00
Jonathan A. Sternberg 2fb3c0c42c Merge pull request #5822 from Gouthamve/master
Linted cmd/ packages. Refs #4098
2016-02-24 18:26:03 -05:00
Goutham Veeramachaneni b1d7e59546 Lint cmd/ packages
Related to #4098

Lint cmd/influxd/

* Errors cannot end with punctuation
* Better comment for exported method
* Better control flow when return is present

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>

Linted cmd/influx_tsm

* Added comments to exported fields
* Removed punctuation at the end of errors

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>

Linted cmd/influx_tsm/b1 and cmd/influx_tsm/bz1

* Added comments to exportes fields

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>

Linted cmd/influx_tsm/tsdb

* Added comments to exported fields
* range k, _ :=  can be written as range k :=
* removed else when return is present
* Added consistency to receiver names in methods

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>

Fix typos

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2016-02-25 01:44:23 +05:30
Jonathan A. Sternberg eb27143422 Merge pull request #5819 from influxdata/js-5814-cqs-with-the-same-name
Fix running multiple CQs with the same name
2016-02-24 14:44:19 -05:00
Jonathan A. Sternberg 0730573e27 Fix running multiple CQs with the same name
Previously, CQs with the same name would be stored in the last run map
the same way. This caused only one of the CQs to run because after the
first one ran it would update the last run time for all CQs with the
same name.

Add the database name to the CQ ID in the last run map to differentiate
between CQs in different databases.

Fixes #5814.
2016-02-24 12:23:52 -05:00
Jonathan A. Sternberg 60e0d16475 Merge pull request #5798 from influxdata/js-5770-remove-dimensions-from-field-wildcards
Remove dimensions from field wildcards
2016-02-24 12:23:00 -05:00
Ben Johnson 23ef15b704 Merge pull request #5753 from influxdata/er-meta-queries
Support mutable meta queries in a cluster
2016-02-24 10:18:50 -07:00
David Norton b30647667b fix MetaExecutor max write connections 2016-02-24 11:24:45 -05:00
David Norton 0022cfe6fd add test for meta.MetaExecutor.ExecuteStatement 2016-02-24 11:24:45 -05:00
David Norton 13b3567449 check number of nodes in meta_executor 2016-02-24 11:24:45 -05:00
Edd Robinson f9c95ad266 More useful error message for node failure 2016-02-24 11:24:45 -05:00
David Norton bb97bb86d4 remove cruft from defer conn close 2016-02-24 11:24:45 -05:00
Edd Robinson 8add49fd96 Ensures meta queries work in clusters.
Fixes #5612, #5573 and #5518.

Using the MetaExecuter, queries that need to run on both data nodes
and optionally the meta store will be executed across all data nodes
in the cluster.
2016-02-24 11:24:45 -05:00
David Norton 4d4e382ddf Add a Meta Executor.
The Meta Executor will make allow data nodes to execute queries
remotely on each other, via RPC calls.
2016-02-24 11:24:22 -05:00
Mark Rushakoff fb83374389 Track stats for number of series, measurements
Per database: track number of series and measurements
Per measurement: track number of series
2016-02-24 08:10:16 -08:00
Jonathan A. Sternberg 3cdb4c1c12 Merge pull request #5797 from influxdata/js-update-toml-dependency
Update toml dependency for slice panic when reading the config
2016-02-24 10:02:21 -05:00
Edd Robinson 16995b6c23 Add ShardError to provide context about shard that errored 2016-02-24 13:33:07 +00:00
Jason Wilder e32e5ff481 Merge pull request #5807 from jonseymour/jss-5804+5805
tsm: cache: undo statistics regressions #5804, #5805.
2016-02-23 13:46:27 -07:00
Jon Seymour 530b86ba7d tsm: cache: restore the semantics of cachedBytes and memSize stats
Fixes #5805.

This commit undoes a regression introduced by #5789.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-24 06:16:46 +11:00
Jon Seymour 3475356dc9 tsm: cache: fix semantics of snapshotCount statistic to make it useful.
Fix for #5804.

The commit for #5789 rendered the semantics of snapshotCount statistic
useless. This commit restores semantics that have diagnostic value to
this statistic.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-24 06:13:54 +11:00
Jason Wilder e0b23fd5b0 Merge pull request #5765 from oiooj/master
No need check Meta.Dir twice
2016-02-23 11:20:24 -07:00
Jason Wilder 92ae2a0e2d Merge pull request #5787 from chris-ramon/handler-query-authorizer
Improvement on `run.NewServer` related to `meta.QueryAuthorizer`.
2016-02-23 11:11:56 -07:00
Gunnar 54957a4838 Merge pull request #5785 from influxdata/ga-build
Implement --generate option in build script
2016-02-23 09:54:39 -08:00
Jason Wilder e7c29d5a37 Merge pull request #5789 from influxdata/jw-5686
Simplify cache snapshotting
2016-02-23 10:46:54 -07:00
gunnaraasen e2d83e53cc Implement -generate option in build script 2016-02-23 09:08:25 -08:00
Jason Wilder 017c24c98e Simplify cache snapshotting
The Cache had support for taking multiple snapshots to support writing
multiple snapshots to TSM files concurrently if that happened to be
a bottleneck.  In practice, this is never a bottleneck and we only
run one snappshoting goroutine continously per shard which has worked
well for all workloads.

The multiple snapshot support introduces some unhandled failure scenarios
where wal segments could be removed without writing them to TSM files.  If
a snapshot compaction fails to write due to transient disk errors, subsequent
snapshots will continue, but the failed one will not be retried.  When the
subsequent ones succeeded, all closed wal segments are removed causing data
loss.

This change simplifies the snapshotting capability to ensure that there is only
ever one snapshot.  If one fails, the next snapshot will update the existing
snapshot and retry all of old and new data.

Fixes #5686
2016-02-23 09:38:51 -07:00
Jason Wilder 0df6d558c2 Merge pull request #5800 from influxdata/jw-5757-regression
Fix data nodes not getting created
2016-02-23 09:22:03 -07:00
Jason Wilder 9ead458399 Fix data nodes not getting created
This fixes a regression introduced in #5757 due to the node.ID getting
assigned by both the meta and data services.  When both roles are active,
the data CreateDataNode path was not getting called because a node ID was
already assigned.

This fixes the issue by seeing if a DataNode already exists for our node
ID, and if it does not, we create one.
2016-02-23 09:01:02 -07:00
Jonathan A. Sternberg 53056d862b Eliminating dead code in `(*influxql.SelectStatement).RewriteWildcards()`
The dimensions array in `RewriteWildcards` gets emptied by an earlier
section of the code and then tries to iterate over that empty slice to
append it to the list of dimensions.

That makes the loop dead code that can't ever be hit.

Also improve the efficiency of this method by not creating a new slice
when there are no wildcards. We already check at the beginning of the
function if there is a wildcard out of necessity. There's no point in
making a new slice and copying the contents if we know that there will
be no wildcards to expand.

It also improves memory efficiency by assuming that if a wildcard
exists, there is only one and the pre-allocated slice can take advantage
of that. If there are multiple wildcards, then a new slice will have to
be created in the middle of the loop to raise the capacity.
2016-02-23 10:36:01 -05:00
Edd Robinson 6f1c02fdbe Reconfigure shards and shard groups on node deletion
Fixes #5680.

When dropping a data node, the following will now happen on the
Meta Store.

  1) If any shards no longer have any owners (because the data node
     being dropped is the only owner), they will be reassigned a
     new owner from within their respective shard group.
  2) If a shard group no longer has any shards/data nodes, they will
     be marked as deleted.

When a shard is being assigned a new owner a data node with the fewest
number of shards in the shard group will be selected as the new owner.

Finally, checking the validity of a data node's ID now happens in the
Meta store, rather than in the state machine.
2016-02-23 15:35:43 +00:00
Jonathan A. Sternberg f7ef382596 Remove dimensions from field wildcards
When a wildcard is specified for the field but not the dimensions, the
dimensions get added to the list of fields as part of
`RewriteWildcards()`.

But when a dimension was given with no wildcard, the dimension didn't
get removed from the wildcard in the fields section. This teaches the
rewriter to disclude dimensions explicitly included from being expanded
as a field. Now this statement when a measurement has one tag named host
and a field named value:

    SELECT * FROM cpu GROUP BY host

Would expand to this:

    SELECT value FROM cpu GROUP BY host

Instead of this:

    SELECT host, value FROM cpu GROUP BY host

If you want the latter behavior, you can include it like this:

    SELECT host, * FROM cpu GROUP BY host

Fixes #5770.
2016-02-23 10:22:56 -05:00
Jonathan A. Sternberg 2837f641d3 Update toml dependency for slice panic when reading the config
The bug was fixed by BurntSushi/toml#84.

Also adding gdm install to `make tools`.
2016-02-23 08:45:01 -05:00
Edd Robinson 08ca148724 Set the retention policy on the store 2016-02-23 11:32:07 +00:00
Chris Ramón f235852c0b updates changelog 2016-02-23 00:03:32 -05:00
Chris Ramón e52accaf90 adds missing srv.Handler.QueryAuthorizer 2016-02-23 00:02:48 -05:00
Jason Wilder 2894234b1e Merge pull request #5757 from influxdata/jw-cluster
Meta node only fixes
2016-02-22 15:44:07 -07:00
Jonathan A. Sternberg 50753de032 Merge pull request #5782 from influxdata/js-5777-audit-panics-in-influxql
Remove the non-unreachable panics in the new query engine
2016-02-22 17:18:57 -05:00
Jason Wilder 6f39b355bc Code cleanups 2016-02-22 15:06:05 -07:00
Jason Wilder a2d3d44505 Fix creating meta only nodes
This fixes a couple of issues with starting meta-only nodes.

1. We were always calling CreateDataNode regardless of whether the the
node is running data services.  We only call that now when node is
data enabled.
2. The node.json was created along-side creating the data node. Since
we are not creatinga a data node, this didn't happen anymore.  There
wasn't a simple way to do this in one place so it's actually handle
for when creating a meta or a data node now.  Since the ID assigned
to the node is the same regardless of role this works in all combinations
of roles.
3. The JoinMetaServer didn't return the ID of the joining node which
created some races when multiple nodes were joining.  The join call now
returns that information to the caller.

Fixes #5754
2016-02-22 15:06:05 -07:00
Jason Wilder 194d8d4693 Ensure monitor store is disabled for meta only nodes
We can't store points locally so ensure it's disabled for now.
2016-02-22 15:05:47 -07:00
Jason Wilder a437002969 Fix join option in config file
The join option was incorrectly exposed on the meta config.  It should
be at the top-level as a string and propogate down to the meta config
as a slice.
2016-02-22 15:05:46 -07:00
Mark Rushakoff 7f457b8852 Merge pull request #5786 from influxdata/mr-fix-tsm1-test-compilation
Fix non-compiling test
2016-02-22 14:04:05 -08:00
Mark Rushakoff 191de2670c Fix non-compiling test 2016-02-22 13:49:11 -08:00