Commit Graph

9159 Commits (bc992fea5e6ccd1cd9f92350a669f1c04747994b)

Author SHA1 Message Date
runner bc992fea5e fix munmap bug in the windows
fix munmap bug in the windows

fix munmap bug in the windows

fix munmap bug in the windows

fix munmap bug in the windows
2016-01-31 10:46:46 +08:00
Jason Wilder 60df13fb56 Merge pull request #5476 from influxdata/jw-fixes
Fix two panics in the WAL
2016-01-29 09:08:35 -07:00
Jason Wilder 924275b337 Fix panic preventing wal file truncation
Fixes #5455
2016-01-28 21:50:51 -07:00
Jason Wilder 01193668cf Fix nil pointer panic when dropping collectd points
Fixes #5449
2016-01-28 21:36:59 -07:00
joelegasse 834744a4c3 Merge pull request #5472 from influxdata/jl-backup-dir
Update influx_tsm to use a backup directory
2016-01-28 13:35:57 -05:00
Joe LeGasse d2816dbb34 Updated CHANGELOG 2016-01-28 13:11:26 -05:00
Joe LeGasse 8f3131b97d Update influx_tsm to use a backup directory 2016-01-28 12:36:34 -05:00
Cory LaNou 797126c570 Merge pull request #5447 from influxdata/fix-5426
Allow for node upgrade
2016-01-28 11:21:56 -06:00
Cory LaNou 0f6c75ab7d make tests pass 2016-01-28 11:03:44 -06:00
Cory LaNou 23de1b15aa error out upgrading a cluster 2016-01-28 10:05:56 -06:00
Cory LaNou 51f6c64134 ake new and upgrade behavior the same for meta/data node numbering 2016-01-28 10:05:56 -06:00
Cory LaNou 53323737b2 no longer need nil check 2016-01-28 10:05:54 -06:00
Cory LaNou d70b694d7d fix misc meta statup bugs 2016-01-28 10:05:53 -06:00
Cory LaNou 31c2e7012a allow for node upgrade 2016-01-28 10:05:53 -06:00
joelegasse 9ab5de0a34 Merge pull request #5454 from joelegasse/influx_tsm
TSM conversion tool improvements
2016-01-28 10:38:47 -05:00
Joe LeGasse 482772997e Updated tests for 'influx_tsm'
Also changed some things to fix failing tests on circleCI, and
removed old TODO item
2016-01-28 09:34:00 -05:00
Jason Wilder d771eb72f7 Merge pull request #5467 from influxdata/jw-backup-node
Backup node.json with metastore backup
2016-01-27 23:06:14 -07:00
Jason Wilder 716714364a Backup node.json with metastore backup 2016-01-27 17:39:54 -07:00
Joe LeGasse 908259340b Improvements to influx_tsm
- Improve logging and status updates
- Added debug http endpoint
- Properly resume/skip backed up files
2016-01-27 16:13:23 -08:00
Todd Persen 0328ac1a7e Fix typo in README.md 2016-01-27 15:16:37 -08:00
Todd Persen 265e3aff8c Update CHANGELOG.md 2016-01-27 15:15:16 -08:00
Todd Persen 3723680747 Merge pull request #5460 from sczk/issue-5436
Prevent exponential growth in ~/.influx_history
2016-01-27 15:12:31 -08:00
Jason Wilder 9528c3ea70 Merge pull request #5465 from influxdata/jw-remote-writes
Optimize remote writes
2016-01-27 15:47:02 -07:00
Ben Johnson bc312ceb14 Merge pull request #5463 from benbjohnson/tsm1-perf
tsm1 query performance improvements
2016-01-27 15:22:05 -07:00
Jason Wilder 1d165d38a9 Optimize Cache entry.add
This reduces some of the lock contention when writing to the cache.
When a new entry is created, it avoids an allocation.  It also skips
a check to see if we need to sorted if we already know it needs to sorted.
2016-01-27 14:26:42 -07:00
Jason Wilder d54f930c2d Don't parse points twice when receiving remote writes
The monitoring stats were causing points to be parsed twice create
more cpu time just parsing points.
2016-01-27 14:24:56 -07:00
Jason Wilder 47c5ade858 Use faster point parsing for remote writes
Parsing the line protocol again on the receiving side of the remote
write consumes a lot cpu.  This uses a different marshaling format
that is much faster to parse after we already parsed the point on
the write side.
2016-01-27 14:24:09 -07:00
Ben Johnson 98baf078d0 tsm1 query performance improvements 2016-01-27 13:42:32 -07:00
Joe LeGasse 4f89c15bd3 Replaced more log.Print();os.Exit(1) with log.Fatal() 2016-01-27 07:25:46 -08:00
Adam Svoboda 40e04d89fc Prevent exponential growth in ~/.influx_history
The history file is cleared before WriteHistory is called after each
command/exit() to prevent exponential file growth.

This commit addresses issue #5436, please see PR for full explanation.
2016-01-26 20:53:41 -06:00
Todd Persen e5fa969306 Update CHANGELOG.md 2016-01-26 18:30:05 -08:00
Todd Persen 308323f48f Merge pull request #5459 from influxdata/tp-ping-to-status
Remove MetaClient.Ping from `/ping` and move it to `/status`
2016-01-26 18:26:37 -08:00
Jason Wilder 1255bb9cb0 Merge pull request #5458 from influxdata/jw-write-pool
Use a bounded pool for remote writes
2016-01-26 19:13:33 -07:00
Todd Persen 66e6375973 Move status request metrics to their own label 2016-01-26 18:10:02 -08:00
Todd Persen 06e91dfca1 Remove MetaClient.Ping from `/ping` and move it to `/status` 2016-01-26 17:58:44 -08:00
Jason Wilder 5abdb42a7d Use a bounded pool for remote writes
Under highly conncurrent write load, the coordinating node would
create a connection to any other node that is part of the replica
group.  Since each connection can be expensive, OOM sitations could
occur because there was no bounds on the number of new connections
that would be created.  If writes on a remote node were slow, connections
could pile up an exacerbate the problem.

This switches the pool to be bounded and has a checkout that is blocking
with a timeout.  If a connection is available, it's returned immediately.
If the pool still has room for more connections, it will create one if needed.
Otherwise, the call will block until a connection becomes available or
the timeout expires.  In the case of a timeout, it is propogated back up
to the PointsWriter that determine what do return to the client.
2016-01-26 17:08:36 -07:00
Jason Wilder 697f48b4e6 Merge pull request #5445 from influxdata/jw-oom
Reduce lock contention in engine.WritePoints
2016-01-26 11:30:31 -07:00
Joe LeGasse cdde2959af Limit parallelism for 'influx_tsm -parallel' 2016-01-26 09:11:09 -08:00
Jason Wilder 372302bcbd Reduce lock contention in Cache.WriteMulti
A write-lock was taken the whole time, but we only need the write
lock at the end.
2016-01-25 16:48:34 -07:00
Jason Wilder 5bee8880db Reduce lock content in engine.WritePoints
Writing the snapshot would deduplicate the snapshot points
while still holding the engine write-lock.  This can be expensive
under high load and cause writes to back up and OOM the server.

Instead, grab the snapshot under the lock and dedup it after releasing
the lock.

Possible fix for #5442
2016-01-25 15:37:34 -07:00
Jason Wilder 9d3c9329a6 Merge pull request #5438 from influxdata/jw-deadlock
Remove double read-lock in meta client
2016-01-24 23:38:56 -07:00
Jason Wilder ac0c593d8d Prevent double-read locking meta client
Possible fix for #5437.  meta.Client.RetentionPolicy acquired a read-lock and
then called Database which called data() which acquired a read-lock again.
If a write lock was taken between these two read-locks (likely by Authenticate),
the write-lock would block, and the second read-lock would also block
causing a dead-lock.
2016-01-24 22:01:09 -07:00
Jason Wilder ca06755422 Fix merge breakage 2016-01-24 22:00:51 -07:00
David Norton 58e0eed9cb Merge pull request #5403 from influxdata/meta-service2
refactor meta into separate meta client & service
2016-01-22 20:06:51 -05:00
Jason Wilder e5ac1b7464 Merge pull request #5428 from influxdata/jw-ms-config
Fixup default hostname and config
2016-01-22 17:51:10 -07:00
Jason Wilder 1696db1c40 Fixup default hostname and config 2016-01-22 17:05:25 -07:00
David Norton c0df09d544 make sure there are CQs before acquiring lease 2016-01-22 17:01:55 -05:00
Jonathan A. Sternberg 865b267eb0 Merge pull request #5368 from influxdata/js-5286-cq-every-greater-than-interval
Teach the CQ runner how to deal with a resample interval higher than the query interval
2016-01-22 10:09:24 -05:00
Jonathan A. Sternberg 1429f4b4ea Teach the CQ runner how to deal with a resample interval higher than the query interval
Previously if you issued a CQ with a resample interval higher than the
query interval, such as the following:

    CREATE CONTINUOUS QUERY cq ON db
        RESAMPLE EVERY 4m
        BEGIN
            SELECT mean(value) INTO cpu_mean FROM cpu GROUP BY time(2m)
        END

This would result in strange behavior because the FOR value defaulted to
the GROUP BY interval and the minimum time passing before a CQ ran was
also the resample interval, so it wouldn't run the appropriate intervals
even if you set the resample duration to a higher value.

This tweaks the CQ runner to set the minimum interval before a bucket
becomes capable of running to the lower of the query interval or the
resample interval instead of always using the resample interval.

It also sets the default resample duration to be the higher value of the
query interval or the resample interval so the above query gets a
default of 4m instead of 2m and will execute 2 queries every 4 minutes.

If you manually set the resample duration to a lower value than the
resample interval, the old behavior will still happen and should be
considered an error.

This also makes trying to create a continuous query with a resample
duration of below the resample interval or query interval (whichever is
higher) as an error returned by the parser.

Fixes #5286.
2016-01-22 09:43:46 -05:00
David Norton 914a9a1de6 fix build after rebase 2016-01-21 15:38:13 -05:00