Commit Graph

431 Commits (0883182798c8155b066fdc7754f70e759c5b258e)

Author SHA1 Message Date
Jason Wilder 6b2b29625d Ensure the tsdb.Store is not closed before creating a shard
Fixes panic: assignment to entry in nil map

Fixes #3848
2015-09-08 11:04:00 -06:00
Ben Johnson fd9be63b4e rollback bolt tx on mapper open error
This commit fixes `SelectMapper.Open()` so that it properly rolls back
transactions. Previously, this caused transactions to stay open
indefinitely which caused mmap resizes to hang indefinitely.
2015-09-08 10:28:51 -06:00
Jason Wilder 13dbc8f0ba Merge pull request #3841 from influxdb/jw-file-utils
Add inspect tool
2015-09-04 14:12:05 -06:00
Cory LaNou fa4415b3a4 refactor processing top/bottom results. clarify some comments 2015-09-04 13:30:43 -05:00
Cory LaNou b62d8c0515 expand variable names for clarity 2015-09-04 13:30:43 -05:00
Cory LaNou 65e652850a btf -> tmin 2015-09-04 13:30:42 -05:00
Cory LaNou 08295c578f refactor processTopBottom 2015-09-04 13:30:42 -05:00
Cory LaNou 9ab3d89c06 bucketTime* -> tMin* 2015-09-04 13:30:42 -05:00
Cory LaNou 3ca93594c3 BucketTime -> TMin 2015-09-04 13:30:42 -05:00
Cory LaNou 347ffc70b4 wire up advanced top sorting/slicing 2015-09-04 13:30:41 -05:00
Cory LaNou 8c4595b345 top is coming together. filling out fields properly 2015-09-04 13:30:41 -05:00
Cory LaNou ba79007960 wip 2015-09-04 13:30:41 -05:00
Cory LaNou 35b9215aa9 refactor processTopBottom - wip 2015-09-04 13:30:40 -05:00
Cory LaNou 046282249a wip remapping top output 2015-09-04 13:30:40 -05:00
Cory LaNou 97f2dc830f comment/type fixes 2015-09-04 13:30:39 -05:00
Cory LaNou 72fd115dc2 exposing tags on cursors, top/bottom are valid funcs now 2015-09-04 13:30:39 -05:00
Jason Wilder 6b4926257a Add inspect tool
Start of a lower-level file inspection tool.  This currently dumps
summary statistics for the shards, index and WAL that can be used to
understand the shape of the data is in the local shards.  This util
operates on the shards itself and not through the server and is intended
more for debugging/troubleshooting.
2015-09-04 10:38:59 -06:00
Jason Wilder 6f41c0fa87 Merge pull request #3986 from influxdb/jw-order-by
Support sorting by time desc
2015-09-04 09:42:58 -06:00
Jason Wilder df70a1c8ce Update tests to use Direction enum 2015-09-04 09:00:11 -06:00
Jason Wilder e767feb8d9 Fix order by desc with aggregate function not return any values 2015-09-03 22:31:58 -06:00
Jason Wilder 7fa3d445f7 Support reverse iteration for b1 engine 2015-09-03 22:31:58 -06:00
Jason Wilder 2725757dba Simplify WAL cursor seek movement logic 2015-09-03 22:31:58 -06:00
Jason Wilder 5a6b0afc4b Replace cursor direction with a type 2015-09-03 22:31:48 -06:00
Jason Wilder 7c67e60c4f Add bz1 reverse cursor test 2015-09-03 22:28:36 -06:00
Jason Wilder 5e481181bc Add WAL reverse cursor test 2015-09-03 22:28:36 -06:00
Jason Wilder e206021e13 Add reverse multi-cursor tests 2015-09-03 22:28:36 -06:00
Jason Wilder 266bdc1c2b Support sort by time DESC in wal and bz1 engines 2015-09-03 22:28:36 -06:00
Philip O'Toole e07432c59f Implement diagnostics support
This change adds support for diagnostics by decomposing the existing
interface into two interfaces -- one for stats, and the other for
diags. It also adds some basic monitor of system, network, and the Go
runtime.
2015-09-03 20:50:54 -07:00
Cory LaNou 6592dcc699 EnableLogging -> LoggingEnabled 2015-09-03 16:56:07 -05:00
Ben Johnson deff06f850 add copier service
This commit adds the copier service which allows one server to
copy shards from another server. This will be used for moving
shards in the cluster.
2015-09-03 13:07:35 -06:00
David Norton 816c5f5368 fix #2555: don't normalize target names 2015-09-03 07:12:15 -04:00
David Norton 99a22c174b fix #2555: add backreference in CQs
Add new query syntax to allow the following in CQs:

INTO "1hPolicy".:MEASUREMENT
2015-09-03 07:12:15 -04:00
Ben Johnson b63ebb72a5 limit bz1 quickcheck tests to 10 iterations on CI
This commit checks the `CI` environment variable in the bz1
test suite and limits the quickcheck runs if the value is `true`.
2015-09-02 11:27:11 -06:00
Philip O'Toole e15bc6df11 Remove obsolete TSDB monitor file
This functionality will be superseded by the new monitor service.
2015-09-01 23:15:57 -07:00
Philip O'Toole 14c04eb4d6 Merge pull request #3916 from influxdb/new_stats_diags
Statistics and Diagnostics service
2015-09-01 18:30:53 -07:00
Philip O'Toole f05dc20b58 Hook new monitor service to server
u
2015-09-01 15:03:52 -07:00
Jason Wilder 898ee8c399 Fix write fails for multiple points when tag starts with quote
Fixes #3928
2015-09-01 11:20:34 -06:00
Ben Johnson d52fe89035 add WAL lock to prevent timing lock contention
This commit adds a lock to the WAL log to prevent timing how long
it takes to obtain the Bolt write lock.
2015-09-01 11:08:39 -06:00
Ben Johnson 9067664ac7 Merge pull request #3913 from benbjohnson/owner
Convert meta shard owners to objects
2015-09-01 09:50:40 -06:00
Paul Dix 040fa060df Add more detailed logging for compactions 2015-09-01 09:52:20 -04:00
Ben Johnson 767307eed6 convert meta shard owners to objects
This commit converts meta.ShardInfo.OwnerIDs from a slice of ids
to a slice of objects. This is to support adding statuses for a
shard for a given node. For example, a node may have a shard
assigned to it but it is currently copying the shard and is not
ready to serve data for it.

The old `OwnerIDs` is marked as deprecated, however, the code
still supports loading from older protobuf-encoded data.
2015-08-31 16:33:13 -06:00
Jason Wilder 027b6e36e7 Fix inconsistent results from show measurements
Running show measurements in a partially replicated cluster produces inconsistent
results due to the connection pooling.  When running remote meta-data queries,
the cluster service ends ups keeping map shard request open but still checks the connection
back into the pool. This causes inconsistent results because data from the last request
interferes with the new request.

This removes the connection pool which fixes the issue.  It also has the side effect of fixing
a nodes pool connections that have gone bad when a node restarts.  For example, in a 3 node cluster
that has been responding to queries correctly, restarting 1 node will cause all the other to fail
to query that node indefinitely.  This is now fixed as well.
2015-08-31 14:31:00 -06:00
Jason Wilder c8bf095342 Fix panic with show measurements with partially replicated shards
Some nodes may receive requests for shards that they are assigned but
do not have the shards locally yet.  Just return no results in this case
instead of panicing.
2015-08-31 14:10:43 -06:00
Jason Wilder f72fd247b5 Fix panic when querying against non-fully replicated shards
The TSDBStore was returning a nil mapper if the shard did not exist.  The caller always
assumed the mapper would not be nil causing a panic.  Instead, have the mapper skip the mapping
phase if it's shard reference is nil.  This fixes queries against data-only nodes and against
shards that are not fully replicated in the cluster.

Fixes #3574
2015-08-31 10:03:07 -06:00
Jason Wilder af2531b373 Use read lock to check current memory size of partition
A write lock was being taken to read the memory size to determine if writes
should be paused.  What happens is that writers get blocked indefintely when
trying to acquire a write lock which makes writes pause (or stop) for long periods
of time.
2015-08-28 15:11:30 -06:00
Jason Wilder 6ba17eca36 Reduce lock contention on Log.WritePoints
The log was deferring the release of the read lock on the WAL.  This had
the affect that a read-lock was held until after the partition finished writing
(which maintains it's own locks).  The read lock is only needed around the call
to pointsToPartions so it can get a consistent copy of the points to write.  After
that calls returns, a lock is not needed so free it immediatedly.
2015-08-28 15:11:30 -06:00
Jason Wilder f5f8f04116 Fix panic in addToCache
addToCache is called in a goroutine and can panic if the server is closed while opening.  If
part of the open func errors, it returns an error and immediately calls close.  close sets
p.cache to nil which causes the goroutine trying to initialized the cache to panic as well.  The
goroutine should run under a write lock to avoid this race/panic.
2015-08-28 13:01:17 -06:00
Jason Wilder eb4a8d4f4a Fix panic when logging error in WAL
If LoadMetadataIndex() tries to log an error, it causes a panic because the
logger is not set until Open() is called, which is after LoadMetaDataIndex() returns.
Instead, just set the logger up when the WAL is created.
2015-08-28 12:59:38 -06:00
Philip O'Toole cf58c38995 go fmt fixes 2015-08-27 18:20:41 -07:00
Jason Wilder 6493cfdc45 Merge pull request #3870 from influxdb/jw-3869
Remove unused Database index names and sorting
2015-08-27 14:00:30 -06:00
Jason Wilder a4c1d9a9a7 Remove unused Database index names and sorting
Writes could timeout and when adding new measurement names to the
index if the sort took a long time.  The names slice was never
actually used (except a test) so keeping it in index wastes memory
and sort it wastes CPU and increases lock contention.  The sorting
was happening while the shard held a write-lock.

Fixes #3869
2015-08-27 11:57:20 -06:00
Daniel Morsing ca7a806e93 Only seek the cursor if it would yield a value of interest
If we've seeked a cursor, then we can be sure that there will be no
data between it and the point that was seeked to. Take advantage of
this fact to only seek when it would yield us a value that would be
different from the last.

In addition, only init the pointsheap when doing a raw query. For
aggregate queries, it is reinitialized on every time bucket, so no
need to seek through all the cursors

For a synthetic database where there was only entries for a tiny
slice of time, it cut queries from 112 seconds to 30 seconds doing
`select mean(value) from cpu where time > now - 2h group by time(1h)`
2015-08-27 10:57:18 -06:00
Ben Johnson 3ce001929c Use 4KB default block size for bz1
This commit changes the default block size from 64KB to 4KB for
bz1. This was lowered because small blocks were being uncompressed,
merged, recompressed, and inserted for a large portion of updates.
This became slower and slower over time until it reached the 64KB
threshold. We moved to the 4KB threshold in order to lower the
impact of this recompression.
2015-08-26 11:05:01 -06:00
dgnorton 2cf6233cbc Merge pull request #3808 from influxdb/dmq-show-measurements2
convert SHOW MEASUREMENTS to a distributed query
2015-08-26 11:43:38 -04:00
Daniel Morsing 3d92f3ab0a Merge pull request #3846 from influxdb/reuseheap
reuse pointsheapItem
2015-08-25 17:27:49 -06:00
Daniel Morsing 391d8cd8d7 reuse pointsheapItem
Since we already got a pointsHeapItem, let's just reuse it instead
of allocating a new one. This cuts allocated memory of a 1 million
points aggregate query from 4881.97MB to 4139.86MB
2015-08-25 17:07:34 -06:00
Paul Dix d903cc351e Merge pull request #3845 from influxdb/pd-fix-wal-meta-panic
Fix metafile so it doesn't get trampled by other goroutines.
2015-08-25 18:35:03 -04:00
Paul Dix 0d744dafed Fix metafile so it doesn't get trampled by other goroutines.
Fixes #3832 and fixes #3833
2015-08-25 18:23:24 -04:00
Daniel Morsing 71a83b7f9d Remove unused buffer allocation
The buffer allocation in bz1 was unused and I'm fairly certain that it
was harmful to performance if used. For queries that run through a bz1
block, needing to hold on to a 64kb block is expensive. Better to churn
on the allocator and have the blocks be released when they are unused
than to have 64kb hanging around for each series regardless of size.

Thanks to @jwilder for brainstorming this issue with me.
2015-08-25 14:51:17 -06:00
Paul Dix a4735624f8 Merge pull request #3829 from influxdb/pd-fix-missing-data-after-flush
Fix missing data in aggregates with bz1
2015-08-25 16:27:03 -04:00
David Norton d8be9b4222 test SHOW MEASUREMENTS when no rows returned 2015-08-25 16:18:28 -04:00
Paul Dix 8c6af91e93 Fix bug with bz1 where some data would get hidden.
Seeking to the middle of a compressed block wasn't working properly. Fixes #3781
2015-08-25 16:16:59 -04:00
Daniel Morsing 40dab87ac9 Merge pull request #3817 from influxdb/walmem
Walmem
2015-08-25 13:29:42 -06:00
David Norton 7b19a93459 add test for distributed SHOW MEASUREMENTS 2015-08-25 14:33:49 -04:00
Cory LaNou 6ba24e804a do not support uint64 2015-08-25 10:47:25 -05:00
David Norton 6f0ba18904 fix TestDropMeasurementStatement 2015-08-25 10:01:38 -04:00
Cory LaNou 7916cade08 support all number types when decoding a point 2015-08-25 08:49:49 -05:00
Daniel Morsing 5455851ac7 move allocation outside struct + gofmt 2015-08-24 15:28:30 -06:00
Daniel Morsing 35b6c7867d reuse memory buffers for marshaling wal entries
By using preallocated buffers for marshaling WAL entries, we can
reduce the amount of memory we allocate.

On a run of `influx_stress -series 10000 -points 1000` this cuts
total allocations from 18684.15MB to 15200.73MB
2015-08-24 14:49:25 -06:00
David Norton 636b4d1603 don't send empty row from ShowMeasurementsExecutor 2015-08-24 13:16:48 -04:00
Daniel Morsing b7bbe8b5e0 remove unused backoffcount field 2015-08-24 10:25:38 -06:00
David Norton 88f556af72 convert SHOW MEASUREMENTS to a distributed query 2015-08-23 23:09:51 -04:00
Paul Dix 981d7175fb Improve WAL flush log output. 2015-08-23 11:28:06 -04:00
David Norton 5d26cfa4d7 return interface{} from nextChunk* functions 2015-08-22 10:59:29 -04:00
David Norton c8f88f9a61 refactor remote mapping 2015-08-22 10:16:41 -04:00
Todd Persen a9e3b9d176 Merge pull request #3797 from influxdb/pd-fix-wal-dirty-sort
Ensure WAL cache gets sorted when needed.
2015-08-21 22:05:28 +00:00
Paul Dix 15cf803b57 Ensure WAL cache gets sorted when needed.
Fixes #3792
2015-08-21 17:48:42 -04:00
Paul Dix a52a4be94c Merge pull request #3793 from influxdb/pd-fix-unsafe-series-shard-access
Fix map concurrent race with adding a shard to a series in the index.
2015-08-21 16:37:04 -04:00
Paul Dix 1a3074ed54 Fix map concurrent race with adding a shard to a series in the index. 2015-08-21 16:24:55 -04:00
Paul Dix 0a6c8b1968 Merge pull request #3788 from influxdb/pd-add-drop-database-to-wal
Update store to properly manage WAL create/delete.
2015-08-21 15:29:02 -04:00
Jason Wilder 589f840ef9 Fix parsing NaN values without timestamps
Fixes #3539 partially.  NaN cannot be queried though and needs to be handled
by the query engine differently.
2015-08-21 12:14:17 -06:00
Jason Wilder 91313f7206 Fix regression where measurement names with equals could not be parsed 2015-08-21 12:14:17 -06:00
Daniel Morsing 27162dd904 only convert key to string once. 2015-08-21 11:01:34 -07:00
Paul Dix 73f3dc1e14 Update store to properly manage WAL create/delete.
* Update the store to remove the WAL directories associated with a shard or database when they are deleted.
* Fix the Store so that it creates separate WAL directories for databases and retention policies.
2015-08-21 11:22:04 -04:00
Jason Wilder 1f846d5edb Optimize Point.unescape
This func show up in profiling.  It's called frequently from multiple places and
can be made more efficient.  The previous implementation looped over the input
slice 4 times updating an returning a new slice each time.  The changes it to loop
once and create one result slice.

With influx_stress

Before:

  Wrote 10000000 points at average rate of 241750
  Average response time:  187.78968ms

After:

  Wrote 10000000 points at average rate of 254618
  Average response time:  172.235028ms
2015-08-20 17:05:18 -06:00
Jason Wilder afe1f598ca Cache name and fields if requested
Through profiling of writes, point.Fields() and point.Name() were called
repeatedly in PointsWriter and the Shard.  These calls are somewhat expensive
when writing large batches so we can cache them to avoid wasting CPU cycles.

Using influx_stress with default settings

Before:
  Wrote 10000000 points at average rate of 202570
  Average response time:  235.450355ms

After:
  Wrote 10000000 points at average rate of 246120
  Average response time:  182.881008ms
2015-08-20 15:48:38 -06:00
Paul Dix 2882ef88dc Merge pull request #3766 from influxdb/pd-close-wal-before-bolt
Make bz1 close the WAL before closing bolt so it can flush
2015-08-20 15:25:51 -04:00
Paul Dix 51c565e461 Ensure partition only closes current segment if its there 2015-08-20 14:37:02 -04:00
Ben Johnson 9e336bacf9 fix wal close deadlock 2015-08-20 11:56:50 -06:00
Paul Dix 9567b2c8a6 Fix logic with closing partitions 2015-08-20 13:53:59 -04:00
Ben Johnson 8f12cef883 Merge pull request #3735 from benbjohnson/append-threshold
Append to small bz1 blocks
2015-08-20 11:47:34 -06:00
Paul Dix 4e7631a135 Merge pull request #3765 from influxdb/pd-fix-wal-io-reads
Fix reads of metadata file in WAL
2015-08-20 13:08:29 -04:00
Ben Johnson e57d60210a Append to small bz1 blocks
This commit changes the bz1 append to check for a small
ending block first. If the block is below the threshold
for block size then it is rewritten with the new data
points instead of having a new block written.
2015-08-20 10:52:52 -06:00
Paul Dix e817036952 Make bz1 close the WAL before closing bolt so it can flush, fix locking on write. 2015-08-20 12:51:47 -04:00
Ben Johnson 6c4297ece5 Add bz1 size benchmarks
This commit add benchmarks to show the size difference between
different block sizes.
2015-08-20 10:22:29 -06:00
Paul Dix 72da8d9741 Merge pull request #3750 from influxdb/pd-fix-wal-logging
Fix WAL logging enable.
2015-08-20 12:05:01 -04:00
Paul Dix 5dd97d39ca Merge pull request #3749 from influxdb/pd-fix-query-engine-no-mutex
Fix query engine not goroutine safe issue.
2015-08-20 11:32:56 -04:00
Paul Dix 370f008220 Fix reads of metadata file in WAL 2015-08-20 10:52:29 -04:00
Paul Dix 1f21d50005 Fix logging in segments and style on log messages 2015-08-20 10:43:25 -04:00
Paul Dix 13d606eaf6 Fix bug querying data from WAL while compacting.
If a flush is happening and you bring up a cursor for a series, if that series didn't have any data in the cache (after the flush started) then it would return no data. What it should have done instead is return the data that is in the flush cache, which is held in separate area of memory until it is committed to the index.
2015-08-20 09:34:02 -04:00