Commit Graph

597 Commits (bd8dd9a29107e1cb5e7b5674779f332efa6eb3d6)

Author SHA1 Message Date
Edd Robinson 003c30989a Check for no values 2016-05-31 16:28:17 +01:00
rw dcec206f2e Dedup `.RUnlock` between two conditionals. 2016-05-29 10:20:58 -07:00
rw 1b160d1af0 Low-contention path for pre-existing cache entries.
This change appears to increase bulk ingestion throughput by 2x-3x in
multiprocessor environments.
2016-05-28 23:50:11 -07:00
Jason Wilder 11959005f4 Switch backup to use shard.Snapshot
This switch the backup shard call to use the shard Snapshot that
internally creates a snapshot by hardlinking all of the TSM and
tombstone files instead.  This reduces the time that the FileStore
is locked and will allow for larger shards to be backup more easily.
2016-05-27 09:30:25 -06:00
David Norton 381059a55c Merge pull request #6736 from influxdata/benchmark-write-points-allocs
Benchmarks to count allocs in WritePoints.
2016-05-27 10:13:17 -04:00
Edd Robinson 6a7f9527e3 Revert d2672a3 and 1e0a4e9 2016-05-27 10:34:14 +01:00
rw 92e7fec5cf Benchmarks to count allocs in WritePoints. 2016-05-26 17:13:14 -07:00
Edd Robinson d2672a3280 Update Go version 2016-05-26 15:26:09 +01:00
Edd Robinson 1e0a4e9119 Move fields under mutex 2016-05-26 12:00:46 +01:00
Jason Wilder d6661060a3 Merge pull request #6719 from shurcooL/fix-tombstone-open-error-check
tsdb/engine/tsm1: Check os.Open error before using file.
2016-05-25 12:11:26 -06:00
Jason Wilder a77dd4fe4c Merge pull request #6725 from influxdata/jw-tsm-query
Fix pathological TSM query case
2016-05-25 11:23:38 -06:00
Jason Wilder 7d50970631 Fix continous compaction edge case
The level planner would keep including the same TSM files to be
recompacted even if they were already quite compacted and split
across several TSM files.

Fixes #6683
2016-05-25 10:36:24 -06:00
Jason Wilder 0b481ff627 Fix pathalogical TSM query case
This fixes a pathalogical query condition cause by and problematic
structuring of TSM files based on how points were written.  The
condition can occur when there are multiple TSM files and a large
number of points are written into the past.  The earlier existing
TSM files must also have points in the past and close to the present
causing their time range to eclipse the later files.

When this condition occurs, some queries can spend an excessive amount
of time merge all the overlapping blocks.

The fix was to constrain the window of overlapping blocks based on
the first one we ran into.  There was also a simple case in the Merge
where we could skip the binary search path and just append the two
inputs.
2016-05-25 09:14:17 -06:00
Dmitri Shuralyov c03ebf896b tsdb/engine/tsm1: Check os.Open error before using file.
os.Open is documented as:

> Open opens the named file for reading. If successful, methods on
> the returned file can be used for reading;

That suggests the file's methods should only be called if opening
was successful. The original code would defer f.Close() right after
os.Open, before ensuring that err is nil, so f.Close() would run
even if os.Open did not return successfully.

Apply https://github.com/golang/go/wiki/CodeReviewComments#indent-error-flow
suggestion to keep the normal path at minimal indentation, and indent
the error handling code instead. This improves code readability.
2016-05-24 21:08:35 -07:00
Jason Wilder f48a106860 Optimized timestamp run-length decoding
Removes the up-front allocation of decoded values and return them
as needed.
2016-05-23 14:05:25 -06:00
Edd Robinson 40732a35d0 Merge pull request #6660 from influxdata/er-vet
Fix vet issues
2016-05-20 11:12:25 +01:00
Jonathan A. Sternberg 5621ccc2ce Remove limit optimization when using an aggregate
The limit optimization was put into the wrong place and caused only part
of the shard to be read when a limit was used. The optimization is
possible, but requires a bit of refactoring to the code here so the call
iterator is created per series before handed to the limit iterator.

Fixes #6661.
2016-05-19 10:29:38 -04:00
Jason Wilder 4c089a56f4 Fix read tombstones: EOF
Due to an bug in TSM tombstone files, it was possible to create
empty tombstone files.  At startup, the TSM file would error out
and not load the TSM file.

Instead, treat it as an empty v1 file so the TSM file can load
correctly.

Fixes #6641
2016-05-18 23:29:25 -06:00
Jason Wilder 7fb7faaaca Fix points already read from being returned more than once
If there were duplicate points in multiple blocks, we would correctly
dedup the points and mark the regions of the blocks we've read.
Unfortunately, we were not excluding the already points as the cursor
moved to points in the later blocks which could cause points to be
return twice incorrectly.

Fixes #6611
2016-05-18 17:21:10 -06:00
Jason Wilder f2bcf9d9ab Code review fixes 2016-05-18 15:25:56 -06:00
Jason Wilder d32ad26d27 Fix data not getting reloaded
The optimization to speed up shard loading had the side effect of
skipping adding series to the index that already exist.  The skipping
was in the wrong location and also skipped the shards measurementFields
index which is required in order to query that series in the shard.
2016-05-18 15:25:56 -06:00
Jason Wilder e859141b75 Speed up tests
Switched the max keys test to write int64 of the same value so RLE
would kick in and the file size will be smaller (84MB vs 3.8MB).

Removed the chunking test which was skipped because the code will
not downsize a block into smaller chunks now.

Skip MaxKeys tests in various environments because it needs to
write too much data to run reliably.
2016-05-18 15:25:56 -06:00
Jason Wilder eff71cbe23 Rollover to new TSM file when max blocks exceeded
Fixes #6406
2016-05-18 15:25:55 -06:00
Jason Wilder 8fda621d8b Fix memory spike when compacting overwritten points
If a large series contains a point that is overwritten, the compactor
would load the whole series into RAM during a full compaction.  If
the series was large, it could cause very large RAM spikes and OOMs.

The change reworks the compactor to merge blocks more incrementally
similar to the fix done in #6556.

Fixes #6557
2016-05-18 15:25:55 -06:00
Edd Robinson f78e67d09c Fix concurrent map access panic 2016-05-18 17:56:50 +01:00
Edd Robinson f680ab0f0d Fix vet issues 2016-05-18 13:34:11 +01:00
Jonathan A. Sternberg 42cdaf0365 Merge pull request #6529 from influxdata/js-6519-select-tag-key-specifier
Support cast syntax for selecting a specific type
2016-05-16 12:30:14 -04:00
Jonathan A. Sternberg 23f6a706bb Support cast syntax for selecting a specific type
Casting syntax is done with the PostgreSQL syntax `field1::float` to
specify which type should be used when selecting a field. You can also
do `field1::field` or `tag1::tag` to specify that a field or tag should
be selected.

This makes it possible to select a tag when a field key and a tag key
conflict with each other in a measurement. It also means it's possible
to choose a field with a specific type if multiple shards disagree. If
no types are given, the same ordering for how a type is chosen is used
to determine which type to return.

The FieldDimensions method has been updated to return the data type for
the fields that get returned. The SeriesKeys function has also been
removed since it is no longer needed. SeriesKeys was originally used for
the fill iterator, but then expanded to be used by auxiliary iterators
for determining the channel iterator types. The fill iterator doesn't
need it anymore and the auxiliary types are better served by
FieldDimensions implementing that functionality, so SeriesKeys is no
longer needed.

Fixes #6519.
2016-05-16 12:08:29 -04:00
Jason Wilder 23fc9ff748 Revert "Fix memory spike when compacting overwritten points"
This reverts commit d99c5e26f6.
2016-05-16 09:30:34 -06:00
Jason Wilder 0dbd4893da Optimize shard index loading
On data sets with many series and potentially large series keys,
the cost of parsing the key and re-indexing can be high.

Loading the TSM keys into the index was being done repeatedly for
series that were already index by an earlier TSM file.  This was
wasted worked and slows down shard loading.

Parsing the key was also innefficient and allocated a new string
slice.  This was simplified to remove that allocation.
2016-05-12 14:02:42 -06:00
Ben Johnson 668bae57df
parallelize query planning
This commit changes the `tsm1.Engine` to create individual series
iterators in batches so that it can be parallelized. Iterators
are combined at the end so they can be redistributed to the
parallelized merge iterator.
2016-05-11 10:38:11 -06:00
Cory LaNou c32906a366 Merge pull request #6593 from influxdata/cjl-copyshard
create shard snapshot
2016-05-10 20:01:59 -05:00
Jason Wilder d8490f1170 Merge pull request #6587 from influxdata/jw-validate-fields
Fix for merge values
2016-05-10 11:56:07 -06:00
Cory LaNou f415cf89ad wip 2016-05-10 11:01:03 -05:00
Jason Wilder 9b86bfea2a Merge pull request #6582 from eleme/fix_engine_cache_size
fix cache size of engine
2016-05-10 09:01:03 -06:00
Jason Wilder 8839cabd41 Add benchmark for Merge 2016-05-10 08:39:55 -06:00
Cory LaNou 4d30ea1eb3 minor PR feedback refactor 2016-05-10 08:14:51 -05:00
Cory LaNou a3bf3e2ef1 added baseline backup/restore plumbing 2016-05-10 08:14:51 -05:00
Jason Wilder 4f39cb2f97 Fix case where Merge return unsorted values 2016-05-09 15:40:34 -06:00
Ben Johnson 078e561820
parallelize iterators 2016-05-09 10:25:30 -06:00
thbourlove 22c2e7e1c5 fix cache memory size of engine 2016-05-09 21:29:34 +08:00
Jason Wilder d99c5e26f6 Fix memory spike when compacting overwritten points
If a large series contains a point that is overwritten, the compactor
would load the whole series into RAM during a full compaction.  If
the series was large, it could cause very large RAM spikes and OOMs.

The change reworks the compactor to merge blocks more incrementally
similar to the fix done in #6556.
2016-05-05 22:31:30 -06:00
Ben Johnson 4c45f8ec32 Merge pull request #6560 from benbjohnson/optimize-tsm1-call-iterator
Move call iterator to series level
2016-05-05 11:13:53 -06:00
Ben Johnson fdf34d4356
move call iterator to series level
This commit moves the `CallIterator` to wrap the individual series
instead of wrapping a shard. This allows individual points to be
aggregated before being merged.

This will cause a small increase in memory usuage per series but
it shows a 20% decrease in query time when there are a moderate
number of points per series.
2016-05-05 09:59:03 -06:00
Jason Wilder a0ac754802 Fix loading huge series into RAM when points are overwritten
In some query scenarios, if there are a lot of points on disk spread
across many blocks in TSM files and a point is overwritten near the
begginning of the shard's timerange, the full series could be loaded
into RAM triggering OOMs and huge allocations.

The issue was that the KeyCursor code that handles overwriting points
had a simple implementation that just deduped the whole series in this
case.  This falls over when the series is quite large.

Instead, the KeyCursor has been changed to only decode blocks with
updated points.  It then keeps track of what section of the blocks
have been read so they are not re-read when the later points are
decoded.

Since the points in a block are always sorted, the code was also changed
to remove the Deduplicate calls since they end up
reallocating the slice.  Instead, we do a sorted merge and re-use
the slice as much as we can.
2016-05-05 09:34:44 -06:00
Jason Wilder 57cb3fdbc0 Merge pull request #6522 from influxdata/tp-tsm-dump
Dump TSM files to line protocol
2016-05-03 10:44:33 -06:00
Jason Wilder 4196554f51 Fix overwriting points returning wrong value
The cursors were returning the wrong value in the case when points
existed in both the cache and tsm files with the same timestamp. The
cache value should have been returned, but the tsm value was returned
incorrectly.

Fixes #6439
2016-05-03 09:21:31 -06:00
Edd Robinson fd77dbe648 Merge pull request #6546 from influxdata/er-build-tag
Fix invalid build tag
2016-05-03 16:00:39 +01:00
Jonathan A. Sternberg a2a5c32770 Merge pull request #6539 from influxdata/js-6495-fix-aggregates-with-empty-shards
Fix aggregate returns when data is missing from some shards
2016-05-03 10:56:21 -04:00
Jonathan A. Sternberg d6d0addcec Fix aggregate returns when data is missing from some shards
If a shard is empty for a specific field and the field type is something
other than a float, a nil iterator would get returned from one of the
empty shards and cause the combined iterators to be cast to the float
type and all other iterator types to be discarded (or for integers, to
be cast).

This is rare since most aggregates don't accept strings or booleans, but
for queries like:

    SELECT distinct(string) FROM mydata

It would result in nothing getting returned if one of the shards didn't
have a value for `string`.

This change modifies the query engine to return nil for the shards
instead of a fake iterator and then to only use the fake iterator if the
final aggregate iterator is nil (meaning that no iterators could be
constructed for the field from any shard).

Fixes #6495.
2016-05-03 10:41:22 -04:00
Edd Robinson d35fa1ec97 Remove redundant windows build tags 2016-05-03 14:22:02 +01:00
Jason Wilder e0304ae3d5 Fix shards not getting assigned to series on restart
Also, simplifies the LoadMetaDataIndex func to not require a *Shard
2016-05-02 11:36:05 -06:00
Jason Wilder 2d09937fd2 Fix removing fully deleted index blocks
If multiple tombstone entries happen to exist for the same key in a
tombstone file, it was possible to panic.  The first application
would remove all index entries and the second time around the code
still assumed entries would exist and would index into the nil slice.

Also fixes a case where the range of time would fully delete all index
entries, but it did not align with math.MinInt64 and math.MaxInt64.  This
would cause the index locations to still exist in the offset slice.  This
is inefficient because the BlockIterator would still scan and decode the block
only to discover that all the values are deleted.  We now just remove it from
the offsets slice in this case since the range of values are deleted.
2016-05-02 11:36:05 -06:00
Jason Wilder 58aa65d5a8 Optimize applyTombstones
When a large tombstone file existed on disk, this code was slow since
it would apply each tombstone to the index one at a time causing the
index to be scanned for each key.

Instead, we group all the tombstones together by timestamp and apply
in bulk so that the index in scan once for each set of tombstones.

If we change to immuntable tombstone files, it might be better to just
write a file where all the keys have the same tombstone so we can re-apply
them efficiently.
2016-05-02 11:36:05 -06:00
Jason Wilder c73c7cea25 Revert filtering index entries in BlockIterator
This was the wrong fix.  The real issue was the tombstones were
being read incorrectly and also applied incorrectly at times.  This
code is slower and not necessary so reverting it.
2016-05-02 11:36:04 -06:00
Jason Wilder f9ace932c0 Fix V2 tombstone reading file position
Each iteration of the loop was incrementing the position by 4 incorrectly.
The position should start at four since the header is 4 bytes.  This
caused tombstones at the end of the file to not be read because the counter
was out of sync with the actual file position which cause the loop to exit early.

Probably better to refactor this to check for io.EOF instead of using the counter.
2016-05-02 11:36:04 -06:00
Jason Wilder bd1009080e Prevent writing empty tombstone files
If you delete from a measurement with a tag those does not match
any series, we would write a empty tombstone file and file to load
it back.
2016-05-02 11:36:04 -06:00
Jason Wilder 8082fc61ba Fix parsing keys when loading database index
The code for parsing a key our of the WAL or TSM files in the engine
was naive and didn't account for measurements with escape chars. This
uses the correct parsing code to parse and load them correctly.

Fixes #6496
2016-04-30 14:47:19 -06:00
Todd Persen 9eb4c1ec57 Fix typo in comment. 2016-04-29 16:26:27 -07:00
Jason Wilder abcb559b09 Remove index meta data when series and measurements are gone
This remove the dropMeta param from the tsdb.Store.DeleteSeries and
lets the shard determine when to remove the meta data from the index
based on what series still have data in the shard.

This uncovered a nasty bug in compactions where a fully deleted series would
prematurely end the compactions and not carry forward the rest of the data
in the TSM file.  This is now fixed as well.
2016-04-29 16:31:57 -06:00
Jason Wilder 4e353867d5 Fix first block not getting purged when deleting series 2016-04-27 17:08:00 -06:00
Ben Johnson f7af787aef
add DELETE query support
This commit adds query language support for deleting series with a
`DELETE` query.
2016-04-27 15:16:23 -06:00
Jason Wilder aefd2ad08b Add DeleteSeries and DeleteSeriesRange 2016-04-27 13:09:53 -06:00
Jason Wilder c306090361 Fix tombstone rename on windows 2016-04-27 13:09:53 -06:00
Jason Wilder 86d37614e4 Remove debugging from test output 2016-04-27 13:09:53 -06:00
Jason Wilder bf3aa5857d Don't add tombstone for timerange not contained by file 2016-04-27 13:09:53 -06:00
Jason Wilder 6042e114a1 Remove tombstoned values during compaction
This will skip blocks that are fully tombstoned as well as remove
points that have been removed within a block.
2016-04-27 13:09:53 -06:00
Jason Wilder 23bbfb2192 Prevent truncated WAL entries from panicing 2016-04-27 13:09:53 -06:00
Jason Wilder 0de21ade40 Add delete range of values support to WAL and cache loader 2016-04-27 13:09:53 -06:00
Jason Wilder d13d01b516 Allow deleting series by time on a shard 2016-04-27 13:09:53 -06:00
Jason Wilder 4d71d2b01f Add support for deleting cache values using time range 2016-04-27 13:09:52 -06:00
Jason Wilder c154cd4b4a Remove TSMReaderOptions
Not used
2016-04-27 13:09:52 -06:00
Jason Wilder c8bd41c2d8 Remove TSM reader Keys func
It's very inneficient and should never be used.
2016-04-27 13:09:52 -06:00
Jason Wilder 7e06d558d5 Update ContainsValue to handle tombstones 2016-04-27 13:09:52 -06:00
Jason Wilder 97504a552c Support time range tombstones in FileStore/KeyCursor 2016-04-27 13:09:52 -06:00
Jason Wilder 27c2bc3f15 Sepearate IndexWriter from TSMIndex
Allows for future versionion of the TSMIndex as well as removing
a lot of unnecessary code.
2016-04-27 13:09:52 -06:00
Jason Wilder bb82331db7 Move TSMIndex defn to reader.go 2016-04-27 13:09:52 -06:00
Jason Wilder 1ac0b01c5a Remove fileAccessor
No longer used
2016-04-27 13:09:52 -06:00
Jason Wilder a789e819a3 Remove NewTSMReaderWithOptions
There are two TSMIndex implementations, the directIndex and the
indirectIndex.  Originally, we only had the directIndex and later
added the indirectIndex and NewTSMReaderWithOptions in order to
allow both indexes to be used in tests and code.  This has created
a problem since we really only use the directIndex for writing and
always use the indirectIndex for reading.

This changes removes the NewTSMReaderWithOptions func so that it is
no longer possible to create a TSMReader with a directIndex.  This
will allow a lot of the block reading code used by the directIndex
to be removed and simplify maintainence.  It also gives better test
coverage of the code that is actually used by the TSM engine now.
2016-04-27 13:09:52 -06:00
Jason Wilder bc6328d196 Add time range support to tombstone files
This adds support for a time range to tombstone files to allow a subset
of points to be deleted instead of the whole series.  It changes the
tombstone file format to a binary format and maintains backwards compatibility
with the old text format tombstone files.
2016-04-27 13:09:52 -06:00
Ben Johnson 286072f65a
update dep: simple8b @ b421ab40 2016-04-22 09:46:05 -06:00
Ben Johnson d204a8b683
optimize tsm1.FloatDecoder
This commit changes the `FloatDecoder.val` from a `float64` type
to a `uint64` to avoid an additional type conversion during read.
Now the type gets converted to a `float64` only on call to `Values()`.
2016-04-21 08:49:12 -06:00
Jason Wilder 87ceb7426a Don't lock the cache while adding entries
Entries have their own locking so the cache doesn't need to be lock
when adding to them.
2016-04-20 16:08:58 -06:00
Jason Wilder fbaa7db54f Don't lock entry when scanning new values to add 2016-04-20 16:00:26 -06:00
Jason Wilder bfa225f149 Merge pull request #6430 from influxdata/jw-cache-load-size
Disable cache max memory size when reloading the cache
2016-04-20 14:35:23 -06:00
Stephen Gutekanst 9dc09c5257 Make logging output location more programmatically configurable (#6213)
This has various benefits:

- Users embedding InfluxDB within other Go programs can specify a different logger / prefix easily.
- More consistent with code used elsewhere in InfluxDB (e.g. services, other `run.Server.*` fields, etc).
- This is also more efficient, because it means `executeQuery` no longer allocates a single `*log.Logger` each time it is called.
2016-04-20 21:07:08 +01:00
Jason Wilder f679787080 Disable cache max memory size when reloading the cache
The cache max memory size is an approximate size and can prevent a
shard from loading at startup.  This change disable the max size
at startup to prevent this problem and sets the limt back after
reloading.

Fixes #6109
2016-04-20 10:41:30 -06:00
Jonathan A. Sternberg c8c38e15cd Merge pull request #6386 from influxdata/js-iterator-next-error
Modify all of the iterators to allow returning an error on Next()
2016-04-20 10:39:53 -04:00
Ben Johnson 54454e1e5b Merge pull request #6424 from benbjohnson/optimize-bit-reader
Optimize tsm1.BitReader
2016-04-20 08:28:24 -06:00
Seif Lotfy c6e3c87e00 Add Block checksum validation and "influx_inspect verify" tool
Fixes #5502
2016-04-19 22:33:03 +02:00
Ben Johnson 1d2238c642
optimize tsm1.BitReader
This commit rewrites the `tsm1.BitReader` to use an 8-byte buffer
instead of a 1-byte buffer and provide an inlineable fast bit read.
2016-04-19 11:34:17 -06:00
Jason Wilder f841a90d35 Use int64 instead of time.Time in timestamp encoder/decoder 2016-04-19 10:25:27 -06:00
Jason Wilder 61beeca426 Update timestamp benchmarks 2016-04-19 10:17:32 -06:00
Jonathan A. Sternberg 7ec2a991d5 Modify all of the iterators to allow returning an error on Next()
This also switches the remaining iterators to be lazy so they can return
errors properly. They needed to be converted to lazy initialization
anyway, which has the side effect of making it much easier for us to
propagate the underlying error during initialization.

Updated the Emitter to return errors when it cannot read properly from
the iterators.
2016-04-18 11:17:55 -04:00
Jonathan A. Sternberg 93745d9693 Merge pull request #6391 from influxdata/js-5553-limit-queries-slow-with-group-by
Propagate the limit option to the low level iterators
2016-04-16 09:39:25 -04:00
Jonathan A. Sternberg bd5fdd797d Propagate the limit option to the low level iterators
When a GROUP BY or multiple sources are used, the top level limit
iterator requires reading the entire iterator stream so it can find all
of the tag groups it needs to return. For large data series, this ends
up with the limit iterator discarding a lot of output.

This change adds a new lower level limit iterator on each series itself
so that there are fewer data points that have to be thrown away by the
top level iterator.

Fixes #5553.
2016-04-15 18:23:54 -04:00
Jonathan A. Sternberg 835d08591e Do not filter out empty tags from series keys 2016-04-13 09:15:57 -04:00
Jonathan A. Sternberg 60282cf52d Merge pull request #6284 from influxdata/js-3371-where-clause-compare-tags-and-fields
Enhance comparing tags and fields in the where clause
2016-04-12 11:45:54 -04:00
Pierre Fersing 29b19a2293 Fix deadlock in tsm1/file_store 2016-04-12 09:39:21 +02:00
Jonathan A. Sternberg ea6262b712 Enhance comparing tags and fields in the where clause
Now it is possible to compare tags and fields and it is also now
possible to compare tags and tags. Previously, it was only possible to
compare fields with fields and tags with a string or a regex.

Fixes #3371.
2016-04-11 18:10:08 -04:00
Ben Johnson 525e22c92b
tsm1 query engine alloc reduction
This commit makes a number of performance improvements to
reduce allocations during query execution. Several objects
and buffers are now reused across the components to avoid
allocations.

Previously a simple `count(value)` query across 1M points
would require 26,000+ allocations. After the changes in
this commit that number has been reduced to 88.
2016-04-11 14:50:59 -06:00
Jonathan A. Sternberg 028fdaff81 Merge pull request #6222 from influxdata/js-6206-descending-tsm1-iterators
Handle nil values from the tsm1 cursor correctly
2016-04-06 10:05:20 -04:00
Jonathan A. Sternberg 94ec92d669 Handle nil values from the tsm1 cursor correctly
Send nil values from the tsm1 cursor at the end of the cursor. After the
cursor reached tsm1, the `nextAt()` call would always return the default
value rather than a nil value.

Descending also didn't work correctly because the seeking functionality
for tsm1 iterators would always act like they were ascending instead of
descending when choosing which value to select. This resulted in very
strange output from the emitter since it couldn't figure out if it was
ascending or descending.

Fixes #6206.
2016-04-06 09:27:02 -04:00
Jason Wilder 3f4c5a5585 Fix race on measurementFields
Both Shard and Engine had the same reference to the measurementField map,
but they each protected it with their own locks.  This causes a race when
write and queries are occurring because writes can add new fields to the
map while queries are reading from it.

The fix moves the ownership to the Engine and provides protected accessors
to that Shard now users.  For the most parts, the access on shard were old
dead code.

Fixing the measurementFields map race created a new race on the internal
fields map.  This is now unexported and protected via MeasurementFields
exported funcs.

Fixes #6188
2016-04-01 18:57:01 -06:00
Jason Wilder 873ac2715d Fix panic: runtime error: slice bounds out of range
Writing a key that exceeds the max key length could cause a panic
when reading a tsm file because the 2 bytes used for the key length
would not be enough to represent the actual key length.

The writer will now return an error if when trying to write a key
that is too large.
2016-03-30 23:44:17 -06:00
Jonathan A. Sternberg 711a6614e6 Implement the point limit monitor
Fixes #6077.
2016-03-30 16:08:56 -04:00
Joe LeGasse f10c300765 Update to conversion tool to work in current versions
After adding type-switches to the tsm1 packages, the custom
implementation found in the conversion tool broke. This change uses
tsm1.NewValue() instead of a custom implementation.

This change also ensures that the tsm1.Value interface can only be
implemented internally to allow for the optimized type-switch based
encoding
2016-03-30 13:26:46 -04:00
Jason Wilder 60c3898577 Add godoc for KeyAt func 2016-03-29 12:59:26 -06:00
Jason Wilder 1b08e2dd55 Use walk func to load all tsm keys to index
Avoids allocating a big map or all keys.
2016-03-29 12:59:26 -06:00
Jason Wilder d4757ad040 Remove sync.Pool from wal UnmarshalBinary
When loading many shards concurrently they block trying to
acquire a write lock in the sync pool adding a new source of
contention.  Since this code flow always needs to allocate a
buffer it's not really buying us much.
2016-03-29 12:59:26 -06:00
Jason Wilder 03ced4cc90 Load shards concurrently 2016-03-29 12:58:52 -06:00
Ben Johnson 45f1c28adb add tsm iterator stats buffer
This commit adds a buffer for stats to be updated without
requiring a mutex lock/unlock on every point. The tradeoff
is that stats are not exactly precise. This works for our
use case because stats are only periodically checked.
2016-03-23 12:23:22 -06:00
Jonathan A. Sternberg a35d9602cd Fix where filters when a OR is used and when a tag does not exist
If an OR was used, merging filters between different expressions would
not work correctly. If one of the sides had a set of series ids with a
condition and the other side had no series ids associated with the
expression, all of the series from the side with a condition would have
the condition ignored. Instead of defaulting a non-existant series
filter to true, it should just be false and the evaluation of the one
side that does exist should take care of determining if the series id
should be included or not. The AND condition used false correctly so did
not have to be changed.

If a tag did not exist and `!=` or `!~` were used, it would return false
even though the neither a field or a tag equaled those values. This has
now been modified to correctly return the correct series ids and the
correct condition.

Also fixed a panic that would occur when a tag caused a field access to
become unnecessary. The filter using the field access still got created
and used even though it was unnecessary, resulting in an attempted
access to a non-initialized map.

Fixes #5152 and a bunch of other miscellaneous issues.
2016-03-22 12:19:06 -04:00
Ben Johnson 6e1c1da25b reduce allocations in query execution
This commit removes some heap objects by converting them from
pointer references to non-pointers or by reusing buffers.
2016-03-22 09:51:39 -06:00
Jonathan A. Sternberg ad96207868 Fix ORDER BY desc so it doesn't skip values
After reading the initial buffer, ORDER BY desc would read the next
block into the buffer and only read the first element. It's because the
code that was copied from the ascending cursor wasn't modified correctly
to set the position to the last element in the buffer.

The buffer size has also been lowered from 1000 to 10 to match with the
ascending cursor for performance with limit queries.

Fixes #6055.
2016-03-22 09:40:11 -04:00
Ben Johnson 7156c1f9bd add IteratorStats
This commit adds an `IteratorStats` that holds aggregate
iterator processing information. A method is also added to
`Iterator` to return the stats:

	Stats() influxql.IteratorStats

The remote iterators will also emit their stats in the point
stream upon first connection, on a given interval, and then
finally once the last point has been sent.
2016-03-21 16:25:19 -06:00
Jason Wilder ee2f21e76f Merge pull request #6082 from influxdata/jw-tsm
Fix partially written TSM files
2016-03-21 15:42:27 -06:00
Jason Wilder 7567453c9a Ensure TSM files are fsync'd
Make sure TSM files are fsync'd when closed and also that the parent
dir is fsync'd when they are renamed.
2016-03-21 15:03:52 -06:00
Jason Wilder a4e5446ddd Return error when TSM writer close returns one
The TSM writer uses a bufio.Writer that needs to be flushed before
it's closed.  If the flush fails for some reason, the error is not
handled by the defer and the compactor continues on as if all is good.
This can create files with truncated indexes or zero-length TSM files.

Fixes #5889
2016-03-21 15:00:36 -06:00
Jonathan A. Sternberg 6655ca7769 Create a new interrupt iterator that will stop emitting points after an interrupt
Use of the iterator is spread out into both `IteratorCreators` and
inside of the iterators themselves. Part of the interrupt must be
handled inside of the engine so it stops trying to emit points when an
interrupt is found and another part of the interrupt has to happen when
combining the iterators so it doesn't just start reading the next shard.
2016-03-21 12:07:07 -04:00
Jason Wilder 3fd40d48a1 Merge pull request #6006 from influxdata/jw-deadlock
Fix deadlock when running backup
2016-03-14 13:36:45 -06:00
Jason Wilder 9984cd5d6d Fix skipping blocks at query time when overlaps exist
Depending on how data is written across TSM files, it was possible
to skip over some blocks at query time making it looks like data was missing.
2016-03-14 13:11:11 -06:00
Jason Wilder 000459e350 Fix deadlock when running backup
A deadlock occurs under write load if a backup is run in between the
time when a snapshot compactions has snapshotted the cache and successfully
written it to disk.  The issus is that the second snapshot call will block
on the commit lock while it is holding the engine write lock.  This causes
all writes to block as well as prevents the currently runnign snapshot
compaction from completing because it needs to acquire a read-lock.

This PR removes the commit lock and just returns an error if a snapshot is
in progress to all any locks being held to be released.  The caller can determine
whether to retry or giveup.
2016-03-14 12:36:48 -06:00
Joe LeGasse 344e5abd41 Changed type-switch a few places to reduce allocations.
Slices of tsm1.Value interfaces are only ever used with all the same
types, and the previous code would switch on the type returned from a
call to Value(), which allocated and returned an interface{} object for
the underlying value.

This change instead type-switches on the tsm1.Value object itself,
allowing it direct access to the underlying value field, eliminating the
unecessary allocations.
2016-03-11 15:57:05 -05:00
Jason Wilder 992c78ee22 Remove period shard maintenance goroutine
This is no longer used in tsm and just peridocially locks everything
for no reason now.
2016-03-09 17:31:02 -07:00
Edd Robinson 58c03448aa Merge pull request #5514 from influxdata/er-engine-panic
Ensure shards and engine are safely closed
2016-03-09 18:56:36 +00:00
Jason Wilder e3fef5593c Merge pull request #5855 from jonseymour/jss-5854-go-master-breaks-build
fix tests to cope with future changes to testing.quick.Check - see #5854
2016-03-01 19:03:21 -07:00
Mark Rushakoff cdcb079769 Tag TSM stats with database, retention policy
... by extracting the db/rp from the given path.

Now that the code has "standardized" on extracting db/rp this way, the
ShardLocation struct is no longer necessary and thus has been removed.
We're back on the previous style of passing the path and walPath to
NewShard.
2016-02-29 09:17:34 -08:00
Jon Seymour 73b3a2a056 Merge #5855 (issue: #5854).
RHS merges cleanly with 0.10.0

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-29 20:37:32 +11:00
Jon Seymour 716cdd7f41 tsm: modify encoding tests to deal with possible nil slices from testing.quick.Check in go master
The current go compiler at the tip of the go master (1d5001af) has a modified implementation of
testing.quick.Check that now generates nil slices as test data. (See: https://gophers.slack.com/archives/general/p14567053570110). The existing tests expect round tripping to work in this case
but it does not. So, in these cases we change the expectation to reflect actual behaviour.

This needs to be checked for reasonableness.
2016-02-29 20:36:19 +11:00
Jason Wilder 8d70d65a82 Convert time.Time to int64 2016-02-25 15:15:01 -07:00
Jon Seymour 11123d2694 Merge #5833 (issue: #5832).
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-26 07:59:03 +11:00
Jon Seymour 2c7cd06b99 tsm: cache: need to check that snapshot has been sorted.
Previously, the for loop at the end of the method assumed that all entries
had been deduplicated, including the entry discovered in the snapshot.

However, this wasn't actually true. With this change, we make it true.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-26 07:56:25 +11:00
Jon Seymour 7eabae68de tsm: cache: add a test for the write sequence {6,1,snapshot,7,2}
Consider the write sequence: 6,1,snapshot,7,2.

The hot cache gets deduplicated, so is 2,7.

Now consider the test if 1 >= 2, this is false, so needSort is not set to true.

The problem is the implicit assumption that the snapshot is always sorted
by the time that merged() runs, but this may not be true.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-26 07:43:50 +11:00
Jason Wilder 6ebc192298 Merge pull request #5678 from jonseymour/typo
doc: typographical, spelling, grammar, word-choice and phrasing improvements.
2016-02-25 09:33:41 -07:00
Jason Wilder daf68dbbd2 Merge pull request #5701 from jonseymour/js-deduplicate-safety
tsm: cache: improve thread safety of Cache.Deduplicate (see #5699)
2016-02-25 09:18:10 -07:00
Jon Seymour 4d98a1cf28 tsm: cache: remove unnecessary lock escalation.
Previously, we needed a write lock on the cache because it was the
only lock we had available to guard updates to entry.values and
entry.needSort.

However, now we have a entry-scoped lock for this purpose, we don't
need the cache write lock for this purpose. Since merged() doesn't
modify the .store or the c.snapshot.sort, there is no need for
a write lock on the cache to protect the cache.

So, we don't need to escalate here - we simply rely on the entry lock
to protect the entries we are iterating over.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-26 01:31:54 +11:00
Jason Wilder 452d77cbaf tsm: cache: introduce entry locks.
Based on @jwilder's alternative to the 'dirty' slice that featured
in previous iterations of this fix.

Suggested-by: Jason Wilder <jason@influxdb.com>
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-26 00:05:38 +11:00
Jon Seymour eb7eec078d tsm: cache: introduce commit lock to Cache
Currently two compactors can execute Engine.WriteSnapshot at once.

This isn't thread safe since both threads want to make modifications to
Cache.snapshot at the same time.

This commit introduces a lock which is acquired during Snapshot() and
released during ClearSnapshot(), ensuring that at most one thread
executes within Engine.WriteSnapshot() at once.

To ensure that we always release this lock, but only release the
snapshot resources on a successful commit, we modify ClearSnapshot() to
accept a boolean which indicates whether the write was successful or not
and guarantee to call this function if Snapshot() has been called.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-25 12:10:37 +11:00
Jon Seymour 45d025db99 tsm: cache: add a tests to demonstrate thread safety vulnerabilities
There are two tests that show two different one vulnerability.

One test shows that Cache.Deduplicate modifies entries in a snapshot's
store without a lock while cache readers are deduplicating those same
entries while correctly locked.

A second test shows that two threads trying to execute the methods
that Engine.WriteSnapshot calls will cause concurrent, unsynchronized
mutating access to the snapshot's store and entries.

The tests fail at this commit and are fixed by subsequent commits.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-25 12:10:31 +11:00
Jon Seymour d7d81f79da tsm: cache: add a test that demonstrates concurrent reads are safe
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-25 12:06:10 +11:00
Mark Rushakoff fb83374389 Track stats for number of series, measurements
Per database: track number of series and measurements
Per measurement: track number of series
2016-02-24 08:10:16 -08:00
Jon Seymour 530b86ba7d tsm: cache: restore the semantics of cachedBytes and memSize stats
Fixes #5805.

This commit undoes a regression introduced by #5789.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-24 06:16:46 +11:00
Jon Seymour 3475356dc9 tsm: cache: fix semantics of snapshotCount statistic to make it useful.
Fix for #5804.

The commit for #5789 rendered the semantics of snapshotCount statistic
useless. This commit restores semantics that have diagnostic value to
this statistic.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-24 06:13:54 +11:00
Jason Wilder 017c24c98e Simplify cache snapshotting
The Cache had support for taking multiple snapshots to support writing
multiple snapshots to TSM files concurrently if that happened to be
a bottleneck.  In practice, this is never a bottleneck and we only
run one snappshoting goroutine continously per shard which has worked
well for all workloads.

The multiple snapshot support introduces some unhandled failure scenarios
where wal segments could be removed without writing them to TSM files.  If
a snapshot compaction fails to write due to transient disk errors, subsequent
snapshots will continue, but the failed one will not be retried.  When the
subsequent ones succeeded, all closed wal segments are removed causing data
loss.

This change simplifies the snapshotting capability to ensure that there is only
ever one snapshot.  If one fails, the next snapshot will update the existing
snapshot and retry all of old and new data.

Fixes #5686
2016-02-23 09:38:51 -07:00
Jonathan A. Sternberg 50753de032 Merge pull request #5782 from influxdata/js-5777-audit-panics-in-influxql
Remove the non-unreachable panics in the new query engine
2016-02-22 17:18:57 -05:00
Mark Rushakoff 191de2670c Fix non-compiling test 2016-02-22 13:49:11 -08:00
Mark Rushakoff fc5c8597ab Merge pull request #5758 from influxdata/mr-disk-stats
Track cache, WAL, filestore stats within tsm1 engine
2016-02-22 13:01:55 -08:00
Jason Wilder aa2e878019 Fix cache not deduplicating points in some cases
The cache had some incorrect logic for determine when a series needed
to be deduplicated.  The logic was checking for unsorted points and
not considering duplicate points.  This would manifest itself as many
points (duplicate) points being returned from the cache and after a
snapshot compaction run, the points would disappear because snapshot
compaction always deduplicates and sorts the points.

Added a test that reproduces the issue.

Fixes #5719
2016-02-22 13:24:42 -07:00
Jonathan A. Sternberg 7a03df2af1 Remove the non-unreachable panics in the new query engine
The only panics left are ones that should be unreachable unless there is
a bug.

Fixes #5777.
2016-02-22 12:52:43 -05:00
Jon Seymour c93da21a61 tsm: cache: only use NewCache for engine cache's snapshots use a simpler constructor
The intent of this change is to avoid writing caches created for
snapshot cache instances into the tsm1_cache measurement. We can do
this by avoiding use of the NewCache constructor. All other methods
are only intended to be called from on the engine cache - never
on a snapshot.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-22 15:17:43 +11:00
Jon Seymour 510ee2c790 tsm: cache: during writes, update the memSize statistic outside the lock
Since we are not locking but relying on atomic arithmetic,
use Add rather than Set. Will also result in slightly less garbage
being created.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-22 08:26:35 +11:00
Jon Seymour 9c6efe99f1 tsm: cache: ensure all statistics are initialised on cache creation.
The intent of this change is to ensure that all statistic fields of the
resulting tsm1_cache measurement are initialized on initialization of
the cache. That way, any consumer of those measurements doesn't
have to deal with the null case.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-21 15:33:50 +11:00
Jon Seymour 6697c721fb tsm: cache: add cache throughput related statistics.
Complementing and extending the changes in #5758.

Add 2 level statistics:

  * snapshotCount
  * cacheAgeMs

Add 2 counter statistics

  * cachedBytes
  * WALCompactionTimeMs

snapshotCount can be used to measure transient write errors that are causing snapshots to accumulate

cacheAgeMs can be used to guage the level of write activity into the cache

The differences between cachedBytes stats sampled at different times can be used to calculate cache throughput rates

The ratio (cachedBytes-diskBytes)/WALCompactionTimeMs can be used calculate WAL compaction throughput.

The ratio of difference between first and last WAL compaction time over the interval
length is an estimate of percentage of cache throughput consumed.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-20 22:18:57 +11:00
Mark Rushakoff 602043e11b Add disk stats for FileStore 2016-02-19 16:37:34 -08:00
Mark Rushakoff d99c09cedd Add stats for current and old WAL segment sizes 2016-02-19 16:37:34 -08:00
Mark Rushakoff e76967efb6 Add stats to tsm1.Cache 2016-02-19 16:37:34 -08:00
Joe LeGasse dc8ed7953d Remove custom binary-conversion functions
Also cleaned up some excess allocations, and other cruft from the code
2016-02-18 13:56:35 -05:00
Ben Johnson f7e04abef7 remove NaN from query engine
This commit removes `math.NaN` returns from float iterators.
2016-02-17 14:11:31 -07:00
Jon Seymour ab702eb44a doc: remove the implication that the wal directory is inside the shard directory.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-15 05:33:22 +11:00
Jon Seymour ed0a112f8e doc: Add an Errata section intended to capture clarifications prior to full revisions of the text.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-15 00:29:02 +11:00
Jon Seymour 5e563d53c1 doc: revise discussion about cache design
The description of the cache design was out of date - reflecting an older
design based on checkpoints and evictions. This revision updates the
design to describe snapshots and also clarify that if compaction performance
falls behind the inbound write rate then writes will fail.

Updates based in part of clarifications provided by Jason Wilder. See https://goo.gl/L7AzVu

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-15 00:29:02 +11:00
Jon Seymour cdc7e28338 doc: rephrasing of how sets of SeriesIterators are generated.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-15 00:29:02 +11:00
Jon Seymour 58d1b7223a doc: refine TSM file system layout description
Minor improvements to phrasing to use the English word 'directory' and slight improvements to grammar.
2016-02-15 00:29:02 +11:00
Jon Seymour 285e0ad17a doc: refine description of the conclusion of the compaction process.
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-15 00:29:02 +11:00
Jon Seymour 008af05f7b doc: various grammar/word-choice improvements in TSM design document
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-15 00:29:02 +11:00
Jon Seymour 88598f78dc doc: fix up some spelling errors/typos in .MD files
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-15 00:29:02 +11:00
Jason Wilder 0ce6dd1304 Fix panic: runtime error: index out of range
There was a fix in 5b1791, but is not present in the current branch likely due to a rebase issue.
The current code panics with a query like:

select value from cpu group by host order by time desc limit 1

This fixes the panic as well as prevents #5193 from re-occurring.  The issue is that agressively
closing the cursors clears out the seeks slice so re-seeking will fail.
2016-02-10 14:00:58 -07:00
Ben Johnson d9a6a7340f add canonical paths 2016-02-10 11:30:52 -07:00
Ben Johnson 5a0d1ab7c1 rename influxdb/influxdb to influxdata/influxdb
This commit changes all the import and URL references from:

    github.com/influxdb/influxdb

to:

    github.com/influxdata/influxdb
2016-02-10 10:26:18 -07:00
Jonathan A. Sternberg d1f7c445e7 Modify iterators to work across shards
Aux iterators now ask the iterator creator what series will be returned
and determine which aux fields to create based on the results.

The `tsdb.Shards` struct also creates a call iterator around the
iterators returned from each shard.
2016-02-10 09:40:29 -07:00
Jonathan A. Sternberg c2d1206177 Implement the fill iterator
Fill requires an additional function for IteratorCreator to retrieve the
series that will be returned from the iterator. When fill is required
for an aggregate, the IteratorCreator will be asked what series will be
returned by the created iterator.
2016-02-10 09:40:29 -07:00
Ben Johnson 6204350d65 fix math operations 2016-02-10 09:40:27 -07:00
Ben Johnson b4cb770a7f refactor aux iterators 2016-02-10 09:40:27 -07:00
Ben Johnson b8918a780c integer support 2016-02-10 09:40:25 -07:00
Jonathan A. Sternberg 583477064c Check for `tsdb.EOF` when looking for the lowest timestamp of aux fields 2016-02-10 09:40:25 -07:00
Jonathan A. Sternberg 34f14424dd Filter tags from the condition when building cursors on tsm1 2016-02-10 09:40:25 -07:00
Ben Johnson 00806de9b8 refactor query engine 2016-02-10 09:40:25 -07:00
Ben Johnson cde973f409 refactor query engine 2016-02-10 09:40:24 -07:00
Jason Wilder 2b3c640695 Fix reading too far in fileAccess.readBytes
Fixes #5566
2016-02-08 09:08:57 -07:00
Jason Wilder 28ae8b6fe0 Merge pull request #5434 from runner-mei/tsm_tombstone_windows
fix TSMReader.Delete() and all unit tests is pass in the windows
2016-02-04 16:27:26 -07:00
Jason Wilder b635e516e5 Merge pull request #5485 from runner-mei/patch-7
fix munmap bug in the windows
2016-02-04 13:47:51 -07:00
Jason Wilder 5a124e0e0b Merge pull request #5431 from runner-mei/patch-5
fix determine the file size
2016-02-04 10:24:05 -07:00
Edd Robinson 1bcb1d033f Allow Close to be called multiple times safely 2016-02-03 10:20:22 +00:00
INADA Naoki 80a637904d tsm1: Use unixnano instead of time.Time 2016-02-03 10:05:40 +09:00
INADA Naoki 771253256b FloatValue uses unixnano instead of time.Time 2016-02-03 09:57:00 +09:00
INADA Naoki 898babf616 add float bench 2016-02-03 03:12:16 +09:00
runner.mei 4ca47103b1 fix TSMReader.Delete() and all unit tests is pass in the windows 2016-01-31 11:32:08 +08:00
runner bc992fea5e fix munmap bug in the windows
fix munmap bug in the windows

fix munmap bug in the windows

fix munmap bug in the windows

fix munmap bug in the windows
2016-01-31 10:46:46 +08:00
runner 4b7fe70cd3 fix determine the file size
fix determine the file size
2016-01-30 14:16:53 +08:00
runner.mei 53f7e03f72 fix TSMReader.Delete() and all unit tests is pass in the windows 2016-01-30 14:15:46 +08:00
Jason Wilder 924275b337 Fix panic preventing wal file truncation
Fixes #5455
2016-01-28 21:50:51 -07:00
Jason Wilder 9528c3ea70 Merge pull request #5465 from influxdata/jw-remote-writes
Optimize remote writes
2016-01-27 15:47:02 -07:00
Jason Wilder 1d165d38a9 Optimize Cache entry.add
This reduces some of the lock contention when writing to the cache.
When a new entry is created, it avoids an allocation.  It also skips
a check to see if we need to sorted if we already know it needs to sorted.
2016-01-27 14:26:42 -07:00
Ben Johnson 98baf078d0 tsm1 query performance improvements 2016-01-27 13:42:32 -07:00
Jason Wilder 372302bcbd Reduce lock contention in Cache.WriteMulti
A write-lock was taken the whole time, but we only need the write
lock at the end.
2016-01-25 16:48:34 -07:00
Jason Wilder 5bee8880db Reduce lock content in engine.WritePoints
Writing the snapshot would deduplicate the snapshot points
while still holding the engine write-lock.  This can be expensive
under high load and cause writes to back up and OOM the server.

Instead, grab the snapshot under the lock and dedup it after releasing
the lock.

Possible fix for #5442
2016-01-25 15:37:34 -07:00
Jason Wilder 24f1bcfd20 Remove Dev prefix from tsm engine/tx 2016-01-10 16:43:36 -07:00
Jason Wilder 5b179113fc Don't close tsm cursor prematurely
We were closing the cursor when we read the last block which caused
the internal state to be cleared.  In a group by query, we seeked multiple
times so depending on the group by interval and how the data was laid out
in the blocks, we woudl close the cursor and the last block would get skipped.

Fixes #5193
2016-01-10 15:26:01 -07:00
Jason Wilder 3c45015311 Remove MAP_POPULATE
This may be causing slow restart times for systems with many large TSM files.
What I believe is happening at startup in these cases is that multiple goroutines
are started to load each TSM file concurrently.  The kernel appears to serialize
mmap calls from the same process so all of the goroutines end up getting blocked
on the actual mmap system call.  MAP_POPULATE instruct the kernel to pre-fault the
page table for the files and triggers read-ahead of the pages.  For larger, 2GB files,
this makes the mmap call more expensive and slower.  When there are many of these files
and calls it is possible to fill all available memory with pagecache.  In this case,
the OS will end up pre-faulting pages from one file and have to remove pages that it just
loaded from another files causing slowness.  MAP_POPULATE may also be cause much more data
to be pre-faulted than necessary.  To load a file, we just need to scan the index at the end
of the file.  MAP_POPULATE is likely causing the whole file to be loaded when it won't actually
be accessed for a while (or at all).

Might fix issue #5311.
2016-01-08 08:45:27 -07:00