Commit Graph

463 Commits (master)

Author SHA1 Message Date
Mark Rushakoff 41415cf2fb Update godoc for tsm1 package 2017-01-02 07:30:18 -08:00
Jason Wilder 637a67ea35 Reduce lock contention on measurementFields 2016-12-19 14:17:01 -07:00
Jonathan A. Sternberg ec57108520 Use proper uber-go/zap import path
It looks like the real import path to the project is go.uber.org/zap
instead of github.com/uber-go/zap since the example in the project
references that path.
2016-12-15 08:54:14 -06:00
Edd Robinson 05ec6ad9ad Add to index safely 2016-12-14 18:23:36 +00:00
Edd Robinson d78ca1a0f3 Fix some races 2016-12-14 18:23:36 +00:00
Edd Robinson 66edb32182 Sharded Cache using a hash ring 2016-12-14 18:23:36 +00:00
Jonathan A. Sternberg 21502a39e8 Switch logging to use structured logging everywhere
The logging library has been switched to use uber-go/zap. While the
logging has been changed to use structured logging, this commit does not
change any of the logging statements to take advantage of the new
structured log or new log levels. Those changes will come in future
commits.
2016-12-14 10:45:15 -06:00
Allen Petersen 31129ab0e9 Use slash separator for filenames in tar archives
NO-OP on platforms with unix path separator.
On Windows paths get converted to slashes before adding to archive and back to backslashes during restore.
2016-11-29 09:44:08 -08:00
Jason Wilder e8a28cfbab Expose Shard.LastModified
This returns the LastModified time of the shard.  The LastModified
time is the wall time when a change to the shards state occurred.
It uses the WAL or FileStore to determine the max mod time.
2016-11-23 10:04:07 -07:00
Jason Wilder 0ee58c208a Switch time.Sleep to time.Ticker
Avoids an allocation when calling time.Sleep
2016-11-15 16:13:55 -07:00
Jonathan A. Sternberg a515aeda39 Optimize first/last when no group by interval is present
The `first()` and `last()` functions response rate would increase linear
to the number of points even though it seems like it shouldn't. This
optimization greatly reduces the amount of time to return a response
when no `GROUP BY time(...)` clause is present in a query.
2016-10-25 09:57:31 -05:00
Jason Wilder 686d1a7ba4 Remove unused config options 2016-10-24 15:32:38 -06:00
Jonathan A. Sternberg 3681bc8a43 Filter out series within shards that do not have data for that series
Previously, we would return a full tag set for every shard and the tag
set would include all series that existed in the database index
including series that didn't physically exist within that shard. This
led to the tag sets returned being incredibly huge when we had high
cardinality but sparse data. Since the data was sparse, it was
unexpected that it would cause such a large strain on the system by most
people.

Now we filter out the series ids that are not assigned to the current
shard when computing a tag set for that shard. This lowers the memory
usage for high cardinality sparse data drastically and allows queries on
those to complete successfully.

This does not resolve issues for high cardinality data in every shard
that is also spread out over a long series of time. That situation isn't
nearly as common as the above situation though.
2016-10-20 14:15:34 -05:00
Jason Wilder b50d9558cf Merge pull request #7479 from influxdata/jw-clean-err
Skip cleanup if dir does not exist
2016-10-18 15:49:09 -06:00
Jason Wilder f30b00c24f Skip cleanup if dir does not exist 2016-10-18 15:33:39 -06:00
Mark Rushakoff 377c40f122 Add stats for active compactions
Unify logic around compaction execution to a single place.

Also report on the error stats that we track. Previously they were not
emitted in the stats output.
2016-10-18 14:12:21 -07:00
Joe LeGasse de9c743004 TSM: update comments for disabling level compactions 2016-10-18 14:14:59 -06:00
Joe LeGasse eda8f70372 TSM: Handle concurrent deletes for compaction 2016-10-18 14:14:59 -06:00
Jason Wilder ed7975874f Rename Enabled -> Enable 2016-10-18 12:22:00 -06:00
Jason Wilder f254b4f3ae Allow snapshot compactions during deletes
If a delete takes a long time to process while writes to the
shard are occuring, it was possible for the cache to fill up
and writes to be rejected.  This occurred because we disabled
all compactions while writing tombstone file to prevent deleted
data from re-appearing after a compaction completed.

Instead, we only disable the level compactions and allow snapshot
compactions to continue.  Snapshots already handle deleted data
with the cache and wal.

Fixes #7161
2016-10-18 12:14:51 -06:00
Jason Wilder 798fa0a9f8 Return error with unknown field type
This will just panic when trying to snapshot the value because EmptyValue
can't be written to TSM files.
2016-10-03 16:30:21 -06:00
Jason Wilder 125f106956 Pre-size the values map when write points 2016-10-03 16:30:21 -06:00
Joe LeGasse 743946fafb models: Add FieldIterator type
The FieldIterator is used to scan over the fields of a point, providing
information, and delaying parsing/decoding the value until it is needed.
This change uses this new type to avoid the allocation of a map for the
fields which is then thrown away as soon as the points get converted
into columns within the datastore.
2016-10-03 16:30:21 -06:00
Jason Wilder 190537a557 Fix DeleteSeries when multiple fields exists
The logic for determining whether a series key was already in the
the set of TSM series was too restrictive.  It allowed only the first
field of a series to be added leaving all the remaing fields.
2016-08-31 20:35:35 -06:00
Jonathan A. Sternberg dc2527ce86 Merge branch '1.0' 2016-08-31 14:45:57 -05:00
Jason Wilder d878d30d18 Fix shard write stats
* Rename *Fail to *Err for consistency with other metrics
* Use index Series count instead of sepaate counter
2016-08-29 09:46:11 -06:00
Edd Robinson 90ff713f21 Fix base64 encoding issue in stats
Fixes #7177.
2016-08-22 15:21:31 +01:00
Ben Johnson 8aa224b22d
reduce memory allocations in index
This commit changes the index to point to index data in the shards
instead of keeping it in-memory on the heap.
2016-08-16 14:09:00 -06:00
Jason Wilder c1a94e8861 Remove temp TSM files when disabling compactions
If they were left around, re-enabling them again could cause
future compactions to continuously fail.  A restart of the
server would clean them up correctly though.
2016-07-28 20:25:37 -06:00
Jason Wilder 5764a730d5 Prevent tombstoning series keys more than once
If there were multiple TSM files and a delete/drop was run,
we would write the delete series to the tombstone file N
times for each file.  This occurred because FileStore.WalkKeys walks
every key in every TSM file which can return duplicate keys.

This issue caused TSM files to be much larger than they should be
and also cause large memory usage during the delete.
2016-07-28 20:25:36 -06:00
Jason Wilder cab84ae279 Prevent concurrent compactions from stepping on each other
Normally, compactions do not conflict on the files they are compacting.
If the full cold threshold is set very low, it can cause conflicts where
two compactions compact the same files.  The full compaction was the
only place this could happen as it's planning is greedy.

To make this safer for concurrent execution, the compaction tracks which
files are current being compacted and prevents any new compactions from
starting if the file set overlaps.

Fixes #6595
2016-07-26 12:58:25 -06:00
Cory LaNou 968d322d6d finish tsm file exporter 2016-07-21 17:20:51 -05:00
Edd Robinson f37e726869 Add trace logging statements to tsdb 2016-07-21 11:14:29 +01:00
Edd Robinson 44231abcbd Add trace logger controlled via DataLoggingEnabled 2016-07-21 11:14:29 +01:00
Edd Robinson 83cc580ff8 Tidy up logging 2016-07-21 11:14:29 +01:00
Jason Wilder 46fdcba6e3 Remove compaction enabled logging
Too verbose
2016-07-17 23:53:12 -06:00
Jason Wilder 2fa28ba1d3 Don't log error when compactions are aborted 2016-07-17 23:53:12 -06:00
Jason Wilder b48d88ce9e Abort running compactions when series are deleted
If a delete is issued while a compaction is running, the a newly
deleted series could re-appear after the compaction completed. This
could occur the compaction had already written the blocks for series
that were just deleted.  When the compaction completes, the newly
written tombstone files would be deleted, essentially undeleting the
series.
2016-07-17 23:53:12 -06:00
Jason Wilder 0264966f5c Add index optimize planning step
For larger datasets, it's possible for shards to get into a state where
many large, dense TSM files exist.  While the shard is still hot for
writes, full compactions will skip these files since they are already
fairly optimized and full compactions are expensive.  If the write volume
is large enough, the shard can accumulate lots of these files.  When
a file is in this state, it's index can contain every series which
causes startup times to increase since each file must parse the full
set of series keys for every file.  If the number of series is high,
the index can be quite large causing large amount of disk IO at startup.

To fix this, a optmize compaction is run when a full compaction planning
step decides there is nothing to do.  The optimize compaction combines
and spreads the data and series keys across all files resulting in each
file containing the full series data for that shard and a subset of the
total set of keys in the shard.

This allows a shard to only store a series key once in the shard reducing
storage size as well allows a shard to only load each key once at startup.
2016-07-14 11:32:36 -06:00
Jonathan A. Sternberg 12a33fe0d3 Add stats and diagnostics to the TSM engine
Track the number of TSM files in the file store and keep engine
statistics related to the number of TSM compactions.
2016-07-07 19:35:55 -05:00
Jonathan A. Sternberg 837a9804cf Refactoring the monitor service to avoid expvar
Truncate the time interval output of the monitor service to be on even
time intervals rather than on every minute based on the start time. This
normalizes the output from the monitor service.
2016-07-07 11:13:58 -05:00
Jonathan A. Sternberg 497db2a6d3 Removing dead code from every package except influxql
The tsdb package had a substantial amount of dead code related to the
old query engine still in there. It is no longer used, so it was removed
since it was left unmaintained. There is likely still more code that is
the same, but wasn't found as part of this code cleanup.

influxql has dead code show up because of the code generation so it is
not included in this pruning.
2016-06-20 22:41:07 -05:00
Jonathan A. Sternberg 6e205ce135 Set the condition cursor instead of aux iterator when creating a nil condition cursor
A copy/paste error had nil cursors destined for a condition cursor get
set to the auxiliary cursor instead. When the number of conditions
exceeded the number of auxiliary fields, this would result in a stack
trace in some situations. When the number of conditions was less than or
equal to the number of auxiliary fields, it means that an auxiliary
cursor may have been overwritten with a nil cursor accidentally and a
leak might have happened since it was never closed.

Fixes #6859.
2016-06-17 14:54:48 -05:00
Jason Wilder ac6addd0b5 Ensure restore doesn't write broken files
Restore would try to open the shard if there was an error.  If there
was an error, the files written are very likely to be partially written
and they can cause the server to panic.

To prevent a shard from trying to open broken files, we now write to
a temp file and rename it to the actual name only after fully writing
and fsyncing the file.
2016-06-07 14:36:46 -06:00
Jason Wilder a74ea4cbf4 Allow creating shards in a disable state
For restoring a shard, we need to be able to have the shard open,
but disabled.  It was racy to open it and then disable it separately
since writes/queries could occur in between that time.
2016-06-01 16:17:18 -06:00
Jason Wilder 1ff8ecf4fb Add ability to disable shards
Disabling a shard causes all writes and queries to a shard to return
an error.  This also disables compactions for the shard.
2016-05-31 10:51:54 -06:00
Jason Wilder 11959005f4 Switch backup to use shard.Snapshot
This switch the backup shard call to use the shard Snapshot that
internally creates a snapshot by hardlinking all of the TSM and
tombstone files instead.  This reduces the time that the FileStore
is locked and will allow for larger shards to be backup more easily.
2016-05-27 09:30:25 -06:00
Edd Robinson 6a7f9527e3 Revert d2672a3 and 1e0a4e9 2016-05-27 10:34:14 +01:00
Edd Robinson d2672a3280 Update Go version 2016-05-26 15:26:09 +01:00
Edd Robinson 1e0a4e9119 Move fields under mutex 2016-05-26 12:00:46 +01:00
Jason Wilder 0b481ff627 Fix pathalogical TSM query case
This fixes a pathalogical query condition cause by and problematic
structuring of TSM files based on how points were written.  The
condition can occur when there are multiple TSM files and a large
number of points are written into the past.  The earlier existing
TSM files must also have points in the past and close to the present
causing their time range to eclipse the later files.

When this condition occurs, some queries can spend an excessive amount
of time merge all the overlapping blocks.

The fix was to constrain the window of overlapping blocks based on
the first one we ran into.  There was also a simple case in the Merge
where we could skip the binary search path and just append the two
inputs.
2016-05-25 09:14:17 -06:00
Jonathan A. Sternberg 5621ccc2ce Remove limit optimization when using an aggregate
The limit optimization was put into the wrong place and caused only part
of the shard to be read when a limit was used. The optimization is
possible, but requires a bit of refactoring to the code here so the call
iterator is created per series before handed to the limit iterator.

Fixes #6661.
2016-05-19 10:29:38 -04:00
Jason Wilder d32ad26d27 Fix data not getting reloaded
The optimization to speed up shard loading had the side effect of
skipping adding series to the index that already exist.  The skipping
was in the wrong location and also skipped the shards measurementFields
index which is required in order to query that series in the shard.
2016-05-18 15:25:56 -06:00
Edd Robinson f78e67d09c Fix concurrent map access panic 2016-05-18 17:56:50 +01:00
Jonathan A. Sternberg 23f6a706bb Support cast syntax for selecting a specific type
Casting syntax is done with the PostgreSQL syntax `field1::float` to
specify which type should be used when selecting a field. You can also
do `field1::field` or `tag1::tag` to specify that a field or tag should
be selected.

This makes it possible to select a tag when a field key and a tag key
conflict with each other in a measurement. It also means it's possible
to choose a field with a specific type if multiple shards disagree. If
no types are given, the same ordering for how a type is chosen is used
to determine which type to return.

The FieldDimensions method has been updated to return the data type for
the fields that get returned. The SeriesKeys function has also been
removed since it is no longer needed. SeriesKeys was originally used for
the fill iterator, but then expanded to be used by auxiliary iterators
for determining the channel iterator types. The fill iterator doesn't
need it anymore and the auxiliary types are better served by
FieldDimensions implementing that functionality, so SeriesKeys is no
longer needed.

Fixes #6519.
2016-05-16 12:08:29 -04:00
Jason Wilder 0dbd4893da Optimize shard index loading
On data sets with many series and potentially large series keys,
the cost of parsing the key and re-indexing can be high.

Loading the TSM keys into the index was being done repeatedly for
series that were already index by an earlier TSM file.  This was
wasted worked and slows down shard loading.

Parsing the key was also innefficient and allocated a new string
slice.  This was simplified to remove that allocation.
2016-05-12 14:02:42 -06:00
Ben Johnson 668bae57df
parallelize query planning
This commit changes the `tsm1.Engine` to create individual series
iterators in batches so that it can be parallelized. Iterators
are combined at the end so they can be redistributed to the
parallelized merge iterator.
2016-05-11 10:38:11 -06:00
Cory LaNou f415cf89ad wip 2016-05-10 11:01:03 -05:00
Cory LaNou 4d30ea1eb3 minor PR feedback refactor 2016-05-10 08:14:51 -05:00
Cory LaNou a3bf3e2ef1 added baseline backup/restore plumbing 2016-05-10 08:14:51 -05:00
Ben Johnson 078e561820
parallelize iterators 2016-05-09 10:25:30 -06:00
Ben Johnson fdf34d4356
move call iterator to series level
This commit moves the `CallIterator` to wrap the individual series
instead of wrapping a shard. This allows individual points to be
aggregated before being merged.

This will cause a small increase in memory usuage per series but
it shows a 20% decrease in query time when there are a moderate
number of points per series.
2016-05-05 09:59:03 -06:00
Jonathan A. Sternberg a2a5c32770 Merge pull request #6539 from influxdata/js-6495-fix-aggregates-with-empty-shards
Fix aggregate returns when data is missing from some shards
2016-05-03 10:56:21 -04:00
Jonathan A. Sternberg d6d0addcec Fix aggregate returns when data is missing from some shards
If a shard is empty for a specific field and the field type is something
other than a float, a nil iterator would get returned from one of the
empty shards and cause the combined iterators to be cast to the float
type and all other iterator types to be discarded (or for integers, to
be cast).

This is rare since most aggregates don't accept strings or booleans, but
for queries like:

    SELECT distinct(string) FROM mydata

It would result in nothing getting returned if one of the shards didn't
have a value for `string`.

This change modifies the query engine to return nil for the shards
instead of a fake iterator and then to only use the fake iterator if the
final aggregate iterator is nil (meaning that no iterators could be
constructed for the field from any shard).

Fixes #6495.
2016-05-03 10:41:22 -04:00
Jason Wilder e0304ae3d5 Fix shards not getting assigned to series on restart
Also, simplifies the LoadMetaDataIndex func to not require a *Shard
2016-05-02 11:36:05 -06:00
Jason Wilder bd1009080e Prevent writing empty tombstone files
If you delete from a measurement with a tag those does not match
any series, we would write a empty tombstone file and file to load
it back.
2016-05-02 11:36:04 -06:00
Jason Wilder 8082fc61ba Fix parsing keys when loading database index
The code for parsing a key our of the WAL or TSM files in the engine
was naive and didn't account for measurements with escape chars. This
uses the correct parsing code to parse and load them correctly.

Fixes #6496
2016-04-30 14:47:19 -06:00
Jason Wilder abcb559b09 Remove index meta data when series and measurements are gone
This remove the dropMeta param from the tsdb.Store.DeleteSeries and
lets the shard determine when to remove the meta data from the index
based on what series still have data in the shard.

This uncovered a nasty bug in compactions where a fully deleted series would
prematurely end the compactions and not carry forward the rest of the data
in the TSM file.  This is now fixed as well.
2016-04-29 16:31:57 -06:00
Jason Wilder aefd2ad08b Add DeleteSeries and DeleteSeriesRange 2016-04-27 13:09:53 -06:00
Jason Wilder 0de21ade40 Add delete range of values support to WAL and cache loader 2016-04-27 13:09:53 -06:00
Jason Wilder d13d01b516 Allow deleting series by time on a shard 2016-04-27 13:09:53 -06:00
Jason Wilder bfa225f149 Merge pull request #6430 from influxdata/jw-cache-load-size
Disable cache max memory size when reloading the cache
2016-04-20 14:35:23 -06:00
Stephen Gutekanst 9dc09c5257 Make logging output location more programmatically configurable (#6213)
This has various benefits:

- Users embedding InfluxDB within other Go programs can specify a different logger / prefix easily.
- More consistent with code used elsewhere in InfluxDB (e.g. services, other `run.Server.*` fields, etc).
- This is also more efficient, because it means `executeQuery` no longer allocates a single `*log.Logger` each time it is called.
2016-04-20 21:07:08 +01:00
Jason Wilder f679787080 Disable cache max memory size when reloading the cache
The cache max memory size is an approximate size and can prevent a
shard from loading at startup.  This change disable the max size
at startup to prevent this problem and sets the limt back after
reloading.

Fixes #6109
2016-04-20 10:41:30 -06:00
Jonathan A. Sternberg 93745d9693 Merge pull request #6391 from influxdata/js-5553-limit-queries-slow-with-group-by
Propagate the limit option to the low level iterators
2016-04-16 09:39:25 -04:00
Jonathan A. Sternberg bd5fdd797d Propagate the limit option to the low level iterators
When a GROUP BY or multiple sources are used, the top level limit
iterator requires reading the entire iterator stream so it can find all
of the tag groups it needs to return. For large data series, this ends
up with the limit iterator discarding a lot of output.

This change adds a new lower level limit iterator on each series itself
so that there are fewer data points that have to be thrown away by the
top level iterator.

Fixes #5553.
2016-04-15 18:23:54 -04:00
Jonathan A. Sternberg 835d08591e Do not filter out empty tags from series keys 2016-04-13 09:15:57 -04:00
Jonathan A. Sternberg ea6262b712 Enhance comparing tags and fields in the where clause
Now it is possible to compare tags and fields and it is also now
possible to compare tags and tags. Previously, it was only possible to
compare fields with fields and tags with a string or a regex.

Fixes #3371.
2016-04-11 18:10:08 -04:00
Jonathan A. Sternberg 028fdaff81 Merge pull request #6222 from influxdata/js-6206-descending-tsm1-iterators
Handle nil values from the tsm1 cursor correctly
2016-04-06 10:05:20 -04:00
Jonathan A. Sternberg 94ec92d669 Handle nil values from the tsm1 cursor correctly
Send nil values from the tsm1 cursor at the end of the cursor. After the
cursor reached tsm1, the `nextAt()` call would always return the default
value rather than a nil value.

Descending also didn't work correctly because the seeking functionality
for tsm1 iterators would always act like they were ascending instead of
descending when choosing which value to select. This resulted in very
strange output from the emitter since it couldn't figure out if it was
ascending or descending.

Fixes #6206.
2016-04-06 09:27:02 -04:00
Jason Wilder 3f4c5a5585 Fix race on measurementFields
Both Shard and Engine had the same reference to the measurementField map,
but they each protected it with their own locks.  This causes a race when
write and queries are occurring because writes can add new fields to the
map while queries are reading from it.

The fix moves the ownership to the Engine and provides protected accessors
to that Shard now users.  For the most parts, the access on shard were old
dead code.

Fixing the measurementFields map race created a new race on the internal
fields map.  This is now unexported and protected via MeasurementFields
exported funcs.

Fixes #6188
2016-04-01 18:57:01 -06:00
Jason Wilder 1b08e2dd55 Use walk func to load all tsm keys to index
Avoids allocating a big map or all keys.
2016-03-29 12:59:26 -06:00
Jason Wilder 03ced4cc90 Load shards concurrently 2016-03-29 12:58:52 -06:00
Jonathan A. Sternberg a35d9602cd Fix where filters when a OR is used and when a tag does not exist
If an OR was used, merging filters between different expressions would
not work correctly. If one of the sides had a set of series ids with a
condition and the other side had no series ids associated with the
expression, all of the series from the side with a condition would have
the condition ignored. Instead of defaulting a non-existant series
filter to true, it should just be false and the evaluation of the one
side that does exist should take care of determining if the series id
should be included or not. The AND condition used false correctly so did
not have to be changed.

If a tag did not exist and `!=` or `!~` were used, it would return false
even though the neither a field or a tag equaled those values. This has
now been modified to correctly return the correct series ids and the
correct condition.

Also fixed a panic that would occur when a tag caused a field access to
become unnecessary. The filter using the field access still got created
and used even though it was unnecessary, resulting in an attempted
access to a non-initialized map.

Fixes #5152 and a bunch of other miscellaneous issues.
2016-03-22 12:19:06 -04:00
Jonathan A. Sternberg 6655ca7769 Create a new interrupt iterator that will stop emitting points after an interrupt
Use of the iterator is spread out into both `IteratorCreators` and
inside of the iterators themselves. Part of the interrupt must be
handled inside of the engine so it stops trying to emit points when an
interrupt is found and another part of the interrupt has to happen when
combining the iterators so it doesn't just start reading the next shard.
2016-03-21 12:07:07 -04:00
Jason Wilder 000459e350 Fix deadlock when running backup
A deadlock occurs under write load if a backup is run in between the
time when a snapshot compactions has snapshotted the cache and successfully
written it to disk.  The issus is that the second snapshot call will block
on the commit lock while it is holding the engine write lock.  This causes
all writes to block as well as prevents the currently runnign snapshot
compaction from completing because it needs to acquire a read-lock.

This PR removes the commit lock and just returns an error if a snapshot is
in progress to all any locks being held to be released.  The caller can determine
whether to retry or giveup.
2016-03-14 12:36:48 -06:00
Jason Wilder 992c78ee22 Remove period shard maintenance goroutine
This is no longer used in tsm and just peridocially locks everything
for no reason now.
2016-03-09 17:31:02 -07:00
Edd Robinson 58c03448aa Merge pull request #5514 from influxdata/er-engine-panic
Ensure shards and engine are safely closed
2016-03-09 18:56:36 +00:00
Jason Wilder 8d70d65a82 Convert time.Time to int64 2016-02-25 15:15:01 -07:00
Jon Seymour eb7eec078d tsm: cache: introduce commit lock to Cache
Currently two compactors can execute Engine.WriteSnapshot at once.

This isn't thread safe since both threads want to make modifications to
Cache.snapshot at the same time.

This commit introduces a lock which is acquired during Snapshot() and
released during ClearSnapshot(), ensuring that at most one thread
executes within Engine.WriteSnapshot() at once.

To ensure that we always release this lock, but only release the
snapshot resources on a successful commit, we modify ClearSnapshot() to
accept a boolean which indicates whether the write was successful or not
and guarantee to call this function if Snapshot() has been called.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-25 12:10:37 +11:00
Jason Wilder 017c24c98e Simplify cache snapshotting
The Cache had support for taking multiple snapshots to support writing
multiple snapshots to TSM files concurrently if that happened to be
a bottleneck.  In practice, this is never a bottleneck and we only
run one snappshoting goroutine continously per shard which has worked
well for all workloads.

The multiple snapshot support introduces some unhandled failure scenarios
where wal segments could be removed without writing them to TSM files.  If
a snapshot compaction fails to write due to transient disk errors, subsequent
snapshots will continue, but the failed one will not be retried.  When the
subsequent ones succeeded, all closed wal segments are removed causing data
loss.

This change simplifies the snapshotting capability to ensure that there is only
ever one snapshot.  If one fails, the next snapshot will update the existing
snapshot and retry all of old and new data.

Fixes #5686
2016-02-23 09:38:51 -07:00
Jonathan A. Sternberg 50753de032 Merge pull request #5782 from influxdata/js-5777-audit-panics-in-influxql
Remove the non-unreachable panics in the new query engine
2016-02-22 17:18:57 -05:00
Jonathan A. Sternberg 7a03df2af1 Remove the non-unreachable panics in the new query engine
The only panics left are ones that should be unreachable unless there is
a bug.

Fixes #5777.
2016-02-22 12:52:43 -05:00
Jon Seymour 6697c721fb tsm: cache: add cache throughput related statistics.
Complementing and extending the changes in #5758.

Add 2 level statistics:

  * snapshotCount
  * cacheAgeMs

Add 2 counter statistics

  * cachedBytes
  * WALCompactionTimeMs

snapshotCount can be used to measure transient write errors that are causing snapshots to accumulate

cacheAgeMs can be used to guage the level of write activity into the cache

The differences between cachedBytes stats sampled at different times can be used to calculate cache throughput rates

The ratio (cachedBytes-diskBytes)/WALCompactionTimeMs can be used calculate WAL compaction throughput.

The ratio of difference between first and last WAL compaction time over the interval
length is an estimate of percentage of cache throughput consumed.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-20 22:18:57 +11:00
Mark Rushakoff e76967efb6 Add stats to tsm1.Cache 2016-02-19 16:37:34 -08:00
Ben Johnson d9a6a7340f add canonical paths 2016-02-10 11:30:52 -07:00
Ben Johnson 5a0d1ab7c1 rename influxdb/influxdb to influxdata/influxdb
This commit changes all the import and URL references from:

    github.com/influxdb/influxdb

to:

    github.com/influxdata/influxdb
2016-02-10 10:26:18 -07:00
Jonathan A. Sternberg d1f7c445e7 Modify iterators to work across shards
Aux iterators now ask the iterator creator what series will be returned
and determine which aux fields to create based on the results.

The `tsdb.Shards` struct also creates a call iterator around the
iterators returned from each shard.
2016-02-10 09:40:29 -07:00
Jonathan A. Sternberg c2d1206177 Implement the fill iterator
Fill requires an additional function for IteratorCreator to retrieve the
series that will be returned from the iterator. When fill is required
for an aggregate, the IteratorCreator will be asked what series will be
returned by the created iterator.
2016-02-10 09:40:29 -07:00
Ben Johnson 6204350d65 fix math operations 2016-02-10 09:40:27 -07:00
Ben Johnson b8918a780c integer support 2016-02-10 09:40:25 -07:00
Jonathan A. Sternberg 34f14424dd Filter tags from the condition when building cursors on tsm1 2016-02-10 09:40:25 -07:00
Ben Johnson 00806de9b8 refactor query engine 2016-02-10 09:40:25 -07:00
Edd Robinson 1bcb1d033f Allow Close to be called multiple times safely 2016-02-03 10:20:22 +00:00
Jason Wilder 5bee8880db Reduce lock content in engine.WritePoints
Writing the snapshot would deduplicate the snapshot points
while still holding the engine write-lock.  This can be expensive
under high load and cause writes to back up and OOM the server.

Instead, grab the snapshot under the lock and dedup it after releasing
the lock.

Possible fix for #5442
2016-01-25 15:37:34 -07:00
Jason Wilder 24f1bcfd20 Remove Dev prefix from tsm engine/tx 2016-01-10 16:43:36 -07:00
Jason Wilder 5b179113fc Don't close tsm cursor prematurely
We were closing the cursor when we read the last block which caused
the internal state to be cleared.  In a group by query, we seeked multiple
times so depending on the group by interval and how the data was laid out
in the blocks, we woudl close the cursor and the last block would get skipped.

Fixes #5193
2016-01-10 15:26:01 -07:00
Jason Wilder d2b7c03175 Re-use the series key
Avoid allocating the string twice.
2016-01-06 12:52:13 -07:00
Paul Dix 26e1c6464a Update backup to address PR comments 2015-12-30 18:06:51 -05:00
Paul Dix 59fbd371fc Implement backup/restore for TSM.
This changes backup and restore to work for TSM. It breaks it for b1 and bz1, but since those are getting removed it's ok.

The backup runs against any host that is specified and can backup either the metasstore, a database, specific retention policy, or a specific shard. It can also take incremental backups with the `since` flag, which will only backup TSM files that have been created since that timestamp.

The backup is safe to run online. However, for shards that are still hot for writes, they won't be able to create new TSM files while the backup for that single shard runs. If the backup isn't too large and the write throughput isn't too high this shouldn't be a problem since the writes will just go into the WAL cache.
2015-12-30 18:06:50 -05:00
Jason Wilder a38c95ec85 Update compactions to run concurrently
This has a few changes in it (unfortuantely).  The main change is to run compactions
concurrently.  While implementing this, a few query and performance bugs showed up that
are also fixed by this commit.
2015-12-23 18:01:11 -07:00
Jason Wilder bb2562b2ab Return CompactionGroups from planning 2015-12-23 18:01:11 -07:00
Jason Wilder 8c7e11f4cf Aggressively clean up KeyCursor resources 2015-12-17 12:51:51 -07:00
Jason Wilder 3893bc60e1 Speed up TSM compactor
Just keep the current block for each iterator in the buffers.
2015-12-16 11:16:17 -07:00
Alexandre Viau ad1044dde9 typo: unkown -> unknown 2015-12-15 18:10:47 -05:00
Philip O'Toole 01ac0b3f23 Tweak compaction log messages 2015-12-15 10:33:13 -08:00
Jason Wilder d7cff651d1 Cancel writing TSM files when engine closes
If the engine is closed while a compaction is going on, the close call
blocks until the goroutine exits.  This could be several minutes because
the control does not return back up to the channel selector while there is
still data to write.
2015-12-08 15:41:53 -07:00
Jason Wilder 9d82e24ca0 Fix performance of dropping large number of keys 2015-12-08 10:47:06 -07:00
Jason Wilder f245b44afa Set full compaction duration option on planner
Was set on engine and not planner so it was always 0.
2015-12-08 09:56:36 -07:00
Paul Dix 8096c6b845 Update TSM, address PR #5011 comments
* Moved TSM file extension to a constant
* Fixed typos
* Changed group.size() back to being a uint64 since it can have multiple files up to 4GB each.
2015-12-07 14:47:17 -05:00
Paul Dix 820b0d31d6 Update TSM to delete from the WAL/cache
* Update cache loader to delete entries from cache
* Add cache.Delete()
* Update delete to look at keys in the Cache in addition to the FileStore
* Update cache compaction to never happen if the cache is empty
2015-12-07 14:35:48 -05:00
Paul Dix 937233d988 Update TSM compaction planning logic
* Update Plan to do a full compaction if cold for writes
* Remove MaxFileSize as a config variable from Compactor. Should be a set constant
* Update Plan to keep track of if the last check was fully compacted so we can skip future planning calls
* Update compact min file count to 3 so that compactions run more frequently
2015-12-07 08:26:30 -05:00
Paul Dix 1bee7d1512 Update TSM, remove old version, add config
* remove rolloverTSMFileSize constant that is no longer used
* remove the maxGenerationFileCount since it is no longer a limitation that's necessary with the new compaction scheme. We no longer read WAL segments as part of the compaction so memory is only used as we read in each individual key
* remove minFileCount and switch to a user configurable variable
* remove the mutex from WALSegmentWriter. There's never more than one open in the WAL at one time and it's not exported through any function so the lock on the WAL should be used. This simplified keeping track of the last write time and removed a bunch of unnecessary locks.
* update WALSegmentWriter.Write to take the compressed bytes so that encoding and compression can occur before the call to write (while we don't hold the WAL lock)
* remove a bunch of unnecessary locking in WAL.writeToLog
* Add check for TSM file magic number and vesion
* Remove old tsm, log, and unused cursor code
* Remove references to tsm1dev everywhere except in the inspector
* Clean up config options for compaction and snapshotting
* Remove old TSM configuration options
* Update the config.sample.toml with TSM options
* Update WAL compact to force if it has been cold for writes for a configurable period of time (1h by default)
2015-12-06 18:50:39 -05:00
Philip O'Toole 6e88547a5e Support shutting down engine goroutines
This was causing races in the code, when the cache was being reloaded,
because back-to-back open-and-closing of the engine during testing left
goroutines running. With this change the engine is completely shutdown
when Close() is called on it.
2015-12-06 09:16:38 -08:00
Philip O'Toole 0d0b919144 Integrate CacheLoader with tsm2 engine 2015-12-05 22:13:57 -08:00
Paul Dix 40e606cb14 Merge pull request #5003 from influxdb/jw-compaction
Update compaction planning
2015-12-05 16:49:54 -05:00
Jason Wilder 6592615958 Updated compaction strategy
This changes compacting files to merge sequences of files in lower generations
up to later generations
2015-12-04 23:30:39 -07:00
Paul Dix 9637446ba9 Merge pull request #4990 from influxdb/pd-loadmetadata-wal
Update TSM engine, WAL and encoding
2015-12-04 18:21:47 -05:00
Paul Dix 33506e4d3e Update TSM cache and engine LoadMetadataIndex 2015-12-04 16:40:01 -05:00
Paul Dix b0f3dcc8cc Update TSM metadata loading and write snapshot
* Update WriteSnapshot to always call synchronously
* Update LoadMetadataIndex to load WAL metadata from the cache
2015-12-04 16:03:17 -05:00
Jason Wilder c7e37766e7 Avoid repetitive index searches when iterating over cursors
First pass at TSM cursor iteration ended up searching the file indexes
too frequently and hurt performance.  This changes that to search it once
and then have the cursor hold onto the block locations to seek
to.  Doubles the query performance from the first iteration, but still a lot
of room for improvement.
2015-12-04 10:02:59 -07:00
Jason Wilder 4b7cc6720a Merge pull request #4983 from influxdb/jw-tsm-deletes2
Implement delete series/measurement
2015-12-04 10:02:11 -07:00
Jason Wilder c54a3da0ca Implement delete series/measurement 2015-12-04 09:10:26 -07:00
Paul Dix eafb703afc Update TSM engine, WAL and encoding
* Add InfluxQLType to Values to map the TSM type to InfluxQL
* Fix bug in WAL where close wouldn't nil out the currentSegment after closing it
* Export writeSnapshot to be used in tests, add argument to run it async or not
* Update reloadCache to load temporary metadata information in the engine
* Update LoadMetadataIndex to use the temp WAL metadata information
2015-12-04 11:09:39 -05:00
Paul Dix b7bae53405 Merge pull request #4980 from influxdb/cursor_desc
Fix descending cache cursor
2015-12-04 07:02:13 -05:00
Philip O'Toole 2d79d7e35f Fix descending cache cursor 2015-12-03 14:34:29 -08:00
Jason Wilder 66c9ef862e Fix regressions
Something broke with writing to the WAL now that compactions are running
concurrently.  There was also a performance problem with Next/Prev doing
twice as many searches as necessary.
2015-12-03 14:25:03 -07:00
Jason Wilder adf5c5b223 Replace Next/Prev with Scan 2015-12-03 12:39:13 -07:00
Jason Wilder 193a36eeb6 Fix code review comments 2015-12-03 12:39:13 -07:00
Jason Wilder 2ad32af7ea Add desc quey support 2015-12-03 12:39:13 -07:00
Jason Wilder be59ba3455 Add Prev support to FileStore
Allows read the previous block of values given a timestamp and key.
2015-12-03 12:39:12 -07:00
Jason Wilder e9832d7414 Add multi-field cursor support to devtsm1 engine 2015-12-03 12:37:47 -07:00
Jason Wilder 6fba01df89 Implement single field TSM queries 2015-12-03 12:35:36 -07:00
Paul Dix be4891c40b Update TSM write snapshot, Compactor
* Ensure that writing snapshots in engine are goroutine safe
* Add Clone method to Compactor
2015-12-03 11:49:47 -05:00
Paul Dix 6722e9ff14 Update TSM engine, engine_test, and wal_test
* Address jwilder's comments in #4966
2015-12-03 10:49:47 -05:00
Paul Dix b0fb8a0a27 Update TSM cache, compact, wal, encoding
* Update cache to have a single slice of values for a key (removed checkpoints)
* Changed compact.Plan to only worry about TSM files.
* Updated Plan to not return an error since there was no case in which it would.
* Update WAL to not keep stats since they're no longer needed.
* Update engine to flush the Cache/WAL to a new TSM file when the min threshold is hit.
* Split compact logic between TSM compacts and WAL/Cache writes.
* Remove unnecessary merge iterator, wal segment iterator, and other no longer necessary stuff.
* Remove the asending bool from the Dedupe method. Values should always be in ascending order. It's up to the cursor to iterate through values based on the direction. Giving the cursor responsibility makes it so we don't need to sort, dedupe or reallocate anything for different query orders.
* Updated engine to use its locks to ensure writes and cache flushes don't cause a race.
* Update all tests with new signatures. Removed a bunch of tests around TSM rewrites and WAL segment iteration that are no longer necessary.
2015-12-03 08:11:50 -05:00
Jason Wilder 83ccaaa656 Reload cache at startup 2015-12-02 14:16:36 -07:00
Jason Wilder ba99dece0c Wire up tsm1dev engine cursor 2015-12-02 14:01:10 -07:00
Jason Wilder 3a8a19a99d Implement LoadMetaDataIndex for tsm1dev engine 2015-12-02 13:38:06 -07:00
Jason Wilder 3014d7e391 Return errors for func not implemented in tsm1dev engine 2015-12-02 11:06:01 -07:00
Jason Wilder a7e21c2975 Don't set a cache memory limit by default
100mb is easy it hit even with basic stress test config.  Don't set
a limit by default so that an operator can size it appropriately based
on their hardware.
2015-12-02 11:01:13 -07:00
Jason Wilder 708266da69 Cache related compaction fixes 2015-12-02 09:45:24 -07:00
Jason Wilder 7e249e0555 Use CacheKeyIterator instead of WALKeyIterator during compactions 2015-12-02 09:45:24 -07:00
Jason Wilder 4a03469662 Integrate TSM compaction into dev engine 2015-12-02 09:45:23 -07:00
Jason Wilder d4b1c25f8e Add CompactionPlanner type
CompactionPlanner is used to determine which files (WAL Segments, TSM
Files) to include in a given compaction run.
2015-12-02 09:45:23 -07:00
Philip O'Toole fc83968e2e Cache values supports sorting order 2015-12-01 13:24:25 -08:00
Philip O'Toole 3a72e40e3f Implement descending cursor support 2015-12-01 13:24:25 -08:00
Philip O'Toole 59674fda21 Integrate cache query with tsm1dev engine 2015-12-01 13:24:25 -08:00
Philip O'Toole 8649ce4c49 Integrate cache with tsm1dev write path 2015-11-26 06:07:19 -08:00
Philip O'Toole 8e7dc3bef9 WAL returns current segment ID on write and delete 2015-11-25 12:23:10 -08:00
Jason Wilder 34bffd5e18 Code review fixes 2015-11-24 21:24:13 -07:00
Jason Wilder 25206c729c Add compactor type 2015-11-24 08:50:07 -07:00
Jason Wilder 697cfe604b Add stubbed out dev tsm engine
Starting to integrate some of the components into a engine that is
usable for development purposes.  This allows the code to evolve while
keeping the existing TSM engine in tact for reference.

Currently, just the WAL is wired up so writes can be tested.  Other engine
functions will panic the server if called.
2015-11-23 13:55:34 -07:00