Commit Graph

163 Commits (1.11)

Author SHA1 Message Date
Edd Robinson 2d9bd09784
Use []byte where possible in Index 2017-01-05 09:57:34 -07:00
Edd Robinson 4b1ef68dc9
Move series and measurement stats to store 2017-01-05 09:54:05 -07:00
Edd Robinson aaf85ae38d
Tombstoning with series cardinality part 1 2017-01-05 09:54:04 -07:00
Edd Robinson bd8dd9a291
Sketches working 2017-01-05 09:54:04 -07:00
Edd Robinson d19fbf5ab4
Wire in HLL estimator 2017-01-05 09:54:03 -07:00
Edd Robinson 2b8efefef4
Initial index interface 2017-01-05 09:51:43 -07:00
Edd Robinson c535e3899a
Remove in-memory index from Shard and Store 2017-01-05 09:47:09 -07:00
Mark Rushakoff 07b87f2630 Miscellaneous lint cleanup 2017-01-03 09:47:32 -08:00
Mark Rushakoff 4a774eb600 Update godoc for the tsdb package 2016-12-30 21:12:37 -08:00
gunnaraasen 78b1a0e771 Add stats on dropped measurements and series; Fixes #7697 2016-12-13 15:17:31 -08:00
Jason Wilder bf17074f58 Avoid allocation when counting tag keys
A new sorted slice was called by the monitor func every 10s.  The
tag keys don't need to be sorted so this avoid the allocation of the
slice and one during sorting.
2016-11-15 16:13:55 -07:00
Jonathan A. Sternberg 3681bc8a43 Filter out series within shards that do not have data for that series
Previously, we would return a full tag set for every shard and the tag
set would include all series that existed in the database index
including series that didn't physically exist within that shard. This
led to the tag sets returned being incredibly huge when we had high
cardinality but sparse data. Since the data was sparse, it was
unexpected that it would cause such a large strain on the system by most
people.

Now we filter out the series ids that are not assigned to the current
shard when computing a tag set for that shard. This lowers the memory
usage for high cardinality sparse data drastically and allows queries on
those to complete successfully.

This does not resolve issues for high cardinality data in every shard
that is also spread out over a long series of time. That situation isn't
nearly as common as the above situation though.
2016-10-20 14:15:34 -05:00
Jason Wilder 2e473e9518 Fix panic in AppendSeriesKeyByID
Calling this function with a series ID that does not exist in
the measurement causes a panic.

Fixes #7334
2016-10-19 11:07:19 -06:00
Jonathan A. Sternberg 41e4e73d4e Reduce map allocations when computing the TagSets of a measurement
Instead of assigning a boolean value of true to the filter expressions
when there was no meaningful expression, this drops a boolean expression
of true from the filter expressions so we don't have to perform a map
assignment. This allows us to reduce allocations and assignments when a
`WHERE` clause only contains tag comparisons and no field comparisons.
2016-10-17 12:13:19 -05:00
Jason Wilder a5f871d62c Rework monitoring to avoid allocations 2016-10-10 11:42:15 -06:00
Jason Wilder 8fce6bba48 Add tag value cardinality limit 2016-10-10 11:42:15 -06:00
Jason Wilder 68dd312bb1 Reduce allocations when calculating tagsets
The TagSets function was creating a lot of intermediate maps and
slices to calculate the sorted tag sets.  It first creates a map
to group tag sets with their series, it then created an equally
sized slice of the tag keys and sorted then.  Finally, it created
a new slice and added the tag sets in the original map by the ordering
of the sorted keys.  It was also recreating the tags map multiple time
creating extra garbage in the loop.

This simplifies the code to create one map for grouping and than adding
the distinct sets to a slice which is then sorted.  It also fixes the
multple tag maps getting created.
2016-09-29 16:02:29 -06:00
Jason Wilder 6671ef00f0 Reduce allocations in idsForExpr 2016-09-26 08:36:59 -06:00
Edd Robinson ed41122ade Pre-allocate map for performance 2016-09-15 18:28:46 +01:00
Jonathan A. Sternberg dc2527ce86 Merge branch '1.0' 2016-08-31 14:45:57 -05:00
Jonathan A. Sternberg 964341eb20 Optimize queries that compare a tag value to an empty string
The behavior for querying tag values with an empty string was originally
fixed in #6283, but it also added a performance problem when the
cardinality of the tag was high. Since a call to `Union()` or `Reject()`
would happen for every series key and it would be called N times for N
cardinality, the comparisons against a blank string were unnecessarily
slow with large memory allocations.

This optimizes these queries so it doesn't use those methods anymore.
Those methods are still useful and used when combining AND and OR
clauses, but they aren't useful when finding the series ids for a single
clause. These methods were unnecessary anyway because the series ids for
the tags were unique anyway and didn't have to be merged as a set.
2016-08-31 14:03:23 -05:00
Ben Johnson a30f9b6c70 Merge pull request #7196 from benbjohnson/mmap-fix
Fix mmap dereferencing
2016-08-24 10:48:28 -06:00
Ben Johnson cc628a1097
Fix mmap dereferencing
Adds a missing dereference call to `Close()` as well as fixes
a tag copy issue.
2016-08-24 10:48:07 -06:00
Edd Robinson 6cafdbc604 Ensure we don't mutate provided statistics tags 2016-08-24 11:40:13 +01:00
Edd Robinson 90ff713f21 Fix base64 encoding issue in stats
Fixes #7177.
2016-08-22 15:21:31 +01:00
Ben Johnson 65536676a4 Merge pull request #7138 from benbjohnson/optimize-shard-open
Reduce memory allocations in index
2016-08-17 15:27:33 -06:00
Ben Johnson 8aa224b22d
reduce memory allocations in index
This commit changes the index to point to index data in the shards
instead of keeping it in-memory on the heap.
2016-08-16 14:09:00 -06:00
Jonathan A. Sternberg 6b5b24a3e3 Decrement number of measurements only once when deleting the last series from a measurement 2016-08-15 13:57:08 -05:00
Mark Rushakoff f34a7430e3 Fix length of (*DatabaseIndex).SeriesKeys()
Previously, it would return as many empty strings in the first half of
the slice as valid values at the end of the slice.
2016-07-27 16:07:39 -07:00
Jason Wilder c31f0c25b4 Fix duplicate series getting created
There was a race where the same series would get added to the in-memory
index for a measurement more than once.  This would result in the same
series being returned more than once during queries causing duplicate
results.  The issue was that we check for the series under the read
lock, but did not check again under the write lock where there was
a small window where the series could be added by another goroutine.

We now check for the series under the write lock.

Fixes #6946
2016-07-18 16:46:36 -06:00
Jonathan A. Sternberg 837a9804cf Refactoring the monitor service to avoid expvar
Truncate the time interval output of the monitor service to be on even
time intervals rather than on every minute based on the start time. This
normalizes the output from the monitor service.
2016-07-07 11:13:58 -05:00
Jonathan A. Sternberg 497db2a6d3 Removing dead code from every package except influxql
The tsdb package had a substantial amount of dead code related to the
old query engine still in there. It is no longer used, so it was removed
since it was left unmaintained. There is likely still more code that is
the same, but wasn't found as part of this code cleanup.

influxql has dead code show up because of the code generation so it is
not included in this pruning.
2016-06-20 22:41:07 -05:00
Ben Johnson 1b94cd2686
optimize SHOW TAG VALUES
This commit optimizes `SHOW TAG VALUES` so that it avoids the
`SELECT` query engine execution and iterator creation. There
are also optimizations to reduce individual memory allocations
and to reduce in-memory heap size by only operating on one
measurement at a time.

Execution time has been reduce to approximately 900ms for
500,000 rows. This is about 2µs per row. Of this time,
approximately 1µs is spent retrieving and sorting the row
and 1µs is spent encoding into JSON and writing to the
response body.
2016-06-06 15:50:53 -06:00
Jason Wilder 579923d95f Fix sporadic write failures with influx_stress
This Unlock was moved which seems to create a deadlock situation
sometimes under high write load.  This deadlock causes writes to
fail with timeouts.
2016-06-01 17:25:47 -06:00
Jason Wilder ff1447202c Reduce lock contention in Measurement.AddSeries 2016-05-27 10:30:08 -06:00
Jason Wilder f1ab89561a Reload series count stat at startup 2016-05-18 15:21:57 -06:00
Jonathan A. Sternberg 23f6a706bb Support cast syntax for selecting a specific type
Casting syntax is done with the PostgreSQL syntax `field1::float` to
specify which type should be used when selecting a field. You can also
do `field1::field` or `tag1::tag` to specify that a field or tag should
be selected.

This makes it possible to select a tag when a field key and a tag key
conflict with each other in a measurement. It also means it's possible
to choose a field with a specific type if multiple shards disagree. If
no types are given, the same ordering for how a type is chosen is used
to determine which type to return.

The FieldDimensions method has been updated to return the data type for
the fields that get returned. The SeriesKeys function has also been
removed since it is no longer needed. SeriesKeys was originally used for
the fill iterator, but then expanded to be used by auxiliary iterators
for determining the channel iterator types. The fill iterator doesn't
need it anymore and the auxiliary types are better served by
FieldDimensions implementing that functionality, so SeriesKeys is no
longer needed.

Fixes #6519.
2016-05-16 12:08:29 -04:00
Jonathan A. Sternberg a17f3d960a SHOW TAG VALUES accepts != and !~ in WHERE clause
Fixes #6607.
2016-05-16 08:51:09 -04:00
Ben Johnson 49eb3b8d04
optimize show series iterator
This commit changes the `SeriesIterator` to process one measurement
at a time and uses a `floatFastDedupeIterator` to avoid point
encoding during deduplication.
2016-05-03 08:52:44 -06:00
Jason Wilder d82aa98951 Reduce indentation in filter func 2016-05-02 11:38:25 -06:00
Jason Wilder 3a7429886e Optimize Measurement.DropSeries 2016-05-02 11:36:04 -06:00
Jason Wilder 8082fc61ba Fix parsing keys when loading database index
The code for parsing a key our of the WAL or TSM files in the engine
was naive and didn't account for measurements with escape chars. This
uses the correct parsing code to parse and load them correctly.

Fixes #6496
2016-04-30 14:47:19 -06:00
Jason Wilder abcb559b09 Remove index meta data when series and measurements are gone
This remove the dropMeta param from the tsdb.Store.DeleteSeries and
lets the shard determine when to remove the meta data from the index
based on what series still have data in the shard.

This uncovered a nasty bug in compactions where a fully deleted series would
prematurely end the compactions and not carry forward the rest of the data
in the TSM file.  This is now fixed as well.
2016-04-29 16:31:57 -06:00
Edd Robinson 4d1cfa887c Ensure measurement dropped when no more series 2016-04-29 00:05:42 +01:00
Jason Wilder 2bd5880d7a Remove series from index when shard is closed
When a shard is closed and removed due to retention policy enforcement,
the series contained in the shard would still exists in the index causing
a memory leak.  Restarting the server would cause them not to be loaded.

Fixes #6457
2016-04-28 12:34:46 -06:00
Jonathan A. Sternberg d26e4e3650 Pass binary expressions to the underlying query
Binary math inside of a where condition was previously disallowed. Now,
these types of queries are just passed verbatim down to the underlying
query engine which can handle it.

We may want to revisit this when it comes to tags at some point as it
prevents the more efficient filtering of tags that a simple expression
allows, but it allows a query like this to be done:

    SELECT * FROM cpu WHERE value + 2 < 5

So while it can be better, this is a good initial implementation to
provide this functionality. There are very rare situations where a tag
may be used appropriately in one of these circumstances.

Fixes #3558.
2016-04-22 11:30:36 -04:00
Jonathan A. Sternberg 09c46a451a Sort the series keys inside of a tag set so the output is deterministic
The series keys within a tag set were previously not sorted which would
cause the output to be non-deterministic. This sorts the output series
by their keys so it has a consistent output especially when using
limits.

Fixes #3166.
2016-04-18 17:45:31 -04:00
Jonathan A. Sternberg ea6262b712 Enhance comparing tags and fields in the where clause
Now it is possible to compare tags and fields and it is also now
possible to compare tags and tags. Previously, it was only possible to
compare fields with fields and tags with a string or a regex.

Fixes #3371.
2016-04-11 18:10:08 -04:00
Jonathan A. Sternberg 5bdd61bde7 Support empty tags for all WHERE equality operations
A missing tag on a point was sometimes treated as `""` and sometimes
treated as a separate `null` entity. This change modifies the equality
operations to always treat a missing tag as an empty string.

Empty tags are *not* indexed and do not have the same performance as a
tag that exists.

Fixes #3773.
2016-04-11 12:01:35 -04:00
Edd Robinson 5327a75a6f Merge pull request #6216 from influxdata/er-scope-proto
Change protobuf package names to avoid clashes
2016-04-07 16:38:21 +01:00
Edd Robinson 184257a10d Scope all internal protobuf packages 2016-04-05 13:54:21 +01:00
Jason Wilder 3f4c5a5585 Fix race on measurementFields
Both Shard and Engine had the same reference to the measurementField map,
but they each protected it with their own locks.  This causes a race when
write and queries are occurring because writes can add new fields to the
map while queries are reading from it.

The fix moves the ownership to the Engine and provides protected accessors
to that Shard now users.  For the most parts, the access on shard were old
dead code.

Fixing the measurementFields map race created a new race on the internal
fields map.  This is now unexported and protected via MeasurementFields
exported funcs.

Fixes #6188
2016-04-01 18:57:01 -06:00
Jason Wilder 07e3215d11 Remove ununsed Series.match func 2016-03-31 10:19:46 -06:00
Jason Wilder 40c4973423 Remove per measurement stats collection
The stats setup ends up creating a lot of lock contention which signifcantly
impacts write throughput when a large number of measurements are used.

Fixes #6131
2016-03-31 10:19:27 -06:00
Jason Wilder f1bb87d4f8 Convert index write lock to series lock 2016-03-31 10:19:27 -06:00
Jason Wilder 9f41acba2f Move shard mapping logic into index 2016-03-29 12:59:27 -06:00
Jason Wilder 3f0e871425 Reduce lock content when loading database index 2016-03-29 12:59:26 -06:00
Jason Wilder 03ced4cc90 Load shards concurrently 2016-03-29 12:58:52 -06:00
Jonathan A. Sternberg a35d9602cd Fix where filters when a OR is used and when a tag does not exist
If an OR was used, merging filters between different expressions would
not work correctly. If one of the sides had a set of series ids with a
condition and the other side had no series ids associated with the
expression, all of the series from the side with a condition would have
the condition ignored. Instead of defaulting a non-existant series
filter to true, it should just be false and the evaluation of the one
side that does exist should take care of determining if the series id
should be included or not. The AND condition used false correctly so did
not have to be changed.

If a tag did not exist and `!=` or `!~` were used, it would return false
even though the neither a field or a tag equaled those values. This has
now been modified to correctly return the correct series ids and the
correct condition.

Also fixed a panic that would occur when a tag caused a field access to
become unnecessary. The filter using the field access still got created
and used even though it was unnecessary, resulting in an attempted
access to a non-initialized map.

Fixes #5152 and a bunch of other miscellaneous issues.
2016-03-22 12:19:06 -04:00
Jonathan A. Sternberg d75428f79f Rename the special condition "name" to "_name" to reduce conflicts
Fixes #6034.
2016-03-16 17:17:04 -04:00
Ben Johnson f692621ef5 allow querying of system-like series
Internal system series start with an underscore prefix but
restricting this prevents users who already use an underscore
prefix in their series names.

Fixes #5870
2016-03-14 13:50:52 -06:00
Jason Wilder c44195d999 Convert measurementToRegex to exported func
Make it consistent with other conventions where exported funcs
take a lock.
2016-03-09 17:45:37 -07:00
Jason Wilder ae2360df7c Use read lock to expand sources
A write-lock was taken which locks the whole store during a query
that needs to expand sources.  Under load, writes can start to fail.
2016-03-09 17:22:57 -07:00
Ben Johnson 41dde61226 SHOW SERIES 2016-03-08 11:47:57 -07:00
Jonathan A. Sternberg 2f0e246757 Implemented the tag values iterator for `SHOW TAG VALUES`
`SHOW TAG VALUES` output has been modified to print the measurement name
for every measurement and to return the output in two columns: key and
value. An example output might be:

    > SHOW TAG VALUES WITH KEY IN (host, region)
    name: cpu
    ---------
    key     value
    host    server01
    region  useast

    name: mem
    ---------
    key     value
    host    server02
    region  useast

`measurementsByExpr` has been taught how to handle reserved keys (ones
with an underscore at the beginning) to allow reusing that function and
skipping over expressions that don't matter to the call.

Fixes #5593.
2016-03-06 09:52:34 -05:00
Mark Rushakoff fb83374389 Track stats for number of series, measurements
Per database: track number of series and measurements
Per measurement: track number of series
2016-02-24 08:10:16 -08:00
Mark Rushakoff fc9ab7a46f Miscellaneous cleanup in tsdb package
* When possible, initialize maps/slices to exact length/capacity
  * See slice benchmarks at
    https://gist.github.com/mark-rushakoff/b5650bd8f06bece0b9fd
* Fixed some typos
* Removed an unnecessary loop in stringset.intersect
2016-02-10 18:00:47 -08:00
Justin Nuß 82c276756a Lint tsdb and tsdb/engine package 2016-02-10 21:33:46 +01:00
Ben Johnson 5a0d1ab7c1 rename influxdb/influxdb to influxdata/influxdb
This commit changes all the import and URL references from:

    github.com/influxdb/influxdb

to:

    github.com/influxdata/influxdb
2016-02-10 10:26:18 -07:00
Ben Johnson 607750ab1b add SHOW MEASUREMENTS iterator 2016-02-10 09:40:28 -07:00
Ben Johnson 00806de9b8 refactor query engine 2016-02-10 09:40:25 -07:00
Ben Johnson cde973f409 refactor query engine 2016-02-10 09:40:24 -07:00
Sean Beckett 1d83c8c427 Update meta.go 2015-10-13 16:46:59 -07:00
David Norton 512d6ac050 fix #4280: only drop points matching WHERE clause 2015-10-09 18:34:32 -04:00
Ben Johnson b213ddad78 refactor cursor 2015-09-22 13:10:12 -06:00
Ben Johnson 1b8b625787 refactor SelectMapper 2015-09-22 13:09:26 -06:00
Mark Rushakoff 85275e7d59 Sort DatabaseIndex.measurementsByTagFilters result
Fixes #4118
2015-09-20 14:37:27 -07:00
Cory LaNou d19a510ad2 refactor Points and Rows to dedicated packages 2015-09-16 15:33:08 -05:00
Jason Wilder 6b4926257a Add inspect tool
Start of a lower-level file inspection tool.  This currently dumps
summary statistics for the shards, index and WAL that can be used to
understand the shape of the data is in the local shards.  This util
operates on the shards itself and not through the server and is intended
more for debugging/troubleshooting.
2015-09-04 10:38:59 -06:00
Jason Wilder a4c1d9a9a7 Remove unused Database index names and sorting
Writes could timeout and when adding new measurement names to the
index if the sort took a long time.  The names slice was never
actually used (except a test) so keeping it in index wastes memory
and sort it wastes CPU and increases lock contention.  The sorting
was happening while the shard held a write-lock.

Fixes #3869
2015-08-27 11:57:20 -06:00
Paul Dix 1c24cbd8a7 Fix query engine not goroutine safe issue. 2015-08-19 18:43:50 -04:00
Paul Dix a509df0484 Compress metadata, add Delete to WAL.
* All metadata for each shard is now stored in a single key with compressed value
* Creation of new metadata no longer requires a syncrhnous write to Bolt. It is passed to the WAL and written to Bolt periodically outside the write path
* Added DeleteSeries to WAL and updated bz1 to remove series there when DeleteSeries or DropMeasurement are called
2015-08-18 08:10:51 -04:00
Paul Dix 3348dab4e0 Fix bug with new shards not getting series data persisted. 2015-08-16 15:45:09 -04:00
Jason Wilder 70aa6961c5 Remove unused in-memory index hash
The series map on Measurement was updated and deleted from but never
actually used.  Series keys can be very bia since they are the the
string representation of the measurement plus sorted tags.

Locally I see 20%-30% reduction in memory usage with 1M series.
2015-08-14 16:37:21 -06:00
Philip O'Toole 7b4879f0ce Fully remove a series when dropped
Fix issue #3226.
2015-08-14 10:50:35 -07:00
Jason Wilder 68b82f3030 Fix regex queries regression
ValidateGroupBy was returning an error if a tag does not exist
but it appears that function was supposed to be validating that
a field name was not used as a group by field.

Fixes #3326
2015-08-10 15:02:29 -06:00
Ben Johnson 1ada790de7 add bz1 storage engine 2015-08-03 14:32:17 -06:00
Jason Wilder 37c971bb82 Fix querying measurements with spaces
Fixes #3319
2015-07-22 14:49:54 -06:00
Ben Johnson a7f50ae03c refactor storage to engine 2015-07-22 11:08:10 -06:00
Ben Johnson de1f9a3736 refactor tsdb tests into test package 2015-07-22 11:07:06 -06:00
Philip O'Toole df3caefcf9 stringSet now takes varadic slice to add 2015-07-20 14:40:39 -07:00
Philip O'Toole 74cb96646c Refactor query engine for distributed query support
With this change, the query engine code gathers information about
shards and tagsets by working with individual shards, collating the
information, and returning that to the client. It does not assume that any
particular shard is local, and accesses all shards through abstracted
Mappers, of which there are two types -- a Mapper type for Raw queries
and a second type for Aggregate queries. There are corresponding
Executors for each type of Mapper, but both types of Executors share the
same interface.
2015-07-15 12:54:55 -07:00
Philip O'Toole dd66491f65 stringSet now returns elements in sorted order 2015-07-06 12:03:58 -04:00
Philip O'Toole cb7baa6d9e Don't group TagSets when tag values are identical
Fixes issue #3059
2015-06-22 16:04:13 -07:00
Pradeep Chhetri 37750acef6 Fixed some Typos 2015-06-11 17:33:26 +05:45
Philip O'Toole 64af1b6241 Report number of measurements and series per node 2015-06-11 00:21:15 -07:00
Philip O'Toole 344a1f4948 Don't even return value from DropSeries 2015-06-10 20:50:07 -07:00
Philip O'Toole 85fd3d0292 Series was not already dropped, return false 2015-06-09 14:25:20 -07:00
Paul Dix 9bf09ee026 Correct comments in tsdb/meta 2015-06-04 16:08:12 -04:00
Paul Dix 408bc3f81e Ensure proper locking of index structures on writes and queries. 2015-06-04 14:50:32 -04:00