Commit Graph

1602 Commits (19f331a450d7542a0b471423adac7b949baa9628)

Author SHA1 Message Date
Jason Wilder 1bc0f68490 Merge branch '1.2' into jw-merge-12 2017-02-07 12:48:36 -07:00
Jonathan A. Sternberg e1fa48d0dd Fix ORDER BY time DESC with ordering series keys
The order of series keys is in ascending alphabetical order, not
descending alphabetical order, when it is ordered by descending time.
This fixes the ordering so points are returned in descending order. The
emitter also had the conditions for choosing which iterator to use in
the wrong direction (which only affects aggregates with `FILL(none)`).
2017-02-06 15:49:12 -06:00
Jason Wilder 2e95b4043c Merge branch '1.2' into jw-merge-12 2017-02-02 16:40:36 -07:00
Jonathan A. Sternberg e49ba016fa Fix incorrect math when aggregates that emit different times are used
When using `non_negative_derivative()` and `last()` in a math aggregate
with each other, the math would not be matched with each other because
one of those aggregates would emit one fewer point than the others. The
math iterators have been modified so they now track the name and tags of
a point and match based on those.

This isn't necessarily ideal and may come to bite us in the future. We
don't necessarily have a defined structure for all iterators so it can
be difficult to know which of two points is supposed to come first in
the ordering. This uses the common ordering that usually makes sense,
but the query engine is getting complicated enough where I am not 100%
certain that this is correct in all circumstances.
2017-02-02 14:40:41 -06:00
Joe LeGasse dd9278a098 regex: don't use exact match for case insensitive expression
Fixes #7906

In an attempt to reduce the overhead of using regex for exact matches,
the query parser will replace `=~ /^thing$/` with `== 'thing'`, but the
conditions being checked would ignore if any flags were set on the
expression, so `=~ /(?i)^THING$/` was replaced with `== 'THING'`, which
will fail unless the case was already exact. This change ensures that no
flags have been changed from those defaulted by the parser.
2017-02-02 10:49:12 -05:00
Joe LeGasse 93d18d42a6 regex: don't use exact match for case insensitive expression
Fixes #7906

In an attempt to reduce the overhead of using regex for exact matches,
the query parser will replace `=~ /^thing$/` with `== 'thing'`, but the
conditions being checked would ignore if any flags were set on the
expression, so `=~ /(?i)^THING$/` was replaced with `== 'THING'`, which
will fail unless the case was already exact. This change ensures that no
flags have been changed from those defaulted by the parser.
2017-02-02 10:25:08 -05:00
Cory LaNou e3e319a176 fix panic in query execution 2017-02-01 10:51:26 -06:00
Paul Dix a801c9dea6 Merge pull request #7889 from influxdata/js-subquery-fixes
Cherry-pick 1.2 fixes for subqueries into master
2017-01-26 10:49:37 -05:00
Edd Robinson 91ee34b111 Merge pull request #7837 from influxdata/er-tidy
General tidy up and subtle bug fixes
2017-01-26 13:43:07 +00:00
Jonathan A. Sternberg 2980f5b2b4 Fix mapping of types when the measurement uses a regex
With the new shard mapper implementation, regexes were just ignored so
it attempted to look up the field type inside of a measurement with no
name (which cannot possibly exist) so it would think the field didn't
exist and map it as the unknown type.
2017-01-25 16:32:57 -06:00
Jonathan A. Sternberg 552408c949 Fix mapping of types when the measurement uses a regex
With the new shard mapper implementation, regexes were just ignored so
it attempted to look up the field type inside of a measurement with no
name (which cannot possibly exist) so it would think the field didn't
exist and map it as the unknown type.
2017-01-25 09:49:51 -06:00
Jonathan A. Sternberg 83c6d53294 Support the WHERE clause in outer queries with subqueries 2017-01-23 15:01:32 -06:00
Jonathan A. Sternberg 3d4d9062a0 Update subqueries so groupings are propagated to inner queries
Previously, only time expressions got propagated inwards. The reason for
this was simple. If the outer query was going to filter to a specific
time range, then it would be unnecessary for the inner query to output
points within that time frame. It started as an optimization, but became
a feature because there was no reason to have the user repeat the same
time clause for the inner query as the outer query. So we allowed an
aggregate query with an interval to pass validation in the subquery if
the outer query had a time range. But `GROUP BY` clauses were not
propagated because that same logic didn't apply to them. It's not an
optimization there. So while grouping by a tag in the outer query
without grouping by it in the inner query was useless, there wasn't any
particular reason to care.

Then a bug was found where wildcards would propagate the dimensions
correctly, but the outer query containing a group by with the inner
query omitting it wouldn't correctly filter out the outer group by. We
could fix that filtering, but on further review, I had been seeing
people make that same mistake a lot. People seem to just believe that
the grouping should be propagated inwards. Instead of trying to fight
what the user wanted and explicitly erase groupings that weren't
propagated manually, we might as well just propagate them for the user
to make their lives easier. There is no useful situation where you would
want to group into buckets that can't physically exist so we might as
well do _something_ useful.

This will also now propagate time intervals to inner queries since the
same applies there. But, while the interval propagates, the following
query will not pass validation since it is still not possible to use a
grouping interval with a raw query (even if the inner query is an
aggregate):

    SELECT * FROM (SELECT mean(value) FROM cpu) WHERE time > now() - 5m GROUP BY time(1m)

This also means wildcards will behave a bit differently. They will
retrieve dimensions from the sources in the inner query rather than just
using the dimensions in the group by.

Fixing top() and bottom() to return the correct auxiliary fields.
Unfortunately, we were not copying the buffer with the auxiliary fields
so those values would be overwritten by a later point.
2017-01-23 15:01:19 -06:00
Jonathan A. Sternberg 6cd5b690d1 Support the WHERE clause in outer queries with subqueries 2017-01-23 14:49:04 -06:00
Jonathan A. Sternberg f199c50d25 Merge pull request #7854 from influxdata/js-7846-subquery-tag-propagation
Update subqueries so groupings are propagated to inner queries
2017-01-23 14:47:18 -06:00
Edd Robinson a67b5457f5 Merge pull request #7869 from influxdata/er-rp-validate-1.2
[Backport 1.2] #7866
2017-01-23 19:36:12 +00:00
Edd Robinson d30819b978 Ensure rp names validated in CREATE DATABASE WITH 2017-01-23 19:18:07 +00:00
Edd Robinson 0804cdb7b5 Ensure rp names validated in CREATE DATABASE WITH 2017-01-23 19:00:19 +00:00
Cory LaNou d54a955068 allow partial writes on field conflicts 2017-01-23 11:54:46 -07:00
Jonathan A. Sternberg f628b4a198 Update subqueries so groupings are propagated to inner queries
Previously, only time expressions got propagated inwards. The reason for
this was simple. If the outer query was going to filter to a specific
time range, then it would be unnecessary for the inner query to output
points within that time frame. It started as an optimization, but became
a feature because there was no reason to have the user repeat the same
time clause for the inner query as the outer query. So we allowed an
aggregate query with an interval to pass validation in the subquery if
the outer query had a time range. But `GROUP BY` clauses were not
propagated because that same logic didn't apply to them. It's not an
optimization there. So while grouping by a tag in the outer query
without grouping by it in the inner query was useless, there wasn't any
particular reason to care.

Then a bug was found where wildcards would propagate the dimensions
correctly, but the outer query containing a group by with the inner
query omitting it wouldn't correctly filter out the outer group by. We
could fix that filtering, but on further review, I had been seeing
people make that same mistake a lot. People seem to just believe that
the grouping should be propagated inwards. Instead of trying to fight
what the user wanted and explicitly erase groupings that weren't
propagated manually, we might as well just propagate them for the user
to make their lives easier. There is no useful situation where you would
want to group into buckets that can't physically exist so we might as
well do _something_ useful.

This will also now propagate time intervals to inner queries since the
same applies there. But, while the interval propagates, the following
query will not pass validation since it is still not possible to use a
grouping interval with a raw query (even if the inner query is an
aggregate):

    SELECT * FROM (SELECT mean(value) FROM cpu) WHERE time > now() - 5m GROUP BY time(1m)

This also means wildcards will behave a bit differently. They will
retrieve dimensions from the sources in the inner query rather than just
using the dimensions in the group by.

Fixing top() and bottom() to return the correct auxiliary fields.
Unfortunately, we were not copying the buffer with the auxiliary fields
so those values would be overwritten by a later point.
2017-01-23 12:38:10 -06:00
Cory LaNou 0103e44896
allow partial writes on field conflicts 2017-01-23 12:25:35 -06:00
Edd Robinson fb7388cdfc Remove dead code from various pkgs 2017-01-17 09:47:34 -08:00
gunnaraasen c8e15da54d Remove token message; Fixes #7823 2017-01-11 13:43:45 -08:00
Mark Rushakoff bbb43faad2 Add more config validation 2017-01-10 10:28:49 -08:00
Jonathan A. Sternberg e7b7984c27 Merge pull request #7817 from influxdata/js-7326-verbose-output-for-ssl-connection-errors
Verbose output for SSL connection errors
2017-01-10 12:10:55 -06:00
Jonathan A. Sternberg 73b76d1227 Verbose output for SSL connection errors
When an error that appears to be an SSL error happens without SSL
enabled, the client will attempt to reconnect with SSL just to see if
that works. If it works, it exits with an error message telling the user
to add `-ssl`. It will also do the same if the SSL connection is unsafe
although it will warn that this is insecure.
2017-01-10 11:53:17 -06:00
Jonathan A. Sternberg b58d1778e2 Remove improper newlines from logging statements 2017-01-10 11:20:09 -06:00
Mark Rushakoff a135906b43 Merge pull request #7747 from influxdata/mr-lint-cleanup
Miscellaneous lint cleanup
2017-01-10 08:22:00 -08:00
Mark Rushakoff 8c2cfd14af Merge pull request #7808 from influxdata/mr-fix-benchmarks
Fix broken server benchmarks
2017-01-09 15:13:01 -08:00
Jonathan A. Sternberg 4a559c4620 Merge pull request #7646 from influxdata/js-4619-subqueries
Support subquery execution in the query language
2017-01-09 14:14:01 -06:00
Mark Rushakoff c126dc5f19 Fix broken server benchmarks
These seem to have been broken in #7368.
2017-01-09 11:09:25 -08:00
Jason Wilder eb4d311c0a Add retry/backup when backing up a shard fails
The backup command can fail if a snapshot is running which silently
closes the connection.  This causes the backup shard command to continue
on as if nothing failed.
2017-01-09 11:28:48 -07:00
Jonathan A. Sternberg d7c8c7ca4f Support subquery execution in the query language
This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.

Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).

The syntax is like this:

    SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)

This will execute derivative and then sum the result of those derivatives.
Another example:

    SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)

This would let you find the maximum minimum value of each host.

There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:

    SELECT mean(value) FROM cpu
    SELECT mean(value) FROM (SELECT value FROM cpu)

Have different performance characteristics. The first will calculate
`mean(value)` at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.
2017-01-07 13:00:48 -06:00
Jason Wilder bbd9d97d73 Re-enabled TestServer_BackupAndRestore
It was failing intermittently, but seems to fail consistently one
re-enabled.  A slice pointer was incremented too early causing a
panic.

Fixes #6590
2017-01-06 16:55:12 -07:00
Mark Rushakoff 390a16925d Merge pull request #7781 from influxdata/mr-godoc
Godoc cleanup
2017-01-04 14:11:51 -08:00
Mark Rushakoff 07b87f2630 Miscellaneous lint cleanup 2017-01-03 09:47:32 -08:00
Michael Nikitochkin 5ebd4244b1 Merge branch 'master' into env-array-config 2017-01-02 16:35:55 +01:00
Mark Rushakoff 6768c6ed3b Update godoc for the cmd package and subpackages 2016-12-30 11:58:43 -08:00
Gustav Westling 69c5354d98
Use length instead of removing it 2016-12-30 12:23:40 +01:00
Gustav Westling 56d98325da
Removed ineffective assignments, and added checks for errors that previsouly was not checked 2016-12-29 20:26:15 +01:00
Michael Nikitochkin 65b08e56f7 [#7323]: Allow add items to array config via ENV
Allow to create a new templates or tags configs, if there are no records
in the default config.

Fixes: #6943
2016-12-23 09:20:46 +01:00
Cory LaNou bc5736f59d Merge pull request #7672 from influxdata/cjl-7563-rp-duration-inf
Enforce minimum shard duration when creating retention policies
2016-12-20 12:16:07 -06:00
Cory LaNou 880c7cdcc8 Merge branch 'master' into cjl-3188-cli-rp-context 2016-12-20 09:55:40 -06:00
Cory LaNou 0cbdea531a add the ability for retention policy context in cli with use command 2016-12-20 09:15:38 -06:00
Cory LaNou fbc9e3cfcc add clear command to cli 2016-12-20 09:14:20 -06:00
Cory LaNou 572da8985c enforce minimum shard duration when creating retention policies 2016-12-20 09:11:43 -06:00
Mark Rushakoff 295a29d4ea Fix quoting on exported string fields
The previous implementation was wrong and double-escaped quotes and
backslashes.
2016-12-18 00:37:48 -08:00
Mark Rushakoff ff78c84b0f Improve export performance
Benchmark improvements with this change:

benchmark                                   old ns/op     new ns/op     delta
BenchmarkExportTSMFloats_100s_250vps-4      23206480      10279106      -55.71%
BenchmarkExportTSMInts_100s_250vps-4        17995000      5762310       -67.98%
BenchmarkExportTSMBools_100s_250vps-4       17067605      4235467       -75.18%
BenchmarkExportTSMStrings_100s_250vps-4     54846997      34682568      -36.76%
BenchmarkExportWALFloats_100s_250vps-4      23459937      10436297      -55.51%
BenchmarkExportWALInts_100s_250vps-4        18747150      6236062       -66.74%
BenchmarkExportWALBools_100s_250vps-4       17988273      4814358       -73.24%
BenchmarkExportWALStrings_100s_250vps-4     59700802      35815739      -40.01%

benchmark                                   old allocs     new allocs     delta
BenchmarkExportTSMFloats_100s_250vps-4      201442         51738          -74.32%
BenchmarkExportTSMInts_100s_250vps-4        201442         51728          -74.32%
BenchmarkExportTSMBools_100s_250vps-4       201441         51638          -74.37%
BenchmarkExportTSMStrings_100s_250vps-4     404092         201584         -50.11%
BenchmarkExportWALFloats_100s_250vps-4      250322         75627          -69.79%
BenchmarkExportWALInts_100s_250vps-4        250323         75617          -69.79%
BenchmarkExportWALBools_100s_250vps-4       250321         75527          -69.83%
BenchmarkExportWALStrings_100s_250vps-4     452868         225291         -50.25%

benchmark                                   old bytes     new bytes     delta
BenchmarkExportTSMFloats_100s_250vps-4      5170539       2351789       -54.52%
BenchmarkExportTSMInts_100s_250vps-4        5143189       2331276       -54.67%
BenchmarkExportTSMBools_100s_250vps-4       3724951       2143780       -42.45%
BenchmarkExportTSMStrings_100s_250vps-4     17131400      10796281      -36.98%
BenchmarkExportWALFloats_100s_250vps-4      4487868       1468109       -67.29%
BenchmarkExportWALInts_100s_250vps-4        4458395       1452359       -67.42%
BenchmarkExportWALBools_100s_250vps-4       2838719       1258755       -55.66%
BenchmarkExportWALStrings_100s_250vps-4     16787201      10010700      -40.37%

Also, after improving those benchmarks, I did a time-filtered export on
a 450MB TSM file to a 21GB plain text output, with and without the
bufio.BufferedWriter.

Without buffering, it took about 263s, and with buffering, it took about
60s, for a delta of about -77%.
2016-12-17 20:15:39 -08:00
Mark Rushakoff da45aab52c Clean up export code, add tests and benchmarks
The export code was moved around a bit, particularly to ease testing
export of a single TSM or WAL file. The functionality should not have
changed.
2016-12-17 18:17:18 -08:00
Jonathan A. Sternberg ec57108520 Use proper uber-go/zap import path
It looks like the real import path to the project is go.uber.org/zap
instead of github.com/uber-go/zap since the example in the project
references that path.
2016-12-15 08:54:14 -06:00