Commit Graph

1883 Commits (643b2eb30cd4726e58a9052ee6f15ec9bea759cf)

Author SHA1 Message Date
Seif Lotfy 643b2eb30c Switch to LogLog-Beta Cardinality estimation
The new algorithm uses only one formula and needs no additional bias corrections for the entire range of cardinalities,
therefore, it is more efficient and simpler to implement. Our simulations show that the accuracy provided by the new
algorithm is as good as or better than the accuracy provided by either of HyperLogLog or HyperLogLog++. The sparse
representation was kept in to provide better low cardinality accuracy. However the linear counting and range estimations
are replaced.
2017-06-20 15:25:01 +02:00
Stuart Carnie 932edd90b2 Merge branch 'master' into sgc-8188 2017-06-14 10:55:06 +10:00
Stuart Carnie 3657dbc256 update CHANGELOG with key names 2017-06-14 10:37:04 +10:00
Jonathan A. Sternberg e2c1da05e4 Merge pull request #8480 from influxdata/js-stats-interval
Change the default stats interval to 1 second instead of 10 seconds
2017-06-13 12:08:23 -05:00
Ben Johnson b51f604030
Fix TSI non-contiguous compaction panic.
This fixes the case where log files are compacted out of order
and cause non-contiguous sets of index files to be compacted.

Previously, the compaction planner would fetch a list of index files
for each level and compact them in order starting with the oldest
ones. This can be a problem for level 1 because level 0 (log files)
are compacted individually and in some cases a log file can finish
compacting before older log files are finished compacting. This
causes there to be a gap in the list of level 1 files that is
ignored when fetching a list of index files.

Now, the planner reads the list of index files starting from the
oldest but stops once it hits a log file. This prevents that gap
from being ignored.
2017-06-13 10:53:26 -06:00
Jonathan A. Sternberg f7382982fd Change the default stats interval to 1 second instead of 10 seconds 2017-06-12 13:15:25 -05:00
marchtea 16dfe2a0ae update CHANGELOG.md 2017-06-12 11:00:27 +08:00
Stuart Carnie 2de52834f0 CQ statistics written to monitor database, addresses #8188
* off by default, enabled by `query-stats-enabled`
* writes to cq_query measurement of configured monitor database
* see CHANGELOG for schema of individual points
2017-06-10 09:20:38 +08:00
Ben Johnson bcc6ef769b
Check file count before attempting a TSI level compaction.
This check was previously in a different section of code which
was lost during a refactor to the new compaction strategy. The
compaction planning now makes a check to ensure at least two
files are available for compaction in a level.
2017-06-06 11:08:59 -06:00
Stuart Carnie 98f2050bcb Update config sample and CHANGELOG 2017-06-05 22:05:00 +08:00
Ben Johnson 3128c6a42e
Fix SHOW TAG VALUES deduplication. 2017-06-01 15:38:35 -06:00
Jonathan A. Sternberg 6a78f1cf4a URL query parameter credentials take priority over Authentication header 2017-05-30 09:26:24 -05:00
Jason Wilder f1181cc402 Update changelog 2017-05-24 14:47:01 -06:00
Jonathan A. Sternberg 9edf236cc8 Maintain the tags of points selected by top() or bottom() when writing the results
When a `SELECT ... INTO ...` is used with `top()` or `bottom()` used
with tags, the points will be written with the tags still intact instead
of converted to fields.
2017-05-23 15:00:21 -05:00
Ryan Betts b18a7e8deb Merge pull request #8396 from influxdata/changelog-merges
Add 1.2.4 and 1.1.5 CHANGELOG updates.
2017-05-23 14:31:09 -04:00
Jason Wilder 31d2309177 Update changelog 2017-05-22 14:53:06 -06:00
Jonathan A. Sternberg 4bdce21a9a Merge pull request #8394 from influxdata/js-top-bottom-performance
Optimize top() and bottom() using an incremental aggregator
2017-05-19 14:32:55 -05:00
Jason Wilder 55f2f83e34 Merge pull request #8407 from influxdata/jw-8392
Return partial write error when points outside of retention policy ar…
2017-05-19 11:25:08 -06:00
Jonathan A. Sternberg 7b9b55bfc0 Optimize top() and bottom() using an incremental aggregator
The previous version of `top()` and `bottom()` would gather all of the
points to use in a slice, filter them (if necessary), then use a
slightly modified heap sort to retrieve the top or bottom values.

This performed horrendously from the standpoint of memory. Since it
consumed so much memory and spent so much time in allocations (along
with sorting a potentially very large slice), this affected speed too.

These calls have now been modified so they keep the top or bottom points
in a min or max heap. For `top()`, a new point will read the minimum
value from the heap. If the new point is greater than the minimum point,
it will replace the minimum point and fix the heap with the new value.
If the new point is smaller, it discards that point. For `bottom()`, the
process is the opposite.

It will then sort the final result to ensure the correct ordering of the
selected points.

When `top()` or `bottom()` contain a tag to select, they have now been
modified so this query:

    SELECT top(value, host, 2) FROM cpu

Essentially becomes this query:

    SELECT top(value, 2), host FROM (
        SELECT max(value) FROM cpu GROUP BY host
    )

This should drastically increase the performance of all `top()` and
`bottom()` queries.
2017-05-19 11:56:46 -05:00
Jason Wilder afb1027bed Return partial write error when points outside of retention policy are dropped
Writing points outside of a retention policy range were silently dropped. They
are dropped to prevent creating a shard that will be immediately deleted.  These
dropped points were silent and did not return an error respone to the caller.

Fixes #8392
2017-05-19 10:50:03 -06:00
Jonathan A. Sternberg 7d043dbc61 Add nanosecond duration literal support 2017-05-19 10:44:11 -05:00
Edd Robinson a5fed3d296 Merge pull request #7862 from influxdata/er-debug-all
Adds handler for returning a profile archive
2017-05-17 17:09:39 +01:00
Ryan Betts a43856adc6 Add 1.2.4 and 1.1.5 CHANGELOG updates. 2017-05-16 16:51:29 -04:00
Edd Robinson 1cbbaa9317 Add support for shards, stats and diagnostics 2017-05-15 14:12:00 +01:00
Edd Robinson 8f8ff0ec61 Adds handler for returning a profile archive
Currently, when debugging issues with InfluxDB we often ask for the
following profiles:

  curl -o block.txt "http://localhost:8086/debug/pprof/block?debug=1"
  curl -o goroutine.txt
"http://localhost:8086/debug/pprof/goroutine?debug=1"
  curl -o heap.txt "http://localhost:8086/debug/pprof/heap?debug=1"
  curl -o cpu.txt "http://localhost:8086/debug/pprof/profile

This can be bothersome for users, or even difficult if they're
unfamiliar with cURL (or it's not on their system).

This commit adds a new endpoint: /debug/pprof/all which will return a
single compressed archive of all of the above profiles. The CPU profile
is optional, and not returned by default. To include a CPU profile the
URL to request should be: /debug/pprof/all?cpu=true. It's also possible
to vary the length of the CPU profile by adding a `seconds=x` parameter,
where x defaults to 30, if absent.

The new command for gathering profiles from users should now be:

  curl -o profiles.tar.gz "http://localhost:8086/debug/pprof/all"

Or, if we need to see a CPU profile:

  curl -o profiles.tar.gz
"http://localhost:8086/debug/pprof/all?cpu=true"

It's important to remember that a CPU profile is a blocking operation
and by default it will take 30 seconds for the response to be returned
to the user.

Finally, if the user is unfamiliar with cURL, they will now be able to
visit http://localhost:8086/debug/pprof/all in a web browser, and the
archive will be downloaded to their machine.
2017-05-15 14:11:38 +01:00
Mark Rushakoff 6f438ea467 Update CHANGELOG 2017-05-12 17:09:09 -07:00
Jason Wilder 0b7c0b680c Update changelog 2017-05-12 14:05:24 -06:00
Jonathan A. Sternberg dea02009e0 Small edits to the etc/config.sample.toml file 2017-05-10 10:56:34 -05:00
Jonathan A. Sternberg 2780630a5f Track HTTP client requests for /write and /query with /debug/requests
After using `/debug/requests`, the client will wait for 30 seconds
(configurable by specifying `seconds=` in the query parameters) and the
HTTP handler will track every incoming query and write to the system.
After that time period has passed, it will output a JSON blob that looks
very similar to `/debug/vars` that shows every IP address and user
account (if authentication is used) that connected to the host during
that time.

In the future, we can add more metrics to track. This is an initial
start to aid with debugging machines that connect too often by looking
at a sample of time (like `/debug/pprof`).
2017-05-09 10:18:33 -05:00
Jason Wilder 29c2b1958e Fix deletes triggering unnecessary compactions
Tombstone files would be written to all TSM files even if the deleted
keys or timerange did not exist in the TSM file.  This had the side
effect of causing shards to get recompacted back to the same state. If
any shards or large numbers of TSM files existed, disk usage and CPU
utilization would spike causing issues.

This prevents tombstones being written for TSM files that could not
possiby contain the series keys being deleted or if the delted time
range is outside the range of the file.
2017-05-08 14:52:28 -06:00
Ben Johnson 489c89bea4
Add tsi support tooling. 2017-05-08 11:00:15 -06:00
Jonathan A. Sternberg 260bdef3d4 Set the CSV output to an empty string for null values 2017-05-04 20:51:58 -05:00
Jason Wilder 684f5d884a Update changelog 2017-05-03 16:31:57 -06:00
Jonathan A. Sternberg df30a4d9c9 Refactor the subquery code and fix outer condition queries
This change refactors the subquery code into a separate builder class to
help allow for more reuse and make the functions smaller and easier to
read.

The previous function that handled most of the code was too big and
impossible to reason through.

This also goes and replaces the complicated logic of aggregates that had
a subquery source with the simpler IteratorMapper. I think the overhead
from the IteratorMapper will be more, but I also believe that the actual
code is simpler and more robust to produce more accurate answers. It
might be a future project to optimize that section of code, but I don't
have any actual numbers for the efficiency of one method and I believe
accuracy and code clarity may be more important at the moment since I am
otherwise incapable of reading my own code.
2017-04-28 17:12:32 -05:00
Jonathan A. Sternberg addc12561f Fix LIMIT and OFFSET for certain aggregate queries
When LIMIT and OFFSET were used with any functions that were not handled
directly by the query engine (anything other than count, max, min, mean,
first, or last), the input to the function would be limited instead of
receiving the full stream of values it was supposed to receive.

This also fixes a bug that caused the server to hang when LIMIT and
OFFSET were used with a selector. When using a selector, the limit and
offset should be handled before the points go to the auxiliary iterator
to be split into different iterators. Limiting happened afterwards which
caused the auxiliary iterator to hang forever.
2017-04-28 15:55:06 -05:00
Ben Johnson 3a46e5dd9e
Remove default upper time bound for DELETE queries. 2017-04-28 12:26:26 -06:00
Jason Wilder a736f186f0 Merge pull request #8327 from influxdata/jw-go181
Update to go 1.8.1
2017-04-27 08:42:30 -06:00
Jonathan A. Sternberg be3bce5212 top() and bottom() now returns the time for every point
`top()` and `bottom()` will now organize the points by time and also
keep the points original time even when a time grouping is used. At the
same time, `top()` and `bottom()` will no longer honor any fill options
that are present since they don't really make sense for these specific
functions.

This also fixes the aggregate and selectors to honor the ordered
iterator option so iterator remain ordered and to also respect the
buckets that are created by the final dimensions of the query so that
two buckets don't overlap each other within the same reducer. A test has
been added for this situation. This should clarify and encourage the use
of the ordered attribute within the query engine.
2017-04-26 15:07:10 -05:00
Jonathan A. Sternberg 4776b216a4 Merge pull request #8253 from influxdata/js-8065-restrict-top-bottom-query
Restrict top() and bottom() selectors to be used with no other functions
2017-04-26 15:06:30 -05:00
Jason Wilder 4db3b69b9d Update to go1.8.1 2017-04-26 11:32:42 -06:00
Jonathan A. Sternberg 1300f4cc6c Remove the admin UI 2017-04-25 16:58:24 -05:00
Jason Wilder 71825d20c8 Update changelog 2017-04-20 12:31:06 -06:00
Jason Wilder 5c51ae7319 Merge branch '1.2' into jw-merge-123 2017-04-14 14:36:54 -06:00
Cory LaNou 8c0f5a7dbe
redact passwords before saving history in cli 2017-04-14 13:13:56 -05:00
Jonathan A. Sternberg 57a2abbc87 Restrict top() and bottom() selectors to be used with no other functions 2017-04-14 10:23:07 -05:00
Cory LaNou 775c5d243d Add changelog for 8187 2017-04-13 13:33:25 -05:00
Cory LaNou f96b59ed20 Add changelog for 8187 2017-04-13 10:31:31 -05:00
Jonathan A. Sternberg a550d323c4 Restrict fill(none) and fill(linear) to be usable only with aggregate queries 2017-04-10 15:58:05 -05:00
Jonathan A. Sternberg 0a5e4bd92b Implicitly cast null to false in binary expressions with a boolean
Also more consistently treat a binary expression with strings so it
produces the same value no matter the direction of the expression.
2017-04-06 12:26:04 -05:00
Jonathan A. Sternberg 45895862b7 Merge pull request #8058 from karlding/service-golinting
Make services/{admin, httpd, subscriber, udp} golintable
2017-04-05 12:30:11 -05:00