Previously we overwrote the tasks existing latestCompleted to be used for latestCompleted as well as latestScheduled.
For obvious reasons this is confusing and missleading. I believe by seperating the two fields we can have a clear seperation
of concerns.
Implementations of the `kv.Bucket#Cursor` API may use
the hints to instruct the access or read behavior to
the underlying key/value store.
The `findAllTasks` function was also fixed to ensure
that paging works as expected when using a name filter.
Tests were added to verify this behavior.
Redundant error checks were also removed.
* feat(task): Allow tasks to run more isolated from other task systems
To allow the task internal system to be used for user created tasks as well
as checks, notification and other future additions we needed to take 2 actions:
1 - We need to use type as a first class citizen, meaning that task's have a type
and each system that will be creating tasks will set the task type through the api.
This is a change to the previous assumption that any user could set task types. This change
will allow us to have other service's white label the task service for their own purposes and not
have to worry about colissions between the types.
2 - We needed to allow other systems to add data specific to the problem they are trying to solve.
For this purpose adding a `metadata` field to the internal task system which should allow other systems to
use the task service.
These changes will allow us in the future to allow for the current check's and notifications implementations
to create a task with meta data instead of creating a check object and a task object in the database.
By allowing this new behavior checks, notifications, and user task's can all follow the same pattern:
Field an api request in a system specific http endpoint, use a small translation to the `TaskService` function call,
translate the results to what the api expects for this system, and return results.
* fix(task): undo additional check for ownerID because check is not ready
* feat(task): add limit function for task concurrency
The new task executor handles limit's differently then the old executor
instead of front loading limits by creating a runner for every task that might run
the new executor has a large worker pool and queue. This allow's us to have a unlimited
concurrency per task and helps us avoid a back log of task's execution based on a
arbitrary execution limit. This add's the ability to add an optional task execution limit
so a user can still have the advantages of limiting concurrency.
We needed the coordinator to be able to execute manual runs and resume runs.
These two functions have been added, but we also needed to allow for the executor to be
mocked out. To do that we needed to return a Promise interface instead of an actual
struct. Both these changes are to facilitate coordinator work and testing.
I chose to add a execute function that allow's the task executor to match expectation from
the scheduler but I left in the existing executor method that return's promises. This is
because I like to be able to have the accountablilty and visiblity inside what's happening
with each execution even though the promise isn't required for the scheduler. This function signature
will be used by the coordinator and potentially other's that want to ensure a 'execution' is completed.
The http error schema has been changed to simplify the outward facing
API. The `op` and `error` attributes have been dropped because they
confused people. The `error` attribute will likely be readded in some
form in the future, but only as additional context and will not be
required or even suggested for the UI to use.
Errors are now output differently both when they are serialized to JSON
and when they are output as strings. The `op` is no longer used if it is
present. It will only appear as an optional attribute if at all. The
`message` attribute for an error is always output and it will be the
prefix for any nested error. When this is serialized to JSON, the
message is automatically flattened so a nested error such as:
influxdb.Error{
Msg: errors.New("something bad happened"),
Err: io.EOF,
}
This would be written to the message as:
something bad happened: EOF
This matches a developers expectations much more easily as most
programmers assume that wrapping an error will act as a prefix for the
inner error.
This is flattened when written out to HTTP in order to make this logic
immaterial to a frontend developer.
The code is still present and plays an important role in categorizing
the error type. On the other hand, the code will not be output as part
of the message as it commonly plays a redundant and confusing role when
humans read it. The human readable message usually gives more context
and a message like with the code acting as a prefix is generally not
desired. But, the code plays a very important role in helping to
identify categories of errors and so it is very important as part of the
return response.
Implementations of the backend.Executor produce errors limited to
querying the KV store. The remainder of the errors will be processed
in the implementation of a `RunPromise`.
Fixes#15161
The current behavior is that the update is pushed into the scheduler,
and the scheduler cherry pick's what it needs. This leaves the task itself out
meaning any logging the scheduler did was not going to have the new task information in it.
When a task is told to execute it can be enqueued waiting for a worker.
This statistic will be superior to the existing delta based on scheduled for,
the current system can be effected by a user having slow queries or a long "delay" on the task.
This new way of measuring the same thing should allow us to accuratly measure when it is the task system's fault.
If we are caching run's in the kv storage system it is possible to get
the the cached version from the kv store and the recently completed run
from the analytical store. We just need to only show analytical results if
we find a duplicate.
fix(notification/check): include tags in check object in generated flux
Closes https://github.com/influxdata/influxdb/issues/14769
fix(notification/check): use selected field in threshold functions
Closes https://github.com/influxdata/influxdb/issues/14776
fix(testing): add selected field for check tests
fix(check): use real flux for threshold check
feat(notification/check): generate flux for deadman checks
chore(endpoint): rename webhook endpoint to http endpoint
fix(notification/rule): fetch url for flux script off of endpoint
fix(notification/rule): clean up slack and http rules
fix(notification/rule): change MessageTemp to MessageTemplate
fix(rules): pass endpoint in to rule during create
fix(ui): rename webhook to http
feat(notification/check): namespace deadman under alerts
fix(notification/check): nest tags under tags key in data object in flux
wip
feat(kv): log error if urm cannot be deleted for notification rule
fix(notification/rule): remove name from notify call in slack rule
chore(ui/cypress/e2e): skip rule create test
To have checks and notifications happen transactionally we need to be
able to alert the task system when a new task was created using the checks and notifications systems.
These two new middlewares allow us to inform the task system of a update
to a task that was created through the check or notification systems.
* feat(task): Remove token's from task structures
We had previously removed token's from the task api but left the token in place in several locations in the stack.
Now we can cleanly remove the extra tokens.
* feat(task): impersonate user on task execution
Passing tokens to tasks is cumbersome and we needed a way to more easily create tasks. With this change we no longer need a token on task create. We take the user that created the task and pass that in as the "owner". As far as the task is concerned the owner is the source of permissions.
This is done by adding an additional field on task create that is OwnerID. We will no longer respect the token passed in and it will be deprecated soon.
Things to do still:
Task updates need to allow for owners to be set.
Current behavior is that the first execution of a task happens based on the create time
of the task when using a 'every' schedule. If you create a task at 12:02 and want
the task to run every 15m. The first execution would happen at 12:17, and the 2nd would happen
at 12:30.
To fix this behavior I refactored the kv task to give a single source of knowledge.
We now have one function for finding exactly what the last scheduled task was.
We also now have a single method that calculates when the next schedule is due.
By unifying the logic it should always work the same way weather your asking when to run
or when creating a task.
* fix(tasks): Add a log message for run transition clairity
We on occasion will see a run in chronograf with missing run data.
We need to find out if we are submitting incomplete data or if we submit full data and somethinge else is happening
* Report errors found when iterating over flux query in task
* Add failing test for tasks executor result iterator exhaust failure
* Ensure errors exhausting tasks query result iterator are surfaced as task failure
* Update CHANGELOG with task result iteration error surfacing fix
The controller implementation is primarily used by influxdb so it
shouldn't be part of the flux repository. This copies the code from flux
to influxdb so it can be removed from the next flux release.
Now that the run status updates are transactional actions
We no longer have to add a timer to keep things on track.
This is causing a problem where some runs are showing up without a start or stop time if the system is busy.
I would rather have the scheduler hang on the update then leave a run action without required fields.
* task(fix): Tasks should no longer have inaccurate response data
tasks should be able to pull from a table with both success and failed results
Co-authored-by: AlirieGray <alirie@influxdata.com>
Co-authored-by: docmerlin <emrys@influxdata.com>
BucketsAccessed doesn't work currently with a private flux.Spec.
See this issue: https://github.com/influxdata/influxdb/issues/13278
This set of changes just allows code to compile until #13278 is fixed.
Note that preauthorization is not working in the meantime.
Fixes#13275.
This replaces usages of the spec compiler with the ast compiler and it
removes the error message referencing the spec compiler as an available
input.
It does not remove any of the code using the spec compiler that is
involved for proxying requests and it does not remove it from the API.
* Update task servicetest to move dependency to the new TaskControlService
closes#12724
We will now have the capability to write new task services that dont have to implement the backend.Store or LogReader or LogWriters
Task updates now attempt to keep the existing runners working.
This causes the system to be slightly slower after a task update and caused a flakey test.