doc: capture device-plugin stricter workflow ordering explicitly

Based on kubelet device manager refactoring done in 1.25 release,
there is stricter ordering requirements where the device plugin
MUST start a gRPC service before registering itself to kubelet.
In case this ordering is not followed, the plugin registration
will fail.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
pull/36940/head
Swati Sehgal 2022-09-21 08:10:58 +01:00
parent d05d5df9ad
commit 88ecb0501c
1 changed files with 47 additions and 42 deletions

View File

@ -87,60 +87,65 @@ spec:
The general workflow of a device plugin includes the following steps:
* Initialization. During this phase, the device plugin performs vendor specific
1. Initialization. During this phase, the device plugin performs vendor-specific
initialization and setup to make sure the devices are in a ready state.
* The plugin starts a gRPC service, with a Unix socket under host path
1. The plugin starts a gRPC service, with a Unix socket under the host path
`/var/lib/kubelet/device-plugins/`, that implements the following interfaces:
```gRPC
service DevicePlugin {
// GetDevicePluginOptions returns options to be communicated with Device Manager.
rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}
```gRPC
service DevicePlugin {
// GetDevicePluginOptions returns options to be communicated with Device Manager.
rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}
// ListAndWatch returns a stream of List of Devices
// Whenever a Device state change or a Device disappears, ListAndWatch
// returns the new list
rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}
// ListAndWatch returns a stream of List of Devices
// Whenever a Device state change or a Device disappears, ListAndWatch
// returns the new list
rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}
// Allocate is called during container creation so that the Device
// Plugin can run device specific operations and instruct Kubelet
// of the steps to make the Device available in the container
rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
// Allocate is called during container creation so that the Device
// Plugin can run device specific operations and instruct Kubelet
// of the steps to make the Device available in the container
rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
// GetPreferredAllocation returns a preferred set of devices to allocate
// from a list of available ones. The resulting preferred allocation is not
// guaranteed to be the allocation ultimately performed by the
// devicemanager. It is only designed to help the devicemanager make a more
// informed allocation decision when possible.
rpc GetPreferredAllocation(PreferredAllocationRequest) returns (PreferredAllocationResponse) {}
// GetPreferredAllocation returns a preferred set of devices to allocate
// from a list of available ones. The resulting preferred allocation is not
// guaranteed to be the allocation ultimately performed by the
// devicemanager. It is only designed to help the devicemanager make a more
// informed allocation decision when possible.
rpc GetPreferredAllocation(PreferredAllocationRequest) returns (PreferredAllocationResponse) {}
// PreStartContainer is called, if indicated by Device Plugin during registeration phase,
// before each container start. Device plugin can run device specific operations
// such as resetting the device before making devices available to the container.
rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {}
}
```
// PreStartContainer is called, if indicated by Device Plugin during registeration phase,
// before each container start. Device plugin can run device specific operations
// such as resetting the device before making devices available to the container.
rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {}
}
```
{{< note >}}
Plugins are not required to provide useful implementations for
`GetPreferredAllocation()` or `PreStartContainer()`. Flags indicating which
(if any) of these calls are available should be set in the `DevicePluginOptions`
message sent back by a call to `GetDevicePluginOptions()`. The `kubelet` will
always call `GetDevicePluginOptions()` to see which optional functions are
available, before calling any of them directly.
{{< /note >}}
{{< note >}}
Plugins are not required to provide useful implementations for
`GetPreferredAllocation()` or `PreStartContainer()`. Flags indicating
the availability of these calls, if any, should be set in the `DevicePluginOptions`
message sent back by a call to `GetDevicePluginOptions()`. The `kubelet` will
always call `GetDevicePluginOptions()` to see which optional functions are
available, before calling any of them directly.
{{< /note >}}
* The plugin registers itself with the kubelet through the Unix socket at host
1. The plugin registers itself with the kubelet through the Unix socket at host
path `/var/lib/kubelet/device-plugins/kubelet.sock`.
* After successfully registering itself, the device plugin runs in serving mode, during which it keeps
monitoring device health and reports back to the kubelet upon any device state changes.
It is also responsible for serving `Allocate` gRPC requests. During `Allocate`, the device plugin may
do device-specific preparation; for example, GPU cleanup or QRNG initialization.
If the operations succeed, the device plugin returns an `AllocateResponse` that contains container
runtime configurations for accessing the allocated devices. The kubelet passes this information
to the container runtime.
{{< note >}}
The ordering of the workflow is important. A plugin MUST start serving gRPC
service before registering itself with kubelet for successful registration.
{{< /note >}}
1. After successfully registering itself, the device plugin runs in serving mode, during which it keeps
monitoring device health and reports back to the kubelet upon any device state changes.
It is also responsible for serving `Allocate` gRPC requests. During `Allocate`, the device plugin may
do device-specific preparation; for example, GPU cleanup or QRNG initialization.
If the operations succeed, the device plugin returns an `AllocateResponse` that contains container
runtime configurations for accessing the allocated devices. The kubelet passes this information
to the container runtime.
### Handling kubelet restarts