doc: capture device-plugin stricter workflow ordering explicitly

Based on kubelet device manager refactoring done in 1.25 release,
there is stricter ordering requirements where the device plugin
MUST start a gRPC service before registering itself to kubelet.
In case this ordering is not followed, the plugin registration
will fail.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
pull/36940/head
Swati Sehgal 2022-09-21 08:10:58 +01:00
parent d05d5df9ad
commit 88ecb0501c
1 changed files with 47 additions and 42 deletions

View File

@ -87,60 +87,65 @@ spec:
The general workflow of a device plugin includes the following steps: The general workflow of a device plugin includes the following steps:
* Initialization. During this phase, the device plugin performs vendor specific 1. Initialization. During this phase, the device plugin performs vendor-specific
initialization and setup to make sure the devices are in a ready state. initialization and setup to make sure the devices are in a ready state.
* The plugin starts a gRPC service, with a Unix socket under host path 1. The plugin starts a gRPC service, with a Unix socket under the host path
`/var/lib/kubelet/device-plugins/`, that implements the following interfaces: `/var/lib/kubelet/device-plugins/`, that implements the following interfaces:
```gRPC ```gRPC
service DevicePlugin { service DevicePlugin {
// GetDevicePluginOptions returns options to be communicated with Device Manager. // GetDevicePluginOptions returns options to be communicated with Device Manager.
rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {} rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}
// ListAndWatch returns a stream of List of Devices // ListAndWatch returns a stream of List of Devices
// Whenever a Device state change or a Device disappears, ListAndWatch // Whenever a Device state change or a Device disappears, ListAndWatch
// returns the new list // returns the new list
rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {} rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}
// Allocate is called during container creation so that the Device // Allocate is called during container creation so that the Device
// Plugin can run device specific operations and instruct Kubelet // Plugin can run device specific operations and instruct Kubelet
// of the steps to make the Device available in the container // of the steps to make the Device available in the container
rpc Allocate(AllocateRequest) returns (AllocateResponse) {} rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
// GetPreferredAllocation returns a preferred set of devices to allocate // GetPreferredAllocation returns a preferred set of devices to allocate
// from a list of available ones. The resulting preferred allocation is not // from a list of available ones. The resulting preferred allocation is not
// guaranteed to be the allocation ultimately performed by the // guaranteed to be the allocation ultimately performed by the
// devicemanager. It is only designed to help the devicemanager make a more // devicemanager. It is only designed to help the devicemanager make a more
// informed allocation decision when possible. // informed allocation decision when possible.
rpc GetPreferredAllocation(PreferredAllocationRequest) returns (PreferredAllocationResponse) {} rpc GetPreferredAllocation(PreferredAllocationRequest) returns (PreferredAllocationResponse) {}
// PreStartContainer is called, if indicated by Device Plugin during registeration phase, // PreStartContainer is called, if indicated by Device Plugin during registeration phase,
// before each container start. Device plugin can run device specific operations // before each container start. Device plugin can run device specific operations
// such as resetting the device before making devices available to the container. // such as resetting the device before making devices available to the container.
rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {} rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {}
} }
``` ```
{{< note >}} {{< note >}}
Plugins are not required to provide useful implementations for Plugins are not required to provide useful implementations for
`GetPreferredAllocation()` or `PreStartContainer()`. Flags indicating which `GetPreferredAllocation()` or `PreStartContainer()`. Flags indicating
(if any) of these calls are available should be set in the `DevicePluginOptions` the availability of these calls, if any, should be set in the `DevicePluginOptions`
message sent back by a call to `GetDevicePluginOptions()`. The `kubelet` will message sent back by a call to `GetDevicePluginOptions()`. The `kubelet` will
always call `GetDevicePluginOptions()` to see which optional functions are always call `GetDevicePluginOptions()` to see which optional functions are
available, before calling any of them directly. available, before calling any of them directly.
{{< /note >}} {{< /note >}}
* The plugin registers itself with the kubelet through the Unix socket at host 1. The plugin registers itself with the kubelet through the Unix socket at host
path `/var/lib/kubelet/device-plugins/kubelet.sock`. path `/var/lib/kubelet/device-plugins/kubelet.sock`.
* After successfully registering itself, the device plugin runs in serving mode, during which it keeps {{< note >}}
monitoring device health and reports back to the kubelet upon any device state changes. The ordering of the workflow is important. A plugin MUST start serving gRPC
It is also responsible for serving `Allocate` gRPC requests. During `Allocate`, the device plugin may service before registering itself with kubelet for successful registration.
do device-specific preparation; for example, GPU cleanup or QRNG initialization. {{< /note >}}
If the operations succeed, the device plugin returns an `AllocateResponse` that contains container
runtime configurations for accessing the allocated devices. The kubelet passes this information 1. After successfully registering itself, the device plugin runs in serving mode, during which it keeps
to the container runtime. monitoring device health and reports back to the kubelet upon any device state changes.
It is also responsible for serving `Allocate` gRPC requests. During `Allocate`, the device plugin may
do device-specific preparation; for example, GPU cleanup or QRNG initialization.
If the operations succeed, the device plugin returns an `AllocateResponse` that contains container
runtime configurations for accessing the allocated devices. The kubelet passes this information
to the container runtime.
### Handling kubelet restarts ### Handling kubelet restarts