From 88ecb0501cd1ae2b0ef63e01a62119a17a3232a5 Mon Sep 17 00:00:00 2001 From: Swati Sehgal Date: Wed, 21 Sep 2022 08:10:58 +0100 Subject: [PATCH] doc: capture device-plugin stricter workflow ordering explicitly Based on kubelet device manager refactoring done in 1.25 release, there is stricter ordering requirements where the device plugin MUST start a gRPC service before registering itself to kubelet. In case this ordering is not followed, the plugin registration will fail. Signed-off-by: Swati Sehgal --- .../compute-storage-net/device-plugins.md | 89 ++++++++++--------- 1 file changed, 47 insertions(+), 42 deletions(-) diff --git a/content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md b/content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md index c8cd33fe14..b26a25af04 100644 --- a/content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md +++ b/content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md @@ -87,60 +87,65 @@ spec: The general workflow of a device plugin includes the following steps: -* Initialization. During this phase, the device plugin performs vendor specific +1. Initialization. During this phase, the device plugin performs vendor-specific initialization and setup to make sure the devices are in a ready state. -* The plugin starts a gRPC service, with a Unix socket under host path +1. The plugin starts a gRPC service, with a Unix socket under the host path `/var/lib/kubelet/device-plugins/`, that implements the following interfaces: - ```gRPC - service DevicePlugin { - // GetDevicePluginOptions returns options to be communicated with Device Manager. - rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {} + ```gRPC + service DevicePlugin { + // GetDevicePluginOptions returns options to be communicated with Device Manager. + rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {} - // ListAndWatch returns a stream of List of Devices - // Whenever a Device state change or a Device disappears, ListAndWatch - // returns the new list - rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {} + // ListAndWatch returns a stream of List of Devices + // Whenever a Device state change or a Device disappears, ListAndWatch + // returns the new list + rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {} - // Allocate is called during container creation so that the Device - // Plugin can run device specific operations and instruct Kubelet - // of the steps to make the Device available in the container - rpc Allocate(AllocateRequest) returns (AllocateResponse) {} + // Allocate is called during container creation so that the Device + // Plugin can run device specific operations and instruct Kubelet + // of the steps to make the Device available in the container + rpc Allocate(AllocateRequest) returns (AllocateResponse) {} - // GetPreferredAllocation returns a preferred set of devices to allocate - // from a list of available ones. The resulting preferred allocation is not - // guaranteed to be the allocation ultimately performed by the - // devicemanager. It is only designed to help the devicemanager make a more - // informed allocation decision when possible. - rpc GetPreferredAllocation(PreferredAllocationRequest) returns (PreferredAllocationResponse) {} + // GetPreferredAllocation returns a preferred set of devices to allocate + // from a list of available ones. The resulting preferred allocation is not + // guaranteed to be the allocation ultimately performed by the + // devicemanager. It is only designed to help the devicemanager make a more + // informed allocation decision when possible. + rpc GetPreferredAllocation(PreferredAllocationRequest) returns (PreferredAllocationResponse) {} - // PreStartContainer is called, if indicated by Device Plugin during registeration phase, - // before each container start. Device plugin can run device specific operations - // such as resetting the device before making devices available to the container. - rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {} - } - ``` + // PreStartContainer is called, if indicated by Device Plugin during registeration phase, + // before each container start. Device plugin can run device specific operations + // such as resetting the device before making devices available to the container. + rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {} + } + ``` - {{< note >}} - Plugins are not required to provide useful implementations for - `GetPreferredAllocation()` or `PreStartContainer()`. Flags indicating which - (if any) of these calls are available should be set in the `DevicePluginOptions` - message sent back by a call to `GetDevicePluginOptions()`. The `kubelet` will - always call `GetDevicePluginOptions()` to see which optional functions are - available, before calling any of them directly. - {{< /note >}} + {{< note >}} + Plugins are not required to provide useful implementations for + `GetPreferredAllocation()` or `PreStartContainer()`. Flags indicating + the availability of these calls, if any, should be set in the `DevicePluginOptions` + message sent back by a call to `GetDevicePluginOptions()`. The `kubelet` will + always call `GetDevicePluginOptions()` to see which optional functions are + available, before calling any of them directly. + {{< /note >}} -* The plugin registers itself with the kubelet through the Unix socket at host +1. The plugin registers itself with the kubelet through the Unix socket at host path `/var/lib/kubelet/device-plugins/kubelet.sock`. -* After successfully registering itself, the device plugin runs in serving mode, during which it keeps - monitoring device health and reports back to the kubelet upon any device state changes. - It is also responsible for serving `Allocate` gRPC requests. During `Allocate`, the device plugin may - do device-specific preparation; for example, GPU cleanup or QRNG initialization. - If the operations succeed, the device plugin returns an `AllocateResponse` that contains container - runtime configurations for accessing the allocated devices. The kubelet passes this information - to the container runtime. + {{< note >}} + The ordering of the workflow is important. A plugin MUST start serving gRPC + service before registering itself with kubelet for successful registration. + {{< /note >}} + +1. After successfully registering itself, the device plugin runs in serving mode, during which it keeps + monitoring device health and reports back to the kubelet upon any device state changes. + It is also responsible for serving `Allocate` gRPC requests. During `Allocate`, the device plugin may + do device-specific preparation; for example, GPU cleanup or QRNG initialization. + If the operations succeed, the device plugin returns an `AllocateResponse` that contains container + runtime configurations for accessing the allocated devices. The kubelet passes this information + to the container runtime. ### Handling kubelet restarts