website/content/en/docs/tasks/administer-cluster/topology-manager.md

188 lines
8.9 KiB
Markdown
Raw Normal View History

---
title: Control Topology Management Policies on a node
Official 1.17 Release Docs (#18011) * feat: graduate TaintNodesByCondition to GA (#17073) * Promote StartupProbe to beta (enabled by default). (#17164) * Watch bookmarks to GA (#17026) * feat: graduate ScheduleDaemonSetPods to GA (#17350) * Update Docker installation instructions (#17405) * Use exact version numbers for installing Docker in Ubuntu (#17428) * Move CSIMigration and CSIMigrationGCE to Beta in Kubernetes v1.17 (#17478) * Promote NodeLease feature to GA (#17189) * Update docs for csi topology ga (#17408) * Update RunAsUsername to beta (#17460) * doc:Update RunAsUsername to beta * doc: update samples - kubernetes.io/os is no longer beta * Updating based on review feedback * Promote Node-specific volume limits to GA (#17432) * Promote PodShareProcessNamespace to stable (#17192) * Promote PodShareProcessNamespace to stable * Add for_k8s_version to feature-state label Co-Authored-By: Tim Bannister <tim@scalefactory.com> * Readd version-check to shareProcessNamespace task * Update service load balancer finalizer doc for GA (#17438) * Update Topology Manager docs (#17451) * Added information on how device plugins can take advantage of Topology Manager * Updated the Topology Manager documentation to include additionalinformation and update some out of date sections * Fix broken Topology Manager link (#17746) Part of What's Next Device Plugin section * Update CRD defaulting docs for GA (#17450) * Add documentation for VolumeSnapshot Beta (#17233) * Updating EndpointSlice documentation for beta release in 1.17 (#17411) * (docs/dualstack): v1.17 updates (#17457) * Add placehold doc updates for dualstack in 1.17 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Add Downward API and /etc/hosts Pod IP validation Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * remove addressed known issue via k/k pr 85246 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Remove known issue and add flag as part of k/k 79993 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * remove follow up placeholders Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update verbiage Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Make IP addressing consistent throughout the task Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update to status.podIPs Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update content/en/docs/tasks/network/validate-dual-stack.md Use set instead of env Co-Authored-By: Khaled Henidak (Kal) <khnidk@outlook.com> * add topology.kubernetes.io/zone, topology.kubernetes.io/region and node.kubernetes.io/instance-type labels to docs (#17498) Signed-off-by: Andrew Sy Kim <kiman@vmware.com> * Service topology alpha documentation (#17459) * Update list of feature flags for in-tree plugins migrated to CSI (#17533) Signed-off-by: Deep Debroy <ddebroy@docker.com> * Update Node concept for TaintNodesByCondition going GA (#17577) * feat: graduate ResourceQuotaScopeSelectors to GA in 1.17 (#17554) * kubeadm: update the upgrade documentation for 1.17 (#17587) * doc: Simplify Windows deployments with RuntimeClass (#16697) * doc: Simplify Windows deployments with RuntimeClass * Updating on review feedback * doc: Adding windows-build label from enhancement 1301 * update doc for kubelet option --reserved-cpus (#17648) * feat: update TaintNodesByCondition in feature gates table (#17377) * Update docs for v1 resource quota configuration (#17547) * AdmissionConfiguration v1 (#17548) * Update WebhookAdmissionConfiguration examples (#17549) * Update AWS EBS Migration Feature state (#16126) * Add resource version section to api-concepts documentation (#16910) * Add Resource Version semantics section to api concepts * Clarify risks of going back in time, add details about compaction and watch cache sizes * Apply suggestions from liggitt Co-Authored-By: Jordan Liggitt <jordan@liggitt.net> * remove pesudocode, apply feedback * Fix typo * Clarify equality rules * Cleanup kubectl generators docs (#17609) * Write ReplicationController without a space * Drop mentioning unsupported cluster versions * Fix capitalization for “API group” * Tweak wording * Avoid using deprecated generator in example * add Antrea description in dev-1.17 (#17919) * Promote VolumeSubpathEnvExpansion to GA * Reference Documentation for the Kubernetes API for 1.17 (#18019) * Update feature-gates.md (#18033) * Reference Documentation for kubectl Commands for 1.17 (#18017) * Update for v1.17 (#18034) * Update config.toml(release-1.17) for 1.17 (#18031)
2019-12-10 00:11:29 +00:00
reviewers:
- ConnorDoyle
- klueska
- lmdaly
- nolancon
content_template: templates/task
---
{{% capture overview %}}
{{< feature-state state="alpha" >}}
An increasing number of systems leverage a combination of CPUs and hardware accelerators to support latency-critical execution and high-throughput parallel computation. These include workloads in fields such as telecommunications, scientific computing, machine learning, financial services and data analytics. Such hybrid systems comprise a high performance environment.
In order to extract the best performance, optimizations related to CPU isolation, memory and device locality are required. However, in Kubernetes, these optimizations are handled by a disjoint set of components.
_Topology Manager_ is a Kubelet component that aims to co-ordinate the set of components that are responsible for these optimizations.
{{% /capture %}}
{{% capture prerequisites %}}
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
{{% /capture %}}
{{% capture steps %}}
## How Topology Manager Works
Prior to the introduction of Topology Manager, the CPU and Device Manager in Kubernetes make resource allocation decisions independently of each other.
This can result in undesirable allocations on multiple-socketed systems, performance/latency sensitive applications will suffer due to these undesirable allocations.
Undesirable in this case meaning for example, CPUs and devices being allocated from different NUMA Nodes thus, incurring additional latency.
The Topology Manager is a Kubelet component, which acts as a source of truth so that other Kubelet components can make topology aligned resource allocation choices.
The Topology Manager provides an interface for components, called *Hint Providers*, to send and receive topology information. Topology Manager has a set of node level policies which are explained below.
The Topology manager receives Topology information from the *Hint Providers* as a bitmask denoting NUMA Nodes available and a preferred allocation indication. The Topology Manager policies perform a set of operations on the hints provided and converge on the hint determined by the policy to give the optimal result, if an undesirable hint is stored the preferred field for the hint will be set to false. In the current policies preferred is the narrowest preferred mask.
The selected hint is stored as part of the Topology Manager. Depending on the policy configured the pod can be accepted or rejected from the node based on the selected hint.
The hint is then stored in the Topology Manager for use by the *Hint Providers* when making the resource allocation decisions.
### Topology Manager Policies
The Topology Manager currently:
- Works on Nodes with the `static` CPU Manager Policy enabled. See [control CPU Management Policies](/docs/tasks/administer-cluster/cpu-management-policies/)
Official 1.17 Release Docs (#18011) * feat: graduate TaintNodesByCondition to GA (#17073) * Promote StartupProbe to beta (enabled by default). (#17164) * Watch bookmarks to GA (#17026) * feat: graduate ScheduleDaemonSetPods to GA (#17350) * Update Docker installation instructions (#17405) * Use exact version numbers for installing Docker in Ubuntu (#17428) * Move CSIMigration and CSIMigrationGCE to Beta in Kubernetes v1.17 (#17478) * Promote NodeLease feature to GA (#17189) * Update docs for csi topology ga (#17408) * Update RunAsUsername to beta (#17460) * doc:Update RunAsUsername to beta * doc: update samples - kubernetes.io/os is no longer beta * Updating based on review feedback * Promote Node-specific volume limits to GA (#17432) * Promote PodShareProcessNamespace to stable (#17192) * Promote PodShareProcessNamespace to stable * Add for_k8s_version to feature-state label Co-Authored-By: Tim Bannister <tim@scalefactory.com> * Readd version-check to shareProcessNamespace task * Update service load balancer finalizer doc for GA (#17438) * Update Topology Manager docs (#17451) * Added information on how device plugins can take advantage of Topology Manager * Updated the Topology Manager documentation to include additionalinformation and update some out of date sections * Fix broken Topology Manager link (#17746) Part of What's Next Device Plugin section * Update CRD defaulting docs for GA (#17450) * Add documentation for VolumeSnapshot Beta (#17233) * Updating EndpointSlice documentation for beta release in 1.17 (#17411) * (docs/dualstack): v1.17 updates (#17457) * Add placehold doc updates for dualstack in 1.17 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Add Downward API and /etc/hosts Pod IP validation Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * remove addressed known issue via k/k pr 85246 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Remove known issue and add flag as part of k/k 79993 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * remove follow up placeholders Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update verbiage Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Make IP addressing consistent throughout the task Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update to status.podIPs Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update content/en/docs/tasks/network/validate-dual-stack.md Use set instead of env Co-Authored-By: Khaled Henidak (Kal) <khnidk@outlook.com> * add topology.kubernetes.io/zone, topology.kubernetes.io/region and node.kubernetes.io/instance-type labels to docs (#17498) Signed-off-by: Andrew Sy Kim <kiman@vmware.com> * Service topology alpha documentation (#17459) * Update list of feature flags for in-tree plugins migrated to CSI (#17533) Signed-off-by: Deep Debroy <ddebroy@docker.com> * Update Node concept for TaintNodesByCondition going GA (#17577) * feat: graduate ResourceQuotaScopeSelectors to GA in 1.17 (#17554) * kubeadm: update the upgrade documentation for 1.17 (#17587) * doc: Simplify Windows deployments with RuntimeClass (#16697) * doc: Simplify Windows deployments with RuntimeClass * Updating on review feedback * doc: Adding windows-build label from enhancement 1301 * update doc for kubelet option --reserved-cpus (#17648) * feat: update TaintNodesByCondition in feature gates table (#17377) * Update docs for v1 resource quota configuration (#17547) * AdmissionConfiguration v1 (#17548) * Update WebhookAdmissionConfiguration examples (#17549) * Update AWS EBS Migration Feature state (#16126) * Add resource version section to api-concepts documentation (#16910) * Add Resource Version semantics section to api concepts * Clarify risks of going back in time, add details about compaction and watch cache sizes * Apply suggestions from liggitt Co-Authored-By: Jordan Liggitt <jordan@liggitt.net> * remove pesudocode, apply feedback * Fix typo * Clarify equality rules * Cleanup kubectl generators docs (#17609) * Write ReplicationController without a space * Drop mentioning unsupported cluster versions * Fix capitalization for “API group” * Tweak wording * Avoid using deprecated generator in example * add Antrea description in dev-1.17 (#17919) * Promote VolumeSubpathEnvExpansion to GA * Reference Documentation for the Kubernetes API for 1.17 (#18019) * Update feature-gates.md (#18033) * Reference Documentation for kubectl Commands for 1.17 (#18017) * Update for v1.17 (#18034) * Update config.toml(release-1.17) for 1.17 (#18031)
2019-12-10 00:11:29 +00:00
- Works on Pods making CPU requests or Device requests via extended resources
Official 1.17 Release Docs (#18011) * feat: graduate TaintNodesByCondition to GA (#17073) * Promote StartupProbe to beta (enabled by default). (#17164) * Watch bookmarks to GA (#17026) * feat: graduate ScheduleDaemonSetPods to GA (#17350) * Update Docker installation instructions (#17405) * Use exact version numbers for installing Docker in Ubuntu (#17428) * Move CSIMigration and CSIMigrationGCE to Beta in Kubernetes v1.17 (#17478) * Promote NodeLease feature to GA (#17189) * Update docs for csi topology ga (#17408) * Update RunAsUsername to beta (#17460) * doc:Update RunAsUsername to beta * doc: update samples - kubernetes.io/os is no longer beta * Updating based on review feedback * Promote Node-specific volume limits to GA (#17432) * Promote PodShareProcessNamespace to stable (#17192) * Promote PodShareProcessNamespace to stable * Add for_k8s_version to feature-state label Co-Authored-By: Tim Bannister <tim@scalefactory.com> * Readd version-check to shareProcessNamespace task * Update service load balancer finalizer doc for GA (#17438) * Update Topology Manager docs (#17451) * Added information on how device plugins can take advantage of Topology Manager * Updated the Topology Manager documentation to include additionalinformation and update some out of date sections * Fix broken Topology Manager link (#17746) Part of What's Next Device Plugin section * Update CRD defaulting docs for GA (#17450) * Add documentation for VolumeSnapshot Beta (#17233) * Updating EndpointSlice documentation for beta release in 1.17 (#17411) * (docs/dualstack): v1.17 updates (#17457) * Add placehold doc updates for dualstack in 1.17 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Add Downward API and /etc/hosts Pod IP validation Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * remove addressed known issue via k/k pr 85246 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Remove known issue and add flag as part of k/k 79993 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * remove follow up placeholders Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update verbiage Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Make IP addressing consistent throughout the task Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update to status.podIPs Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update content/en/docs/tasks/network/validate-dual-stack.md Use set instead of env Co-Authored-By: Khaled Henidak (Kal) <khnidk@outlook.com> * add topology.kubernetes.io/zone, topology.kubernetes.io/region and node.kubernetes.io/instance-type labels to docs (#17498) Signed-off-by: Andrew Sy Kim <kiman@vmware.com> * Service topology alpha documentation (#17459) * Update list of feature flags for in-tree plugins migrated to CSI (#17533) Signed-off-by: Deep Debroy <ddebroy@docker.com> * Update Node concept for TaintNodesByCondition going GA (#17577) * feat: graduate ResourceQuotaScopeSelectors to GA in 1.17 (#17554) * kubeadm: update the upgrade documentation for 1.17 (#17587) * doc: Simplify Windows deployments with RuntimeClass (#16697) * doc: Simplify Windows deployments with RuntimeClass * Updating on review feedback * doc: Adding windows-build label from enhancement 1301 * update doc for kubelet option --reserved-cpus (#17648) * feat: update TaintNodesByCondition in feature gates table (#17377) * Update docs for v1 resource quota configuration (#17547) * AdmissionConfiguration v1 (#17548) * Update WebhookAdmissionConfiguration examples (#17549) * Update AWS EBS Migration Feature state (#16126) * Add resource version section to api-concepts documentation (#16910) * Add Resource Version semantics section to api concepts * Clarify risks of going back in time, add details about compaction and watch cache sizes * Apply suggestions from liggitt Co-Authored-By: Jordan Liggitt <jordan@liggitt.net> * remove pesudocode, apply feedback * Fix typo * Clarify equality rules * Cleanup kubectl generators docs (#17609) * Write ReplicationController without a space * Drop mentioning unsupported cluster versions * Fix capitalization for “API group” * Tweak wording * Avoid using deprecated generator in example * add Antrea description in dev-1.17 (#17919) * Promote VolumeSubpathEnvExpansion to GA * Reference Documentation for the Kubernetes API for 1.17 (#18019) * Update feature-gates.md (#18033) * Reference Documentation for kubectl Commands for 1.17 (#18017) * Update for v1.17 (#18034) * Update config.toml(release-1.17) for 1.17 (#18031)
2019-12-10 00:11:29 +00:00
If these conditions are met, Topology Manager will align the requested resources.
Topology Manager supports four allocation policies. You can set a policy via a Kubelet flag, `--topology-manager-policy`.
There are four supported policies:
* `none` (default)
* `best-effort`
* `restricted`
* `single-numa-node`
### none policy {#policy-none}
This is the default policy and does not perform any topology alignment.
### best-effort policy {#policy-best-effort}
For each container in a Guaranteed Pod, kubelet, with `best-effort` topology
management policy, calls each Hint Provider to discover their resource availability.
Using this information, the Topology Manager stores the
preferred NUMA Node affinity for that container. If the affinity is not preferred,
Topology Manager will store this and admit the pod to the node anyway.
The *Hint Providers* can then use this information when making the
resource allocation decision.
### restricted policy {#policy-restricted}
For each container in a Guaranteed Pod, kubelet, with `restricted` topology
management policy, calls each Hint Provider to discover their resource availability.
Using this information, the Topology Manager stores the
preferred NUMA Node affinity for that container. If the affinity is not preferred,
Topology Manager will reject this pod from the node. This will result in a pod in a `Terminated` state with a pod admission failure.
Official 1.17 Release Docs (#18011) * feat: graduate TaintNodesByCondition to GA (#17073) * Promote StartupProbe to beta (enabled by default). (#17164) * Watch bookmarks to GA (#17026) * feat: graduate ScheduleDaemonSetPods to GA (#17350) * Update Docker installation instructions (#17405) * Use exact version numbers for installing Docker in Ubuntu (#17428) * Move CSIMigration and CSIMigrationGCE to Beta in Kubernetes v1.17 (#17478) * Promote NodeLease feature to GA (#17189) * Update docs for csi topology ga (#17408) * Update RunAsUsername to beta (#17460) * doc:Update RunAsUsername to beta * doc: update samples - kubernetes.io/os is no longer beta * Updating based on review feedback * Promote Node-specific volume limits to GA (#17432) * Promote PodShareProcessNamespace to stable (#17192) * Promote PodShareProcessNamespace to stable * Add for_k8s_version to feature-state label Co-Authored-By: Tim Bannister <tim@scalefactory.com> * Readd version-check to shareProcessNamespace task * Update service load balancer finalizer doc for GA (#17438) * Update Topology Manager docs (#17451) * Added information on how device plugins can take advantage of Topology Manager * Updated the Topology Manager documentation to include additionalinformation and update some out of date sections * Fix broken Topology Manager link (#17746) Part of What's Next Device Plugin section * Update CRD defaulting docs for GA (#17450) * Add documentation for VolumeSnapshot Beta (#17233) * Updating EndpointSlice documentation for beta release in 1.17 (#17411) * (docs/dualstack): v1.17 updates (#17457) * Add placehold doc updates for dualstack in 1.17 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Add Downward API and /etc/hosts Pod IP validation Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * remove addressed known issue via k/k pr 85246 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Remove known issue and add flag as part of k/k 79993 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * remove follow up placeholders Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update verbiage Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Make IP addressing consistent throughout the task Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update to status.podIPs Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update content/en/docs/tasks/network/validate-dual-stack.md Use set instead of env Co-Authored-By: Khaled Henidak (Kal) <khnidk@outlook.com> * add topology.kubernetes.io/zone, topology.kubernetes.io/region and node.kubernetes.io/instance-type labels to docs (#17498) Signed-off-by: Andrew Sy Kim <kiman@vmware.com> * Service topology alpha documentation (#17459) * Update list of feature flags for in-tree plugins migrated to CSI (#17533) Signed-off-by: Deep Debroy <ddebroy@docker.com> * Update Node concept for TaintNodesByCondition going GA (#17577) * feat: graduate ResourceQuotaScopeSelectors to GA in 1.17 (#17554) * kubeadm: update the upgrade documentation for 1.17 (#17587) * doc: Simplify Windows deployments with RuntimeClass (#16697) * doc: Simplify Windows deployments with RuntimeClass * Updating on review feedback * doc: Adding windows-build label from enhancement 1301 * update doc for kubelet option --reserved-cpus (#17648) * feat: update TaintNodesByCondition in feature gates table (#17377) * Update docs for v1 resource quota configuration (#17547) * AdmissionConfiguration v1 (#17548) * Update WebhookAdmissionConfiguration examples (#17549) * Update AWS EBS Migration Feature state (#16126) * Add resource version section to api-concepts documentation (#16910) * Add Resource Version semantics section to api concepts * Clarify risks of going back in time, add details about compaction and watch cache sizes * Apply suggestions from liggitt Co-Authored-By: Jordan Liggitt <jordan@liggitt.net> * remove pesudocode, apply feedback * Fix typo * Clarify equality rules * Cleanup kubectl generators docs (#17609) * Write ReplicationController without a space * Drop mentioning unsupported cluster versions * Fix capitalization for “API group” * Tweak wording * Avoid using deprecated generator in example * add Antrea description in dev-1.17 (#17919) * Promote VolumeSubpathEnvExpansion to GA * Reference Documentation for the Kubernetes API for 1.17 (#18019) * Update feature-gates.md (#18033) * Reference Documentation for kubectl Commands for 1.17 (#18017) * Update for v1.17 (#18034) * Update config.toml(release-1.17) for 1.17 (#18031)
2019-12-10 00:11:29 +00:00
Once the pod is in a `Terminated` state, the Kubernetes scheduler will **not** attempt to reschedule the pod. It is recommended to use a ReplicaSet or Deployment to trigger a redeploy of the pod.
An external control loop could be also implemented to trigger a redeployment of pods that have the `Topology Affinity` error.
If the pod is admitted, the *Hint Providers* can then use this information when making the
resource allocation decision.
### single-numa-node policy {#policy-single-numa-node}
For each container in a Guaranteed Pod, kubelet, with `single-numa-node` topology
management policy, calls each Hint Provider to discover their resource availability.
Using this information, the Topology Manager determines if a single NUMA Node affinity is possible.
If it is, Topology Manager will store this and the *Hint Providers* can then use this information when making the
resource allocation decision.
If, however, this is not possible then the Topology Manager will reject the pod from the node. This will result in a pod in a `Terminated` state with a pod admission failure.
Official 1.17 Release Docs (#18011) * feat: graduate TaintNodesByCondition to GA (#17073) * Promote StartupProbe to beta (enabled by default). (#17164) * Watch bookmarks to GA (#17026) * feat: graduate ScheduleDaemonSetPods to GA (#17350) * Update Docker installation instructions (#17405) * Use exact version numbers for installing Docker in Ubuntu (#17428) * Move CSIMigration and CSIMigrationGCE to Beta in Kubernetes v1.17 (#17478) * Promote NodeLease feature to GA (#17189) * Update docs for csi topology ga (#17408) * Update RunAsUsername to beta (#17460) * doc:Update RunAsUsername to beta * doc: update samples - kubernetes.io/os is no longer beta * Updating based on review feedback * Promote Node-specific volume limits to GA (#17432) * Promote PodShareProcessNamespace to stable (#17192) * Promote PodShareProcessNamespace to stable * Add for_k8s_version to feature-state label Co-Authored-By: Tim Bannister <tim@scalefactory.com> * Readd version-check to shareProcessNamespace task * Update service load balancer finalizer doc for GA (#17438) * Update Topology Manager docs (#17451) * Added information on how device plugins can take advantage of Topology Manager * Updated the Topology Manager documentation to include additionalinformation and update some out of date sections * Fix broken Topology Manager link (#17746) Part of What's Next Device Plugin section * Update CRD defaulting docs for GA (#17450) * Add documentation for VolumeSnapshot Beta (#17233) * Updating EndpointSlice documentation for beta release in 1.17 (#17411) * (docs/dualstack): v1.17 updates (#17457) * Add placehold doc updates for dualstack in 1.17 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Add Downward API and /etc/hosts Pod IP validation Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * remove addressed known issue via k/k pr 85246 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Remove known issue and add flag as part of k/k 79993 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * remove follow up placeholders Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update verbiage Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Make IP addressing consistent throughout the task Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update to status.podIPs Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update content/en/docs/tasks/network/validate-dual-stack.md Use set instead of env Co-Authored-By: Khaled Henidak (Kal) <khnidk@outlook.com> * add topology.kubernetes.io/zone, topology.kubernetes.io/region and node.kubernetes.io/instance-type labels to docs (#17498) Signed-off-by: Andrew Sy Kim <kiman@vmware.com> * Service topology alpha documentation (#17459) * Update list of feature flags for in-tree plugins migrated to CSI (#17533) Signed-off-by: Deep Debroy <ddebroy@docker.com> * Update Node concept for TaintNodesByCondition going GA (#17577) * feat: graduate ResourceQuotaScopeSelectors to GA in 1.17 (#17554) * kubeadm: update the upgrade documentation for 1.17 (#17587) * doc: Simplify Windows deployments with RuntimeClass (#16697) * doc: Simplify Windows deployments with RuntimeClass * Updating on review feedback * doc: Adding windows-build label from enhancement 1301 * update doc for kubelet option --reserved-cpus (#17648) * feat: update TaintNodesByCondition in feature gates table (#17377) * Update docs for v1 resource quota configuration (#17547) * AdmissionConfiguration v1 (#17548) * Update WebhookAdmissionConfiguration examples (#17549) * Update AWS EBS Migration Feature state (#16126) * Add resource version section to api-concepts documentation (#16910) * Add Resource Version semantics section to api concepts * Clarify risks of going back in time, add details about compaction and watch cache sizes * Apply suggestions from liggitt Co-Authored-By: Jordan Liggitt <jordan@liggitt.net> * remove pesudocode, apply feedback * Fix typo * Clarify equality rules * Cleanup kubectl generators docs (#17609) * Write ReplicationController without a space * Drop mentioning unsupported cluster versions * Fix capitalization for “API group” * Tweak wording * Avoid using deprecated generator in example * add Antrea description in dev-1.17 (#17919) * Promote VolumeSubpathEnvExpansion to GA * Reference Documentation for the Kubernetes API for 1.17 (#18019) * Update feature-gates.md (#18033) * Reference Documentation for kubectl Commands for 1.17 (#18017) * Update for v1.17 (#18034) * Update config.toml(release-1.17) for 1.17 (#18031)
2019-12-10 00:11:29 +00:00
Once the pod is in a `Terminated` state, the Kubernetes scheduler will **not** attempt to reschedule the pod. It is recommended a Deployment with Replicas to trigger a redeploy of the pod.
An external control loop could be also implemented to trigger a redeployment of pods that have the `Topology Affinity` error.
### Pod Interactions with Topology Manager Policies
Consider the containers in the following pod specs:
```yaml
spec:
containers:
- name: nginx
image: nginx
```
This pod runs in the `BestEffort` QoS class because no resource `requests` or
`limits` are specified.
```yaml
spec:
containers:
- name: nginx
image: nginx
resources:
limits:
memory: "200Mi"
requests:
memory: "100Mi"
```
This pod runs in the `Burstable` QoS class because requests are less than limits.
If the selected policy is anything other than `none` , Topology Manager would not consider either of these Pod
specifications.
```yaml
spec:
containers:
- name: nginx
image: nginx
resources:
limits:
memory: "200Mi"
cpu: "2"
example.com/device: "1"
requests:
memory: "200Mi"
cpu: "2"
example.com/device: "1"
```
This pod runs in the `Guaranteed` QoS class because `requests` are equal to `limits`.
Official 1.17 Release Docs (#18011) * feat: graduate TaintNodesByCondition to GA (#17073) * Promote StartupProbe to beta (enabled by default). (#17164) * Watch bookmarks to GA (#17026) * feat: graduate ScheduleDaemonSetPods to GA (#17350) * Update Docker installation instructions (#17405) * Use exact version numbers for installing Docker in Ubuntu (#17428) * Move CSIMigration and CSIMigrationGCE to Beta in Kubernetes v1.17 (#17478) * Promote NodeLease feature to GA (#17189) * Update docs for csi topology ga (#17408) * Update RunAsUsername to beta (#17460) * doc:Update RunAsUsername to beta * doc: update samples - kubernetes.io/os is no longer beta * Updating based on review feedback * Promote Node-specific volume limits to GA (#17432) * Promote PodShareProcessNamespace to stable (#17192) * Promote PodShareProcessNamespace to stable * Add for_k8s_version to feature-state label Co-Authored-By: Tim Bannister <tim@scalefactory.com> * Readd version-check to shareProcessNamespace task * Update service load balancer finalizer doc for GA (#17438) * Update Topology Manager docs (#17451) * Added information on how device plugins can take advantage of Topology Manager * Updated the Topology Manager documentation to include additionalinformation and update some out of date sections * Fix broken Topology Manager link (#17746) Part of What's Next Device Plugin section * Update CRD defaulting docs for GA (#17450) * Add documentation for VolumeSnapshot Beta (#17233) * Updating EndpointSlice documentation for beta release in 1.17 (#17411) * (docs/dualstack): v1.17 updates (#17457) * Add placehold doc updates for dualstack in 1.17 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Add Downward API and /etc/hosts Pod IP validation Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * remove addressed known issue via k/k pr 85246 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Remove known issue and add flag as part of k/k 79993 Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * remove follow up placeholders Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update verbiage Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Make IP addressing consistent throughout the task Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update to status.podIPs Signed-off-by: Lachlan Evenson <lachlan.evenson@microsoft.com> * Update content/en/docs/tasks/network/validate-dual-stack.md Use set instead of env Co-Authored-By: Khaled Henidak (Kal) <khnidk@outlook.com> * add topology.kubernetes.io/zone, topology.kubernetes.io/region and node.kubernetes.io/instance-type labels to docs (#17498) Signed-off-by: Andrew Sy Kim <kiman@vmware.com> * Service topology alpha documentation (#17459) * Update list of feature flags for in-tree plugins migrated to CSI (#17533) Signed-off-by: Deep Debroy <ddebroy@docker.com> * Update Node concept for TaintNodesByCondition going GA (#17577) * feat: graduate ResourceQuotaScopeSelectors to GA in 1.17 (#17554) * kubeadm: update the upgrade documentation for 1.17 (#17587) * doc: Simplify Windows deployments with RuntimeClass (#16697) * doc: Simplify Windows deployments with RuntimeClass * Updating on review feedback * doc: Adding windows-build label from enhancement 1301 * update doc for kubelet option --reserved-cpus (#17648) * feat: update TaintNodesByCondition in feature gates table (#17377) * Update docs for v1 resource quota configuration (#17547) * AdmissionConfiguration v1 (#17548) * Update WebhookAdmissionConfiguration examples (#17549) * Update AWS EBS Migration Feature state (#16126) * Add resource version section to api-concepts documentation (#16910) * Add Resource Version semantics section to api concepts * Clarify risks of going back in time, add details about compaction and watch cache sizes * Apply suggestions from liggitt Co-Authored-By: Jordan Liggitt <jordan@liggitt.net> * remove pesudocode, apply feedback * Fix typo * Clarify equality rules * Cleanup kubectl generators docs (#17609) * Write ReplicationController without a space * Drop mentioning unsupported cluster versions * Fix capitalization for “API group” * Tweak wording * Avoid using deprecated generator in example * add Antrea description in dev-1.17 (#17919) * Promote VolumeSubpathEnvExpansion to GA * Reference Documentation for the Kubernetes API for 1.17 (#18019) * Update feature-gates.md (#18033) * Reference Documentation for kubectl Commands for 1.17 (#18017) * Update for v1.17 (#18034) * Update config.toml(release-1.17) for 1.17 (#18031)
2019-12-10 00:11:29 +00:00
```yaml
spec:
containers:
- name: nginx
image: nginx
resources:
limits:
example.com/deviceA: "1"
example.com/deviceB: "1"
requests:
example.com/deviceA: "1"
example.com/deviceB: "1"
```
This pod runs in the `BestEffort` QoS class because there are no CPU and memory requests.
The Topology Manager would consider both of the above pods. The Topology Manager would consult the Hint Providers, which are CPU and Device Manager to get topology hints for the pods.
In the case of the `Guaranteed` pod the `static` CPU Manager policy would return hints relating to the CPU request and the Device Manager would send back hints for the requested device.
In the case of the `BestEffort` pod the CPU Manager would send back the default hint as there is no CPU request and the Device Manager would send back the hints for each of the requested devices.
Using this information the Topology Manager calculates the optimal hint for the pod and stores this information, which will be used by the Hint Providers when they are making their resource assignments.
### Known Limitations
1. As of K8s 1.16 the Topology Manager is currently only guaranteed to work if a *single* container in the pod spec requires aligned resources. This is due to the hint generation being based on current resource allocations, and all containers in a pod generate hints before any resource allocation has been made. This results in unreliable hints for all but the first container in a pod.
*Due to this limitation if multiple pods/containers are considered by Kubelet in quick succession they may not respect the Topology Manager policy.
2. The maximum number of NUMA nodes that Topology Manager will allow is 8, past this there will be a state explosion when trying to enumerate the possible NUMA affinities and generating their hints.
3. The scheduler is not topology-aware, so it is possible to be scheduled on a node and then fail on the node due to the Topology Manager.
{{% /capture %}}