工作负载分区 | 可扩展性和性能 | OpenShift Container Platform 4.17

启用工作负载分区
性能配置文件和工作负载分区
示例性能配置文件配置

工作负载分区将计算节点CPU资源分离到不同的CPU集。主要目标是将平台Pod保持在指定的内核上，以避免中断客户工作负载正在运行的CPU。

工作负载分区将OpenShift Container Platform服务、集群管理工作负载和基础设施Pod隔离到一组保留的CPU上运行。这确保了集群部署中剩余的CPU不会被触及，并且仅供非平台工作负载使用。集群管理所需的最小保留CPU数量为四个CPU超线程 (HT)。

在启用工作负载分区和有效管理CPU资源的上下文中，未正确配置的节点将不允许通过节点准入Webhook加入集群。启用工作负载分区功能后，控制平面和工作节点的机器配置池将提供节点使用的配置。将新节点添加到这些池中将确保它们在加入集群之前已正确配置。

目前，节点必须每个机器配置池具有统一的配置，以确保在该池内的所有节点上设置正确的CPU亲和性。准入后，集群中的节点会将自身标识为支持名为management.workload.openshift.io/cores的新资源类型，并准确报告其CPU容量。工作负载分区只能在集群安装期间启用，方法是将附加字段cpuPartitioningMode添加到install-config.yaml文件。

启用工作负载分区后，management.workload.openshift.io/cores资源允许调度程序根据主机的cpushares容量正确分配Pod，而不仅仅是默认的cpuset。这确保了在工作负载分区场景中更精确地分配资源。

工作负载分区确保Pod配置中指定的CPU请求和限制得到遵守。在OpenShift Container Platform 4.16或更高版本中，通过CPU分区为平台Pod设置准确的CPU使用限制。由于工作负载分区使用management.workload.openshift.io/cores的自定义资源类型，因此由于Kubernetes对扩展资源的要求，请求和限制的值相同。但是，工作负载分区修改的注释正确地反映了所需的限制。

扩展资源不能过度承诺，因此如果容器规范中同时存在请求和限制，则请求和限制必须相等。

启用工作负载分区

使用工作负载分区，集群管理Pod会添加注释以将其正确地划分到指定的CPU亲和性中。这些Pod在性能配置文件中保留值指定的最小大小CPU配置内正常运行。在计算应为平台预留多少个保留CPU内核时，应考虑使用工作负载分区的其他Day 2运算符。

工作负载分区使用标准Kubernetes调度功能将用户工作负载与平台工作负载隔离开。

您只能在集群安装期间启用工作负载分区。安装后无法禁用工作负载分区。但是，您可以安装后更改reserved和isolated CPU的CPU配置。

使用此过程在集群范围内启用工作负载分区

步骤

在install-config.yaml文件中，添加附加字段cpuPartitioningMode并将其设置为AllNodes。

apiVersion: v1
baseDomain: devcluster.openshift.com
cpuPartitioningMode: AllNodes (1)
compute:
  - architecture: amd64
    hyperthreading: Enabled
    name: worker
    platform: {}
    replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3

1	在安装时设置用于CPU分区的集群。默认值为`None`。

性能配置文件和工作负载分区

应用性能配置文件允许您使用工作负载分区功能。正确配置的性能配置文件指定isolated和reserved CPU。创建性能配置文件的推荐方法是使用性能配置文件创建器 (PPC) 工具创建性能配置文件。

其他资源

关于性能配置文件创建器

示例性能配置文件配置

apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  # if you change this name make sure the 'include' line in TunedPerformancePatch.yaml
  # matches this name: include=openshift-node-performance-${PerformanceProfile.metadata.name}
  # Also in file 'validatorCRs/informDuValidator.yaml': 
  # name: 50-performance-${PerformanceProfile.metadata.name}
  name: openshift-node-performance-profile
  annotations:
    ran.openshift.io/reference-configuration: "ran-du.redhat.com"
spec:
  additionalKernelArgs:
    - "rcupdate.rcu_normal_after_boot=0"
    - "efi=runtime"
    - "vfio_pci.enable_sriov=1"
    - "vfio_pci.disable_idle_d3=1"
    - "module_blacklist=irdma"
  cpu:
    isolated: $isolated
    reserved: $reserved
  hugepages:
    defaultHugepagesSize: $defaultHugepagesSize
    pages:
      - size: $size
        count: $count
        node: $node
  machineConfigPoolSelector:
    pools.operator.machineconfiguration.openshift.io/$mcp: ""
  nodeSelector:
    node-role.kubernetes.io/$mcp: ''
  numa:
    topologyPolicy: "restricted"
  # To use the standard (non-realtime) kernel, set enabled to false
  realTimeKernel:
    enabled: true
  workloadHints:
    # WorkloadHints defines the set of upper level flags for different type of workloads.
    # See https://github.com/openshift/cluster-node-tuning-operator/blob/master/docs/performanceprofile/performance_profile.md#workloadhints
    # for detailed descriptions of each item.
    # The configuration below is set for a low latency, performance mode.
    realTime: true
    highPowerConsumption: false
    perPodPowerManagement: false

PerformanceProfile CR 字段描述

metadata.name

确保name与相关 GitOps ZTP 自定义资源 (CR) 中设置的以下字段匹配

TunedPerformancePatch.yaml 中的 include=openshift-node-performance-${PerformanceProfile.metadata.name}
validatorCRs/informDuValidator.yaml 中的 name: 50-performance-${PerformanceProfile.metadata.name}

spec.additionalKernelArgs

"efi=runtime" 为集群主机配置 UEFI 安全启动。

spec.cpu.isolated

设置隔离的 CPU。确保所有超线程对都匹配。

保留和隔离的 CPU 池不得重叠，并且必须共同涵盖所有可用的核心。未考虑的 CPU 核心会导致系统出现未定义的行为。

spec.cpu.reserved

设置保留的 CPU。启用工作负载分区时，系统进程、内核线程和系统容器线程将限制在这些 CPU 上。所有未隔离的 CPU 都应保留。

spec.hugepages.pages

设置巨页数量 (count)
设置巨页大小 (size)。
将node设置为分配hugepages的 NUMA 节点 (node)

spec.realTimeKernel

将enabled设置为true以使用实时内核。

spec.workloadHints

使用workloadHints定义不同类型工作负载的顶级标志集。此示例配置将集群配置为低延迟和高性能。

其他资源

推荐用于 vDU 应用工作负载的单节点 OpenShift 集群配置 → 工作负载分区