×

在 OpenShift Container Platform 上运行低延迟应用程序

OpenShift Container Platform 通过使用多种技术和专用硬件设备,能够为在商用现货 (COTS) 硬件上运行的应用程序提供低延迟处理。

RHCOS 实时内核

确保工作负载以高度确定性的进程进行处理。

CPU 隔离

避免 CPU 调度延迟,并确保始终有足够的 CPU 容量可用。

NUMA 感知拓扑管理

将内存和大页与 CPU 和 PCI 设备对齐,以将保证的容器内存和大页固定到非一致性内存访问 (NUMA) 节点。所有服务质量 (QoS) 类别的 Pod 资源都保留在同一个 NUMA 节点上。这降低了延迟并提高了节点的性能。

大页内存管理

使用大页尺寸可通过减少访问页表所需的系统资源量来提高系统性能。

使用 PTP 进行精确计时同步

允许网络中节点之间以亚微秒精度同步。

vDU 应用程序工作负载的推荐集群主机要求

运行 vDU 应用程序工作负载需要一个具有足够资源来运行 OpenShift Container Platform 服务和生产工作负载的裸机主机。

表 1. 最小资源要求
配置 vCPU 内存 存储

最小值

4 到 8 个 vCPU

32GB RAM

120GB

一个 vCPU 等于一个物理核心。但是,如果启用同时多线程 (SMT) 或超线程,请使用以下公式计算表示一个物理核心的 vCPU 数量:

  • (每个核心的线程数 × 核心数) × 插槽数 = vCPU 数

使用虚拟介质启动时,服务器必须具有底板管理控制器 (BMC)。

配置主机固件以实现低延迟和高性能

裸机主机需要在主机准备好配置之前配置固件。固件配置取决于具体的硬件和安装的特定要求。

步骤
  1. 将**UEFI/BIOS 启动模式**设置为UEFI

  2. 在主机启动顺序中,将**硬盘驱动器**设置为第一项。

  3. 应用您的硬件的特定固件配置。下表根据英特尔 FlexRAN 4G 和 5G 基带 PHY 参考设计,描述了英特尔至强 Skylake 服务器及更高版本硬件的代表性固件配置。

    确切的固件配置取决于您的特定硬件和网络要求。以下示例配置仅用于说明目的。

    表 2. 示例固件配置
    固件设置 配置

    CPU 电源和性能策略

    性能

    Uncore 频率缩放

    已禁用

    性能 P-limit

    已禁用

    增强型英特尔 SpeedStep ® 技术

    已启用

    英特尔可配置 TDP

    已启用

    可配置 TDP 等级

    2 级

    英特尔® 睿频加速技术

    已启用

    节能睿频

    已禁用

    硬件 P 状态

    已禁用

    封装 C 状态

    C0/C1 状态

    C1E

    已禁用

    处理器 C6

    已禁用

为主机的固件启用全局 SR-IOV 和 VT-d 设置。这些设置与裸机环境相关。

托管集群网络的连接先决条件

在您可以使用 GitOps 零接触配置 (ZTP) 管道安装和配置托管集群之前,托管集群主机必须满足以下网络先决条件:

  • 集线器集群中的 GitOps ZTP 容器与目标裸机主机的底板管理控制器 (BMC) 之间必须存在双向连接。

  • 托管集群必须能够解析并访问集线器主机名的 API 主机名和*.apps主机名。以下是集线器 API 主机名和*.apps主机名的示例:

    • api.hub-cluster.internal.domain.com

    • console-openshift-console.apps.hub-cluster.internal.domain.com

  • 集线器集群必须能够解析并访问托管集群的 API 和*.apps主机名。以下是托管集群 API 主机名和*.apps主机名的示例:

    • api.sno-managed-cluster-1.internal.domain.com

    • console-openshift-console.apps.sno-managed-cluster-1.internal.domain.com

使用 GitOps ZTP 的单节点 OpenShift 中的工作负载分区

工作负载分区将 OpenShift Container Platform 服务、集群管理工作负载和基础设施 Pod 配置为运行在预留数量的主机 CPU 上。

要使用 GitOps 零接触配置 (ZTP) 配置工作负载分区,您需要在用于安装集群的SiteConfig自定义资源 (CR) 中配置cpuPartitioningMode字段,并应用一个PerformanceProfile CR 来配置主机上的isolatedreserved CPU。

配置SiteConfig CR 可在集群安装时启用工作负载分区,应用PerformanceProfile CR 可配置将 CPU 分配到预留集和隔离集的具体情况。这两个步骤发生在集群配置的不同阶段。

在 OpenShift Container Platform 4.13 中,使用SiteConfig CR 中的cpuPartitioningMode字段配置工作负载分区是一项技术预览功能。

或者,您可以使用SiteConfig自定义资源 (CR) 的cpuset字段和组PolicyGeneratorPolicyGentemplate CR 的reserved字段指定集群管理 CPU 资源。GitOps ZTP 管道使用这些值来填充工作负载分区MachineConfig CR (cpuset) 和PerformanceProfile CR (reserved) 中所需的字段,从而配置单节点 OpenShift 集群。此方法是 OpenShift Container Platform 4.14 中的通用可用性功能。

工作负载分区配置将 OpenShift Container Platform 基础设施 Pod 固定到reserved CPU 集。系统服务(例如 systemd、CRI-O 和 kubelet)运行在reserved CPU 集上。isolated CPU 集专门分配给您的容器工作负载。隔离 CPU 可确保工作负载能够保证访问指定的 CPU,而不会与在同一节点上运行的其他应用程序发生冲突。所有未隔离的 CPU 都应预留。

确保reservedisolated CPU 集之间不重叠。

其他资源
  • 有关推荐的单节点 OpenShift 工作负载分区配置,请参阅工作负载分区

关于使用 TPM 和 PCR 保护的磁盘加密

您可以使用SiteConfig自定义资源 (CR) 中的diskEncryption字段来配置使用可信平台模块 (TPM) 和平台配置寄存器 (PCR) 保护的磁盘加密。

TPM 是一个存储加密密钥并评估系统安全状态的硬件组件。TPM 中的 PCR 存储表示系统当前硬件和软件配置的哈希值。您可以使用以下 PCR 寄存器来保护磁盘加密的加密密钥:

PCR 1

表示统一可扩展固件接口 (UEFI) 状态。

PCR 7

表示安全启动状态。

TPM 通过将加密密钥链接到系统的当前状态(如 PCR 1 和 PCR 7 中记录的那样)来保护加密密钥。dmcrypt 实用程序使用这些密钥来加密磁盘。如果需要,加密密钥与预期 PCR 寄存器之间的绑定会在升级后自动更新。

在系统启动过程中,dmcrypt 实用程序使用 TPM PCR 值来解锁磁盘。如果当前 PCR 值与先前链接的值匹配,则解锁成功。如果 PCR 值不匹配,则无法释放加密密钥,磁盘将保持加密状态且无法访问。

使用SiteConfig CR 中的diskEncryption字段配置磁盘加密**仅为技术预览功能**。技术预览功能不受 Red Hat 生产服务级别协议 (SLA) 的支持,可能功能不完整。Red Hat 不建议在生产环境中使用它们。这些功能可让您抢先体验即将推出的产品功能,从而能够在开发过程中测试功能并提供反馈。

有关 Red Hat 技术预览功能的支持范围的更多信息,请参见技术预览功能支持范围

其他资源

推荐的集群安装清单

ZTP 管道在集群安装期间应用以下自定义资源 (CR)。这些配置 CR 确保集群满足运行 vDU 应用程序所需的特性和性能要求。

使用 GitOps ZTP 插件和SiteConfig CR 进行集群部署时,默认情况下包含以下MachineConfig CR。

使用SiteConfig extraManifests过滤器来更改默认包含的 CR。有关更多信息,请参见使用 SiteConfig CR 进行高级托管集群配置

工作负载分区

运行 DU 工作负载的单节点 OpenShift 集群需要工作负载分区。这限制了允许运行平台服务的核心数,从而最大限度地提高应用程序负载的 CPU 核心数。

工作负载分区只能在集群安装期间启用。安装后无法禁用工作负载分区。但是,您可以通过PerformanceProfile CR 更改分配给隔离集和保留集的 CPU 集。更改 CPU 设置会导致节点重新启动。

从 OpenShift Container Platform 4.12 升级到 4.13+

当过渡到使用cpuPartitioningMode启用工作负载分区时,请从用于配置集群的/extra-manifest文件夹中删除工作负载分区MachineConfig CR。

推荐的工作负载分区SiteConfig CR 配置
apiVersion: ran.openshift.io/v1
kind: SiteConfig
metadata:
  name: "<site_name>"
  namespace: "<site_name>"
spec:
  baseDomain: "example.com"
  cpuPartitioningMode: AllNodes (1)
1 cpuPartitioningMode字段设置为AllNodes,以便为集群中的所有节点配置工作负载分区。
验证

检查应用程序和集群系统 CPU 固定是否正确。运行以下命令

  1. 打开到托管集群的远程 shell 提示符

    $ oc debug node/example-sno-1
  2. 检查 OpenShift 基础设施应用程序的 CPU 固定是否正确

    sh-4.4# pgrep ovn | while read i; do taskset -cp $i; done
    示例输出
    pid 8481's current affinity list: 0-1,52-53
    pid 8726's current affinity list: 0-1,52-53
    pid 9088's current affinity list: 0-1,52-53
    pid 9945's current affinity list: 0-1,52-53
    pid 10387's current affinity list: 0-1,52-53
    pid 12123's current affinity list: 0-1,52-53
    pid 13313's current affinity list: 0-1,52-53
  3. 检查系统应用程序的 CPU 固定是否正确

    sh-4.4# pgrep systemd | while read i; do taskset -cp $i; done
    示例输出
    pid 1's current affinity list: 0-1,52-53
    pid 938's current affinity list: 0-1,52-53
    pid 962's current affinity list: 0-1,52-53
    pid 1197's current affinity list: 0-1,52-53

减少的平台管理占用空间

为了减少平台的整体管理占用空间,需要一个MachineConfig自定义资源 (CR),它将所有 Kubernetes 特定的挂载点放置在一个与主机操作系统分开的新的命名空间中。以下 base64 编码的示例MachineConfig CR 说明了此配置。

推荐的容器挂载命名空间配置 (01-container-mount-ns-and-kubelet-conf-master.yaml)
# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: container-mount-namespace-and-kubelet-conf-master
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
        - contents:
            source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKCmRlYnVnKCkgewogIGVjaG8gJEAgPiYyCn0KCnVzYWdlKCkgewogIGVjaG8gVXNhZ2U6ICQoYmFzZW5hbWUgJDApIFVOSVQgW2VudmZpbGUgW3Zhcm5hbWVdXQogIGVjaG8KICBlY2hvIEV4dHJhY3QgdGhlIGNvbnRlbnRzIG9mIHRoZSBmaXJzdCBFeGVjU3RhcnQgc3RhbnphIGZyb20gdGhlIGdpdmVuIHN5c3RlbWQgdW5pdCBhbmQgcmV0dXJuIGl0IHRvIHN0ZG91dAogIGVjaG8KICBlY2hvICJJZiAnZW52ZmlsZScgaXMgcHJvdmlkZWQsIHB1dCBpdCBpbiB0aGVyZSBpbnN0ZWFkLCBhcyBhbiBlbnZpcm9ubWVudCB2YXJpYWJsZSBuYW1lZCAndmFybmFtZSciCiAgZWNobyAiRGVmYXVsdCAndmFybmFtZScgaXMgRVhFQ1NUQVJUIGlmIG5vdCBzcGVjaWZpZWQiCiAgZXhpdCAxCn0KClVOSVQ9JDEKRU5WRklMRT0kMgpWQVJOQU1FPSQzCmlmIFtbIC16ICRVTklUIHx8ICRVTklUID09ICItLWhlbHAiIHx8ICRVTklUID09ICItaCIgXV07IHRoZW4KICB1c2FnZQpmaQpkZWJ1ZyAiRXh0cmFjdGluZyBFeGVjU3RhcnQgZnJvbSAkVU5JVCIKRklMRT0kKHN5c3RlbWN0bCBjYXQgJFVOSVQgfCBoZWFkIC1uIDEpCkZJTEU9JHtGSUxFI1wjIH0KaWYgW1sgISAtZiAkRklMRSBdXTsgdGhlbgogIGRlYnVnICJGYWlsZWQgdG8gZmluZCByb290IGZpbGUgZm9yIHVuaXQgJFVOSVQgKCRGSUxFKSIKICBleGl0CmZpCmRlYnVnICJTZXJ2aWNlIGRlZmluaXRpb24gaXMgaW4gJEZJTEUiCkVYRUNTVEFSVD0kKHNlZCAtbiAtZSAnL15FeGVjU3RhcnQ9LipcXCQvLC9bXlxcXSQvIHsgcy9eRXhlY1N0YXJ0PS8vOyBwIH0nIC1lICcvXkV4ZWNTdGFydD0uKlteXFxdJC8geyBzL15FeGVjU3RhcnQ9Ly87IHAgfScgJEZJTEUpCgppZiBbWyAkRU5WRklMRSBdXTsgdGhlbgogIFZBUk5BTUU9JHtWQVJOQU1FOi1FWEVDU1RBUlR9CiAgZWNobyAiJHtWQVJOQU1FfT0ke0VYRUNTVEFSVH0iID4gJEVOVkZJTEUKZWxzZQogIGVjaG8gJEVYRUNTVEFSVApmaQo=
          mode: 493
          path: /usr/local/bin/extractExecStart
        - contents:
            source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKbnNlbnRlciAtLW1vdW50PS9ydW4vY29udGFpbmVyLW1vdW50LW5hbWVzcGFjZS9tbnQgIiRAIgo=
          mode: 493
          path: /usr/local/bin/nsenterCmns
    systemd:
      units:
        - contents: |
            [Unit]
            Description=Manages a mount namespace that both kubelet and crio can use to share their container-specific mounts

            [Service]
            Type=oneshot
            RemainAfterExit=yes
            RuntimeDirectory=container-mount-namespace
            Environment=RUNTIME_DIRECTORY=%t/container-mount-namespace
            Environment=BIND_POINT=%t/container-mount-namespace/mnt
            ExecStartPre=bash -c "findmnt ${RUNTIME_DIRECTORY} || mount --make-unbindable --bind ${RUNTIME_DIRECTORY} ${RUNTIME_DIRECTORY}"
            ExecStartPre=touch ${BIND_POINT}
            ExecStart=unshare --mount=${BIND_POINT} --propagation slave mount --make-rshared /
            ExecStop=umount -R ${RUNTIME_DIRECTORY}
          name: container-mount-namespace.service
        - dropins:
            - contents: |
                [Unit]
                Wants=container-mount-namespace.service
                After=container-mount-namespace.service

                [Service]
                ExecStartPre=/usr/local/bin/extractExecStart %n /%t/%N-execstart.env ORIG_EXECSTART
                EnvironmentFile=-/%t/%N-execstart.env
                ExecStart=
                ExecStart=bash -c "nsenter --mount=%t/container-mount-namespace/mnt \
                    ${ORIG_EXECSTART}"
              name: 90-container-mount-namespace.conf
          name: crio.service
        - dropins:
            - contents: |
                [Unit]
                Wants=container-mount-namespace.service
                After=container-mount-namespace.service

                [Service]
                ExecStartPre=/usr/local/bin/extractExecStart %n /%t/%N-execstart.env ORIG_EXECSTART
                EnvironmentFile=-/%t/%N-execstart.env
                ExecStart=
                ExecStart=bash -c "nsenter --mount=%t/container-mount-namespace/mnt \
                    ${ORIG_EXECSTART} --housekeeping-interval=30s"
              name: 90-container-mount-namespace.conf
            - contents: |
                [Service]
                Environment="OPENSHIFT_MAX_HOUSEKEEPING_INTERVAL_DURATION=60s"
                Environment="OPENSHIFT_EVICTION_MONITORING_PERIOD_DURATION=30s"
              name: 30-kubelet-interval-tuning.conf
          name: kubelet.service

SCTP

流控制传输协议 (SCTP) 是 RAN 应用程序中使用的关键协议。此MachineConfig对象将 SCTP 内核模块添加到节点以启用此协议。

推荐的控制平面节点 SCTP 配置 (03-sctp-machine-config-master.yaml)
# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: load-sctp-module-master
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
        - contents:
            source: data:,
            verification: {}
          filesystem: root
          mode: 420
          path: /etc/modprobe.d/sctp-blacklist.conf
        - contents:
            source: data:text/plain;charset=utf-8,sctp
          filesystem: root
          mode: 420
          path: /etc/modules-load.d/sctp-load.conf
推荐的工作节点 SCTP 配置 (03-sctp-machine-config-worker.yaml)
# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: load-sctp-module-worker
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
        - contents:
            source: data:,
            verification: {}
          filesystem: root
          mode: 420
          path: /etc/modprobe.d/sctp-blacklist.conf
        - contents:
            source: data:text/plain;charset=utf-8,sctp
          filesystem: root
          mode: 420
          path: /etc/modules-load.d/sctp-load.conf

设置 rcu_normal

以下MachineConfig CR 将系统配置为在系统完成启动后将rcu_normal设置为 1。这提高了 vDU 应用程序的内核延迟。

推荐的配置,用于在节点完成启动后禁用rcu_expedited (08-set-rcu-normal-master.yaml)
# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 08-set-rcu-normal-master
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
        - contents:
            source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKIwojIERpc2FibGUgcmN1X2V4cGVkaXRlZCBhZnRlciBub2RlIGhhcyBmaW5pc2hlZCBib290aW5nCiMKIyBUaGUgZGVmYXVsdHMgYmVsb3cgY2FuIGJlIG92ZXJyaWRkZW4gdmlhIGVudmlyb25tZW50IHZhcmlhYmxlcwojCgojIERlZmF1bHQgd2FpdCB0aW1lIGlzIDYwMHMgPSAxMG06Ck1BWElNVU1fV0FJVF9USU1FPSR7TUFYSU1VTV9XQUlUX1RJTUU6LTYwMH0KCiMgRGVmYXVsdCBzdGVhZHktc3RhdGUgdGhyZXNob2xkID0gMiUKIyBBbGxvd2VkIHZhbHVlczoKIyAgNCAgLSBhYnNvbHV0ZSBwb2QgY291bnQgKCsvLSkKIyAgNCUgLSBwZXJjZW50IGNoYW5nZSAoKy8tKQojICAtMSAtIGRpc2FibGUgdGhlIHN0ZWFkeS1zdGF0ZSBjaGVjawpTVEVBRFlfU1RBVEVfVEhSRVNIT0xEPSR7U1RFQURZX1NUQVRFX1RIUkVTSE9MRDotMiV9CgojIERlZmF1bHQgc3RlYWR5LXN0YXRlIHdpbmRvdyA9IDYwcwojIElmIHRoZSBydW5uaW5nIHBvZCBjb3VudCBzdGF5cyB3aXRoaW4gdGhlIGdpdmVuIHRocmVzaG9sZCBmb3IgdGhpcyB0aW1lCiMgcGVyaW9kLCByZXR1cm4gQ1BVIHV0aWxpemF0aW9uIHRvIG5vcm1hbCBiZWZvcmUgdGhlIG1heGltdW0gd2FpdCB0aW1lIGhhcwojIGV4cGlyZXMKU1RFQURZX1NUQVRFX1dJTkRPVz0ke1NURUFEWV9TVEFURV9XSU5ET1c6LTYwfQoKIyBEZWZhdWx0IHN0ZWFkeS1zdGF0ZSBhbGxvd3MgYW55IHBvZCBjb3VudCB0byBiZSAic3RlYWR5IHN0YXRlIgojIEluY3JlYXNpbmcgdGhpcyB3aWxsIHNraXAgYW55IHN0ZWFkeS1zdGF0ZSBjaGVja3MgdW50aWwgdGhlIGNvdW50IHJpc2VzIGFib3ZlCiMgdGhpcyBudW1iZXIgdG8gYXZvaWQgZmFsc2UgcG9zaXRpdmVzIGlmIHRoZXJlIGFyZSBzb21lIHBlcmlvZHMgd2hlcmUgdGhlCiMgY291bnQgZG9lc24ndCBpbmNyZWFzZSBidXQgd2Uga25vdyB3ZSBjYW4ndCBiZSBhdCBzdGVhZHktc3RhdGUgeWV0LgpTVEVBRFlfU1RBVEVfTUlOSU1VTT0ke1NURUFEWV9TVEFURV9NSU5JTVVNOi0wfQoKIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIwoKd2l0aGluKCkgewogIGxvY2FsIGxhc3Q9JDEgY3VycmVudD0kMiB0aHJlc2hvbGQ9JDMKICBsb2NhbCBkZWx0YT0wIHBjaGFuZ2UKICBkZWx0YT0kKCggY3VycmVudCAtIGxhc3QgKSkKICBpZiBbWyAkY3VycmVudCAtZXEgJGxhc3QgXV07IHRoZW4KICAgIHBjaGFuZ2U9MAogIGVsaWYgW1sgJGxhc3QgLWVxIDAgXV07IHRoZW4KICAgIHBjaGFuZ2U9MTAwMDAwMAogIGVsc2UKICAgIHBjaGFuZ2U9JCgoICggIiRkZWx0YSIgKiAxMDApIC8gbGFzdCApKQogIGZpCiAgZWNobyAtbiAibGFzdDokbGFzdCBjdXJyZW50OiRjdXJyZW50IGRlbHRhOiRkZWx0YSBwY2hhbmdlOiR7cGNoYW5nZX0lOiAiCiAgbG9jYWwgYWJzb2x1dGUgbGltaXQKICBjYXNlICR0aHJlc2hvbGQgaW4KICAgIColKQogICAgICBhYnNvbHV0ZT0ke3BjaGFuZ2UjIy19ICMgYWJzb2x1dGUgdmFsdWUKICAgICAgbGltaXQ9JHt0aHJlc2hvbGQlJSV9CiAgICAgIDs7CiAgICAqKQogICAgICBhYnNvbHV0ZT0ke2RlbHRhIyMtfSAjIGFic29sdXRlIHZhbHVlCiAgICAgIGxpbWl0PSR0aHJlc2hvbGQKICAgICAgOzsKICBlc2FjCiAgaWYgW1sgJGFic29sdXRlIC1sZSAkbGltaXQgXV07IHRoZW4KICAgIGVjaG8gIndpdGhpbiAoKy8tKSR0aHJlc2hvbGQiCiAgICByZXR1cm4gMAogIGVsc2UKICAgIGVjaG8gIm91dHNpZGUgKCsvLSkkdGhyZXNob2xkIgogICAgcmV0dXJuIDEKICBmaQp9CgpzdGVhZHlzdGF0ZSgpIHsKICBsb2NhbCBsYXN0PSQxIGN1cnJlbnQ9JDIKICBpZiBbWyAkbGFzdCAtbHQgJFNURUFEWV9TVEFURV9NSU5JTVVNIF1dOyB0aGVuCiAgICBlY2hvICJsYXN0OiRsYXN0IGN1cnJlbnQ6JGN1cnJlbnQgV2FpdGluZyB0byByZWFjaCAkU1RFQURZX1NUQVRFX01JTklNVU0gYmVmb3JlIGNoZWNraW5nIGZvciBzdGVhZHktc3RhdGUiCiAgICByZXR1cm4gMQogIGZpCiAgd2l0aGluICIkbGFzdCIgIiRjdXJyZW50IiAiJFNURUFEWV9TVEFURV9USFJFU0hPTEQiCn0KCndhaXRGb3JSZWFkeSgpIHsKICBsb2dnZXIgIlJlY292ZXJ5OiBXYWl0aW5nICR7TUFYSU1VTV9XQUlUX1RJTUV9cyBmb3IgdGhlIGluaXRpYWxpemF0aW9uIHRvIGNvbXBsZXRlIgogIGxvY2FsIHQ9MCBzPTEwCiAgbG9jYWwgbGFzdENjb3VudD0wIGNjb3VudD0wIHN0ZWFkeVN0YXRlVGltZT0wCiAgd2hpbGUgW1sgJHQgLWx0ICRNQVhJTVVNX1dBSVRfVElNRSBdXTsgZG8KICAgIHNsZWVwICRzCiAgICAoKHQgKz0gcykpCiAgICAjIERldGVjdCBzdGVhZHktc3RhdGUgcG9kIGNvdW50CiAgICBjY291bnQ9JChjcmljdGwgcHMgMj4vZGV2L251bGwgfCB3YyAtbCkKICAgIGlmIFtbICRjY291bnQgLWd0IDAgXV0gJiYgc3RlYWR5c3RhdGUgIiRsYXN0Q2NvdW50IiAiJGNjb3VudCI7IHRoZW4KICAgICAgKChzdGVhZHlTdGF0ZVRpbWUgKz0gcykpCiAgICAgIGVjaG8gIlN0ZWFkeS1zdGF0ZSBmb3IgJHtzdGVhZHlTdGF0ZVRpbWV9cy8ke1NURUFEWV9TVEFURV9XSU5ET1d9cyIKICAgICAgaWYgW1sgJHN0ZWFkeVN0YXRlVGltZSAtZ2UgJFNURUFEWV9TVEFURV9XSU5ET1cgXV07IHRoZW4KICAgICAgICBsb2dnZXIgIlJlY292ZXJ5OiBTdGVhZHktc3RhdGUgKCsvLSAkU1RFQURZX1NUQVRFX1RIUkVTSE9MRCkgZm9yICR7U1RFQURZX1NUQVRFX1dJTkRPV31zOiBEb25lIgogICAgICAgIHJldHVybiAwCiAgICAgIGZpCiAgICBlbHNlCiAgICAgIGlmIFtbICRzdGVhZHlTdGF0ZVRpbWUgLWd0IDAgXV07IHRoZW4KICAgICAgICBlY2hvICJSZXNldHRpbmcgc3RlYWR5LXN0YXRlIHRpbWVyIgogICAgICAgIHN0ZWFkeVN0YXRlVGltZT0wCiAgICAgIGZpCiAgICBmaQogICAgbGFzdENjb3VudD0kY2NvdW50CiAgZG9uZQogIGxvZ2dlciAiUmVjb3Zlcnk6IFJlY292ZXJ5IENvbXBsZXRlIFRpbWVvdXQiCn0KCnNldFJjdU5vcm1hbCgpIHsKICBlY2hvICJTZXR0aW5nIHJjdV9ub3JtYWwgdG8gMSIKICBlY2hvIDEgPiAvc3lzL2tlcm5lbC9yY3Vfbm9ybWFsCn0KCm1haW4oKSB7CiAgd2FpdEZvclJlYWR5CiAgZWNobyAiV2FpdGluZyBmb3Igc3RlYWR5IHN0YXRlIHRvb2s6ICQoYXdrICd7cHJpbnQgaW50KCQxLzM2MDApImgiLCBpbnQoKCQxJTM2MDApLzYwKSJtIiwgaW50KCQxJTYwKSJzIn0nIC9wcm9jL3VwdGltZSkiCiAgc2V0UmN1Tm9ybWFsCn0KCmlmIFtbICIke0JBU0hfU09VUkNFWzBdfSIgPSAiJHswfSIgXV07IHRoZW4KICBtYWluICIke0B9IgogIGV4aXQgJD8KZmkK
          mode: 493
          path: /usr/local/bin/set-rcu-normal.sh
    systemd:
      units:
        - contents: |
            [Unit]
            Description=Disable rcu_expedited after node has finished booting by setting rcu_normal to 1

            [Service]
            Type=simple
            ExecStart=/usr/local/bin/set-rcu-normal.sh

            # Maximum wait time is 600s = 10m:
            Environment=MAXIMUM_WAIT_TIME=600

            # Steady-state threshold = 2%
            # Allowed values:
            #  4  - absolute pod count (+/-)
            #  4% - percent change (+/-)
            #  -1 - disable the steady-state check
            # Note: '%' must be escaped as '%%' in systemd unit files
            Environment=STEADY_STATE_THRESHOLD=2%%

            # Steady-state window = 120s
            # If the running pod count stays within the given threshold for this time
            # period, return CPU utilization to normal before the maximum wait time has
            # expires
            Environment=STEADY_STATE_WINDOW=120

            # Steady-state minimum = 40
            # Increasing this will skip any steady-state checks until the count rises above
            # this number to avoid false positives if there are some periods where the
            # count doesn't increase but we know we can't be at steady-state yet.
            Environment=STEADY_STATE_MINIMUM=40

            [Install]
            WantedBy=multi-user.target
          enabled: true
          name: set-rcu-normal.service

使用 kdump 自动进行内核崩溃转储

kdump是 Linux 内核的一个功能,当内核崩溃时,它会创建一个内核崩溃转储。使用以下MachineConfig CR 启用kdump

推荐的MachineConfig CR,用于从控制平面 kdump 日志中删除 ice 驱动程序 (05-kdump-config-master.yaml)
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 05-kdump-config-master
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
        - enabled: true
          name: kdump-remove-ice-module.service
          contents: |
            [Unit]
            Description=Remove ice module when doing kdump
            Before=kdump.service
            [Service]
            Type=oneshot
            RemainAfterExit=true
            ExecStart=/usr/local/bin/kdump-remove-ice-module.sh
            [Install]
            WantedBy=multi-user.target
    storage:
      files:
        - contents:
            source: data:text/plain;charset=utf-8;base64,IyEvdXNyL2Jpbi9lbnYgYmFzaAoKIyBUaGlzIHNjcmlwdCByZW1vdmVzIHRoZSBpY2UgbW9kdWxlIGZyb20ga2R1bXAgdG8gcHJldmVudCBrZHVtcCBmYWlsdXJlcyBvbiBjZXJ0YWluIHNlcnZlcnMuCiMgVGhpcyBpcyBhIHRlbXBvcmFyeSB3b3JrYXJvdW5kIGZvciBSSEVMUExBTi0xMzgyMzYgYW5kIGNhbiBiZSByZW1vdmVkIHdoZW4gdGhhdCBpc3N1ZSBpcwojIGZpeGVkLgoKc2V0IC14CgpTRUQ9Ii91c3IvYmluL3NlZCIKR1JFUD0iL3Vzci9iaW4vZ3JlcCIKCiMgb3ZlcnJpZGUgZm9yIHRlc3RpbmcgcHVycG9zZXMKS0RVTVBfQ09ORj0iJHsxOi0vZXRjL3N5c2NvbmZpZy9rZHVtcH0iClJFTU9WRV9JQ0VfU1RSPSJtb2R1bGVfYmxhY2tsaXN0PWljZSIKCiMgZXhpdCBpZiBmaWxlIGRvZXNuJ3QgZXhpc3QKWyAhIC1mICR7S0RVTVBfQ09ORn0gXSAmJiBleGl0IDAKCiMgZXhpdCBpZiBmaWxlIGFscmVhZHkgdXBkYXRlZAoke0dSRVB9IC1GcSAke1JFTU9WRV9JQ0VfU1RSfSAke0tEVU1QX0NPTkZ9ICYmIGV4aXQgMAoKIyBUYXJnZXQgbGluZSBsb29rcyBzb21ldGhpbmcgbGlrZSB0aGlzOgojIEtEVU1QX0NPTU1BTkRMSU5FX0FQUEVORD0iaXJxcG9sbCBucl9jcHVzPTEgLi4uIGhlc3RfZGlzYWJsZSIKIyBVc2Ugc2VkIHRvIG1hdGNoIGV2ZXJ5dGhpbmcgYmV0d2VlbiB0aGUgcXVvdGVzIGFuZCBhcHBlbmQgdGhlIFJFTU9WRV9JQ0VfU1RSIHRvIGl0CiR7U0VEfSAtaSAncy9eS0RVTVBfQ09NTUFORExJTkVfQVBQRU5EPSJbXiJdKi8mICcke1JFTU9WRV9JQ0VfU1RSfScvJyAke0tEVU1QX0NPTkZ9IHx8IGV4aXQgMAo=
          mode: 448
          path: /usr/local/bin/kdump-remove-ice-module.sh
推荐的控制平面节点 kdump 配置 (06-kdump-master.yaml)
# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 06-kdump-enable-master
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
        - enabled: true
          name: kdump.service
  kernelArguments:
    - crashkernel=512M
推荐的MachineConfig CR,用于从工作节点 kdump 日志中删除 ice 驱动程序 (05-kdump-config-worker.yaml)
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 05-kdump-config-worker
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
        - enabled: true
          name: kdump-remove-ice-module.service
          contents: |
            [Unit]
            Description=Remove ice module when doing kdump
            Before=kdump.service
            [Service]
            Type=oneshot
            RemainAfterExit=true
            ExecStart=/usr/local/bin/kdump-remove-ice-module.sh
            [Install]
            WantedBy=multi-user.target
    storage:
      files:
        - contents:
            source: data:text/plain;charset=utf-8;base64,IyEvdXNyL2Jpbi9lbnYgYmFzaAoKIyBUaGlzIHNjcmlwdCByZW1vdmVzIHRoZSBpY2UgbW9kdWxlIGZyb20ga2R1bXAgdG8gcHJldmVudCBrZHVtcCBmYWlsdXJlcyBvbiBjZXJ0YWluIHNlcnZlcnMuCiMgVGhpcyBpcyBhIHRlbXBvcmFyeSB3b3JrYXJvdW5kIGZvciBSSEVMUExBTi0xMzgyMzYgYW5kIGNhbiBiZSByZW1vdmVkIHdoZW4gdGhhdCBpc3N1ZSBpcwojIGZpeGVkLgoKc2V0IC14CgpTRUQ9Ii91c3IvYmluL3NlZCIKR1JFUD0iL3Vzci9iaW4vZ3JlcCIKCiMgb3ZlcnJpZGUgZm9yIHRlc3RpbmcgcHVycG9zZXMKS0RVTVBfQ09ORj0iJHsxOi0vZXRjL3N5c2NvbmZpZy9rZHVtcH0iClJFTU9WRV9JQ0VfU1RSPSJtb2R1bGVfYmxhY2tsaXN0PWljZSIKCiMgZXhpdCBpZiBmaWxlIGRvZXNuJ3QgZXhpc3QKWyAhIC1mICR7S0RVTVBfQ09ORn0gXSAmJiBleGl0IDAKCiMgZXhpdCBpZiBmaWxlIGFscmVhZHkgdXBkYXRlZAoke0dSRVB9IC1GcSAke1JFTU9WRV9JQ0VfU1RSfSAke0tEVU1QX0NPTkZ9ICYmIGV4aXQgMAoKIyBUYXJnZXQgbGluZSBsb29rcyBzb21ldGhpbmcgbGlrZSB0aGlzOgojIEtEVU1QX0NPTU1BTkRMSU5FX0FQUEVORD0iaXJxcG9sbCBucl9jcHVzPTEgLi4uIGhlc3RfZGlzYWJsZSIKIyBVc2Ugc2VkIHRvIG1hdGNoIGV2ZXJ5dGhpbmcgYmV0d2VlbiB0aGUgcXVvdGVzIGFuZCBhcHBlbmQgdGhlIFJFTU9WRV9JQ0VfU1RSIHRvIGl0CiR7U0VEfSAtaSAncy9eS0RVTVBfQ09NTUFORExJTkVfQVBQRU5EPSJbXiJdKi8mICcke1JFTU9WRV9JQ0VfU1RSfScvJyAke0tEVU1QX0NPTkZ9IHx8IGV4aXQgMAo=
          mode: 448
          path: /usr/local/bin/kdump-remove-ice-module.sh
推荐的 kdump 工作节点配置 (06-kdump-worker.yaml)
# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 06-kdump-enable-worker
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
        - enabled: true
          name: kdump.service
  kernelArguments:
    - crashkernel=512M

禁用自动 CRI-O 缓存清除

在不受控制的主机关闭或集群重新启动后,CRI-O 会自动删除整个 CRI-O 缓存,导致节点重新启动时所有镜像都从注册表中提取。这可能导致不可接受的慢速恢复时间或恢复失败。为了防止这种情况在您使用 GitOps ZTP 安装的单节点 OpenShift 集群中发生,请在集群安装期间禁用 CRI-O 删除缓存功能。

推荐的MachineConfig CR,用于在控制平面节点上禁用 CRI-O 缓存清除 (99-crio-disable-wipe-master.yaml)
# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-crio-disable-wipe-master
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
        - contents:
            source: data:text/plain;charset=utf-8;base64,W2NyaW9dCmNsZWFuX3NodXRkb3duX2ZpbGUgPSAiIgo=
          mode: 420
          path: /etc/crio/crio.conf.d/99-crio-disable-wipe.toml
推荐的MachineConfig CR,用于在工作节点上禁用 CRI-O 缓存清除 (99-crio-disable-wipe-worker.yaml)
# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-crio-disable-wipe-worker
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
        - contents:
            source: data:text/plain;charset=utf-8;base64,W2NyaW9dCmNsZWFuX3NodXRkb3duX2ZpbGUgPSAiIgo=
          mode: 420
          path: /etc/crio/crio.conf.d/99-crio-disable-wipe.toml

配置 crun 为默认容器运行时

以下ContainerRuntimeConfig自定义资源 (CR) 将 crun 配置为控制平面和工作节点的默认 OCI 容器运行时。crun 容器运行时速度快、轻量级且内存占用空间小。

为了获得最佳性能,请在单节点 OpenShift、三节点 OpenShift 和标准集群中为控制平面和工作节点启用 crun。为了避免在应用 CR 时集群重新启动,请将更改作为 GitOps ZTP 附加的第 0 天安装时清单应用。

推荐的控制平面节点ContainerRuntimeConfig CR (enable-crun-master.yaml)
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
  name: enable-crun-master
spec:
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/master: ""
  containerRuntimeConfig:
    defaultRuntime: crun
推荐的工作节点ContainerRuntimeConfig CR (enable-crun-worker.yaml)
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
  name: enable-crun-worker
spec:
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ""
  containerRuntimeConfig:
    defaultRuntime: crun

使用 TPM 和 PCR 保护启用磁盘加密

您可以使用SiteConfig自定义资源 (CR) 中的diskEncryption字段来配置使用可信平台模块 (TPM) 和平台配置寄存器 (PCR) 保护的磁盘加密。

配置SiteConfig CR 可在集群安装时启用磁盘加密。

先决条件
  • 您已安装 OpenShift CLI (oc)。

  • 您已以具有cluster-admin权限的用户身份登录。

  • 您已阅读“关于使用 TPM 和 PCR 保护的磁盘加密”部分。

步骤
  • 配置SiteConfig CR 中的spec.clusters.diskEncryption字段

    推荐的SiteConfig CR 配置,用于启用使用 PCR 保护的磁盘加密
    apiVersion: ran.openshift.io/v1
    kind: SiteConfig
    metadata:
      name: "encryption-tpm2"
      namespace: "encryption-tpm2"
    spec:
      clusters:
      - clusterName: "encryption-tpm2"
        clusterImageSetNameRef: "openshift-v4.13.0"
        diskEncryption:
          type: "tpm2" (1)
          tpm2:
            pcrList: "1,7" (2)
        nodes:
          - hostName: "node1"
            role: master
    1 将磁盘加密类型设置为tpm2
    2 配置要用于磁盘加密的 PCR 列表。您必须使用 PCR 寄存器 1 和 7。
验证
  • 通过运行以下命令检查是否启用了使用 TPM 和 PCR 保护的磁盘加密

    $ clevis luks list -d <disk_path> (1)
    1 <disk_path>替换为磁盘的路径。例如,/dev/sda4
    示例输出
    1: tpm2 '{"hash":"sha256","key":"ecc","pcr_bank":"sha256","pcr_ids":"1,7"}'

推荐的安装后集群配置

集群安装完成后,ZTP 管道会应用运行 DU 工作负载所需的以下自定义资源 (CR)。

在 GitOps ZTP v4.10 及更早版本中,您需要使用 MachineConfig CR 配置 UEFI 安全启动。在 GitOps ZTP v4.11 及更高版本中,这不再需要。在 v4.11 版本中,您可以通过更新用于安装集群的 SiteConfig CR 中的 spec.clusters.nodes.bootMode 字段来为单节点 OpenShift 集群配置 UEFI 安全启动。更多信息,请参见 使用 SiteConfig 和 GitOps ZTP 部署托管集群

操作符

运行 DU 工作负载的单节点 OpenShift 集群需要安装以下操作符:

  • 本地存储操作符

  • 日志操作符

  • PTP 操作符

  • SR-IOV 网络操作符

您还需要配置自定义的 CatalogSource CR,禁用默认的 OperatorHub 配置,并配置一个您安装的集群可以访问的 ImageContentSourcePolicy 镜像仓库。

推荐的存储操作符命名空间和操作符组配置 (StorageNS.yaml, StorageOperGroup.yaml)
---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-local-storage
  annotations:
    workload.openshift.io/allowed: management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-local-storage
  namespace: openshift-local-storage
  annotations: {}
spec:
  targetNamespaces:
    - openshift-local-storage
推荐的集群日志操作符命名空间和操作符组配置 (ClusterLogNS.yaml, ClusterLogOperGroup.yaml)
---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-logging
  annotations:
    workload.openshift.io/allowed: management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: cluster-logging
  namespace: openshift-logging
  annotations: {}
spec:
  targetNamespaces:
    - openshift-logging
推荐的 PTP 操作符命名空间和操作符组配置 (PtpSubscriptionNS.yaml, PtpSubscriptionOperGroup.yaml)
---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-ptp
  annotations:
    workload.openshift.io/allowed: management
  labels:
    openshift.io/cluster-monitoring: "true"
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: ptp-operators
  namespace: openshift-ptp
  annotations: {}
spec:
  targetNamespaces:
    - openshift-ptp
推荐的 SR-IOV 操作符命名空间和操作符组配置 (SriovSubscriptionNS.yaml, SriovSubscriptionOperGroup.yaml)
---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-sriov-network-operator
  annotations:
    workload.openshift.io/allowed: management
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: sriov-network-operators
  namespace: openshift-sriov-network-operator
  annotations: {}
spec:
  targetNamespaces:
    - openshift-sriov-network-operator
推荐的 CatalogSource 配置 (DefaultCatsrc.yaml)
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: default-cat-source
  namespace: openshift-marketplace
  annotations:
    target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
spec:
  displayName: default-cat-source
  image: $imageUrl
  publisher: Red Hat
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 1h
status:
  connectionState:
    lastObservedState: READY
推荐的 ImageContentSourcePolicy 配置 (DisconnectedICSP.yaml)
apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  name: disconnected-internal-icsp
  annotations: {}
spec:
#    repositoryDigestMirrors:
#    - $mirrors
推荐的 OperatorHub 配置 (OperatorHub.yaml)
apiVersion: config.openshift.io/v1
kind: OperatorHub
metadata:
  name: cluster
  annotations: {}
spec:
  disableAllDefaultSources: true

操作符订阅

运行 DU 工作负载的单节点 OpenShift 集群需要以下 Subscription CR。订阅提供了下载以下操作符的位置:

  • 本地存储操作符

  • 日志操作符

  • PTP 操作符

  • SR-IOV 网络操作符

  • SRIOV-FEC 操作符

对于每个操作符订阅,请指定从中获取操作符的通道。推荐的通道是 stable

您可以指定 ManualAutomatic 更新。在 Automatic 模式下,操作符会在注册表中可用时自动更新到通道中的最新版本。在 Manual 模式下,只有在明确批准后才会安装新的操作符版本。

请使用 Manual 模式进行订阅。这允许您控制操作符更新的时间,以适应计划的维护窗口。

推荐的本地存储操作符订阅 (StorageSubscription.yaml)
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: local-storage-operator
  namespace: openshift-local-storage
  annotations: {}
spec:
  channel: "stable"
  name: local-storage-operator
  source: redhat-operators-disconnected
  sourceNamespace: openshift-marketplace
  installPlanApproval: Manual
status:
  state: AtLatestKnown
推荐的 SR-IOV 操作符订阅 (SriovSubscription.yaml)
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: sriov-network-operator-subscription
  namespace: openshift-sriov-network-operator
  annotations: {}
spec:
  channel: "stable"
  name: sriov-network-operator
  source: redhat-operators-disconnected
  sourceNamespace: openshift-marketplace
  installPlanApproval: Manual
status:
  state: AtLatestKnown
推荐的 PTP 操作符订阅 (PtpSubscription.yaml)
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: ptp-operator-subscription
  namespace: openshift-ptp
  annotations: {}
spec:
  channel: "stable"
  name: ptp-operator
  source: redhat-operators-disconnected
  sourceNamespace: openshift-marketplace
  installPlanApproval: Manual
status:
  state: AtLatestKnown
推荐的集群日志操作符订阅 (ClusterLogSubscription.yaml)
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: cluster-logging
  namespace: openshift-logging
  annotations: {}
spec:
  channel: "stable-6.0"
  name: cluster-logging
  source: redhat-operators-disconnected
  sourceNamespace: openshift-marketplace
  installPlanApproval: Manual
status:
  state: AtLatestKnown

集群日志记录和日志转发

运行 DU 工作负载的单节点 OpenShift 集群需要日志记录和日志转发进行调试。需要以下自定义资源 (CR):

推荐的 ClusterLogForwarder.yaml
apiVersion: "observability.openshift.io/v1"
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
  annotations: {}
spec:
  # outputs: $outputs
  # pipelines: $pipelines
  serviceAccount:
    name: logcollector
#apiVersion: "observability.openshift.io/v1"
#kind: ClusterLogForwarder
#metadata:
#  name: instance
#  namespace: openshift-logging
# spec:
#   outputs:
#   - type: "kafka"
#     name: kafka-open
#     # below url is an example
#     kafka:
#       url: tcp://10.46.55.190:9092/test
#   filters:
#   - name: test-labels
#     type: openshiftLabels
#     openshiftLabels:
#       label1: test1
#       label2: test2
#       label3: test3
#       label4: test4
#   pipelines:
#   - name: all-to-default
#     inputRefs:
#     - audit
#     - infrastructure
#     filterRefs:
#     - test-labels
#     outputRefs:
#     - kafka-open
#   serviceAccount:
#     name: logcollector

spec.outputs.kafka.url 字段设置为日志转发到的 Kafka 服务器的 URL。

推荐的 ClusterLogNS.yaml
---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-logging
  annotations:
    workload.openshift.io/allowed: management
推荐的 ClusterLogOperGroup.yaml
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: cluster-logging
  namespace: openshift-logging
  annotations: {}
spec:
  targetNamespaces:
    - openshift-logging
推荐的 ClusterLogServiceAccount.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: logcollector
  namespace: openshift-logging
  annotations: {}
推荐的 ClusterLogServiceAccountAuditBinding.yaml
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: logcollector-audit-logs-binding
  annotations: {}
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: collect-audit-logs
subjects:
  - kind: ServiceAccount
    name: logcollector
    namespace: openshift-logging
推荐的 ClusterLogServiceAccountInfrastructureBinding.yaml
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: logcollector-infrastructure-logs-binding
  annotations: {}
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: collect-infrastructure-logs
subjects:
  - kind: ServiceAccount
    name: logcollector
    namespace: openshift-logging
推荐的 ClusterLogSubscription.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: cluster-logging
  namespace: openshift-logging
  annotations: {}
spec:
  channel: "stable-6.0"
  name: cluster-logging
  source: redhat-operators-disconnected
  sourceNamespace: openshift-marketplace
  installPlanApproval: Manual
status:
  state: AtLatestKnown

性能配置

运行 DU 工作负载的单节点 OpenShift 集群需要 Node Tuning Operator 性能配置以使用实时主机功能和服务。

在早期版本的 OpenShift Container Platform 中,使用 Performance Addon Operator 来实现自动调整,以实现 OpenShift 应用程序的低延迟性能。在 OpenShift Container Platform 4.11 及更高版本中,此功能是 Node Tuning Operator 的一部分。

以下示例 PerformanceProfile CR 说明了所需的单节点 OpenShift 集群配置。

推荐的性能配置 (PerformanceProfile.yaml)
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  # if you change this name make sure the 'include' line in TunedPerformancePatch.yaml
  # matches this name: include=openshift-node-performance-${PerformanceProfile.metadata.name}
  # Also in file 'validatorCRs/informDuValidator.yaml': 
  # name: 50-performance-${PerformanceProfile.metadata.name}
  name: openshift-node-performance-profile
  annotations:
    ran.openshift.io/reference-configuration: "ran-du.redhat.com"
spec:
  additionalKernelArgs:
    - "rcupdate.rcu_normal_after_boot=0"
    - "efi=runtime"
    - "vfio_pci.enable_sriov=1"
    - "vfio_pci.disable_idle_d3=1"
    - "module_blacklist=irdma"
  cpu:
    isolated: $isolated
    reserved: $reserved
  hugepages:
    defaultHugepagesSize: $defaultHugepagesSize
    pages:
      - size: $size
        count: $count
        node: $node
  machineConfigPoolSelector:
    pools.operator.machineconfiguration.openshift.io/$mcp: ""
  nodeSelector:
    node-role.kubernetes.io/$mcp: ''
  numa:
    topologyPolicy: "restricted"
  # To use the standard (non-realtime) kernel, set enabled to false
  realTimeKernel:
    enabled: true
  workloadHints:
    # WorkloadHints defines the set of upper level flags for different type of workloads.
    # See https://github.com/openshift/cluster-node-tuning-operator/blob/master/docs/performanceprofile/performance_profile.md#workloadhints
    # for detailed descriptions of each item.
    # The configuration below is set for a low latency, performance mode.
    realTime: true
    highPowerConsumption: false
    perPodPowerManagement: false
表 3. 单节点 OpenShift 集群的 PerformanceProfile CR 选项
PerformanceProfile CR 字段 描述

metadata.name

确保 name 与相关 GitOps ZTP 自定义资源 (CR) 中设置的以下字段匹配:

  • include=openshift-node-performance-${PerformanceProfile.metadata.name}TunedPerformancePatch.yaml

  • name: 50-performance-${PerformanceProfile.metadata.name}validatorCRs/informDuValidator.yaml

spec.additionalKernelArgs

"efi=runtime" 为集群主机配置 UEFI 安全启动。

spec.cpu.isolated

设置隔离的 CPU。确保所有超线程对都匹配。

保留和隔离的 CPU 池不得重叠,并且必须一起涵盖所有可用的内核。未计算在内的 CPU 内核会导致系统出现未定义的行为。

spec.cpu.reserved

设置保留的 CPU。启用工作负载分区时,系统进程、内核线程和系统容器线程将限制在这些 CPU 上。所有未隔离的 CPU 都应保留。

spec.hugepages.pages

  • 设置巨页数量 (count)

  • 设置巨页大小 (size)。

  • node 设置为分配 hugepages 的 NUMA 节点 (node)

spec.realTimeKernel

enabled 设置为 true 以使用实时内核。

spec.workloadHints

使用 workloadHints 定义不同类型工作负载的顶级标志集。此示例配置将集群配置为低延迟和高性能。

配置集群时间同步

为控制平面或工作节点运行一次性系统时间同步作业。

推荐的控制平面节点一次性时间同步 (99-sync-time-once-master.yaml)
# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-sync-time-once-master
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
        - contents: |
            [Unit]
            Description=Sync time once
            After=network-online.target
            Wants=network-online.target
            [Service]
            Type=oneshot
            TimeoutStartSec=300
            ExecCondition=/bin/bash -c 'systemctl is-enabled chronyd.service --quiet && exit 1 || exit 0'
            ExecStart=/usr/sbin/chronyd -n -f /etc/chrony.conf -q
            RemainAfterExit=yes
            [Install]
            WantedBy=multi-user.target
          enabled: true
          name: sync-time-once.service
推荐的工作节点一次性时间同步 (99-sync-time-once-worker.yaml)
# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-sync-time-once-worker
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
        - contents: |
            [Unit]
            Description=Sync time once
            After=network-online.target
            Wants=network-online.target
            [Service]
            Type=oneshot
            TimeoutStartSec=300
            ExecCondition=/bin/bash -c 'systemctl is-enabled chronyd.service --quiet && exit 1 || exit 0'
            ExecStart=/usr/sbin/chronyd -n -f /etc/chrony.conf -q
            RemainAfterExit=yes
            [Install]
            WantedBy=multi-user.target
          enabled: true
          name: sync-time-once.service

PTP

单节点 OpenShift 集群使用精确时间协议 (PTP) 进行网络时间同步。以下示例 PtpConfig CR 说明了普通时钟、边界时钟和主时钟所需的 PTP 配置。您应用的确切配置将取决于节点硬件和具体用例。

推荐的 PTP 普通时钟配置 (PtpConfigSlave.yaml)
apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: ordinary
  namespace: openshift-ptp
  annotations: {}
spec:
  profile:
    - name: "ordinary"
      # The interface name is hardware-specific
      interface: $interface
      ptp4lOpts: "-2 -s"
      phc2sysOpts: "-a -r -n 24"
      ptpSchedulingPolicy: SCHED_FIFO
      ptpSchedulingPriority: 10
      ptpSettings:
        logReduce: "true"
      ptp4lConf: |
        [global]
        #
        # Default Data Set
        #
        twoStepFlag 1
        slaveOnly 1
        priority1 128
        priority2 128
        domainNumber 24
        #utc_offset 37
        clockClass 255
        clockAccuracy 0xFE
        offsetScaledLogVariance 0xFFFF
        free_running 0
        freq_est_interval 1
        dscp_event 0
        dscp_general 0
        dataset_comparison G.8275.x
        G.8275.defaultDS.localPriority 128
        #
        # Port Data Set
        #
        logAnnounceInterval -3
        logSyncInterval -4
        logMinDelayReqInterval -4
        logMinPdelayReqInterval -4
        announceReceiptTimeout 3
        syncReceiptTimeout 0
        delayAsymmetry 0
        fault_reset_interval -4
        neighborPropDelayThresh 20000000
        masterOnly 0
        G.8275.portDS.localPriority 128
        #
        # Run time options
        #
        assume_two_step 0
        logging_level 6
        path_trace_enabled 0
        follow_up_info 0
        hybrid_e2e 0
        inhibit_multicast_service 0
        net_sync_monitor 0
        tc_spanning_tree 0
        tx_timestamp_timeout 50
        unicast_listen 0
        unicast_master_table 0
        unicast_req_duration 3600
        use_syslog 1
        verbose 0
        summary_interval 0
        kernel_leap 1
        check_fup_sync 0
        clock_class_threshold 7
        #
        # Servo Options
        #
        pi_proportional_const 0.0
        pi_integral_const 0.0
        pi_proportional_scale 0.0
        pi_proportional_exponent -0.3
        pi_proportional_norm_max 0.7
        pi_integral_scale 0.0
        pi_integral_exponent 0.4
        pi_integral_norm_max 0.3
        step_threshold 2.0
        first_step_threshold 0.00002
        max_frequency 900000000
        clock_servo pi
        sanity_freq_limit 200000000
        ntpshm_segment 0
        #
        # Transport options
        #
        transportSpecific 0x0
        ptp_dst_mac 01:1B:19:00:00:00
        p2p_dst_mac 01:80:C2:00:00:0E
        udp_ttl 1
        udp6_scope 0x0E
        uds_address /var/run/ptp4l
        #
        # Default interface options
        #
        clock_type OC
        network_transport L2
        delay_mechanism E2E
        time_stamping hardware
        tsproc_mode filter
        delay_filter moving_median
        delay_filter_length 10
        egressLatency 0
        ingressLatency 0
        boundary_clock_jbod 0
        #
        # Clock description
        #
        productDescription ;;
        revisionData ;;
        manufacturerIdentity 00:00:00
        userDescription ;
        timeSource 0xA0
  recommend:
    - profile: "ordinary"
      priority: 4
      match:
        - nodeLabel: "node-role.kubernetes.io/$mcp"
推荐的边界时钟配置 (PtpConfigBoundary.yaml)
apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: boundary
  namespace: openshift-ptp
  annotations: {}
spec:
  profile:
    - name: "boundary"
      ptp4lOpts: "-2"
      phc2sysOpts: "-a -r -n 24"
      ptpSchedulingPolicy: SCHED_FIFO
      ptpSchedulingPriority: 10
      ptpSettings:
        logReduce: "true"
      ptp4lConf: |
        # The interface name is hardware-specific
        [$iface_slave]
        masterOnly 0
        [$iface_master_1]
        masterOnly 1
        [$iface_master_2]
        masterOnly 1
        [$iface_master_3]
        masterOnly 1
        [global]
        #
        # Default Data Set
        #
        twoStepFlag 1
        slaveOnly 0
        priority1 128
        priority2 128
        domainNumber 24
        #utc_offset 37
        clockClass 248
        clockAccuracy 0xFE
        offsetScaledLogVariance 0xFFFF
        free_running 0
        freq_est_interval 1
        dscp_event 0
        dscp_general 0
        dataset_comparison G.8275.x
        G.8275.defaultDS.localPriority 128
        #
        # Port Data Set
        #
        logAnnounceInterval -3
        logSyncInterval -4
        logMinDelayReqInterval -4
        logMinPdelayReqInterval -4
        announceReceiptTimeout 3
        syncReceiptTimeout 0
        delayAsymmetry 0
        fault_reset_interval -4
        neighborPropDelayThresh 20000000
        masterOnly 0
        G.8275.portDS.localPriority 128
        #
        # Run time options
        #
        assume_two_step 0
        logging_level 6
        path_trace_enabled 0
        follow_up_info 0
        hybrid_e2e 0
        inhibit_multicast_service 0
        net_sync_monitor 0
        tc_spanning_tree 0
        tx_timestamp_timeout 50
        unicast_listen 0
        unicast_master_table 0
        unicast_req_duration 3600
        use_syslog 1
        verbose 0
        summary_interval 0
        kernel_leap 1
        check_fup_sync 0
        clock_class_threshold 135
        #
        # Servo Options
        #
        pi_proportional_const 0.0
        pi_integral_const 0.0
        pi_proportional_scale 0.0
        pi_proportional_exponent -0.3
        pi_proportional_norm_max 0.7
        pi_integral_scale 0.0
        pi_integral_exponent 0.4
        pi_integral_norm_max 0.3
        step_threshold 2.0
        first_step_threshold 0.00002
        max_frequency 900000000
        clock_servo pi
        sanity_freq_limit 200000000
        ntpshm_segment 0
        #
        # Transport options
        #
        transportSpecific 0x0
        ptp_dst_mac 01:1B:19:00:00:00
        p2p_dst_mac 01:80:C2:00:00:0E
        udp_ttl 1
        udp6_scope 0x0E
        uds_address /var/run/ptp4l
        #
        # Default interface options
        #
        clock_type BC
        network_transport L2
        delay_mechanism E2E
        time_stamping hardware
        tsproc_mode filter
        delay_filter moving_median
        delay_filter_length 10
        egressLatency 0
        ingressLatency 0
        boundary_clock_jbod 0
        #
        # Clock description
        #
        productDescription ;;
        revisionData ;;
        manufacturerIdentity 00:00:00
        userDescription ;
        timeSource 0xA0
  recommend:
    - profile: "boundary"
      priority: 4
      match:
        - nodeLabel: "node-role.kubernetes.io/$mcp"
推荐的 PTP Westport Channel e810 主时钟配置 (PtpConfigGmWpc.yaml)
# The grandmaster profile is provided for testing only
# It is not installed on production clusters
apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: grandmaster
  namespace: openshift-ptp
  annotations: {}
spec:
  profile:
    - name: "grandmaster"
      ptp4lOpts: "-2 --summary_interval -4"
      phc2sysOpts: -r -u 0 -m -w -N 8 -R 16 -s $iface_master -n 24
      ptpSchedulingPolicy: SCHED_FIFO
      ptpSchedulingPriority: 10
      ptpSettings:
        logReduce: "true"
      plugins:
        e810:
          enableDefaultConfig: false
          settings:
            LocalMaxHoldoverOffSet: 1500
            LocalHoldoverTimeout: 14400
            MaxInSpecOffset: 100
          pins: $e810_pins
          #  "$iface_master":
          #    "U.FL2": "0 2"
          #    "U.FL1": "0 1"
          #    "SMA2": "0 2"
          #    "SMA1": "0 1"
          ublxCmds:
            - args: #ubxtool -P 29.20 -z CFG-HW-ANT_CFG_VOLTCTRL,1
                - "-P"
                - "29.20"
                - "-z"
                - "CFG-HW-ANT_CFG_VOLTCTRL,1"
              reportOutput: false
            - args: #ubxtool -P 29.20 -e GPS
                - "-P"
                - "29.20"
                - "-e"
                - "GPS"
              reportOutput: false
            - args: #ubxtool -P 29.20 -d Galileo
                - "-P"
                - "29.20"
                - "-d"
                - "Galileo"
              reportOutput: false
            - args: #ubxtool -P 29.20 -d GLONASS
                - "-P"
                - "29.20"
                - "-d"
                - "GLONASS"
              reportOutput: false
            - args: #ubxtool -P 29.20 -d BeiDou
                - "-P"
                - "29.20"
                - "-d"
                - "BeiDou"
              reportOutput: false
            - args: #ubxtool -P 29.20 -d SBAS
                - "-P"
                - "29.20"
                - "-d"
                - "SBAS"
              reportOutput: false
            - args: #ubxtool -P 29.20 -t -w 5 -v 1 -e SURVEYIN,600,50000
                - "-P"
                - "29.20"
                - "-t"
                - "-w"
                - "5"
                - "-v"
                - "1"
                - "-e"
                - "SURVEYIN,600,50000"
              reportOutput: true
            - args: #ubxtool -P 29.20 -p MON-HW
                - "-P"
                - "29.20"
                - "-p"
                - "MON-HW"
              reportOutput: true
            - args: #ubxtool -P 29.20 -p CFG-MSG,1,38,248
                - "-P"
                - "29.20"
                - "-p"
                - "CFG-MSG,1,38,248"
              reportOutput: true
      ts2phcOpts: " "
      ts2phcConf: |
        [nmea]
        ts2phc.master 1
        [global]
        use_syslog  0
        verbose 1
        logging_level 7
        ts2phc.pulsewidth 100000000
        #cat /dev/GNSS to find available serial port
        #example value of gnss_serialport is /dev/ttyGNSS_1700_0
        ts2phc.nmea_serialport $gnss_serialport
        leapfile  /usr/share/zoneinfo/leap-seconds.list
        [$iface_master]
        ts2phc.extts_polarity rising
        ts2phc.extts_correction 0
      ptp4lConf: |
        [$iface_master]
        masterOnly 1
        [$iface_master_1]
        masterOnly 1
        [$iface_master_2]
        masterOnly 1
        [$iface_master_3]
        masterOnly 1
        [global]
        #
        # Default Data Set
        #
        twoStepFlag 1
        priority1 128
        priority2 128
        domainNumber 24
        #utc_offset 37
        clockClass 6
        clockAccuracy 0x27
        offsetScaledLogVariance 0xFFFF
        free_running 0
        freq_est_interval 1
        dscp_event 0
        dscp_general 0
        dataset_comparison G.8275.x
        G.8275.defaultDS.localPriority 128
        #
        # Port Data Set
        #
        logAnnounceInterval -3
        logSyncInterval -4
        logMinDelayReqInterval -4
        logMinPdelayReqInterval 0
        announceReceiptTimeout 3
        syncReceiptTimeout 0
        delayAsymmetry 0
        fault_reset_interval -4
        neighborPropDelayThresh 20000000
        masterOnly 0
        G.8275.portDS.localPriority 128
        #
        # Run time options
        #
        assume_two_step 0
        logging_level 6
        path_trace_enabled 0
        follow_up_info 0
        hybrid_e2e 0
        inhibit_multicast_service 0
        net_sync_monitor 0
        tc_spanning_tree 0
        tx_timestamp_timeout 50
        unicast_listen 0
        unicast_master_table 0
        unicast_req_duration 3600
        use_syslog 1
        verbose 0
        summary_interval -4
        kernel_leap 1
        check_fup_sync 0
        clock_class_threshold 7
        #
        # Servo Options
        #
        pi_proportional_const 0.0
        pi_integral_const 0.0
        pi_proportional_scale 0.0
        pi_proportional_exponent -0.3
        pi_proportional_norm_max 0.7
        pi_integral_scale 0.0
        pi_integral_exponent 0.4
        pi_integral_norm_max 0.3
        step_threshold 2.0
        first_step_threshold 0.00002
        clock_servo pi
        sanity_freq_limit  200000000
        ntpshm_segment 0
        #
        # Transport options
        #
        transportSpecific 0x0
        ptp_dst_mac 01:1B:19:00:00:00
        p2p_dst_mac 01:80:C2:00:00:0E
        udp_ttl 1
        udp6_scope 0x0E
        uds_address /var/run/ptp4l
        #
        # Default interface options
        #
        clock_type BC
        network_transport L2
        delay_mechanism E2E
        time_stamping hardware
        tsproc_mode filter
        delay_filter moving_median
        delay_filter_length 10
        egressLatency 0
        ingressLatency 0
        boundary_clock_jbod 0
        #
        # Clock description
        #
        productDescription ;;
        revisionData ;;
        manufacturerIdentity 00:00:00
        userDescription ;
        timeSource 0x20
  recommend:
    - profile: "grandmaster"
      priority: 4
      match:
        - nodeLabel: "node-role.kubernetes.io/$mcp"

以下可选的 PtpOperatorConfig CR 配置节点的 PTP 事件报告。

推荐的 PTP 事件配置 (PtpOperatorConfigForEvent.yaml)
apiVersion: ptp.openshift.io/v1
kind: PtpOperatorConfig
metadata:
  name: default
  namespace: openshift-ptp
  annotations: {}
spec:
  daemonNodeSelector:
    node-role.kubernetes.io/$mcp: ""
  ptpEventConfig:
    apiVersion: $event_api_version
    enableEventPublisher: true
    transportHost: "http://ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043"

扩展调整配置文件

运行 DU 工作负载的单节点 OpenShift 集群需要额外的性能调整配置才能满足高性能工作负载的要求。以下示例Tuned CR 扩展了Tuned 配置文件

推荐的扩展Tuned 配置文件配置 (TunedPerformancePatch.yaml)
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: performance-patch
  namespace: openshift-cluster-node-tuning-operator
  annotations: {}
spec:
  profile:
    - name: performance-patch
      # Please note:
      # - The 'include' line must match the associated PerformanceProfile name, following below pattern
      #   include=openshift-node-performance-${PerformanceProfile.metadata.name}
      # - When using the standard (non-realtime) kernel, remove the kernel.timer_migration override from
      #   the [sysctl] section and remove the entire section if it is empty.
      data: |
        [main]
        summary=Configuration changes profile inherited from performance created tuned
        include=openshift-node-performance-openshift-node-performance-profile
        [scheduler]
        group.ice-ptp=0:f:10:*:ice-ptp.*
        group.ice-gnss=0:f:10:*:ice-gnss.*
        group.ice-dplls=0:f:10:*:ice-dplls.*
        [service]
        service.stalld=start,enable
        service.chronyd=stop,disable
  recommend:
    - machineConfigLabels:
        machineconfiguration.openshift.io/role: "$mcp"
      priority: 19
      profile: performance-patch
表 4. 单节点 OpenShift 集群的Tuned CR 选项
Tuned CR 字段 描述

spec.profile.data

  • spec.profile.data中设置的include行必须与关联的PerformanceProfile CR 名称匹配。例如,include=openshift-node-performance-${PerformanceProfile.metadata.name}

SR-IOV

单根 I/O 虚拟化 (SR-IOV) 常用于启用前传和中传网络。以下 YAML 示例配置了单节点 OpenShift 集群的 SR-IOV。

SriovNetwork CR 的配置将根据您的特定网络和基础设施需求而有所不同。

推荐的SriovOperatorConfig CR 配置 (SriovOperatorConfig.yaml)
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
  namespace: openshift-sriov-network-operator
  annotations: {}
spec:
  configDaemonNodeSelector:
    "node-role.kubernetes.io/$mcp": ""
  # Injector and OperatorWebhook pods can be disabled (set to "false") below
  # to reduce the number of management pods. It is recommended to start with the 
  # webhook and injector pods enabled, and only disable them after verifying the
  # correctness of user manifests.
  #   If the injector is disabled, containers using sr-iov resources must explicitly assign
  #   them in the  "requests"/"limits" section of the container spec, for example:
  #    containers:
  #    - name: my-sriov-workload-container
  #      resources:
  #        limits:
  #          openshift.io/<resource_name>:  "1"
  #        requests:
  #          openshift.io/<resource_name>:  "1"
  enableInjector: false
  enableOperatorWebhook: false
  logLevel: 0
表 5. 单节点 OpenShift 集群的SriovOperatorConfig CR 选项
SriovOperatorConfig CR 字段 描述

spec.enableInjector

禁用Injector Pod 以减少管理 Pod 的数量。先启用Injector Pod,并在验证用户清单后才禁用它们。如果禁用 Injector,则使用 SR-IOV 资源的容器必须在容器规范的requestslimits部分中显式分配它们。

例如

containers:
- name: my-sriov-workload-container
  resources:
    limits:
      openshift.io/<resource_name>:  "1"
    requests:
      openshift.io/<resource_name>:  "1"

spec.enableOperatorWebhook

禁用OperatorWebhook Pod 以减少管理 Pod 的数量。先启用OperatorWebhook Pod,并在验证用户清单后才禁用它们。

推荐的SriovNetwork 配置 (SriovNetwork.yaml)
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: ""
  namespace: openshift-sriov-network-operator
  annotations: {}
spec:
  #  resourceName: ""
  networkNamespace: openshift-sriov-network-operator
#  vlan: ""
#  spoofChk: ""
#  ipam: ""
#  linkState: ""
#  maxTxRate: ""
#  minTxRate: ""
#  vlanQoS: ""
#  trust: ""
#  capabilities: ""
表 6. 单节点 OpenShift 集群的SriovNetwork CR 选项
SriovNetwork CR 字段 描述

spec.vlan

使用中传网络的 VLAN 配置vlan

推荐的SriovNetworkNodePolicy CR 配置 (SriovNetworkNodePolicy.yaml)
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: $name
  namespace: openshift-sriov-network-operator
  annotations: {}
spec:
  # The attributes for Mellanox/Intel based NICs as below.
  #     deviceType: netdevice/vfio-pci
  #     isRdma: true/false
  deviceType: $deviceType
  isRdma: $isRdma
  nicSelector:
    # The exact physical function name must match the hardware used    
    pfNames: [$pfNames]
  nodeSelector:
    node-role.kubernetes.io/$mcp: ""
  numVfs: $numVfs
  priority: $priority
  resourceName: $resourceName
表 7. 单节点 OpenShift 集群的SriovNetworkPolicy CR 选项
SriovNetworkNodePolicy CR 字段 描述

spec.deviceType

deviceType配置为vfio-pcinetdevice。对于Mellanox 网卡,设置deviceType: netdeviceisRdma: true。对于英特尔网卡,设置deviceType: vfio-pciisRdma: false

spec.nicSelector.pfNames

指定连接到前传网络的接口。

spec.numVfs

指定前传网络的 VF 数量。

spec.nicSelector.pfNames

物理功能的确切名称必须与硬件匹配。

推荐的 SR-IOV 内核配置 (07-sriov-related-kernel-args-master.yaml)
# Automatically generated by extra-manifests-builder
# Do not make changes directly.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 07-sriov-related-kernel-args-master
spec:
  config:
    ignition:
      version: 3.2.0
  kernelArguments:
    - intel_iommu=on
    - iommu=pt

控制台操作符

使用集群功能特性来阻止安装控制台操作符。当节点由中心管理时,不需要它。移除操作符可以为应用程序工作负载提供额外的空间和容量。

要在托管集群的安装过程中禁用控制台操作符,请在SiteConfig自定义资源 (CR) 的spec.clusters.0.installConfigOverrides字段中设置以下内容:

installConfigOverrides:  "{\"capabilities\":{\"baselineCapabilitySet\": \"None\" }}"

Alertmanager

运行 DU 工作负载的单节点 OpenShift 集群需要减少 OpenShift Container Platform 监控组件消耗的 CPU 资源。以下ConfigMap自定义资源 (CR) 禁用了 Alertmanager。

推荐的集群监控配置 (ReduceMonitoringFootprint.yaml)
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
  annotations: {}
data:
  config.yaml: |
    alertmanagerMain:
      enabled: false
    telemeterClient:
      enabled: false
    prometheusK8s:
       retention: 24h

操作符生命周期管理器

运行分布式单元工作负载的单节点 OpenShift 集群需要持续访问 CPU 资源。操作符生命周期管理器 (OLM) 定期从操作符收集性能数据,导致 CPU 利用率增加。以下ConfigMap自定义资源 (CR) 禁用了 OLM 收集操作符性能数据的功能。

推荐的集群 OLM 配置 (ReduceOLMFootprint.yaml)
apiVersion: v1
kind: ConfigMap
metadata:
  name: collect-profiles-config
  namespace: openshift-operator-lifecycle-manager
data:
  pprof-config.yaml: |
    disabled: True

LVM 存储

您可以使用逻辑卷管理器 (LVM) 存储在单节点 OpenShift 集群上动态配置本地存储。

单节点 OpenShift 的推荐存储解决方案是本地存储操作符。或者,您可以使用 LVM 存储,但这需要分配额外的 CPU 资源。

以下 YAML 示例配置了节点的存储,使其可用于 OpenShift Container Platform 应用程序。

推荐的LVMCluster 配置 (StorageLVMCluster.yaml)
apiVersion: lvm.topolvm.io/v1alpha1
kind: LVMCluster
metadata:
  name: lvmcluster
  namespace: openshift-storage
  annotations: {}
spec: {}
#example: creating a vg1 volume group leveraging all available disks on the node
#         except the installation disk.
#  storage:
#    deviceClasses:
#    - name: vg1
#      thinPoolConfig:
#        name: thin-pool-1
#        sizePercent: 90
#        overprovisionRatio: 10
表 8. 单节点 OpenShift 集群的LVMCluster CR 选项
LVMCluster CR 字段 描述

deviceSelector.paths

配置用于 LVM 存储的磁盘。如果未指定磁盘,则 LVM 存储将使用指定精简池中所有未使用的磁盘。

网络诊断

运行 DU 工作负载的单节点 OpenShift 集群需要减少 Pod 间网络连接检查,以减少这些 Pod 产生的额外负载。以下自定义资源 (CR) 禁用了这些检查。

推荐的网络诊断配置 (DisableSnoNetworkDiag.yaml)
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
  annotations: {}
spec:
  disableNetworkDiagnostics: true