使用 GitOps ZTP 扩展单节点 OpenShift 集群 | 边缘计算 | OpenShift Container Platform 4.17

使用 PolicyGenerator 或 PolicyGenTemplate 资源将配置文件应用于工作节点
确保 PTP 和 SR-IOV 守护进程选择器兼容性
PTP 和 SR-IOV 节点选择器兼容性
使用 PolicyGenerator CR 将工作节点策略应用于工作节点
使用 PolicyGenTemplate CR 将工作节点策略应用于工作节点
使用 GitOps ZTP 将工作节点添加到单节点 OpenShift 集群

您可以使用 GitOps 零接触配置 (ZTP) 扩展单节点 OpenShift 集群。将工作节点添加到单节点 OpenShift 集群时，原始单节点 OpenShift 集群将保留控制平面节点角色。添加工作节点不需要现有单节点 OpenShift 集群的任何停机时间。

虽然没有规定可以添加到单节点 OpenShift 集群中的工作节点数量限制，但是您必须重新评估控制平面节点上为附加工作节点保留的 CPU 分配。

如果您需要在工作节点上进行工作负载分区，则必须在安装节点之前在中心集群上部署和修复托管集群策略。这样，工作负载分区MachineConfig对象会在 GitOps ZTP 工作流将MachineConfig启动文件应用于工作节点之前呈现并与worker机器配置池关联。

建议您先修复策略，然后再安装工作节点。如果您在安装工作节点后创建工作负载分区清单，则必须手动释放节点并删除由守护程序集管理的所有 Pod。当管理守护程序集创建新的 Pod 时，新的 Pod 会经历工作负载分区过程。

使用 GitOps ZTP 向单节点 OpenShift 集群添加工作节点只是一个技术预览功能。技术预览功能不受 Red Hat 生产服务级别协议 (SLA) 的支持，并且可能功能不完整。Red Hat 不建议在生产环境中使用它们。这些功能可让您抢先体验即将推出的产品功能，从而使客户能够在开发过程中测试功能并提供反馈。

有关 Red Hat 技术预览功能的支持范围的更多信息，请参见技术预览功能支持范围。

其他资源

有关针对 vDU 应用程序部署优化的单节点 OpenShift 集群的更多信息，请参见在单节点 OpenShift 上部署 vDU 的参考配置。
有关工作节点的更多信息，请参见向单节点 OpenShift 集群添加工作节点。
有关从扩展的单节点 OpenShift 集群中删除工作节点的信息，请参见使用命令行界面删除托管集群节点。

使用 PolicyGenerator 或 PolicyGenTemplate 资源将配置文件应用于工作节点

您可以使用 DU 配置文件配置附加工作节点。

您可以使用 GitOps 零接触配置 (ZTP) 通用、组和特定于站点的PolicyGenerator或PolicyGenTemplate资源将 RAN 分布式单元 (DU) 配置文件应用于工作节点集群。与 ArgoCD policies应用程序链接的 GitOps ZTP 管道包括以下 CR，您可以在解压缩ztp-site-generate容器时在相关的out/argocd/example文件夹中找到这些 CR。

/acmpolicygenerator 资源

acm-common-ranGen.yaml
acm-group-du-sno-ranGen.yaml
acm-example-sno-site.yaml
ns.yaml
kustomization.yaml

/policygentemplates 资源

common-ranGen.yaml
group-du-sno-ranGen.yaml
example-sno-site.yaml
ns.yaml
kustomization.yaml

在工作节点上配置 DU 配置文件被认为是升级。要启动升级流程，您必须更新现有策略或创建其他策略。然后，您必须创建一个ClusterGroupUpgrade CR 来协调集群组中的策略。

确保 PTP 和 SR-IOV 守护程序选择器兼容性

如果使用 GitOps 零接触配置 (ZTP) 插件 4.11 或更早版本部署了 DU 配置文件，则 PTP 和 SR-IOV 运算符可能配置为仅将守护程序放置在标记为master的节点上。此配置会阻止 PTP 和 SR-IOV 守护程序在工作节点上运行。如果系统上 PTP 和 SR-IOV 守护程序节点选择器配置不正确，则必须在继续进行工作节点 DU 配置文件配置之前更改守护程序。

步骤

检查其中一个 spoke 集群上 PTP 运算符的守护程序节点选择器设置
```
$ oc get ptpoperatorconfig/default -n openshift-ptp -ojsonpath='{.spec}' | jq
```
PTP 运算符的示例输出
```
{"daemonNodeSelector":{"node-role.kubernetes.io/master":""}} (1)
```
1 如果节点选择器设置为master，则 spoke 是使用需要更改的 GitOps ZTP 插件版本部署的。

检查其中一个 spoke 集群上 SR-IOV 运算符的守护程序节点选择器设置

$  oc get sriovoperatorconfig/default -n \
openshift-sriov-network-operator -ojsonpath='{.spec}' | jq

SR-IOV 运算符的示例输出

{"configDaemonNodeSelector":{"node-role.kubernetes.io/worker":""},"disableDrain":false,"enableInjector":true,"enableOperatorWebhook":true} (1)

1	如果节点选择器设置为`master`，则 spoke 是使用需要更改的 GitOps ZTP 插件版本部署的。

在组策略中，添加以下complianceType和spec条目

spec:
    - fileName: PtpOperatorConfig.yaml
      policyName: "config-policy"
      complianceType: mustonlyhave
      spec:
        daemonNodeSelector:
          node-role.kubernetes.io/worker: ""
    - fileName: SriovOperatorConfig.yaml
      policyName: "config-policy"
      complianceType: mustonlyhave
      spec:
        configDaemonNodeSelector:
          node-role.kubernetes.io/worker: ""

更改daemonNodeSelector字段会导致 PTP 同步暂时丢失和 SR-IOV 连接暂时丢失。

提交 Git 中的更改，然后推送到 GitOps ZTP ArgoCD 应用程序正在监视的 Git 存储库。

PTP 和 SR-IOV 节点选择器兼容性

PTP 配置资源和 SR-IOV 网络节点策略使用node-role.kubernetes.io/master: ""作为节点选择器。如果附加工作节点与控制平面节点具有相同的 NIC 配置，则用于配置控制平面节点的策略可以重复用于工作节点。但是，必须更改节点选择器以选择两种节点类型，例如使用"node-role.kubernetes.io/worker"标签。

使用 PolicyGenerator CR 将工作节点策略应用于工作节点

您可以使用PolicyGenerator CR 创建工作节点策略。

步骤

创建以下PolicyGenerator CR

apiVersion: policy.open-cluster-management.io/v1
kind: PolicyGenerator
metadata:
    name: example-sno-workers
placementBindingDefaults:
    name: example-sno-workers-placement-binding
policyDefaults:
    namespace: example-sno
    placement:
        labelSelector:
            matchExpressions:
                - key: sites
                  operator: In
                  values:
                    - example-sno (1)
    remediationAction: inform
    severity: low
    namespaceSelector:
        exclude:
            - kube-*
        include:
            - '*'
    evaluationInterval:
        compliant: 10m
        noncompliant: 10s
policies:
    - name: example-sno-workers-config-policy
      policyAnnotations:
        ran.openshift.io/ztp-deploy-wave: "10"
      manifests:
        - path: source-crs/MachineConfigGeneric.yaml (2)
          patches:
            - metadata:
                labels:
                    machineconfiguration.openshift.io/role: worker (3)
                name: enable-workload-partitioning
              spec:
                config:
                    storage:
                        files:
                            - contents:
                                source: data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS53b3JrbG9hZHMubWFuYWdlbWVudF0KYWN0aXZhdGlvbl9hbm5vdGF0aW9uID0gInRhcmdldC53b3JrbG9hZC5vcGVuc2hpZnQuaW8vbWFuYWdlbWVudCIKYW5ub3RhdGlvbl9wcmVmaXggPSAicmVzb3VyY2VzLndvcmtsb2FkLm9wZW5zaGlmdC5pbyIKcmVzb3VyY2VzID0geyAiY3B1c2hhcmVzIiA9IDAsICJjcHVzZXQiID0gIjAtMyIgfQo=
                              mode: 420
                              overwrite: true
                              path: /etc/crio/crio.conf.d/01-workload-partitioning
                              user:
                                name: root
                            - contents:
                                source: data:text/plain;charset=utf-8;base64,ewogICJtYW5hZ2VtZW50IjogewogICAgImNwdXNldCI6ICIwLTMiCiAgfQp9Cg==
                              mode: 420
                              overwrite: true
                              path: /etc/kubernetes/openshift-workload-pinning
                              user:
                                name: root
        - path: source-crs/PerformanceProfile-MCP-worker.yaml
          patches:
            - metadata:
                name: openshift-worker-node-performance-profile
              spec:
                cpu: (4)
                    isolated: 4-47
                    reserved: 0-3
                hugepages:
                    defaultHugepagesSize: 1G
                    pages:
                        - count: 32
                          size: 1G
                realTimeKernel:
                    enabled: true
        - path: source-crs/TunedPerformancePatch-MCP-worker.yaml
          patches:
            - metadata:
                name: performance-patch-worker
              spec:
                profile:
                    - data: |
                      [main]
                      summary=Configuration changes profile inherited from performance created tuned
                      include=openshift-node-performance-openshift-worker-node-performance-profile
                      [bootloader]
                      cmdline_crash=nohz_full=4-47 (5)
                      [sysctl]
                      kernel.timer_migration=1
                      [scheduler]
                      group.ice-ptp=0:f:10:*:ice-ptp.*
                      [service]
                      service.stalld=start,enable
                      service.chronyd=stop,disable
                      name: performance-patch-worker
                recommend:
                    - profile: performance-patch-worker

1	策略将应用于具有此标签的所有集群。
2	此通用`MachineConfig` CR 用于在工作节点上配置工作负载分区。
3	`MCP`字段必须设置为`worker`。
4	必须为每个特定的硬件平台配置`cpu.isolated`和`cpu.reserved`字段。
5	`cmdline_crash` CPU 集必须与`PerformanceProfile`部分中的`cpu.isolated`集匹配。

通用MachineConfig CR 用于在工作节点上配置工作负载分区。您可以生成crio和kubelet配置文件的内容。

将创建的策略模板添加到 ArgoCD policies应用程序监视的 Git 存储库。
在kustomization.yaml文件中添加策略。
提交 Git 中的更改，然后推送到 GitOps ZTP ArgoCD 应用程序正在监视的 Git 存储库。

要将新的策略修复到您的 spoke 集群，请创建一个 TALM 自定义资源

$ cat <<EOF | oc apply -f -
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: example-sno-worker-policies
  namespace: default
spec:
  backup: false
  clusters:
  - example-sno
  enable: true
  managedPolicies:
  - group-du-sno-config-policy
  - example-sno-workers-config-policy
  - example-sno-config-policy
  preCaching: false
  remediationStrategy:
    maxConcurrency: 1
EOF

使用 PolicyGenTemplate CR 将工作节点策略应用于工作节点

您可以使用PolicyGenTemplate CR 创建工作节点策略。

步骤

创建以下PolicyGenTemplate CR

apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
  name: "example-sno-workers"
  namespace: "example-sno"
spec:
  bindingRules:
    sites: "example-sno" (1)
  mcp: "worker" (2)
  sourceFiles:
    - fileName: MachineConfigGeneric.yaml (3)
      policyName: "config-policy"
      metadata:
        labels:
          machineconfiguration.openshift.io/role: worker
        name: enable-workload-partitioning
      spec:
        config:
          storage:
            files:
            - contents:
                source: data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS53b3JrbG9hZHMubWFuYWdlbWVudF0KYWN0aXZhdGlvbl9hbm5vdGF0aW9uID0gInRhcmdldC53b3JrbG9hZC5vcGVuc2hpZnQuaW8vbWFuYWdlbWVudCIKYW5ub3RhdGlvbl9wcmVmaXggPSAicmVzb3VyY2VzLndvcmtsb2FkLm9wZW5zaGlmdC5pbyIKcmVzb3VyY2VzID0geyAiY3B1c2hhcmVzIiA9IDAsICJjcHVzZXQiID0gIjAtMyIgfQo=
              mode: 420
              overwrite: true
              path: /etc/crio/crio.conf.d/01-workload-partitioning
              user:
                name: root
            - contents:
                source: data:text/plain;charset=utf-8;base64,ewogICJtYW5hZ2VtZW50IjogewogICAgImNwdXNldCI6ICIwLTMiCiAgfQp9Cg==
              mode: 420
              overwrite: true
              path: /etc/kubernetes/openshift-workload-pinning
              user:
                name: root
    - fileName: PerformanceProfile.yaml
      policyName: "config-policy"
      metadata:
        name: openshift-worker-node-performance-profile
      spec:
        cpu: (4)
          isolated: "4-47"
          reserved: "0-3"
        hugepages:
          defaultHugepagesSize: 1G
          pages:
            - size: 1G
              count: 32
        realTimeKernel:
          enabled: true
    - fileName: TunedPerformancePatch.yaml
      policyName: "config-policy"
      metadata:
        name: performance-patch-worker
      spec:
        profile:
          - name: performance-patch-worker
            data: |
              [main]
              summary=Configuration changes profile inherited from performance created tuned
              include=openshift-node-performance-openshift-worker-node-performance-profile
              [bootloader]
              cmdline_crash=nohz_full=4-47 (5)
              [sysctl]
              kernel.timer_migration=1
              [scheduler]
              group.ice-ptp=0:f:10:*:ice-ptp.*
              [service]
              service.stalld=start,enable
              service.chronyd=stop,disable
        recommend:
        - profile: performance-patch-worker

1	策略将应用于具有此标签的所有集群。
2	`MCP`字段必须设置为`worker`。
3	此通用`MachineConfig` CR 用于在工作节点上配置工作负载分区。
4	必须为每个特定的硬件平台配置`cpu.isolated`和`cpu.reserved`字段。
5	`cmdline_crash` CPU 集必须与`PerformanceProfile`部分中的`cpu.isolated`集匹配。

通用MachineConfig CR 用于在工作节点上配置工作负载分区。您可以生成crio和kubelet配置文件的内容。

将创建的策略模板添加到 ArgoCD policies应用程序监视的 Git 存储库。
在kustomization.yaml文件中添加策略。
提交 Git 中的更改，然后推送到 GitOps ZTP ArgoCD 应用程序正在监视的 Git 存储库。

要将新的策略修复到您的 spoke 集群，请创建一个 TALM 自定义资源

$ cat <<EOF | oc apply -f -
apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: example-sno-worker-policies
  namespace: default
spec:
  backup: false
  clusters:
  - example-sno
  enable: true
  managedPolicies:
  - group-du-sno-config-policy
  - example-sno-workers-config-policy
  - example-sno-config-policy
  preCaching: false
  remediationStrategy:
    maxConcurrency: 1
EOF

使用 GitOps ZTP 向单节点 OpenShift 集群添加工作节点

您可以向现有的单节点 OpenShift 集群添加一个或多个工作节点，以增加集群中可用的 CPU 资源。

先决条件

在 OpenShift Container Platform 4.11 或更高版本的裸机中心集群中安装和配置 RHACM 2.6 或更高版本
在中心集群中安装拓扑感知生命周期管理器
在中心集群中安装 Red Hat OpenShift GitOps
使用 GitOps ZTP ztp-site-generate容器镜像版本 4.12 或更高版本
使用 GitOps ZTP 部署托管单节点 OpenShift 集群
按照 RHACM 文档中的说明配置中央基础设施管理
配置为集群服务的 DNS 以解析内部 API 端点api-int.<cluster_name>.<base_domain>

步骤

如果您使用example-sno.yaml SiteConfig清单部署了集群，请将新的工作节点添加到spec.clusters['example-sno'].nodes列表中

nodes:
- hostName: "example-node2.example.com"
  role: "worker"
  bmcAddress: "idrac-virtualmedia+https://[1111:2222:3333:4444::bbbb:1]/redfish/v1/Systems/System.Embedded.1"
  bmcCredentialsName:
    name: "example-node2-bmh-secret"
  bootMACAddress: "AA:BB:CC:DD:EE:11"
  bootMode: "UEFI"
  nodeNetwork:
    interfaces:
      - name: eno1
        macAddress: "AA:BB:CC:DD:EE:11"
    config:
      interfaces:
        - name: eno1
          type: ethernet
          state: up
          macAddress: "AA:BB:CC:DD:EE:11"
          ipv4:
            enabled: false
          ipv6:
            enabled: true
            address:
            - ip: 1111:2222:3333:4444::1
              prefix-length: 64
      dns-resolver:
        config:
          search:
          - example.com
          server:
          - 1111:2222:3333:4444::2
      routes:
        config:
        - destination: ::/0
          next-hop-interface: eno1
          next-hop-address: 1111:2222:3333:4444::1
          table-id: 254

为新主机创建一个 BMC 身份验证密钥，如SiteConfig文件spec.nodes部分中的bmcCredentialsName字段所引用

apiVersion: v1
data:
  password: "password"
  username: "username"
kind: Secret
metadata:
  name: "example-node2-bmh-secret"
  namespace: example-sno
type: Opaque

提交 Git 中的更改，然后推送到 GitOps ZTP ArgoCD 应用程序正在监视的 Git 存储库。

当 ArgoCD 的 `cluster` 应用同步时，GitOps ZTP 插件会在中心集群上生成两个新的清单文件。
- BareMetalHost
- NMStateConfig
  
  工作节点不应配置 `cpuset` 字段。工作节点的工作负载分区将在节点安装完成后通过管理策略添加。

验证

您可以通过多种方式监控安装过程。

运行以下命令检查预配置镜像是否已创建

$ oc get ppimg -n example-sno

示例输出

NAMESPACE       NAME            READY   REASON
example-sno     example-sno     True    ImageCreated
example-sno     example-node2   True    ImageCreated

检查裸机主机的状态

$ oc get bmh -n example-sno

示例输出

NAME            STATE          CONSUMER   ONLINE   ERROR   AGE
example-sno     provisioned               true             69m
example-node2   provisioning              true             4m50s (1)

1	`provisioning` 状态表示节点正在从安装介质启动。

持续监控安装过程

运行以下命令监控代理安装过程

$ oc get agent -n example-sno --watch

示例输出

NAME                                   CLUSTER   APPROVED   ROLE     STAGE
671bc05d-5358-8940-ec12-d9ad22804faa   example-sno   true       master   Done
[...]
14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Starting installation
14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Installing
14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Writing image to disk
[...]
14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Waiting for control plane
[...]
14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Rebooting
14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Done

工作节点安装完成后，工作节点证书将自动批准。此时，工作节点将显示在 `ManagedClusterInfo` 状态中。运行以下命令查看状态

$ oc get managedclusterinfo/example-sno -n example-sno -o \
jsonpath='{range .status.nodeList[*]}{.name}{"\t"}{.conditions}{"\t"}{.labels}{"\n"}{end}'

示例输出

example-sno	[{"status":"True","type":"Ready"}]	{"node-role.kubernetes.io/master":"","node-role.kubernetes.io/worker":""}
example-node2	[{"status":"True","type":"Ready"}]	{"node-role.kubernetes.io/worker":""}