使用 GitOps ZTP 手动安装单节点 OpenShift 集群 | 边缘计算 | OpenShift Container Platform 4.17

手动生成 GitOps ZTP 安装和配置 CR
创建托管裸机主机密钥
配置使用 GitOps ZTP 的手动安装的 Discovery ISO 内核参数
安装单个托管集群
监控托管集群安装状态
托管集群故障排除
RHACM 生成的集群安装 CR 参考

您可以使用 Red Hat Advanced Cluster Management (RHACM) 和 assisted 服务部署单个节点的托管 OpenShift 集群。

如果您正在创建多个托管集群，请使用使用 ZTP 部署远程边缘站点中描述的SiteConfig方法。

目标裸机主机必须满足推荐的 vDU 应用工作负载集群配置中列出的网络、固件和硬件要求。

手动生成 GitOps ZTP 安装和配置 CR

使用ztp-site-generate容器的generator入口点，根据SiteConfig和PolicyGenerator CR生成集群的站点安装和配置自定义资源(CR)。

先决条件

您已安装 OpenShift CLI (oc)。
您已以具有cluster-admin权限的用户身份登录到 hub 集群。

步骤

通过运行以下命令创建一个输出文件夹
```
$ mkdir -p ./out
```

从ztp-site-generate容器镜像导出argocd目录

$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.17 extract /home/ztp --tar | tar x -C ./out

./out目录在out/argocd/example/文件夹中包含参考PolicyGenerator和SiteConfig CR。

示例输出

out
 └── argocd
      └── example
           ├── acmpolicygenerator
           │     ├── {policy-prefix}common-ranGen.yaml
           │     ├── {policy-prefix}example-sno-site.yaml
           │     ├── {policy-prefix}group-du-sno-ranGen.yaml
           │     ├── {policy-prefix}group-du-sno-validator-ranGen.yaml
           │     ├── ...
           │     ├── kustomization.yaml
           │     └── ns.yaml
           └── siteconfig
                  ├── example-sno.yaml
                  ├── KlusterletAddonConfigOverride.yaml
                  └── kustomization.yaml

为站点安装 CR 创建输出文件夹
```
$ mkdir -p ./site-install
```

修改要安装的集群类型的示例SiteConfig CR。将example-sno.yaml复制到site-1-sno.yaml，并修改 CR 以匹配要安装的站点和裸机主机的详细信息，例如

# example-node1-bmh-secret & assisted-deployment-pull-secret need to be created under same namespace example-sno
---
apiVersion: ran.openshift.io/v1
kind: SiteConfig
metadata:
  name: "example-sno"
  namespace: "example-sno"
spec:
  baseDomain: "example.com"
  pullSecretRef:
    name: "assisted-deployment-pull-secret"
  clusterImageSetNameRef: "openshift-4.16"
  sshPublicKey: "ssh-rsa AAAA..."
  clusters:
    - clusterName: "example-sno"
      networkType: "OVNKubernetes"
      # installConfigOverrides is a generic way of passing install-config
      # parameters through the siteConfig.  The 'capabilities' field configures
      # the composable openshift feature.  In this 'capabilities' setting, we
      # remove all the optional set of components.
      # Notes:
      # - OperatorLifecycleManager is needed for 4.15 and later
      # - NodeTuning is needed for 4.13 and later, not for 4.12 and earlier
      # - Ingress is needed for 4.16 and later
      installConfigOverrides: |
        {
          "capabilities": {
            "baselineCapabilitySet": "None",
            "additionalEnabledCapabilities": [
              "NodeTuning",
              "OperatorLifecycleManager",
              "Ingress"
            ]
          }
        }
      # It is strongly recommended to include crun manifests as part of the additional install-time manifests for 4.13+.
      # The crun manifests can be obtained from source-crs/optional-extra-manifest/ and added to the git repo ie.sno-extra-manifest.
      # extraManifestPath: sno-extra-manifest
      clusterLabels:
        # These example cluster labels correspond to the bindingRules in the PolicyGenTemplate examples
        du-profile: "latest"
        # These example cluster labels correspond to the bindingRules in the PolicyGenTemplate examples in ../policygentemplates:
        # ../policygentemplates/common-ranGen.yaml will apply to all clusters with 'common: true'
        common: true
        # ../policygentemplates/group-du-sno-ranGen.yaml will apply to all clusters with 'group-du-sno: ""'
        group-du-sno: ""
        # ../policygentemplates/example-sno-site.yaml will apply to all clusters with 'sites: "example-sno"'
        # Normally this should match or contain the cluster name so it only applies to a single cluster
        sites: "example-sno"
      clusterNetwork:
        - cidr: 1001:1::/48
          hostPrefix: 64
      machineNetwork:
        - cidr: 1111:2222:3333:4444::/64
      serviceNetwork:
        - 1001:2::/112
      additionalNTPSources:
        - 1111:2222:3333:4444::2
      # Initiates the cluster for workload partitioning. Setting specific reserved/isolated CPUSets is done via PolicyTemplate
      # please see Workload Partitioning Feature for a complete guide.
      cpuPartitioningMode: AllNodes
      # Optionally; This can be used to override the KlusterletAddonConfig that is created for this cluster:
      #crTemplates:
      #  KlusterletAddonConfig: "KlusterletAddonConfigOverride.yaml"
      nodes:
        - hostName: "example-node1.example.com"
          role: "master"
          # Optionally; This can be used to configure desired BIOS setting on a host:
          #biosConfigRef:
          #  filePath: "example-hw.profile"
          bmcAddress: "idrac-virtualmedia+https://[1111:2222:3333:4444::bbbb:1]/redfish/v1/Systems/System.Embedded.1"
          bmcCredentialsName:
            name: "example-node1-bmh-secret"
          bootMACAddress: "AA:BB:CC:DD:EE:11"
          # Use UEFISecureBoot to enable secure boot.
          bootMode: "UEFISecureBoot"
          rootDeviceHints:
            deviceName: "/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0"
          # disk partition at `/var/lib/containers` with ignitionConfigOverride. Some values must be updated. See DiskPartitionContainer.md for more details
          ignitionConfigOverride: |
            {
              "ignition": {
                "version": "3.2.0"
              },
              "storage": {
                "disks": [
                  {
                    "device": "/dev/disk/by-id/wwn-0x6b07b250ebb9d0002a33509f24af1f62",
                    "partitions": [
                      {
                        "label": "var-lib-containers",
                        "sizeMiB": 0,
                        "startMiB": 250000
                      }
                    ],
                    "wipeTable": false
                  }
                ],
                "filesystems": [
                  {
                    "device": "/dev/disk/by-partlabel/var-lib-containers",
                    "format": "xfs",
                    "mountOptions": [
                      "defaults",
                      "prjquota"
                    ],
                    "path": "/var/lib/containers",
                    "wipeFilesystem": true
                  }
                ]
              },
              "systemd": {
                "units": [
                  {
                    "contents": "# Generated by Butane\n[Unit]\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\n\n[Mount]\nWhere=/var/lib/containers\nWhat=/dev/disk/by-partlabel/var-lib-containers\nType=xfs\nOptions=defaults,prjquota\n\n[Install]\nRequiredBy=local-fs.target",
                    "enabled": true,
                    "name": "var-lib-containers.mount"
                  }
                ]
              }
            }
          nodeNetwork:
            interfaces:
              - name: eno1
                macAddress: "AA:BB:CC:DD:EE:11"
            config:
              interfaces:
                - name: eno1
                  type: ethernet
                  state: up
                  ipv4:
                    enabled: false
                  ipv6:
                    enabled: true
                    address:
                      # For SNO sites with static IP addresses, the node-specific,
                      # API and Ingress IPs should all be the same and configured on
                      # the interface
                      - ip: 1111:2222:3333:4444::aaaa:1
                        prefix-length: 64
              dns-resolver:
                config:
                  search:
                    - example.com
                  server:
                    - 1111:2222:3333:4444::2
              routes:
                config:
                  - destination: ::/0
                    next-hop-interface: eno1
                    next-hop-address: 1111:2222:3333:4444::1
                    table-id: 254

从ztp-site-generate容器的out/extra-manifest目录提取参考 CR 配置文件后，您可以使用extraManifests.searchPaths包含指向包含这些文件的 git 目录的路径。这允许 GitOps ZTP 管道在集群安装期间应用这些 CR 文件。如果您配置了searchPaths目录，则 GitOps ZTP 管道在站点安装期间不会从ztp-site-generate容器获取清单。

通过处理修改后的SiteConfig CR site-1-sno.yaml并运行以下命令来生成第 0 天安装 CR

$ podman run -it --rm -v `pwd`/out/argocd/example/siteconfig:/resources:Z -v `pwd`/site-install:/output:Z,U registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.17 generator install site-1-sno.yaml /output

示例输出

site-install
└── site-1-sno
    ├── site-1_agentclusterinstall_example-sno.yaml
    ├── site-1-sno_baremetalhost_example-node1.example.com.yaml
    ├── site-1-sno_clusterdeployment_example-sno.yaml
    ├── site-1-sno_configmap_example-sno.yaml
    ├── site-1-sno_infraenv_example-sno.yaml
    ├── site-1-sno_klusterletaddonconfig_example-sno.yaml
    ├── site-1-sno_machineconfig_02-master-workload-partitioning.yaml
    ├── site-1-sno_machineconfig_predefined-extra-manifests-master.yaml
    ├── site-1-sno_machineconfig_predefined-extra-manifests-worker.yaml
    ├── site-1-sno_managedcluster_example-sno.yaml
    ├── site-1-sno_namespace_example-sno.yaml
    └── site-1-sno_nmstateconfig_example-node1.example.com.yaml

可选：使用-E选项处理参考SiteConfig CR，仅为特定集群类型生成第 0 天MachineConfig安装 CR。例如，运行以下命令

为MachineConfig CR 创建输出文件夹
```
$ mkdir -p ./site-machineconfig
```

生成MachineConfig安装 CR

$ podman run -it --rm -v `pwd`/out/argocd/example/siteconfig:/resources:Z -v `pwd`/site-machineconfig:/output:Z,U registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.17 generator install -E site-1-sno.yaml /output

示例输出

site-machineconfig
└── site-1-sno
    ├── site-1-sno_machineconfig_02-master-workload-partitioning.yaml
    ├── site-1-sno_machineconfig_predefined-extra-manifests-master.yaml
    └── site-1-sno_machineconfig_predefined-extra-manifests-worker.yaml

使用上一步中的参考PolicyGenerator CR 生成和导出第 2 天配置 CR。运行以下命令

为第 2 天 CR 创建输出文件夹
```
$ mkdir -p ./ref
```

生成并导出第 2 天配置 CR

$ podman run -it --rm -v `pwd`/out/argocd/example/acmpolicygenerator:/resources:Z -v `pwd`/ref:/output:Z,U registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.17 generator config -N . /output

该命令在./ref文件夹中为单节点 OpenShift、三节点集群和标准集群生成示例组和特定于站点的PolicyGenerator CR。

示例输出

ref
 └── customResource
      ├── common
      ├── example-multinode-site
      ├── example-sno
      ├── group-du-3node
      ├── group-du-3node-validator
      │    └── Multiple-validatorCRs
      ├── group-du-sno
      ├── group-du-sno-validator
      ├── group-du-standard
      └── group-du-standard-validator
           └── Multiple-validatorCRs

使用生成的 CR 作为安装集群时使用的 CR 的基础。您可以根据“安装单个托管集群”中的说明将安装 CR 应用于 hub 集群。集群安装完成后，可以将配置 CR 应用于集群。

验证

验证节点部署后是否应用了自定义角色和标签
```
$ oc describe node example-node.example.com
```

示例输出

Name:   example-node.example.com
Roles:  control-plane,example-label,master,worker
Labels: beta.kubernetes.io/arch=amd64
        beta.kubernetes.io/os=linux
        custom-label/parameter1=true
        kubernetes.io/arch=amd64
        kubernetes.io/hostname=cnfdf03.telco5gran.eng.rdu2.redhat.com
        kubernetes.io/os=linux
        node-role.kubernetes.io/control-plane=
        node-role.kubernetes.io/example-label= (1)
        node-role.kubernetes.io/master=
        node-role.kubernetes.io/worker=
        node.openshift.io/os_id=rhcos

1	自定义标签已应用于节点。

其他资源

创建托管裸机主机密钥

将托管裸机主机所需的Secret自定义资源 (CR) 添加到 hub 集群。您需要一个密钥，以便 GitOps 零接触配置 (ZTP) 管道访问基板管理控制器 (BMC)，以及一个密钥，以便 assisted installer 服务从注册表提取集群安装镜像。

密钥通过名称从SiteConfig CR 中引用。命名空间必须与SiteConfig命名空间匹配。

步骤

创建一个 YAML 密钥文件，其中包含主机基板管理控制器 (BMC) 的凭据以及安装 OpenShift 和所有附加组件集群运算符所需的拉取密钥

将以下 YAML 保存为文件example-sno-secret.yaml

apiVersion: v1
kind: Secret
metadata:
  name: example-sno-bmc-secret
  namespace: example-sno (1)
data: (2)
  password: <base64_password>
  username: <base64_username>
type: Opaque
---
apiVersion: v1
kind: Secret
metadata:
  name: pull-secret
  namespace: example-sno  (3)
data:
  .dockerconfigjson: <pull_secret> (4)
type: kubernetes.io/dockerconfigjson

1	必须与相关`SiteConfig` CR 中配置的命名空间匹配
2	`password`和`username`的 Base64 编码值
3	必须与相关`SiteConfig` CR 中配置的命名空间匹配
4	Base64 编码的拉取密钥

将example-sno-secret.yaml的相对路径添加到用于安装集群的kustomization.yaml文件。

配置使用 GitOps ZTP 的手动安装的 Discovery ISO 内核参数

GitOps 零接触配置 (ZTP) 工作流将 Discovery ISO 用作托管裸机主机上 OpenShift Container Platform 安装过程的一部分。您可以编辑InfraEnv资源以指定 Discovery ISO 的内核参数。这对于具有特定环境要求的集群安装非常有用。例如，为 Discovery ISO 配置rd.net.timeout.carrier内核参数，以便为集群提供静态网络或在安装期间下载根文件系统之前接收 DHCP 地址。

在 OpenShift Container Platform 4.17 中，您只能添加内核参数。您不能替换或删除内核参数。

先决条件

您已安装 OpenShift CLI (oc)。
您已以具有集群管理员权限的用户身份登录到 hub 集群。
您已手动生成安装和配置自定义资源 (CR)。

步骤

编辑InfraEnv CR 中的spec.kernelArguments规范以配置内核参数。

apiVersion: agent-install.openshift.io/v1beta1
kind: InfraEnv
metadata:
  name: <cluster_name>
  namespace: <cluster_name>
spec:
  kernelArguments:
    - operation: append (1)
      value: audit=0 (2)
    - operation: append
      value: trace=1
  clusterRef:
    name: <cluster_name>
    namespace: <cluster_name>
  pullSecretRef:
    name: pull-secret

1	指定追加操作以添加内核参数。
2	指定您要配置的内核参数。此示例配置审计内核参数和跟踪内核参数。

SiteConfig CR 在第 0 天安装 CR 的一部分中生成InfraEnv资源。

验证

要验证内核参数是否已应用，在 Discovery 镜像验证 OpenShift Container Platform 已准备好安装后，您可以在安装过程开始之前通过 SSH 连接到目标主机。此时，您可以在/proc/cmdline文件中查看 Discovery ISO 的内核参数。

开始与目标主机的 SSH 会话。

$ ssh -i /path/to/privatekey core@<host_name>

使用以下命令查看系统的内核参数：
```
$ cat /proc/cmdline
```

安装单个托管集群

您可以使用 assisted 服务和 Red Hat Advanced Cluster Management (RHACM) 手动部署单个托管集群。

先决条件

您已安装 OpenShift CLI (oc)。
您已以具有cluster-admin权限的用户身份登录到 hub 集群。
您已创建基板管理控制器 (BMC) Secret 和镜像拉取密钥 Secret 自定义资源 (CR)。有关详细信息，请参见“创建托管裸机主机密钥”。
您的目标裸机主机满足托管集群的网络和硬件要求。

步骤

为要部署的每个特定集群版本创建一个ClusterImageSet，例如clusterImageSet-4.17.yaml。ClusterImageSet具有以下格式：

apiVersion: hive.openshift.io/v1
kind: ClusterImageSet
metadata:
  name: openshift-4.17.0 (1)
spec:
   releaseImage: quay.io/openshift-release-dev/ocp-release:4.17.0-x86_64 (2)

1	您要部署的描述性版本。
2	指定要部署的`releaseImage`并确定操作系统镜像版本。Discovery ISO 基于由`releaseImage`设置的镜像版本，如果找不到确切的版本，则使用最新版本。

应用clusterImageSet CR。
```
$ oc apply -f clusterImageSet-4.17.yaml
```

在cluster-namespace.yaml文件中创建Namespace CR。

apiVersion: v1
kind: Namespace
metadata:
     name: <cluster_name> (1)
     labels:
        name: <cluster_name> (1)

1	要配置的托管集群的名称。

运行以下命令应用Namespace CR：
```
$ oc apply -f cluster-namespace.yaml
```
应用您从ztp-site-generate容器中提取并根据您的需求自定义的生成的第 0 天 CR。
```
$ oc apply -R ./site-install/site-sno-1
```

其他资源

监控托管集群安装状态

通过检查集群状态来确保集群配置成功。

先决条件

所有自定义资源都已配置和预配，并且已在 hub 上为托管集群创建了Agent自定义资源。

步骤

检查托管集群的状态。
```
$ oc get managedcluster
```
True表示托管集群已准备好。
检查代理状态。
```
$ oc get agent -n <cluster_name>
```
使用describe命令提供代理状态的详细描述。需要注意的状态包括BackendError、InputError、ValidationsFailing、InstallationFailed和AgentIsConnected。这些状态与Agent和AgentClusterInstall自定义资源相关。
```
$ oc describe agent -n <cluster_name>
```

检查集群配置状态。

$ oc get agentclusterinstall -n <cluster_name>

使用describe命令提供集群配置状态的详细描述。
```
$ oc describe agentclusterinstall -n <cluster_name>
```

检查托管集群附加服务的狀態。

$ oc get managedclusteraddon -n <cluster_name>

检索托管集群的kubeconfig文件的身份验证信息。

$ oc get secret -n <cluster_name> <cluster_name>-admin-kubeconfig -o jsonpath={.data.kubeconfig} | base64 -d > <directory>/<cluster_name>-kubeconfig

托管集群故障排除

使用此过程诊断托管集群可能出现的任何安装问题。

步骤

检查托管集群的状态。
```
$ oc get managedcluster
```
示例输出
```
NAME            HUB ACCEPTED   MANAGED CLUSTER URLS   JOINED   AVAILABLE   AGE
SNO-cluster     true                                   True     True      2d19h
```
如果AVAILABLE列中的状态为True，则托管集群正在由 hub 管理。

如果AVAILABLE列中的状态为Unknown，则托管集群未由 hub 管理。请使用以下步骤继续检查以获取更多信息。

检查AgentClusterInstall安装状态。

$ oc get clusterdeployment -n <cluster_name>

示例输出

NAME        PLATFORM            REGION   CLUSTERTYPE   INSTALLED    INFRAID    VERSION  POWERSTATE AGE
Sno0026    agent-baremetal                               false                          Initialized
2d14h

如果INSTALLED列中的状态为false，则安装失败。

如果安装失败，请输入以下命令来查看AgentClusterInstall资源的状态：
```
$ oc describe agentclusterinstall -n <cluster_name> <cluster_name>
```
解决错误并重置集群。
1. 删除集群的托管集群资源。
  $ oc delete managedcluster <cluster_name>
2. 删除集群的命名空间。
  $ oc delete namespace <cluster_name>
  这将删除为该集群创建的所有命名空间范围的自定义资源。您必须等待ManagedCluster CR 删除完成才能继续。
3. 重新创建托管集群的自定义资源。

RHACM 生成的集群安装 CR 参考

Red Hat Advanced Cluster Management (RHACM) 支持在单节点集群、三节点集群和使用特定一组安装自定义资源 (CR) 的标准集群上部署 OpenShift Container Platform，您可以使用每个站点的SiteConfig CR 生成这些资源。

每个托管集群都有其自己的命名空间，除ManagedCluster和ClusterImageSet之外的所有安装 CR 都位于该命名空间下。ManagedCluster和ClusterImageSet是集群范围的，而不是命名空间范围的。命名空间和 CR 名称与集群名称匹配。

下表列出了当 RHACM assisted 服务使用您配置的SiteConfig CR 安装集群时，自动应用的安装 CR。

表 1. RHACM 生成的集群安装 CR
CR	描述	用法
`BareMetalHost`	包含目标裸机主机的基板管理控制器 (BMC) 的连接信息。	通过使用 Redfish 协议，提供对 BMC 的访问权限，以便在目标服务器上加载和启动发现镜像。
`InfraEnv`	包含在目标裸机主机上安装 OpenShift Container Platform 的信息。	与`ClusterDeployment`一起使用，为托管集群生成发现 ISO。
`AgentClusterInstall`	指定托管集群配置的详细信息，例如网络和控制平面节点的数量。安装完成后，显示集群`kubeconfig`和凭据。	指定托管集群配置信息，并在集群安装过程中提供状态。
`ClusterDeployment`	引用要使用的`AgentClusterInstall` CR。	与`InfraEnv`一起使用，为托管集群生成发现 ISO。
`NMStateConfig`	提供网络配置信息，例如`MAC`地址到`IP`映射、DNS 服务器、默认路由和其他网络设置。	为托管集群的 Kube API 服务器设置静态 IP 地址。
`代理`	包含目标裸机主机的硬件信息。	目标机器的发现镜像启动时，会在中心自动创建。
`托管集群 (ManagedCluster)`	当集群由中心托管时，必须导入并已知。此 Kubernetes 对象提供了该接口。	中心使用此资源来管理和显示托管集群的状态。
`KlusterletAddonConfig`	包含中心提供的要部署到`ManagedCluster`资源的服务列表。	告诉中心将哪些附加组件服务部署到`ManagedCluster`资源。
`命名空间 (Namespace)`	中心上存在的`ManagedCluster`资源的逻辑空间。每个站点唯一。	将资源传播到`ManagedCluster`。
`密钥 (Secret)`	创建两个 CR：`BMC 密钥`和`镜像拉取密钥`。	`BMC 密钥`使用用户名和密码对目标裸机主机进行身份验证。 `镜像拉取密钥`包含安装在目标裸机主机上的 OpenShift Container Platform 镜像的身份验证信息。
`集群镜像集 (ClusterImageSet)`	包含 OpenShift Container Platform 镜像信息，例如存储库和镜像名称。	传递到资源中以提供 OpenShift Container Platform 镜像。