×

此内容由 Red Hat 专家撰写,但尚未在所有受支持的配置上进行测试。

先决条件
环境
  • 准备环境变量

    将集群名称更改为与您的 ROSA 集群匹配,并确保您已以管理员身份登录到集群。在继续之前,请确保所有字段都正确输出。

    $ export CLUSTER_NAME=$(oc get infrastructure cluster -o=jsonpath="{.status.infrastructureName}"  | sed 's/-[a-z0-9]\{5\}$//')
    $ export ROSA_CLUSTER_ID=$(rosa describe cluster -c ${CLUSTER_NAME} --output json | jq -r .id)
    $ export REGION=$(rosa describe cluster -c ${CLUSTER_NAME} --output json | jq -r .region.id)
    $ export OIDC_ENDPOINT=$(oc get authentication.config.openshift.io cluster -o jsonpath='{.spec.serviceAccountIssuer}' | sed  's|^https://||')
    $ export AWS_ACCOUNT_ID=`aws sts get-caller-identity --query Account --output text`
    $ export CLUSTER_VERSION=`rosa describe cluster -c ${CLUSTER_NAME} -o json | jq -r .version.raw_id | cut -f -2 -d '.'`
    $ export ROLE_NAME="${CLUSTER_NAME}-openshift-oadp-aws-cloud-credentials"
    $ export AWS_PAGER=""
    $ export SCRATCH="/tmp/${CLUSTER_NAME}/oadp"
    $ mkdir -p ${SCRATCH}
    $ echo "Cluster ID: ${ROSA_CLUSTER_ID}, Region: ${REGION}, OIDC Endpoint: ${OIDC_ENDPOINT}, AWS Account ID: ${AWS_ACCOUNT_ID}"

准备 AWS 账户

  1. 创建允许 S3 访问的 IAM 策略

    $ POLICY_ARN=$(aws iam list-policies --query "Policies[?PolicyName=='RosaOadpVer1'].{ARN:Arn}" --output text)
    if [[ -z "${POLICY_ARN}" ]]; then
    $ cat << EOF > ${SCRATCH}/policy.json
    {
    "Version": "2012-10-17",
    "Statement": [
     {
       "Effect": "Allow",
       "Action": [
         "s3:CreateBucket",
         "s3:DeleteBucket",
         "s3:PutBucketTagging",
         "s3:GetBucketTagging",
         "s3:PutEncryptionConfiguration",
         "s3:GetEncryptionConfiguration",
         "s3:PutLifecycleConfiguration",
         "s3:GetLifecycleConfiguration",
         "s3:GetBucketLocation",
         "s3:ListBucket",
         "s3:GetObject",
         "s3:PutObject",
         "s3:DeleteObject",
         "s3:ListBucketMultipartUploads",
         "s3:AbortMultipartUpload",
         "s3:ListMultipartUploadParts",
         "ec2:DescribeSnapshots",
         "ec2:DescribeVolumes",
         "ec2:DescribeVolumeAttribute",
         "ec2:DescribeVolumesModifications",
         "ec2:DescribeVolumeStatus",
         "ec2:CreateTags",
         "ec2:CreateVolume",
         "ec2:CreateSnapshot",
         "ec2:DeleteSnapshot"
       ],
       "Resource": "*"
     }
    ]}
    EOF
    $ POLICY_ARN=$(aws iam create-policy --policy-name "RosaOadpVer1" \
    --policy-document file:///${SCRATCH}/policy.json --query Policy.Arn \
    --tags Key=rosa_openshift_version,Value=${CLUSTER_VERSION} Key=rosa_role_prefix,Value=ManagedOpenShift Key=operator_namespace,Value=openshift-oadp Key=operator_name,Value=openshift-oadp \
    --output text)
    fi
    $ echo ${POLICY_ARN}
  2. 为集群创建 IAM 角色信任策略

    $ cat <<EOF > ${SCRATCH}/trust-policy.json
    {
      "Version": "2012-10-17",
      "Statement": [{
        "Effect": "Allow",
        "Principal": {
          "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_ENDPOINT}"
        },
        "Action": "sts:AssumeRoleWithWebIdentity",
        "Condition": {
          "StringEquals": {
             "${OIDC_ENDPOINT}:sub": [
               "system:serviceaccount:openshift-adp:openshift-adp-controller-manager",
               "system:serviceaccount:openshift-adp:velero"]
          }
        }
      }]
    }
    EOF
    $ ROLE_ARN=$(aws iam create-role --role-name \
     "${ROLE_NAME}" \
      --assume-role-policy-document file://${SCRATCH}/trust-policy.json \
      --tags Key=rosa_cluster_id,Value=${ROSA_CLUSTER_ID} Key=rosa_openshift_version,Value=${CLUSTER_VERSION} Key=rosa_role_prefix,Value=ManagedOpenShift Key=operator_namespace,Value=openshift-adp Key=operator_name,Value=openshift-oadp \
      --query Role.Arn --output text)
    
    $ echo ${ROLE_ARN}
  3. 将 IAM 策略附加到 IAM 角色

    $ aws iam attach-role-policy --role-name "${ROLE_NAME}" \
     --policy-arn ${POLICY_ARN}

在集群上部署 OADP

  1. 为 OADP 创建命名空间

    $ oc create namespace openshift-adp
  2. 创建凭据密钥

    $ cat <<EOF > ${SCRATCH}/credentials
    [default]
    role_arn = ${ROLE_ARN}
    web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token
    EOF
    $ oc -n openshift-adp create secret generic cloud-credentials \
     --from-file=${SCRATCH}/credentials
  3. 部署 OADP 运算符

    运算符 1.1 版目前存在一个问题,该问题与具有 `PartiallyFailed` 状态的备份有关。这似乎不会影响备份和恢复过程,但应注意,因为它存在问题。

    $ cat << EOF | oc create -f -
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
     generateName: openshift-adp-
     namespace: openshift-adp
     name: oadp
    spec:
     targetNamespaces:
     - openshift-adp
    ---
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
     name: redhat-oadp-operator
     namespace: openshift-adp
    spec:
     channel: stable-1.2
     installPlanApproval: Automatic
     name: redhat-oadp-operator
     source: redhat-operators
     sourceNamespace: openshift-marketplace
    EOF
  4. 等待运算符准备就绪

    $ watch oc -n openshift-adp get pods
    示例输出
    NAME                                                READY   STATUS    RESTARTS   AGE
    openshift-adp-controller-manager-546684844f-qqjhn   1/1     Running   0          22s
  5. 创建云存储

    $ cat << EOF | oc create -f -
    apiVersion: oadp.openshift.io/v1alpha1
    kind: CloudStorage
    metadata:
     name: ${CLUSTER_NAME}-oadp
     namespace: openshift-adp
    spec:
     creationSecret:
       key: credentials
       name: cloud-credentials
     enableSharedConfig: true
     name: ${CLUSTER_NAME}-oadp
     provider: aws
     region: $REGION
    EOF
  6. 检查应用程序的存储默认存储类

    $ oc get pvc -n <namespace> (1)
    1 输入应用程序的命名空间。
    示例输出
    NAME     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    applog   Bound    pvc-351791ae-b6ab-4e8b-88a4-30f73caf5ef8   1Gi        RWO            gp3-csi        4d19h
    mysql    Bound    pvc-16b8e009-a20a-4379-accc-bc81fedd0621   1Gi        RWO            gp3-csi        4d19h
    $ oc get storageclass
    示例输出
    NAME                PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
    gp2                 kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   true                   4d21h
    gp2-csi             ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   4d21h
    gp3                 ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   4d21h
    gp3-csi (default)   ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   4d21h

    使用 gp3-csi、gp2-csi、gp3 或 gp2 都可以。如果正在备份的所有应用程序都使用带有 CSI 的 PV,请在 OADP DPA 配置中包含 CSI 插件。

  7. 仅限 CSI:部署数据保护应用程序

    $ cat << EOF | oc create -f -
    apiVersion: oadp.openshift.io/v1alpha1
    kind: DataProtectionApplication
    metadata:
     name: ${CLUSTER_NAME}-dpa
     namespace: openshift-adp
    spec:
     backupImages: true
     features:
       dataMover:
         enable: false
     backupLocations:
     - bucket:
         cloudStorageRef:
           name: ${CLUSTER_NAME}-oadp
         credential:
           key: credentials
           name: cloud-credentials
         prefix: velero
         default: true
         config:
           region: ${REGION}
     configuration:
       velero:
         defaultPlugins:
         - openshift
         - aws
         - csi
       restic:
         enable: false
    EOF

    如果对 CSI 卷运行此命令,则可以跳过下一步。

  8. 非 CSI 卷:部署数据保护应用程序

    $ cat << EOF | oc create -f -
    apiVersion: oadp.openshift.io/v1alpha1
    kind: DataProtectionApplication
    metadata:
     name: ${CLUSTER_NAME}-dpa
     namespace: openshift-adp
    spec:
     backupImages: true
     features:
       dataMover:
         enable: false
     backupLocations:
     - bucket:
         cloudStorageRef:
           name: ${CLUSTER_NAME}-oadp
         credential:
           key: credentials
           name: cloud-credentials
         prefix: velero
         default: true
         config:
           region: ${REGION}
     configuration:
       velero:
         defaultPlugins:
         - openshift
         - aws
       restic:
         enable: false
     snapshotLocations:
       - velero:
           config:
             credentialsFile: /tmp/credentials/openshift-adp/cloud-credentials-credentials
             enableSharedConfig: 'true'
             profile: default
             region: ${REGION}
           provider: aws
    EOF
  • 在 OADP 1.1.x ROSA STS 环境中,必须将容器镜像备份和恢复(`spec.backupImages`)值设置为 `false`,因为它不受支持。

  • Restic 功能(`restic.enable=false`)已禁用,在 ROSA STS 环境中不受支持。

  • DataMover 功能(`dataMover.enable=false`)已禁用,在 ROSA STS 环境中不受支持。

执行备份

以下示例 hello-world 应用程序没有附加持久卷。任一 DPA 配置都可以使用。

  1. 创建要备份的工作负载

    $ oc create namespace hello-world
    $ oc new-app -n hello-world --image=docker.io/openshift/hello-openshift
  2. 公开路由

    $ oc expose service/hello-openshift -n hello-world
  3. 检查应用程序是否正在运行

    $ curl `oc get route/hello-openshift -n hello-world -o jsonpath='{.spec.host}'`
    示例输出
    Hello OpenShift!
  4. 备份工作负载

    $ cat << EOF | oc create -f -
    apiVersion: velero.io/v1
    kind: Backup
    metadata:
     name: hello-world
     namespace: openshift-adp
    spec:
     includedNamespaces:
     - hello-world
     storageLocation: ${CLUSTER_NAME}-dpa-1
     ttl: 720h0m0s
    EOF
  5. 等待备份完成

    $ watch "oc -n openshift-adp get backup hello-world -o json | jq .status"
    示例输出
    {
     "completionTimestamp": "2022-09-07T22:20:44Z",
     "expiration": "2022-10-07T22:20:22Z",
     "formatVersion": "1.1.0",
     "phase": "Completed",
     "progress": {
       "itemsBackedUp": 58,
       "totalItems": 58
     },
     "startTimestamp": "2022-09-07T22:20:22Z",
     "version": 1
    }
  6. 删除演示工作负载

    $ oc delete ns hello-world
  7. 从备份恢复

    $ cat << EOF | oc create -f -
    apiVersion: velero.io/v1
    kind: Restore
    metadata:
     name: hello-world
     namespace: openshift-adp
    spec:
     backupName: hello-world
    EOF
  8. 等待恢复完成

    $ watch "oc -n openshift-adp get restore hello-world -o json | jq .status"
    示例输出
    {
     "completionTimestamp": "2022-09-07T22:25:47Z",
     "phase": "Completed",
     "progress": {
       "itemsRestored": 38,
       "totalItems": 38
     },
     "startTimestamp": "2022-09-07T22:25:28Z",
     "warnings": 9
    }
  9. 检查工作负载是否已恢复

    $ oc -n hello-world get pods
    示例输出
    NAME                              READY   STATUS    RESTARTS   AGE
    hello-openshift-9f885f7c6-kdjpj   1/1     Running   0          90s
    $ curl `oc get route/hello-openshift -n hello-world -o jsonpath='{.spec.host}'`
    示例输出
    Hello OpenShift!
  10. 有关故障排除技巧,请参阅 OADP 团队的故障排除文档

  11. 可以在 OADP 团队的示例应用程序目录中找到其他示例应用程序

清理

  1. 删除工作负载

    $ oc delete ns hello-world
  2. 如果不再需要,请从集群中删除备份和恢复资源

    $ oc delete backups.velero.io hello-world
    $ oc delete restores.velero.io hello-world
  3. 要删除 s3 中的备份/恢复和远程对象

    $ velero backup delete hello-world
    $ velero restore delete hello-world
  4. 删除数据保护应用程序

    $ oc -n openshift-adp delete dpa ${CLUSTER_NAME}-dpa
  5. 删除云存储

    $ oc -n openshift-adp delete cloudstorage ${CLUSTER_NAME}-oadp

    如果此命令挂起,您可能需要删除最终确定器

    $ oc -n openshift-adp patch cloudstorage ${CLUSTER_NAME}-oadp -p '{"metadata":{"finalizers":null}}' --type=merge
  6. 如果不再需要,请删除运算符

    $ oc -n openshift-adp delete subscription oadp-operator
  7. 删除运算符的命名空间

    $ oc delete ns redhat-openshift-adp
  8. 如果您不再希望拥有自定义资源定义,请从集群中删除它们

    $ for CRD in `oc get crds | grep velero | awk '{print $1}'`; do oc delete crd $CRD; done
    $ for CRD in `oc get crds | grep -i oadp | awk '{print $1}'`; do oc delete crd $CRD; done
  9. 删除 AWS S3 存储桶

    $ aws s3 rm s3://${CLUSTER_NAME}-oadp --recursive
    $ aws s3api delete-bucket --bucket ${CLUSTER_NAME}-oadp
  10. 将策略与角色分离

    $ aws iam detach-role-policy --role-name "${ROLE_NAME}" \
     --policy-arn "${POLICY_ARN}"
  11. 删除角色

    $ aws iam delete-role --role-name "${ROLE_NAME}"