×

文件完整性运算符是一个 OpenShift Container Platform 运算符,它持续对集群节点运行文件完整性检查。它部署一个守护程序集,该程序集在每个节点上初始化和运行特权高级入侵检测环境 (AIDE) 容器,提供一个状态对象,其中包含在守护程序集 Pod 的初始运行期间修改的文件日志。

目前,仅支持 Red Hat Enterprise Linux CoreOS (RHCOS) 节点。

创建 FileIntegrity 自定义资源

FileIntegrity 自定义资源 (CR) 的实例表示对一个或多个节点的一组连续文件完整性扫描。

每个 FileIntegrity CR 都由在与 FileIntegrity CR 规范匹配的节点上运行 AIDE 的守护程序集支持。

步骤
  1. 创建以下名为 worker-fileintegrity.yaml 的示例 FileIntegrity CR 以启用对工作节点的扫描

    示例 FileIntegrity CR
    apiVersion: fileintegrity.openshift.io/v1alpha1
    kind: FileIntegrity
    metadata:
      name: worker-fileintegrity
      namespace: openshift-file-integrity
    spec:
      nodeSelector: (1)
          node-role.kubernetes.io/worker: ""
      tolerations: (2)
      - key: "myNode"
        operator: "Exists"
        effect: "NoSchedule"
      config: (3)
        name: "myconfig"
        namespace: "openshift-file-integrity"
        key: "config"
        gracePeriod: 20 (4)
        maxBackups: 5 (5)
        initialDelay: 60 (6)
      debug: false
    status:
      phase: Active (7)
    1 定义用于调度节点扫描的选择器。
    2 指定tolerations以便在具有自定义污点的节点上调度。如果未指定,则应用默认容忍度,允许在主节点和基础设施节点上运行。
    3 定义一个包含要使用的AIDE配置的ConfigMap
    4 AIDE完整性检查之间暂停的秒数。频繁对节点进行AIDE检查可能会占用大量资源,因此指定更长的间隔时间可能很有用。默认为900秒(15分钟)。
    5 在节点上保留的AIDE数据库和日志备份(重新初始化过程的剩余文件)的最大数量。超过此数量的旧备份将由守护进程自动清除。默认为5。
    6 启动第一次AIDE完整性检查之前等待的秒数。默认为0。
    7 FileIntegrity实例的运行状态。状态为InitializingPendingActive
    正在初始化

    FileIntegrity对象当前正在初始化或重新初始化AIDE数据库。

    等待中

    FileIntegrity部署仍在创建中。

    活跃

    扫描正在进行。

  2. 将YAML文件应用于openshift-file-integrity命名空间

    $ oc apply -f worker-fileintegrity.yaml -n openshift-file-integrity
验证
  • 通过运行以下命令确认FileIntegrity对象已成功创建

    $ oc get fileintegrities -n openshift-file-integrity
    示例输出
    NAME                   AGE
    worker-fileintegrity   14s

检查FileIntegrity自定义资源状态

FileIntegrity自定义资源 (CR) 通过 .status.phase 子资源报告其状态。

步骤
  • 要查询FileIntegrity CR状态,请运行

    $ oc get fileintegrities/worker-fileintegrity  -o jsonpath="{ .status.phase }"
    示例输出
    Active

FileIntegrity自定义资源阶段

  • Pending - 创建自定义资源 (CR) 后的阶段。

  • Active - 后端守护程序集启动并运行时的阶段。

  • Initializing - 正在重新初始化AIDE数据库时的阶段。

理解FileIntegrityNodeStatuses对象

FileIntegrity CR 的扫描结果在另一个名为FileIntegrityNodeStatuses的对象中报告。

$ oc get fileintegritynodestatuses
示例输出
NAME                                                AGE
worker-fileintegrity-ip-10-0-130-192.ec2.internal   101s
worker-fileintegrity-ip-10-0-147-133.ec2.internal   109s
worker-fileintegrity-ip-10-0-165-160.ec2.internal   102s

FileIntegrityNodeStatus 对象结果可能需要一些时间才能可用。

每个节点有一个结果对象。每个FileIntegrityNodeStatus对象的nodeName属性对应于正在扫描的节点。文件完整性扫描的状态在results数组中表示,该数组保存扫描条件。

$ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq

fileintegritynodestatus对象报告AIDE运行的最新状态,并在status字段中将状态显示为FailedSucceededErrored

$ oc get fileintegritynodestatuses -w
示例输出
NAME                                                               NODE                                         STATUS
example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal   ip-10-0-134-186.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal   ip-10-0-150-230.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-169-137.us-east-2.compute.internal   ip-10-0-169-137.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal   ip-10-0-180-200.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal    ip-10-0-194-66.us-east-2.compute.internal    Failed
example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal   ip-10-0-222-188.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal   ip-10-0-134-186.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal   ip-10-0-222-188.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal    ip-10-0-194-66.us-east-2.compute.internal    Failed
example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal   ip-10-0-150-230.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal   ip-10-0-180-200.us-east-2.compute.internal   Succeeded

FileIntegrityNodeStatus CR状态类型

这些条件在相应的FileIntegrityNodeStatus CR状态的results数组中报告

  • Succeeded - 完整性检查通过;自上次初始化数据库以来,AIDE检查涵盖的文件和目录未被修改。

  • Failed - 完整性检查失败;自上次初始化数据库以来,AIDE检查涵盖的一些文件或目录已被修改。

  • Errored - AIDE扫描程序遇到内部错误。

FileIntegrityNodeStatus CR成功示例

具有成功状态的条件的示例输出
[
  {
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:45:57Z"
  }
]
[
  {
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:46:03Z"
  }
]
[
  {
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:45:48Z"
  }
]

在这种情况下,所有三个扫描都成功,到目前为止没有其他条件。

FileIntegrityNodeStatus CR失败状态示例

要模拟失败条件,请修改AIDE跟踪的文件之一。例如,修改其中一个工作节点上的/etc/resolv.conf

$ oc debug node/ip-10-0-130-192.ec2.internal
示例输出
Creating debug namespace/openshift-debug-node-ldfbj ...
Starting pod/ip-10-0-130-192ec2internal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.130.192
If you don't see a command prompt, try pressing enter.
sh-4.2# echo "# integrity test" >> /host/etc/resolv.conf
sh-4.2# exit

Removing debug pod ...
Removing debug namespace/openshift-debug-node-ldfbj ...

一段时间后,相应的FileIntegrityNodeStatus对象的results数组中将报告Failed条件。保留之前的Succeeded条件,这使您可以查明检查失败的时间。

$ oc get fileintegritynodestatuses.fileintegrity.openshift.io/worker-fileintegrity-ip-10-0-130-192.ec2.internal -ojsonpath='{.results}' | jq -r

或者,如果您不提及对象名称,请运行

$ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq
示例输出
[
  {
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:54:14Z"
  },
  {
    "condition": "Failed",
    "filesChanged": 1,
    "lastProbeTime": "2020-09-15T12:57:20Z",
    "resultConfigMapName": "aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed",
    "resultConfigMapNamespace": "openshift-file-integrity"
  }
]

Failed条件指向一个配置映射,该映射提供有关确切失败原因的更多详细信息

$ oc describe cm aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed
示例输出
Name:         aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed
Namespace:    openshift-file-integrity
Labels:       file-integrity.openshift.io/node=ip-10-0-130-192.ec2.internal
              file-integrity.openshift.io/owner=worker-fileintegrity
              file-integrity.openshift.io/result-log=
Annotations:  file-integrity.openshift.io/files-added: 0
              file-integrity.openshift.io/files-changed: 1
              file-integrity.openshift.io/files-removed: 0

Data

integritylog:
------
AIDE 0.15.1 found differences between database and filesystem!!
Start timestamp: 2020-09-15 12:58:15

Summary:
  Total number of files:  31553
  Added files:                0
  Removed files:            0
  Changed files:            1


---------------------------------------------------
Changed files:
---------------------------------------------------

changed: /hostroot/etc/resolv.conf

---------------------------------------------------
Detailed information about changes:
---------------------------------------------------


File: /hostroot/etc/resolv.conf
 SHA512   : sTQYpB/AL7FeoGtu/1g7opv6C+KT1CBJ , qAeM+a8yTgHPnIHMaRlS+so61EN8VOpg

Events:  <none>

由于配置映射数据大小限制,超过 1 MB 的 AIDE 日志将作为 base64 编码的 gzip 存档添加到失败配置映射中。使用以下命令提取日志

$ oc get cm <failure-cm-name> -o json | jq -r '.data.integritylog' | base64 -d | gunzip

压缩日志通过配置映射中存在file-integrity.openshift.io/compressed注释键来指示。

理解事件

FileIntegrityFileIntegrityNodeStatus对象的狀態转变由事件记录。事件的创建时间反映最新的转变,例如从InitializingActive,而不一定反映最新的扫描结果。但是,最新的事件始终反映最新的状态。

$ oc get events --field-selector reason=FileIntegrityStatus
示例输出
LAST SEEN   TYPE     REASON                OBJECT                                MESSAGE
97s         Normal   FileIntegrityStatus   fileintegrity/example-fileintegrity   Pending
67s         Normal   FileIntegrityStatus   fileintegrity/example-fileintegrity   Initializing
37s         Normal   FileIntegrityStatus   fileintegrity/example-fileintegrity   Active

节点扫描失败时,会创建一个包含add/changed/removed和配置映射信息的事件。

$ oc get events --field-selector reason=NodeIntegrityStatus
示例输出
LAST SEEN   TYPE      REASON                OBJECT                                MESSAGE
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-134-173.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-168-238.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-169-175.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-152-92.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-158-144.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-131-30.ec2.internal
87m         Warning   NodeIntegrityStatus   fileintegrity/example-fileintegrity   node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed

添加、更改或删除的文件数量发生变化会导致新的事件,即使节点的状态没有发生转变。

$ oc get events --field-selector reason=NodeIntegrityStatus
示例输出
LAST SEEN   TYPE      REASON                OBJECT                                MESSAGE
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-134-173.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-168-238.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-169-175.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-152-92.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-158-144.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-131-30.ec2.internal
87m         Warning   NodeIntegrityStatus   fileintegrity/example-fileintegrity   node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed
40m         Warning   NodeIntegrityStatus   fileintegrity/example-fileintegrity   node ip-10-0-152-92.ec2.internal has changed! a:3,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed