大多数错误发生在 Collector 启动期间,此时 Collector 配置自身并查找或下载系统的内核驱动程序。(Most errors occur during Collector startup when Collector configures itself and finds or downloads a kernel driver for the system.)
下图描述了 Collector 启动过程的主要部分 (The following diagram describes the main parts of Collector startup process)
如果启动过程的任何部分失败,日志将显示详细说明哪些步骤成功或失败的诊断摘要。(If any part of the startup procedure fails, the logs display a diagnostic summary detailing which steps succeeded or failed.)
以下日志文件示例显示了成功的启动 (The following log file example shows a successful startup)
[INFO 2022/11/28 13:21:55] == Collector Startup Diagnostics: ==
[INFO 2022/11/28 13:21:55] Connected to Sensor? true
[INFO 2022/11/28 13:21:55] Kernel driver available? true
[INFO 2022/11/28 13:21:55] Driver loaded into kernel? true
[INFO 2022/11/28 13:21:55] ====================================
日志输出确认 Collector 已连接到 Sensor 并找到并加载了内核驱动程序。您可以使用此日志检查 Collector 是否成功启动。
启动时,首先检查您是否可以连接到 Sensor。Sensor 负责下载内核驱动程序和 CIDR 块以处理网络事件,使其成为启动过程的重要组成部分。以下日志表明您无法连接到 Sensor
Collector Version: 3.15.0
OS: Ubuntu 20.04.4 LTS
Kernel Version: 5.4.0-126-generic
Starting StackRox Collector...
[INFO 2023/05/13 12:20:43] Hostname: 'hostname'
[INFO 2023/05/13 12:20:43] Sensor configured at address: sensor.stackrox.svc:9998
[INFO 2023/05/13 12:20:43] Attempting to connect to Sensor
[INFO 2023/05/13 12:21:13]
[INFO 2023/05/13 12:21:13] == Collector Startup Diagnostics: ==
[INFO 2023/05/13 12:21:13] Connected to Sensor? false
[INFO 2023/05/13 12:21:13] Kernel driver candidates:
[INFO 2023/05/13 12:21:13] ====================================
[INFO 2023/05/13 12:21:13]
[FATAL 2023/05/13 12:21:13] Unable to connect to Sensor.
此错误可能意味着 Sensor 未正确启动或 Collector 配置不正确。要解决此问题,必须验证 Collector 配置以确保 Sensor 地址正确且 Sensor pod 运行正常。
查看 Collector 日志以具体检查已配置的 Sensor 地址。或者,您可以运行以下命令
$ kubectl -n stackrox get pod <collector_pod_name> -o jsonpath='{.spec.containers[0].env[?(@.name=="GRPC_SERVER")].value}' (1)
1 | 对于 <collector_pod_name> ,请指定您的 Collector pod 的名称,例如 collector-vclg5 。 |
Collector 确定它是否具有节点内核版本的内核驱动程序。Collector 首先搜索本地存储以查找具有正确版本和类型的驱动程序,然后尝试从 Sensor 下载驱动程序。以下日志表明本地内核驱动程序和 Sensor 中的驱动程序均不存在
Collector Version: 3.15.0
OS: Alpine Linux v3.16
Kernel Version: 5.15.82-0-virt
Starting StackRox Collector...
[INFO 2023/05/30 12:00:33] Hostname: 'alpine'
[INFO 2023/05/30 12:00:33] User configured collection-method=ebpf
[INFO 2023/05/30 12:00:33] Afterglow is enabled
[INFO 2023/05/30 12:00:33] Sensor configured at address: sensor.stackrox.svc:443
[INFO 2023/05/30 12:00:33] Attempting to connect to Sensor
[INFO 2023/05/30 12:00:33] Successfully connected to Sensor.
[INFO 2023/05/30 12:00:33] Module version: 2.5.0-rc1
[INFO 2023/05/30 12:00:33] Config: collection_method:0, useChiselCache:1, scrape_interval:30, turn_off_scrape:0, hostname:alpine, processesListeningOnPorts:1, logLevel:INFO
[INFO 2023/05/30 12:00:33] Attempting to find eBPF probe - Candidate versions:
[INFO 2023/05/30 12:00:33] collector-ebpf-5.15.82-0-virt.o
[INFO 2023/05/30 12:00:33] Attempting to download collector-ebpf-5.15.82-0-virt.o
[INFO 2023/05/30 12:00:33] Attempting to download kernel object from https://sensor.stackrox.svc:443/kernel-objects/2.5.0/collector-ebpf-5.15.82-0-virt.o.gz (1)
[INFO 2023/05/30 12:00:33] HTTP Request failed with error code 404 (2)
[WARNING 2023/05/30 12:02:03] Attempted to download collector-ebpf-5.15.82-0-virt.o.gz 90 time(s)
[WARNING 2023/05/30 12:02:03] Failed to download from collector-ebpf-5.15.82-0-virt.o.gz
[WARNING 2023/05/30 12:02:03] Unable to download kernel object collector-ebpf-5.15.82-0-virt.o to /module/collector-ebpf.o.gz
[WARNING 2023/05/30 12:02:03] No suitable kernel object downloaded for collector-ebpf-5.15.82-0-virt.o
[ERROR 2023/05/30 12:02:03] Failed to initialize collector kernel components.
[INFO 2023/05/30 12:02:03]
[INFO 2023/05/30 12:02:03] == Collector Startup Diagnostics: ==
[INFO 2023/05/30 12:02:03] Connected to Sensor? true
[INFO 2023/05/30 12:02:03] Kernel driver candidates:
[INFO 2023/05/30 12:02:03] collector-ebpf-5.15.82-0-virt.o (unavailable)
[INFO 2023/05/30 12:02:03] ====================================
[INFO 2023/05/30 12:02:03]
[FATAL 2023/05/30 12:02:03] Failed to initialize collector kernel components. (3)
1 | 日志显示首先尝试找到模块,然后尝试从 Sensor 下载驱动程序。 |
2 | 404 错误表示节点的内核没有内核驱动程序。 |
3 | 由于缺少驱动程序,Collector 进入 CrashLoopBackOff 状态。 |
在 Collector 启动之前,它会加载内核驱动程序。但是,在极少数情况下,您可能会遇到 Collector 无法加载内核驱动程序的问题,从而导致各种错误消息或异常。在这种情况下,必须检查日志以识别加载内核驱动程序失败的问题。
请考虑以下 Collector 日志
[INFO 2023/05/13 14:25:13] Hostname: 'hostname'
[INFO 2023/05/13 14:25:13] Successfully downloaded and decompressed /module/collector.o
[INFO 2023/05/13 14:25:13]
[INFO 2023/05/13 14:25:13] This product uses ebpf subcomponents licensed under the GNU
[INFO 2023/05/13 14:25:13] GENERAL PURPOSE LICENSE Version 2 outlined in the /kernel-modules/LICENSE file.
[INFO 2023/05/13 14:25:13] Source code for the ebpf subcomponents is available at
[INFO 2023/05/13 14:25:13] https://github.com/stackrox/falcosecurity-libs/
[INFO 2023/05/13 14:25:13]
[WARNING 2023/05/13 14:25:13] libscap: bpf_load_program() event=tracepoint/syscalls/sys_enter_chdir: Operation not permitted
[ERROR 2023/05/13 14:25:13] Failed to setup collector-ebpf-6.2.0-20-generic.o
[ERROR 2023/05/13 14:25:13] Failed to initialize collector kernel components.
[INFO 2023/05/13 14:25:13]
[INFO 2023/05/13 14:25:13] == Collector Startup Diagnostics: ==
[INFO 2023/05/13 14:25:13] Connected to Sensor? true
[INFO 2023/05/13 14:25:13] Kernel driver candidates:
[INFO 2023/05/13 14:25:13] collector-ebpf-6.2.0-20-generic.o (available)
[INFO 2023/05/13 14:25:13] ====================================
[INFO 2023/05/13 14:25:13]
[FATAL 2023/05/13 14:25:13] Failed to initialize collector kernel components.
如果您遇到此类错误,则您不太可能自行修复它。因此,请改为向 Red Hat Advanced Cluster Security for Kubernetes (RHACS) 支持团队报告问题或创建一个GitHub 问题。