kube-proxy Pods Stuck in Crashloopbackoff Cycle

guide: how to fix kube-proxy pods stuck in crashloopbackoff

Apr 09, 2025

UPDATE MAY 5, 2025:

Kernel version 6.8.0-59-generic fixes the issue.

The other day, I realized that the kube-proxy pods in my homelab were constantly crashing and restarting. I am running an RKE2 cluster version v1.32.3+rke2r1.

The output looked like this:

E0408 13:28:27.883039       1 proxier.go:1511] "Failed to execute iptables-restore" err=
    exit status 2: Warning: Extension MARK revision 0 not supported, missing kernel module?
    ip6tables-restore v1.8.9 (nf_tables): unknown option "--xor-mark"
    Error occurred at line: 17
    Try `ip6tables-restore -h' or 'ip6tables-restore --help' for more information.
 >

I decided to dive deep down the rabbit hole.

Trying Configuration Changes
1. I attempted to modify kube-proxy to use IPv4-only mode by adding this to the RKE2 config:
```
kube-proxy-arg:
 - "ipfamily=ipv4"
```
Result: The RKE2 agent service failed to start with this configuration.
Testing Other Configuration Options
1. I tried more specific kube-proxy arguments:
```
yamlkube-proxy-arg:
 - "proxy-mode=iptables"
 - "v6-cluster-cidr=none"
```
Result: This also caused the RKE2 agent service to fail.
System-Level IPv6 Disabling
1. I tried disabling IPv6 at the system level:
```
bashsudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
```
And making it persistent in /etc/sysctl.conf.
Result: IPv6 was disabled, but kube-proxy still tried to configure IPv6 rules and failed.
Attempting to Load Missing Kernel Modules
1. I tried loading the required modules directly:
```
bashsudo modprobe xt_mark
sudo modprobe xt_MARK
```
Result: Received errors that the module xt_MARK doesn't exist in the 6.8.0-57-generic kernel.
Kernel Version Comparison
1. I discovered that an older kernel (6.8.0-49-generic) on the same Ubuntu version didn't exhibit this issue, suggesting a regression in the newer kernel's nftables/iptables compatibility layer.

And now, the root cause and the fix.

If you have a node running Ubuntu's kernel version 6.8.0-56-generic or later (as of apr. 9, 2025), all kube-proxy pods will enter CrashLoopBackOff state due to a missing kernel module.

This issue persists in the current version 6.8.0-57-generic (which was what I was using in my homelab). The last confirmed working kernel version is 6.8.0-55-generic. I found this was the last working version by upgrading and downgrading kernels on my nodes to find where the bug was introduced.

This finding narrows down the exact kernel versions affected and provides a clear path for remediation. To fix this you have two options, either downgrade affected nodes to kernel version 6.8.0-55-generic or earlier, or wait for a fix in future kernel releases.

I chose to downgrade. After I rolled back the kernel on my nodes to version 6.8.0-55-generic and restarted the nodes, the kube-proxy pods magically started working.

Yes. My cluster is named joker2. Joe+RKE2=joker2(kind of).

Look at that! 20 hours running and no restarts!

Side note: Upgrading to kernel version 6.11.0-21-generic also works. I prefer to stick with the version that is used natively for 24.04 noble.

Cheers,

Joe

kube-proxy Pods Stuck in Crashloopbackoff Cycle

guide: how to fix kube-proxy pods stuck in crashloopbackoff

Discussion about this post