use Cilium to fully replace kube-proxy
May 12, 2022 默认分类
use Cilium
to fully replace kube-proxy
This guide explains how to provision a Kubernetes cluster without kube-proxy
, and to use Cilium to fully replace it. For simplicity, we will use kubeadm
to bootstrap the cluster.
- Cilium’s kube-proxy replacement depends on the Host-Reachable Services feature, therefore a v4.19.57, v5.1.16, v5.2.0 or more recent Linux kernel is required. Linux kernels v5.3 and v5.8 add additional features that Cilium can use to further optimize the kube-proxy replacement implementation.
- Note that v5.0.y kernels do not have the fix required to run the kube-proxy replacement since at this point in time the v5.0.y stable kernel is end-of-life (EOL) and not maintained anymore on kernel.org. For individual distribution maintained kernels, the situation could differ. Therefore, please check with your distribution.
update kernetl
# Update CentOS Repositories
yum -y update
# Enable the ELRepo Repository
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
# Install the ELRepo repository
rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
# List Available Kernels
yum list available --disablerepo='*' --enablerepo=elrepo-kernel
>Loaded plugins: fastestmirror
>Loading mirror speeds from cached hostfile
> * elrepo-kernel: mirror.rackspace.com
>Available Packages
>kernel-lt.x86_64 5.4.192-1.el7.elrepo elrepo-kernel
>kernel-lt-devel.x86_64 5.4.192-1.el7.elrepo elrepo-kernel
>kernel-lt-doc.noarch 5.4.192-1.el7.elrepo elrepo-kernel
>kernel-lt-headers.x86_64 5.4.192-1.el7.elrepo elrepo-kernel
>kernel-lt-tools.x86_64 5.4.192-1.el7.elrepo elrepo-kernel
>kernel-lt-tools-libs.x86_64 5.4.192-1.el7.elrepo elrepo-kernel
>kernel-lt-tools-libs-devel.x86_64 5.4.192-1.el7.elrepo elrepo-kernel
>kernel-ml-doc.noarch 5.17.6-1.el7.elrepo elrepo-kernel
>kernel-ml-tools.x86_64 5.17.6-1.el7.elrepo elrepo-kernel
>kernel-ml-tools-libs.x86_64 5.17.6-1.el7.elrepo elrepo-kernel
>kernel-ml-tools-libs-devel.x86_64 5.17.6-1.el7.elrepo elrepo-kernel
>perf.x86_64 5.17.6-1.el7.elrepo elrepo-kernel
>python-perf.x86_64 5.17.6-1.el7.elrepo elrepo-kernel
# Install New CentOS Kernel Version
yum --enablerepo=elrepo-kernel install kernel-ml kernel-ml-devel kernel-ml-headers
# Set Default Kernel Version
vim /etc/default/grub # Once the file opens, look for the line that says GRUB_DEFAULT=X, and change it to GRUB_DEFAULT=0 (zero). This line will instruct the boot loader to default to the first kernel on the list, which is the latest.
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot
Install containerd and crictl
Forwarding IPv4 and letting iptables see bridged traffic
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# Apply sysctl params without reboot
sudo sysctl --system
the official documentation
Configure
# generate containerd config
containerd config default | tee /etc/containerd/config.toml
# change containerd use SystemdCgroup
sed -i "s/SystemdCgroup = false/SystemdCgroup = true/g" /etc/containerd/config.toml
# restart containerd
systemctl restart containerd
# crictl
cat > /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
EOF
Install kubeadm
the official kubeadm documentation
Install helm
the official helm documentation
Quick-Start
Initialize the control-plane node via kubeadm init and skip the installation of the kube-proxy add-on:
kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/16 --skip-phases=addon/kube-proxy
# optional
kubectl taint nodes --all node-role.kubernetes.io/master-
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
Setup Helm repository
helm repo add cilium https://helm.cilium.io/
Next, generate the required YAML files and deploy them. Important: Replace REPLACE_WITH_API_SERVER_IP
and REPLACE_WITH_API_SERVER_PORT
below with the concrete control-plane node IP address and the kube-apiserver port number reported by kubeadm init
(usually, it is port 6443
).
Specifying this is necessary as kubeadm init
is run explicitly without setting up kube-proxy and as a consequence, although it exports KUBERNETES_SERVICE_HOST
and KUBERNETES_SERVICE_PORT
with a ClusterIP of the kube-apiserver service to the environment, there is no kube-proxy in our setup provisioning that service. The Cilium agent therefore needs to be made aware of this information through below configuration.
helm install cilium cilium/cilium --version 1.11.4 \
--namespace kube-system \
--set operator.replicas=1 \
--set nodeinit.enabled=true \
--set nodeinit.restartPods=true \
--set externalIPs.enabled=true \
--set nodePort.enabled=true \
--set hostPort.enabled=true \
--set tunnel=disabled \
--set bpf.masquerade=true \
--set bpf.clockProbe=true \
--set bpf.waitForMount=true \
--set bpf.preallocateMaps=true \
--set bpf.tproxy=true \
--set bpf.hostRouting=true \
--set autoDirectNodeRoutes=true \
--set localRedirectPolicy=true \
--set enableCiliumEndpointSlice=true \
--set enableK8sEventHandover=true \
--set enableK8sEndpointSlice=true \
--set wellKnownIdentities.enabled=true \
--set sockops.enabled=true \
--set ipam.operator.clusterPoolIPv4PodCIDRList=10.244.0.0/16 \
--set ipv4NativeRoutingCIDR=10.244.0.0/16 \
--set nodePort.directRoutingDevice=eth0 \
--set devices=eth0 \
--set bandwidthManager=true \
--set hubble.enabled=true \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set installNoConntrackIptablesRules=true \
--set egressGateway.enabled=true \
--set endpointRoutes.enabled=true \
--set pullPolicy=IfNotPresent \
--set kubeProxyReplacement=strict \
--set loadBalancer.algorithm=maglev \
--set loadBalancer.mode=dsr \
--set hostServices.enabled=true \
--set k8sServiceHost=172.16.127.45 \
--set k8sServicePort=6443
Check cilium state
[email protected] ~ 10:31:38 # kubectl -n kube-system get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cilium-8wvvp 1/1 Running 0 32m 172.16.127.45 test <none> <none>
cilium-node-init-f5rmz 1/1 Running 0 32m 172.16.127.45 test <none> <none>
cilium-operator-7469d54548-4pr9c 1/1 Running 0 32m 172.16.127.45 test <none> <none>
coredns-6d4b75cb6d-956sh 1/1 Running 0 33m 10.244.0.146 test <none> <none>
coredns-6d4b75cb6d-wjk9p 1/1 Running 0 33m 10.244.0.105 test <none> <none>
etcd-test 1/1 Running 1 34m 172.16.127.45 test <none> <none>
kube-apiserver-test 1/1 Running 0 34m 172.16.127.45 test <none> <none>
kube-controller-manager-test 1/1 Running 0 34m 172.16.127.45 test <none> <none>
kube-scheduler-test 1/1 Running 1 34m 172.16.127.45 test <none> <none>
Validate the Setup, use --verbose
for full details
[email protected] ~ 10:40:14 # kubectl exec -it -n kube-system cilium-8wvvp -- cilium status --verbose
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), wait-for-node-init (init), clean-cilium-state (init)
KVStore: Ok Disabled
Kubernetes: Ok 1.24 (v1.24.0) [linux/amd64]
Kubernetes APIs: ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEgressNATPolicy", "cilium/v2::CiliumLocalRedirectPolicy", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "cilium/v2alpha1::CiliumEndpointSlice", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement: Strict [eth0 172.16.127.45 (Direct Routing)]
Host firewall: Disabled
Cilium: Ok 1.11.4 (v1.11.4-9d25463)
NodeMonitor: Disabled
Cilium health daemon: Ok
IPAM: IPv4: 4/254 allocated from 10.244.0.0/24,
Allocated addresses:
10.244.0.105 (kube-system/coredns-6d4b75cb6d-wjk9p)
10.244.0.146 (kube-system/coredns-6d4b75cb6d-956sh)
10.244.0.225 (router)
10.244.0.239 (health)
BandwidthManager: EDT with BPF [eth0]
Host Routing: Legacy
Masquerading: BPF [eth0] 10.244.0.0/16 [IPv4: Enabled, IPv6: Disabled]
Clock Source for BPF: ktime
Controller Status: 30/30 healthy
Name Last success Last error Count Message
bpf-map-sync-cilium_ipcache 1s ago 34m13s ago 0 no error
bpf-map-sync-cilium_throttle 5s ago never 0 no error
cilium-health-ep 54s ago never 0 no error
dns-garbage-collector-job 12s ago never 0 no error
endpoint-1338-regeneration-recovery never never 0 no error
endpoint-428-regeneration-recovery never never 0 no error
endpoint-723-regeneration-recovery never never 0 no error
endpoint-815-regeneration-recovery never never 0 no error
endpoint-gc 4m13s ago never 0 no error
ipcache-inject-labels 34m9s ago 34m11s ago 0 no error
k8s-heartbeat 13s ago never 0 no error
mark-k8s-node-as-available 33m55s ago never 0 no error
metricsmap-bpf-prom-sync 2s ago never 0 no error
resolve-identity-1338 3m55s ago never 0 no error
resolve-identity-428 3m56s ago never 0 no error
resolve-identity-723 3m55s ago never 0 no error
resolve-identity-815 3m55s ago never 0 no error
sync-endpoints-and-host-ips 56s ago never 0 no error
sync-lb-maps-with-k8s-services 33m56s ago never 0 no error
sync-node-with-ciliumnode (test) 34m10s ago 34m11s ago 0 no error
sync-policymap-1338 43s ago never 0 no error
sync-policymap-428 42s ago never 0 no error
sync-policymap-723 43s ago never 0 no error
sync-policymap-815 43s ago never 0 no error
sync-to-k8s-ciliumendpoint (1338) 5s ago never 0 no error
sync-to-k8s-ciliumendpoint (428) 5s ago never 0 no error
sync-to-k8s-ciliumendpoint (723) 5s ago never 0 no error
sync-to-k8s-ciliumendpoint (815) 4s ago never 0 no error
template-dir-watcher never never 0 no error
update-k8s-node-annotations 34m11s ago never 0 no error
Proxy Status: OK, ip 10.244.0.225, 0 redirects active on ports 10000-20000
Hubble: Disabled
KubeProxyReplacement Details:
Status: Strict
Socket LB Protocols: TCP, UDP
Devices: eth0 172.16.127.45 (Direct Routing)
Mode: DSR
Backend Selection: Maglev (Table Size: 16381)
Session Affinity: Enabled
Graceful Termination: Enabled
XDP Acceleration: Disabled
Services:
- ClusterIP: Enabled
- NodePort: Enabled (Range: 30000-32767)
- LoadBalancer: Enabled
- externalIPs: Enabled
- HostPort: Enabled
BPF Maps: dynamic sizing: on (ratio: 0.002500)
Name Size
Non-TCP connection tracking 65536
TCP connection tracking 131072
Endpoint policy 65535
Events 64
IP cache 512000
IP masquerading agent 16384
IPv4 fragmentation 8192
IPv4 service 65536
IPv6 service 65536
IPv4 service backend 65536
IPv6 service backend 65536
IPv4 service reverse NAT 65536
IPv6 service reverse NAT 65536
Metrics 1024
NAT 131072
Neighbor table 131072
Global policy 16384
Per endpoint policy 65536
Session affinity 65536
Signal 64
Sockmap 65535
Sock reverse NAT 65536
Tunnel 65536
Encryption: Disabled
Cluster health: 1/1 reachable (2022-05-12T02:41:03Z)
Name IP Node Endpoints
test (localhost) 172.16.127.45 reachable reachable
Verify service
[email protected] ~ 10:43:41 # kubectl exec -it -n kube-system cilium-8wvvp -- cilium service list
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), wait-for-node-init (init), clean-cilium-state (init)
ID Frontend Service Type Backend
1 10.96.0.1:443 ClusterIP 1 => 172.16.127.45:6443
2 10.96.0.10:53 ClusterIP 1 => 10.244.0.146:53
2 => 10.244.0.105:53
3 10.96.0.10:9153 ClusterIP 1 => 10.244.0.146:9153
2 => 10.244.0.105:9153
At the same time we can verify, using iptables
in the host namespace, that no iptables
rule for the service is present:
[email protected] ~ 10:43:17 # iptables-save | grep KUBE-SVC
[ empty line ]
using ipvsadm
in the host namespace, that no ipvsadm
rule for the service is present:
[email protected] ~ 10:43:12 # ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
REF
cilium
install containerd
create-cluster-kubeadm
container-runtimes
helm install