use Cilium to fully replace kube-proxy

May 12, 2022 默认分类

use Cilium to fully replace kube-proxy

This guide explains how to provision a Kubernetes cluster without kube-proxy, and to use Cilium to fully replace it. For simplicity, we will use kubeadm to bootstrap the cluster.

  • Cilium’s kube-proxy replacement depends on the Host-Reachable Services feature, therefore a v4.19.57, v5.1.16, v5.2.0 or more recent Linux kernel is required. Linux kernels v5.3 and v5.8 add additional features that Cilium can use to further optimize the kube-proxy replacement implementation.
  • Note that v5.0.y kernels do not have the fix required to run the kube-proxy replacement since at this point in time the v5.0.y stable kernel is end-of-life (EOL) and not maintained anymore on kernel.org. For individual distribution maintained kernels, the situation could differ. Therefore, please check with your distribution.

update kernetl

# Update CentOS Repositories
yum -y update
# Enable the ELRepo Repository
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
# Install the ELRepo repository
rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
# List Available Kernels
yum list available --disablerepo='*' --enablerepo=elrepo-kernel
>Loaded plugins: fastestmirror
>Loading mirror speeds from cached hostfile
> * elrepo-kernel: mirror.rackspace.com
>Available Packages
>kernel-lt.x86_64                     5.4.192-1.el7.elrepo            elrepo-kernel
>kernel-lt-devel.x86_64               5.4.192-1.el7.elrepo            elrepo-kernel
>kernel-lt-doc.noarch                 5.4.192-1.el7.elrepo            elrepo-kernel
>kernel-lt-headers.x86_64             5.4.192-1.el7.elrepo            elrepo-kernel
>kernel-lt-tools.x86_64               5.4.192-1.el7.elrepo            elrepo-kernel
>kernel-lt-tools-libs.x86_64          5.4.192-1.el7.elrepo            elrepo-kernel
>kernel-lt-tools-libs-devel.x86_64    5.4.192-1.el7.elrepo            elrepo-kernel
>kernel-ml-doc.noarch                 5.17.6-1.el7.elrepo             elrepo-kernel
>kernel-ml-tools.x86_64               5.17.6-1.el7.elrepo             elrepo-kernel
>kernel-ml-tools-libs.x86_64          5.17.6-1.el7.elrepo             elrepo-kernel
>kernel-ml-tools-libs-devel.x86_64    5.17.6-1.el7.elrepo             elrepo-kernel
>perf.x86_64                          5.17.6-1.el7.elrepo             elrepo-kernel
>python-perf.x86_64                   5.17.6-1.el7.elrepo             elrepo-kernel
# Install New CentOS Kernel Version
yum --enablerepo=elrepo-kernel install kernel-ml kernel-ml-devel kernel-ml-headers
# Set Default Kernel Version
vim /etc/default/grub  # Once the file opens, look for the line that says GRUB_DEFAULT=X, and change it to GRUB_DEFAULT=0 (zero). This line will instruct the boot loader to default to the first kernel on the list, which is the latest.
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot

Install containerd and crictl

Forwarding IPv4 and letting iptables see bridged traffic

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# Apply sysctl params without reboot
sudo sysctl --system

the official documentation
Configure

# generate containerd config
containerd config default | tee /etc/containerd/config.toml
# change containerd use SystemdCgroup
sed -i "s/SystemdCgroup = false/SystemdCgroup = true/g" /etc/containerd/config.toml
# restart containerd
systemctl restart containerd
# crictl
cat > /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
EOF

Install kubeadm

the official kubeadm documentation

Install helm

the official helm documentation

Quick-Start

Initialize the control-plane node via kubeadm init and skip the installation of the kube-proxy add-on:

kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/16 --skip-phases=addon/kube-proxy
# optional
kubectl taint nodes --all node-role.kubernetes.io/master-
kubectl taint nodes --all node-role.kubernetes.io/control-plane-

Setup Helm repository

helm repo add cilium https://helm.cilium.io/

Next, generate the required YAML files and deploy them. Important: Replace REPLACE_WITH_API_SERVER_IP and REPLACE_WITH_API_SERVER_PORT below with the concrete control-plane node IP address and the kube-apiserver port number reported by kubeadm init (usually, it is port 6443).

Specifying this is necessary as kubeadm init is run explicitly without setting up kube-proxy and as a consequence, although it exports KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT with a ClusterIP of the kube-apiserver service to the environment, there is no kube-proxy in our setup provisioning that service. The Cilium agent therefore needs to be made aware of this information through below configuration.

cilium helm chart values help

helm install cilium cilium/cilium --version 1.11.4 \
--namespace kube-system \
--set operator.replicas=1 \
--set nodeinit.enabled=true \
--set nodeinit.restartPods=true \
--set externalIPs.enabled=true \
--set nodePort.enabled=true \
--set hostPort.enabled=true \
--set tunnel=disabled \
--set bpf.masquerade=true \
--set bpf.clockProbe=true \
--set bpf.waitForMount=true \
--set bpf.preallocateMaps=true \
--set bpf.tproxy=true \
--set bpf.hostRouting=true \
--set autoDirectNodeRoutes=true \
--set localRedirectPolicy=true \
--set enableCiliumEndpointSlice=true \
--set enableK8sEventHandover=true \
--set enableK8sEndpointSlice=true \
--set wellKnownIdentities.enabled=true \
--set sockops.enabled=true \
--set ipam.operator.clusterPoolIPv4PodCIDRList=10.244.0.0/16 \
--set ipv4NativeRoutingCIDR=10.244.0.0/16 \
--set nodePort.directRoutingDevice=eth0 \
--set devices=eth0 \
--set bandwidthManager=true \
--set hubble.enabled=true \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set installNoConntrackIptablesRules=true \
--set egressGateway.enabled=true \
--set endpointRoutes.enabled=true \
--set pullPolicy=IfNotPresent \
--set kubeProxyReplacement=strict \
--set loadBalancer.algorithm=maglev \
--set loadBalancer.mode=dsr \
--set hostServices.enabled=true \
--set k8sServiceHost=172.16.127.45 \
--set k8sServicePort=6443

Check cilium state

root@test ~ 10:31:38 # kubectl -n kube-system get pod -o wide
NAME                               READY   STATUS    RESTARTS   AGE   IP              NODE   NOMINATED NODE   READINESS GATES
cilium-8wvvp                       1/1     Running   0          32m   172.16.127.45   test   <none>           <none>
cilium-node-init-f5rmz             1/1     Running   0          32m   172.16.127.45   test   <none>           <none>
cilium-operator-7469d54548-4pr9c   1/1     Running   0          32m   172.16.127.45   test   <none>           <none>
coredns-6d4b75cb6d-956sh           1/1     Running   0          33m   10.244.0.146    test   <none>           <none>
coredns-6d4b75cb6d-wjk9p           1/1     Running   0          33m   10.244.0.105    test   <none>           <none>
etcd-test                          1/1     Running   1          34m   172.16.127.45   test   <none>           <none>
kube-apiserver-test                1/1     Running   0          34m   172.16.127.45   test   <none>           <none>
kube-controller-manager-test       1/1     Running   0          34m   172.16.127.45   test   <none>           <none>
kube-scheduler-test                1/1     Running   1          34m   172.16.127.45   test   <none>           <none>

Validate the Setup, use --verbose for full details

root@test ~ 10:40:14 # kubectl exec -it -n kube-system cilium-8wvvp -- cilium status --verbose
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), wait-for-node-init (init), clean-cilium-state (init)
KVStore:                Ok   Disabled
Kubernetes:             Ok   1.24 (v1.24.0) [linux/amd64]
Kubernetes APIs:        ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEgressNATPolicy", "cilium/v2::CiliumLocalRedirectPolicy", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "cilium/v2alpha1::CiliumEndpointSlice", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:   Strict   [eth0 172.16.127.45 (Direct Routing)]
Host firewall:          Disabled
Cilium:                 Ok   1.11.4 (v1.11.4-9d25463)
NodeMonitor:            Disabled
Cilium health daemon:   Ok   
IPAM:                   IPv4: 4/254 allocated from 10.244.0.0/24, 
Allocated addresses:
  10.244.0.105 (kube-system/coredns-6d4b75cb6d-wjk9p)
  10.244.0.146 (kube-system/coredns-6d4b75cb6d-956sh)
  10.244.0.225 (router)
  10.244.0.239 (health)
BandwidthManager:       EDT with BPF   [eth0]
Host Routing:           Legacy
Masquerading:           BPF   [eth0]   10.244.0.0/16 [IPv4: Enabled, IPv6: Disabled]
Clock Source for BPF:   ktime
Controller Status:      30/30 healthy
  Name                                  Last success   Last error   Count   Message
  bpf-map-sync-cilium_ipcache           1s ago         34m13s ago   0       no error   
  bpf-map-sync-cilium_throttle          5s ago         never        0       no error   
  cilium-health-ep                      54s ago        never        0       no error   
  dns-garbage-collector-job             12s ago        never        0       no error   
  endpoint-1338-regeneration-recovery   never          never        0       no error   
  endpoint-428-regeneration-recovery    never          never        0       no error   
  endpoint-723-regeneration-recovery    never          never        0       no error   
  endpoint-815-regeneration-recovery    never          never        0       no error   
  endpoint-gc                           4m13s ago      never        0       no error   
  ipcache-inject-labels                 34m9s ago      34m11s ago   0       no error   
  k8s-heartbeat                         13s ago        never        0       no error   
  mark-k8s-node-as-available            33m55s ago     never        0       no error   
  metricsmap-bpf-prom-sync              2s ago         never        0       no error   
  resolve-identity-1338                 3m55s ago      never        0       no error   
  resolve-identity-428                  3m56s ago      never        0       no error   
  resolve-identity-723                  3m55s ago      never        0       no error   
  resolve-identity-815                  3m55s ago      never        0       no error   
  sync-endpoints-and-host-ips           56s ago        never        0       no error   
  sync-lb-maps-with-k8s-services        33m56s ago     never        0       no error   
  sync-node-with-ciliumnode (test)      34m10s ago     34m11s ago   0       no error   
  sync-policymap-1338                   43s ago        never        0       no error   
  sync-policymap-428                    42s ago        never        0       no error   
  sync-policymap-723                    43s ago        never        0       no error   
  sync-policymap-815                    43s ago        never        0       no error   
  sync-to-k8s-ciliumendpoint (1338)     5s ago         never        0       no error   
  sync-to-k8s-ciliumendpoint (428)      5s ago         never        0       no error   
  sync-to-k8s-ciliumendpoint (723)      5s ago         never        0       no error   
  sync-to-k8s-ciliumendpoint (815)      4s ago         never        0       no error   
  template-dir-watcher                  never          never        0       no error   
  update-k8s-node-annotations           34m11s ago     never        0       no error   
Proxy Status:   OK, ip 10.244.0.225, 0 redirects active on ports 10000-20000
Hubble:         Disabled
KubeProxyReplacement Details:
  Status:                 Strict
  Socket LB Protocols:    TCP, UDP
  Devices:                eth0 172.16.127.45 (Direct Routing)
  Mode:                   DSR
  Backend Selection:      Maglev (Table Size: 16381)
  Session Affinity:       Enabled
  Graceful Termination:   Enabled
  XDP Acceleration:       Disabled
  Services:
  - ClusterIP:      Enabled
  - NodePort:       Enabled (Range: 30000-32767) 
  - LoadBalancer:   Enabled 
  - externalIPs:    Enabled 
  - HostPort:       Enabled
BPF Maps:   dynamic sizing: on (ratio: 0.002500)
  Name                          Size
  Non-TCP connection tracking   65536
  TCP connection tracking       131072
  Endpoint policy               65535
  Events                        64
  IP cache                      512000
  IP masquerading agent         16384
  IPv4 fragmentation            8192
  IPv4 service                  65536
  IPv6 service                  65536
  IPv4 service backend          65536
  IPv6 service backend          65536
  IPv4 service reverse NAT      65536
  IPv6 service reverse NAT      65536
  Metrics                       1024
  NAT                           131072
  Neighbor table                131072
  Global policy                 16384
  Per endpoint policy           65536
  Session affinity              65536
  Signal                        64
  Sockmap                       65535
  Sock reverse NAT              65536
  Tunnel                        65536
Encryption:          Disabled
Cluster health:      1/1 reachable   (2022-05-12T02:41:03Z)
  Name               IP              Node        Endpoints
  test (localhost)   172.16.127.45   reachable   reachable

Verify service

root@test ~ 10:43:41 # kubectl exec -it -n kube-system cilium-8wvvp -- cilium service list
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), wait-for-node-init (init), clean-cilium-state (init)
ID   Frontend          Service Type   Backend                   
1    10.96.0.1:443     ClusterIP      1 => 172.16.127.45:6443   
2    10.96.0.10:53     ClusterIP      1 => 10.244.0.146:53      
                                      2 => 10.244.0.105:53      
3    10.96.0.10:9153   ClusterIP      1 => 10.244.0.146:9153    
                                      2 => 10.244.0.105:9153

At the same time we can verify, using iptables in the host namespace, that no iptables rule for the service is present:

root@test ~ 10:43:17 # iptables-save | grep KUBE-SVC
[ empty line ]

using ipvsadm in the host namespace, that no ipvsadm rule for the service is present:

root@test ~ 10:43:12 # ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn

REF

cilium
install containerd
create-cluster-kubeadm
container-runtimes
helm install

添加新评论