添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

I'm struggling to understand how to correctly configure kube-dns with flannel on kubernetes 1.10 and containerd as the CRI.

kube-dns fails to run, with the following error:

kubectl -n kube-system logs kube-dns-595fdb6c46-9tvn9 -c kubedns
I0424 14:56:34.944476       1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:35.444469       1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
E0424 14:56:35.815863       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host
E0424 14:56:35.815863       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host
I0424 14:56:35.944444       1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:36.444462       1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:36.944507       1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
F0424 14:56:37.444434       1 dns.go:209] Timeout waiting for initialization
kubectl -n kube-system describe pod kube-dns-595fdb6c46-9tvn9
  Type     Reason     Age                 From              Message
  ----     ------     ----                ----              -------
  Warning  Unhealthy  47m (x181 over 3h)  kubelet, worker1  Readiness probe failed: Get http://10.244.0.2:8081/readiness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff    27m (x519 over 3h)  kubelet, worker1  Back-off restarting failed container
  Normal   Killing    17m (x44 over 3h)   kubelet, worker1  Killing container with id containerd://dnsmasq:Container failed liveness probe.. Container will be killed and recreated.
  Warning  Unhealthy  12m (x178 over 3h)  kubelet, worker1  Liveness probe failed: Get http://10.244.0.2:10054/metrics: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff    2m (x855 over 3h)   kubelet, worker1  Back-off restarting failed container

There is indeed no route to the 10.96.0.1 endpoint:

ip route
default via 10.240.0.254 dev ens160 
10.240.0.0/24 dev ens160  proto kernel  scope link  src 10.240.0.21 
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 
10.244.0.0/16 dev cni0  proto kernel  scope link  src 10.244.0.1 
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink 
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 
10.244.4.0/24 via 10.244.4.0 dev flannel.1 onlink 
10.244.5.0/24 via 10.244.5.0 dev flannel.1 onlink

What is responsible for configuring the cluster service address range and associated routes? Is it the container runtime, the overlay network (flannel in this case), or something else? Where should it be configured?

The 10-containerd-net.conflist configures the bridge between the host and my pod network. Can the service network be configured here too?

cat /etc/cni/net.d/10-containerd-net.conflist 
  "cniVersion": "0.3.1",
  "name": "containerd-net",
  "plugins": [
      "type": "bridge",
      "bridge": "cni0",
      "isGateway": true,
      "ipMasq": true,
      "promiscMode": true,
      "ipam": {
        "type": "host-local",
        "subnet": "10.244.0.0/16",
        "routes": [
          { "dst": "0.0.0.0/0" }
      "type": "portmap",
      "capabilities": {"portMappings": true}

Edit:

Just came across this from 2016:

As of a few weeks ago (I forget the release but it was a 1.2.x where x != 0) (#24429) we fixed the routing such that any traffic that arrives at a node destined for a service IP will be handled as if it came to a node port. This means you should be able to set yo static routes for your service cluster IP range to one or more nodes and the nodes will act as bridges. This is the same trick most people do with flannel to bridge the overlay.

It's imperfect but it works. In the future will will need to get more precise with the routing if you want optimal behavior (i.e. not losing the client IP), or we will see more non-kube-proxy implementations of services.

Is that still relevant? Do I need to setup a static route for the service CIDR? Or is the issue actually with kube-proxy rather than flannel or containerd?

My flannel configuration:

cat /etc/cni/net.d/10-flannel.conflist 
  "name": "cbr0",
  "plugins": [
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      "type": "portmap",
      "capabilities": {
        "portMappings": true

And kube-proxy:

[Unit]
Description=Kubernetes Kube Proxy
Documentation=https://github.com/kubernetes/kubernetes
[Service]
ExecStart=/usr/local/bin/kube-proxy \
  --cluster-cidr=10.244.0.0/16 \
  --feature-gates=SupportIPVSProxyMode=true \
  --ipvs-min-sync-period=5s \
  --ipvs-sync-period=5s \
  --ipvs-scheduler=rr \
  --kubeconfig=/etc/kubernetes/kube-proxy.conf \
  --logtostderr=true \
  --master=https://192.168.160.1:6443 \
  --proxy-mode=ipvs \
  --v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target

Edit:

Having looked at the kube-proxy debugging steps, it appears that kube-proxy cannot contact the master. I suspect this is a large part of the problem. I have 3 controller/master nodes behind a HAProxy loadbalancer, which is bound to 192.168.160.1:6443 and forwards round robin to each of the masters on 10.240.0.1[1|2|3]:6443. This can be seen in the output/configs above.

In kube-proxy.service, I have specified --master=192.168.160.1:6443. Why are connections being attempted to port 443? Can I change this - there doesn't seem to be a port flag? Does it need to be port 443 for some reason?

There are two components to this answer, one about running kube-proxy and one about where those :443 URLs are coming from.

First, about kube-proxy: please don't run kube-proxy as a systemd service like that. It is designed to be launched by kubelet in the cluster so that the SDN addresses behave rationally, since they are effectively "fake" addresses. By running kube-proxy outside the control of kubelet, all kinds of weird things are going to happen unless you expend a huge amount of energy to replicate the way that kubelet configures its subordinate docker containers.

Now, about that :443 URL:

E0424 14:56:35.815863 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host

Why are connections being attempted to port 443? Can I change this - there doesn't seem to be a port flag? Does it need to be port 443 for some reason?

That 10.96.0.1 is from the Service CIDR of your cluster, which is (and should be) separate from the Pod CIDR which should be separate from the Node's subnets, etc. The .1 of the cluster's Service CIDR is either reserved (or traditionally allocated) to the kubernetes.default.svc.cluster.local Service, with its one Service.port as 443.

I'm not super sure why the --master flag doesn't supersede the value in /etc/kubernetes/kube-proxy.conf but since that file is very clearly only supposed to be used by kube-proxy, why not just update the value in the file to remove all doubt?

You will run docker, kubelet, and kube-proxy outside of a container, the same way you would run any system daemon, so you just need the bare binaries. I assume that is no longer correct? Which services should be run as a systemd service and which as pods? Hmm, fair one. I'll have to work out how to add the master information to the kubeconfig - everything has been cli flags (perhaps encouraged by being a systemd service). – amb85 Apr 26 '18 at 22:09 I wasn't aware of that paragraph, and cannot possibly fathom why they would write such a thing -- unless one is running a cluster without an overlay network (SDN), in which case, yes, I would suspect it matters a lot less. At this point, I would suggest trying it the kubelet way, and then circle back after you have more experience if running kube-proxy extra-cluster is important to you – Matthew L Daniel Apr 27 '18 at 5:00 I have got this working but now have a busybox error of kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy. Is this likely related to your comments re kube-proxy? If so, I'll give the pod approach a try. – amb85 Apr 27 '18 at 6:03 I mean this in a genuine "have you considered" manner: have you thought about getting a cluster working using something like kubespray, so you can examine the steps it takes and the resulting working cluster, and then try to build one from scratch? Because starting with a hammer and a dream is a bad way to build a house from scratch. – Matthew L Daniel Apr 28 '18 at 4:15

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

site design / logo © 2019 Stack Exchange Inc; user contributions licensed under cc by-sa 3.0 with attribution required. rev 2019.4.29.33501