添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

I'm working on installing a three node kubernetes cluster on a CentOS 7 with flannel for a some time, however the CoreDNS pods cannot connect to API server and constantly restarting.

The reference HowTo document I followed is here .

What Have I Done so Far?

  • Disabled SELinux,
  • Disabled firewalld ,
  • Enabled br_netfilter , bridge-nf-call-iptables ,
  • Installed kubernetes on three nodes, set-up master's pod network with flannel default network ( 10.244.0.0/16 ),
  • Installed other two nodes, and joined the master.
  • Deployed flannel,
  • Configured Docker's BIP to use flannel default per-node subnet and network.
  • Current State

  • The kubelet works and the cluster reports nodes as ready.
  • The Cluster can schedule and migrate pods, so CoreDNS are spawned on nodes.
  • Flannel network is connected. No logs in containers and I can ping 10.244.0.0/24 networks from node to node.
  • Kubernetes can deploy and run arbitrary pods (Tried shell demo , and can access its shell via kubectl even if the container is on a different node.
  • However, since DNS is not working, they cannot resolve any IP addresses.
  • What is the Problem?

  • CoreDNS pods report that they cannot connect to API server with error:

    Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
    
  • I cannot see 10.96.0.0 routes in routing tables:

    default via 172.16.0.1 dev eth0 proto static metric 100 
    10.1.0.0/24 dev eth1 proto kernel scope link src 10.1.0.202 metric 101 
    10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 
    10.244.1.0/24 dev docker0 proto kernel scope link src 10.244.1.1 
    10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1 
    10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 
    172.16.0.0/16 dev eth0 proto kernel scope link src 172.16.0.202 metric 100
    

    Additional Info

  • Cluster init is done with the command kubeadm init --apiserver-advertise-address=172.16.0.201 --pod-network-cidr=10.244.0.0/16.
  • I have torn down the cluster and rebuilt with 1.12.0 The problem still persists.
  • The workaround in Kubernetes documentation doesn't work.
  • Problem is present and same both with 1.11-3and 1.12-0 CentOS7 packages.
  • Progress so Far

  • Downgraded Kubernetes to 1.11.3-0.
  • Re-initialized Kubernetes with kubeadm init --apiserver-advertise-address=172.16.0.201 --pod-network-cidr=10.244.0.0/16, since the server has another external IP which cannot be accessed via other hosts, and Kubernetes tends to select that IP as API Server IP. --pod-network-cidr is mandated by flannel.
  • Resulting iptables -L output after initialization with no joined nodes

    Chain INPUT (policy ACCEPT)
    target     prot opt source               destination         
    KUBE-EXTERNAL-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */
    KUBE-FIREWALL  all  --  anywhere             anywhere            
    Chain FORWARD (policy ACCEPT)
    target     prot opt source               destination         
    KUBE-FORWARD  all  --  anywhere             anywhere             /* kubernetes forwarding rules */
    DOCKER-USER  all  --  anywhere             anywhere            
    Chain OUTPUT (policy ACCEPT)
    target     prot opt source               destination         
    KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
    KUBE-FIREWALL  all  --  anywhere             anywhere            
    Chain DOCKER-USER (1 references)
    target     prot opt source               destination         
    RETURN     all  --  anywhere             anywhere            
    Chain KUBE-EXTERNAL-SERVICES (1 references)
    target     prot opt source               destination         
    Chain KUBE-FIREWALL (2 references)
    target     prot opt source               destination         
    DROP       all  --  anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
    Chain KUBE-FORWARD (1 references)
    target     prot opt source               destination         
    ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding rules */ mark match 0x4000/0x4000
    Chain KUBE-SERVICES (1 references)
    target     prot opt source               destination         
    REJECT     udp  --  anywhere             10.96.0.10           /* kube-system/kube-dns:dns has no endpoints */ udp dpt:domain reject-with icmp-port-unreachable
    REJECT     tcp  --  anywhere             10.96.0.10           /* kube-system/kube-dns:dns-tcp has no endpoints */ tcp dpt:domain reject-with icmp-port-unreachable
    
  • Looks like API Server is deployed as it should

    $ kubectl get svc kubernetes -o=yaml
    apiVersion: v1
    kind: Service
    metadata:
      creationTimestamp: 2018-10-25T06:58:46Z
      labels:
        component: apiserver
        provider: kubernetes
      name: kubernetes
      namespace: default
      resourceVersion: "6"
      selfLink: /api/v1/namespaces/default/services/kubernetes
      uid: 6b3e4099-d823-11e8-8264-a6f3f1f622f3
    spec:
      clusterIP: 10.96.0.1
      ports:
      - name: https
        port: 443
        protocol: TCP
        targetPort: 6443
      sessionAffinity: None
      type: ClusterIP
    status:
      loadBalancer: {}
    
  • Then I've applied flannel network pod with

    kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
    
  • As soon as I apply the flannel network, CoreDNS pods start and start to give the same error:

    Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500\u0026resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
    
  • I've found out that flanneld is using the wrong network interface, and changed it in the kube-flannel.yml file before deployment. However the outcome is still the same.

  • Any help is greatly appreciated.

    AFAIK Flannel's network addresses are hard-coded. Then you have to configure flannel to fit to kubernetes. I'll try that next. Currently someone else is also working on it, so I cannot rebuild the cluster. – bayindirh Oct 24 '18 at 13:35 I don't know if your problem is related to this but there is a problem between k8s 1.12 version and flannel. You can read this: github.com/coreos/flannel/issues/1044 – Yavuz Sert Oct 24 '18 at 14:16

    I've solved the problem. The cause is a mixture of inexperience, lack of documentation and some old, no-longer-correct information.

    The guy who will be using the installation told me that Docker's bridge needs to be in the same subnet with the Flannel network, hence I edited Docker's bridge network.

    However, when Kubernetes started to use CNI, this requirement not only became unnecessary, but plain wrong. Having both cni0 and docker0 on the same network with same IP address always felt wrong, but since I'm a complete beginner in Kubernetes, I ignored my hunch.

    As a result, I reset Docker's network to its default, tore down the cluster and rebuilt it. Now everything is working as it should.

    TL;DR: Never, ever touch Docker's network parameters if you are setting up a recent Kubernetes release. Just install Docker, init the Kubernetes and deploy Flannel. Kubernetes and CNI will take care of container to Flannel transport.

    This is basically saying that your coredns pod cannot talk to the kube-apiserver. The kube-apiserver is exposed in the pod through these environment variables: KUBERNETES_SERVICE_HOST=10.96.0.1 and KUBERNETES_SERVICE_PORT_HTTPS=443

    I believe that the routes that you posted are routes on the host since this is what you get when you run ip routes in pod container:

    root@xxxx-xxxxxxxxxx-xxxxx:/# ip route
    default via 169.254.1.1 dev eth0
    169.254.1.1 dev eth0  scope link
    root@xxxx-xxxxxxxxxx-xxxxx:/#
    

    In any case, you wouldn't see 10.96.0.1 since that's exposed in the cluster using iptables. So what is that address? It happens that is a service in the default namespace called kubernetes. That service's ClusterIP is 10.96.0.1 and it's listening on port 443, it also maps to targetPort 6443 which is where your kube-apiserver is running.

    Since you can deploy pods, etc. It seems like the kube-apiserver is not down and that's not your problem. So most likely you are missing that service (or there's some iptable rule not allowing you to connect to it). You can see it here, for example:

    $ kubectl get svc kubernetes
    NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
    kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   92d
    

    The full output is something like this:

    $ kubectl get svc kubernetes -o=yaml
    apiVersion: v1
    kind: Service
    metadata:
      creationTimestamp: 2018-07-23T21:10:22Z
      labels:
        component: apiserver
        provider: kubernetes
      name: kubernetes
      namespace: default
      resourceVersion: "24"
      selfLink: /api/v1/namespaces/default/services/kubernetes
      uid: xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx
    spec:
      clusterIP: 10.96.0.1
      ports:
      - name: https
        port: 443
        protocol: TCP
        targetPort: 6443
      sessionAffinity: None
      type: ClusterIP
    status:
      loadBalancer: {} 
    

    So if you are missing it, you can create it like this:

    cat <<EOF
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        component: apiserver
        provider: kubernetes
      name: kubernetes
      namespace: default
    spec:
      clusterIP: 10.96.0.1
      ports:
      - name: https
        port: 443
        protocol: TCP
        targetPort: 6443
      sessionAffinity: None
      type: ClusterIP
    EOF | kubectl apply -f -
    

    I met this before. The Firewalld had opened the port 6443 to my real LAN IPs, but it still disables others, so I tried to shut down the Firewall via the CMD :

    systemctl stop firewalld
    

    It works and all exceptions that coming from kubectl logs were gone, so the root cause is the firewall rules of your linux servers.

    Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.

    site design / logo © 2019 Stack Exchange Inc; user contributions licensed under cc by-sa 3.0 with attribution required. rev 2019.4.29.33475
  •