As with any program, you might run into an error installing or running kubeadm.
This page lists some common failure scenarios and have provided steps that can help you understand and fix the problem.
If your problem is not listed below, please follow the following steps:
If no issue exists, please
open one
and follow the issue template.
If you are unsure about how kubeadm works, you can ask on
Slack
in #kubeadm, or open a question on
StackOverflow
. Please include
relevant tags like
#kubernetes
and
#kubeadm
so folks can help you.
ebtables
or some similar executable not found during installation
If you see the following warnings while running
kubeadm init
[preflight] WARNING: ebtables not found in system path
[preflight] WARNING: ethtool not found in system path
Then you may be missing
ebtables
,
ethtool
or a similar executable on your node. You can install them with the following commands:
For Ubuntu/Debian users, run
apt install ebtables ethtool
.
For CentOS/Fedora users, run
yum install ebtables ethtool
.
kubeadm blocks waiting for control plane during installation
If you notice that
kubeadm init
hangs after printing out the following line:
[apiclient] Created API client, waiting for the control plane to become ready
This may be caused by a number of problems. The most common are:
network connection problems. Check that your machine has full network connectivity before continuing.
the default cgroup driver configuration for the kubelet differs from that used by Docker.
Check the system log file (e.g.
/var/log/message
) or examine the output from
journalctl -u kubelet
. If you see something like the following:
error: failed to run Kubelet: failed to create kubelet:
misconfiguration: kubelet cgroup driver: "systemd"
is different from docker cgroup driver: "cgroupfs"
There are two common ways to fix the cgroup driver problem:
Install Docker again following instructions
here
.
control plane Docker containers are crashlooping or hanging. You can check this by running
docker ps
and investigating each container by running
docker logs
.
kubeadm blocks when removing managed containers
The following could happen if Docker halts and does not remove any Kubernetes-managed containers:
sudo kubeadm reset
[preflight] Running pre-flight checks
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"[reset] Removing kubernetes-managed containers
(block)
A possible solution is to restart the Docker service and then re-run
kubeadm reset
:
Inspecting the logs for docker may also be useful:
journalctl -ul docker
Pods in
RunContainerError
,
CrashLoopBackOff
or
Error
state
Right after
kubeadm init
there should not be any pods in these states.
If there are pods in one of these states
right after
kubeadm init
, please open an
issue in the kubeadm repo.
coredns
(or
kube-dns
) should be in the
Pending
state
until you have deployed the network solution.
If you see Pods in the
RunContainerError
,
CrashLoopBackOff
or
Error
state
after deploying the network solution and nothing happens to
coredns
(or
kube-dns
),
it’s very likely that the Pod Network solution that you installed is somehow broken.
You might have to grant it more RBAC privileges or use a newer version. Please file
an issue in the Pod Network providers’ issue tracker and get the issue triaged there.
If you install a version of Docker older than 1.12.1, remove the
MountFlags=slave
option
when booting
dockerd
with
systemd
and restart
docker
. You can see the MountFlags in
/usr/lib/systemd/system/docker.service
.
MountFlags can interfere with volumes mounted by Kubernetes, and put the Pods in
CrashLoopBackOff
state.
The error happens when Kubernetes does not find
var/run/secrets/kubernetes.io/serviceaccount
files.
coredns
(or
kube-dns
) is stuck in the
Pending
state
This is
expected
and part of the design. kubeadm is network provider-agnostic, so the admin
should
install the pod network solution
of choice. You have to install a Pod Network
before CoreDNS may be deployed fully. Hence the
Pending
state before the network is set up.
HostPort
services do not work
The
HostPort
and
HostIP
functionality is available depending on your Pod Network
provider. Please contact the author of the Pod Network solution to find out whether
HostPort
and
HostIP
functionality are available.
Calico, Canal, and Flannel CNI providers are verified to support HostPort.
If your network provider does not support the portmap CNI plugin, you may need to use the
NodePort feature of
services
or use
HostNetwork=true
.
Pods are not accessible via their Service IP
Many network add-ons do not yet enable
hairpin mode
which allows pods to access themselves via their Service IP. This is an issue related to
CNI
. Please contact the network
add-on provider to get the latest status of their support for hairpin mode.
If you are using VirtualBox (directly or via Vagrant), you will need to
ensure that
hostname -i
returns a routable IP address. By default the first
interface is connected to a non-routable host-only network. A work around
is to modify
/etc/hosts
, see this
Vagrantfile
for an example.
TLS certificate errors
The following error indicates a possible certificate mismatch.
# kubectl get pods
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
Verify that the
$HOME/.kube/config
file contains a valid certificate, and
regenerate a certificate if necessary. The certificates in a kubeconfig file
are base64 encoded. The
base64 -d
command can be used to decode the certificate
and
openssl x509 -text -noout
can be used for viewing the certificate information.
Unset the
KUBECONFIG
environment variable using:
unset KUBECONFIG
Or set it to the default
KUBECONFIG
location:
exportKUBECONFIG=/etc/kubernetes/admin.conf
Another workaround is to overwrite the existing
kubeconfig
for the “admin” user:
Default NIC When using flannel as the pod network in Vagrant
The following error might indicate that something was wrong in the pod network:
Error from server (NotFound): the server could not find the requested resource
If you’re using flannel as the pod network inside Vagrant, then you will have to specify the default interface name for flannel.
Vagrant typically assigns two interfaces to all VMs. The first, for which all hosts are assigned the IP address
10.0.2.15
, is for external traffic that gets NATed.
This may lead to problems with flannel, which defaults to the first interface on a host. This leads to all hosts thinking they have the same public IP address. To prevent this, pass the
--iface eth1
flag to flannel so that the second interface is chosen.
Non-public IP used for containers
In some situations
kubectl logs
and
kubectl run
commands may return with the following errors in an otherwise functional cluster:
Error from server: Get https://10.19.0.41:10250/containerLogs/default/mysql-ddc65b868-glc5m/mysql: dial tcp 10.19.0.41:10250: getsockopt: no route to host
This may be due to Kubernetes using an IP that can not communicate with other IPs on the seemingly same subnet, possibly by policy of the machine provider.
Digital Ocean assigns a public IP to
eth0
as well as a private one to be used internally as anchor for their floating IP feature, yet
kubelet
will pick the latter as the node’s
InternalIP
instead of the public one.
Use
ip addr show
to check for this scenario instead of
ifconfig
because
ifconfig
will not display the offending alias IP address. Alternatively an API endpoint specific to Digital Ocean allows to query for the anchor IP from the droplet:
The workaround is to tell
kubelet
which IP to use using
--node-ip
. When using Digital Ocean, it can be the public one (assigned to
eth0
) or the private one (assigned to
eth1
) should you want to use the optional private network. The
KubeletExtraArgs
section of the kubeadm
NodeRegistrationOptions
structure
can be used for this.
Then restart
kubelet
:
systemctl daemon-reload
systemctl restart kubelet
coredns
pods have
CrashLoopBackOff
or
Error
state
If you have nodes that are running SELinux with an older version of Docker you might experience a scenario
where the
coredns
pods are not starting. To solve that you can try one of the following options:
Modify the
coredns
deployment to set
allowPrivilegeEscalation
to
true
:
kubectl -n kube-system get deployment coredns -o yaml | \
sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' | \
kubectl apply -f -
Another cause for CoreDNS to have
CrashLoopBackOff
is when a CoreDNS Pod deployed in Kubernetes detects a loop.
A number of workarounds
are available to avoid Kubernetes trying to restart the CoreDNS Pod every time CoreDNS detects the loop and exits.
Warning:
Disabling SELinux or setting
allowPrivilegeEscalation
to
true
can compromise
the security of your cluster.
etcd pods restart continually
If you encounter the following error:
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\""
this issue appears if you run CentOS 7 with Docker 1.13.1.84.
This version of Docker can prevent the kubelet from executing into the etcd container.
To work around the issue, choose one of these options:
Roll back to an earlier version of Docker, such as 1.13.1-75
Not possible to pass a comma separated list of values to arguments inside a
--component-extra-args
flag
kubeadm init
flags such as
--component-extra-args
allow you to pass custom arguments to a control-plane
component like the kube-apiserver. However, this mechanism is limited due to the underlying type used for parsing
the values (
mapStringString
).
If you decide to pass an argument that supports multiple, comma-separated values such as
--apiserver-extra-args "enable-admission-plugins=LimitRanger,NamespaceExists"
this flag will fail with
flag: malformed pair, expect string=string
. This happens because the list of arguments for
--apiserver-extra-args
expects
key=value
pairs and in this case
NamespacesExists
is considered
as a key that is missing a value.
Alternativelly, you can try separating the
key=value
pairs like so:
--apiserver-extra-args "enable-admission-plugins=LimitRanger,enable-admission-plugins=NamespaceExists"
but this will result in the key
enable-admission-plugins
only having the value of
NamespaceExists
.
kube-proxy scheduled before node is initialized by cloud-controller-manager
In cloud provider scenarios, kube-proxy can end up being scheduled on new worker nodes before
the cloud-controller-manager has initialized the node addresses. This causes kube-proxy to fail
to pick up the node’s IP address properly and has knock-on effects to the proxy function managing
load balancers.
The following error can be seen in kube-proxy Pods:
server.go:610] Failed to retrieve node IP: host IP unknown; known addresses: []
proxier.go:340] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
A known solution is to patch the kube-proxy DaemonSet to allow scheduling it on control-plane
nodes regardless of their conditions, keeping it off of other nodes until their initial guarding
conditions abate:
The NodeRegistration.Taints field is omitted when marshalling kubeadm configuration
Note: This
issue
only applies to tools that marshal kubeadm types (e.g. to a YAML configuration file). It will be fixed in kubeadm API v1beta2.
By default, kubeadm applies the
role.kubernetes.io/master:NoSchedule
taint to control-plane nodes.
If you prefer kubeadm to not taint the control-plane node, and set
InitConfiguration.NodeRegistration.Taints
to an empty slice,
the field will be omitted when marshalling. When the field is omitted, kubeadm applies the default taint.
There are at least two workarounds:
Use the
role.kubernetes.io/master:PreferNoSchedule
taint instead of an empty slice.
Pods will get scheduled on masters
, unless other nodes have capacity.