kubeadm 1.8.0 init fails with "/var/lib/kubelet is not empty" #53356

wjrogers · 2017-10-03T01:35:19Z

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
On a fresh Ubuntu 16.04.3 system booted from the official cloud image, kubeadm init fails because /var/lib/kubelet exists.

root@kubemaster:~# kubeadm init
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.8.0
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks
[preflight] Some fatal errors occurred:
        /var/lib/kubelet is not empty
[preflight] If you know what you are doing, you can skip pre-flight checks with `--skip-preflight-checks`

What you expected to happen:
kubeadm successfully initializes the cluster

How to reproduce it (as minimally and precisely as possible):

Boot a new VM from the latest Ubuntu Cloud image
apt-get install -y apt-transport-https docker.io
Follow the kubeadm installation instructions
kubeadm init

Anything else we need to know?:
Contents of /var/lib/kubelet:

/var/lib/kubelet
/var/lib/kubelet/pki
/var/lib/kubelet/pki/kubelet.crt
/var/lib/kubelet/pki/kubelet.key

Environment:

Kubernetes version (use kubectl version):

root@kubemaster:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:57:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server localhost:8080 was refused - did you specify the right host or port?

root@kubemaster:~# apt search kube
Sorting... Done
Full Text Search... Done
kubeadm/kubernetes-xenial,now 1.8.0-01 amd64 [installed]
  Kubernetes Cluster Bootstrapping Tool

kubectl/kubernetes-xenial,now 1.8.0-00 amd64 [installed,automatic]
  Kubernetes Command Line Tool

kubelet/kubernetes-xenial,now 1.8.0-00 amd64 [installed,automatic]
  Kubernetes Node Agent

kubernetes-cni/kubernetes-xenial,now 0.5.1-00 amd64 [installed,automatic]
  Kubernetes CNI

Cloud provider or hardware configuration: Hyper-V generation 1 virtual machine

OS (e.g. from /etc/os-release):

NAME="Ubuntu"
VERSION="16.04.3 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.3 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

Kernel (e.g. uname -a): Linux kubemaster 4.4.0-96-generic #119-Ubuntu SMP Tue Sep 12 14:59:54 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Install tools: none
Others: none

The text was updated successfully, but these errors were encountered:

liggitt · 2017-10-03T06:07:08Z

Cause

This is related to the location where the kubelet persists its certificates while running in the background, waiting for config:

the kubeadm 1.8.0-0 package accidentally used a location that is erased on reboot
the kubeadm 1.8.0-1 package corrected that to use an appropriate location for the certificates, but starting the kubelet before running kubeadm init causes files to be generated into the folder kubeadm expects to be empty

since kubeadm expects there to be a running kubelet prior to kubeadm init being called, it shouldn't expect the kubelet's --root-dir folder to be empty

Workaround

if you are scripting bootstrapping a known clean machine, there are a few possible workarounds until #53317 is released in 1.8.1 (any of the following work around this issue):

verify this is the only preflight check failure, then run the init or join command with --skip-preflight-checks=true
stop the kubelet service and remove /var/lib/kubelet/pki prior to running the init or join command
run kubeadm reset prior to running init or join

Resolution

addressed as part of #53317

liggitt · 2017-10-03T06:07:20Z

cc @kubernetes/sig-cluster-lifecycle-bugs @luxas

surajssd · 2017-10-03T06:56:14Z

@wjrogers I did kubeadm reset && kubeadm init it worked for me!

liggitt · 2017-10-03T13:27:44Z

@wjrogers I did kubeadm reset && kubeadm init it worked for me!

On a known clean environment, that's an ok workaround for the moment, but the reason kubeadm init checks for existing files/folders is to avoid stomping an existing installation, and running kubeadm reset would negate those checks.

jpetazzo · 2017-10-03T13:34:24Z

(If you wonder why the deployment scripts that you wrote yesterday stopped working this morning – this is why! :-))

For the time being, before running kubeadmin init or kubeadmin join, I'll:

check if /etc/kubernetes/kubelet.conf or /etc/kubernetes/admin.conf exists
if they don't exist, stop kubelet, and wipe out /var/lib/kubelet/pki

I'd love to know if there are foreseeable pitfalls in this approach.

Thanks! <3

liggitt · 2017-10-03T14:01:39Z

(If you wonder why the deployment scripts that you wrote yesterday stopped working this morning – this is why! :-))

apologies for the churn. we traded a data loss issue on reboot for this false-positive fail safe issue. working with @kubernetes/sig-cluster-lifecycle-bugs to determine the best way to resolve this.

jpetazzo · 2017-10-03T14:10:56Z

No worries, and thanks for the super-quick turnaround time. Much appreciated!
(I didn't mean to sound snarky or anything; sorry if that how it came out!)

vglisin · 2017-10-03T14:44:52Z

This guys are posting new versions without properly testing it. Can you imagine brand new centos 7 installation and then getting: "kubeadm 1.8.0 init fails with "/var/lib/kubelet is not empty"
how disgusting that was and what I was thinking about responsibility and professionalism here?

vglisin · 2017-10-03T14:49:31Z

Everything was done by: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/ but now it is even getting better: The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused
you should remove 1.8 and put it do "back to the drawing board" stadium.
It is useless even to test this 1.8

liggitt · 2017-10-03T14:50:06Z

I apologize for the issue. Our top priority was to prevent data loss and issues upgrading existing installations that would disrupt already-running workloads. We are working now to resolve this issue with fresh installations.

vglisin · 2017-10-03T14:50:29Z

Wish you luck.

naseemkullah · 2017-10-03T14:58:33Z

ETA for a fix for this? should i hard code this deletion of contents of /var/lib/kubelet prior to running kubeadm init in my automation scripts? or simply add the ignore pre flight checks arguement will do the trick?

liggitt · 2017-10-03T15:00:10Z

there are plenty of other pre-flight checks that are valuable and you don't want to skip, it's just the check for an empty /var/lib/kubelet that is incorrect.

if you are scripting bootstrapping a known clean machine, there are a couple possible workarounds until #53317 is released in 1.8.1:

run kubeadm init/kubeadm join skipping preflight checks
stop the kubelet service, remove /var/lib/kubelet/pki, then run kubeadm init/kubeadm join

naseemkullah · 2017-10-03T15:05:53Z

ok just to confirm, moving forward, ideally kubeadm init should not be checking for contents in /var/lib/kubelet (i.e. having contents is totally fine), or should installing kubelet not populate contenst of /var/lib/kubelet ?

liggitt · 2017-10-03T15:08:03Z

ok just to confirm, moving forward, ideally kubeadm init should not be checking for contents in /var/lib/kubelet (i.e. having contents is totally fine)

Correct. The kubeadm instructions start the kubelet and let it run in a crash loop in the background, waiting for config. In that state, the kubelet is free to write to its directory containing state (/var/lib/kubelet), so kubeadm should not require that directory to be empty in order to run kubeadm init

jpetazzo · 2017-10-03T15:45:11Z

(Reposting my earlier tip since it worked like a charm for my automated deployment scripts.)

For the time being, before running kubeadmin init or kubeadmin join, I'll:

check if /etc/kubernetes/kubelet.conf or /etc/kubernetes/admin.conf exists
if they don't exist, stop kubelet (systemctl stop kubelet), and wipe out /var/lib/kubelet/pki (rm -rf /var/lib/kubelet/pki)

This might or might not be appropriate for your use-cases, but for mine it works great (I'm automatically deploying hundreds of k8s clusters of training purposes). Stopping kubelet is necessary to avoid race conditions where it would recreate the pki directory before you run kubeadm.

liggitt · 2017-10-04T14:49:59Z

After that i'm tried to weave it failed with error
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
The connection to the server localhost:8080 was refused

@hemaprasad, be sure to follow the instructions to add the admin kubeconfig file so kubectl can communicate with the initialized API server:

To start using your cluster, you need to run (as a regular user):
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

liggitt · 2017-10-04T14:56:54Z

For anyone following this thread, if you encounter issues other than the /var/lib/kubelet is not empty message, please report them in a separate issue (after searching for existing reports), to ensure they are triaged and routed appropriately and resolved as quickly as possible. Thanks.

hemaprasad · 2017-10-04T15:06:06Z

@liggiit, thank you for your quick response
now i'm getting

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
Unable to connect to the server: Forbidden
W1004 20:34:18.925025 20485 factory_object_mapping.go:423] Failed to download OpenAPI (Get https://192.168.1.3:6443/swagger-2.0.0.pb-v1: Forbidden), falling back to swagger
Unable to connect to the server: Forbidden

jpetazzo · 2017-10-04T15:18:23Z

@hemaprasad: you seem to be hitting a totally different problem than the one related to /var/lib/kubelet/pki; can you please open a different issue? Thank you so much!

hemaprasad · 2017-10-04T15:56:20Z

@jpetazzo,even I also faced the same issue'var/lib/kubelet is not empty' after I used the command "kubeadm reset && kubeadm init" it want to hanged state.(if want u can see my above threads).to move to another step I'm trying to install the weave it was failed saying that
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
The connection to the server localhost:8080 was refused - did you specify the right host or port?
W1004 18:53:33.930337 16144 factory_object_mapping.go:423] Failed to download OpenAPI (Get http://localhost:8080/swagger-2.0.0.pb-v1: dial tcp [::1]:8080: getsockopt: connection refused), falling back to swagger
The connection to the server localhost:8080 was refused - did you specify the right host or port? Can you please help me to solve the issue
Thank in advance

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add /var/lib/kubelet error to known issues Document #53356 in known issues for 1.8.0

Automatic merge from submit-queue (batch tested with PRs 53317, 52186). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Change default --cert-dir for kubelet to a non-transient location The default kubelet `--cert-dir` location is `/var/run/kubernetes`, which is automatically erased on reboot on many platforms. As of 1.8.0, kubelet TLS bootstrapping and cert rotation now persist files in `--cert-dir`, this should default to a non-transient location. Default it to the `pki` subfolder of the default `--root-dir` Fixes #53288 Additionally, since `kubeadm` expects a running (albeit crashlooping) kubelet prior to running `kubeadm init` or `kubeadm join`, and was using the default `--root-dir` of `/var/lib/kubelet`, it should not expect that folder to be empty as a pre-init check. Fixes #53356 ```release-note kubelet: `--cert-dir` now defaults to `/var/lib/kubelet/pki`, in order to ensure bootstrapped and rotated certificates persist beyond a reboot. ```

liggitt · 2017-10-04T17:39:42Z

pick to 1.8 branch is #53448
reopening to track issue until resolved in 1.8.x

jpetazzo · 2017-10-04T18:11:49Z

@hemaprasad if you get The connection to the server localhost:8080 was refused it means that your kube config file is missing. By default, kubectl tries to connect to localhost:8080. But when you deploy a cluster with kubeadm, the API will be listening on port 6443. So you should see an error relative to port 6443, nor port 8080. If you see an error related to port 8080, it means that you haven't copied the configuration file (admin.conf) generated by kubeadm. I hope that makes sense.

My piece of advice would be to destroy the machines that you are currently using and restart from scratch (assuming you are using VMs), making sure that you follow each step carefully. Very often it solved the problem for me, because I had forgotten a step, or deviated from the instructions. Especially if something fails, it is often easier to restart from scratch, until you understand fully the technology and each underlying command. Good luck!

vglisin · 2017-10-05T07:14:38Z

Hello hemaprasad,
The problem is that when people know it is hard for them to set their mind to explain things to people who are new in a same area. Explanations given by jpetazzo and liggitt are ok but they are stepping over some steps which they assume but we as newbee don't.
Now, to have kubernetes operational we all should:
-stop the kubelet service, remove /var/lib/kubelet/pki, then run kubeadm init (without starting kubelet service first)
Same goes for nodes:
-stop the kubelet service, remove /var/lib/kubelet/pki, then run kubeadm join (with a token saved from kubeadm init)
Now... how to make dashboard accessible from my workstation which is out of kubernetes cluster?
Suppose for this I should go to some other thread?

jpetazzo · 2017-10-05T09:56:19Z

@vglisin your explanations are correct! As for your dashboard question, I have no idea, but I would recommend to open another github issue for that (since the title of this issue is "kubeadm 1.8.0 init fails with "/var/lib/kubelet is not empty""). Thank you!

vglisin · 2017-10-05T12:19:58Z

Thank you for fast reply. Also I am trying these innocent commands from documentations:
scp root@:/etc/kubernetes/admin.conf .
./kubectl --kubeconfig ./admin.conf get nodes
but I am getting:
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")

My guess is that is also the same reason why dashboard over proxy is not working.
Some help?

jpetazzo · 2017-10-05T12:32:25Z

@vglisin It looks like you are copying admin.conf from a remote host. If it is the case, you also need to edit admin.conf to point to that remote host (instead of localhost). The tutorials and documentations imply that people will run kubectl directly from the master.

vglisin · 2017-10-05T13:17:30Z

Nice this is... so why into same documentation exist this:
Controlling your cluster from machines other than the master
In order to get a kubectl on some other computer (e.g. laptop) to talk to your cluster, you need to copy the administrator kubeconfig file from your master to your workstation like this...

Never mind, it is obviously long way to go till this function even in test environment :(

vglisin · 2017-10-05T14:25:01Z

Hello all.
Finally dashboard starts working from master node. Please, do NOT use 1GB-2GB on master node, it would behave erratically and services which you expect to start normally... won't. Use at least 4GB and then also be patient. http://localhost:8001/ui/ will respond with strange formed URL so please use copy-paste:
http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/

That's for today. Hope this will help somebody.

k8s-github-robot · 2017-10-12T17:34:04Z

[MILESTONENOTIFIER] Milestone Issue Needs Approval

@liggitt @luxas @mikedanese @pipejakob @wjrogers @kubernetes/sig-cluster-lifecycle-bugs

Action required: This issue must have the status/approved-for-milestone label applied by a SIG maintainer.

Issue Labels

sig/cluster-lifecycle: Issue will be escalated to these SIGs if needed.
priority/critical-urgent: Never automatically move out of a release milestone; continually escalate to contributor and SIG through all available channels.
kind/bug: Fixes a bug discovered during the current release.

Help

dkirrane · 2017-10-12T18:19:18Z

@jpetazzo I've tried you're workaround

systemctl stop kubelet
rm -Rf /var/lib/kubelet/pki
kubeadm init

But I'm then on kubeadm init I'm hitting this:

[kubelet-check] It seems like the kubelet isn't running or healthy.

liggitt · 2017-10-12T18:23:56Z

this is fixed in v1.8.1

dkirrane · 2017-10-13T09:31:28Z

After installing 1.8.1 kubelet won't start:

kubelet: error: unable to load client CA file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory

vglisin · 2017-10-17T10:10:50Z

Not only what you mentioned dkirrane, there is a problem with:
https://kubernetes.io/docs/admin/kubeadm/#manage-tokens
They never wrote ANY detail how to add a new Node in a Cluster when your initial Token (which is valid for ... unknown) expired! Not a word. Only if you want to test-and-try yourself. Lovely beta troubleshooting job this is.
Documentation should be much, much better. Especially for somebody not working on Kubernetes dev. team.

vglisin · 2017-10-17T10:22:22Z

If you try with this you will see that there is no way to use IP address mentioned:
(Text is from: https://kubernetes.io/docs/admin/kubeadm/#manage-tokens)
"To implement this automation, you must know the IP address that the master will have after it is started.....
kubeadm can generate a token for you:
kubeadm token generate
Start both the master node and the worker nodes concurrently with this token."

Where to use mentioned IP address?
How to make any useful Node cluster joining after Token has expired?

vglisin · 2017-10-17T13:55:51Z

Hello, solved a Token problem.
Documentation really needs to be updated considering newbee mind. Otherwise it is source of frustration. Currently nodes are ok and operational.
If this above helps you... ok.

rajendragosavi · 2019-06-23T16:09:29Z

This issue is fixed in the latest version of kubernetes. Just reset the kubeadm and remove the /var/lib/etcd as kubeadm is expecting it to be empty.

Then start the kubeadm init.
It should work fine.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 3, 2017

k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Oct 3, 2017

k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. kind/bug Categorizes issue or PR as related to a bug. labels Oct 3, 2017

k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Oct 3, 2017

liggitt mentioned this issue Oct 3, 2017

Change default --cert-dir for kubelet to a non-transient location #53317

Merged

liggitt added this to the v1.8 milestone Oct 3, 2017

liggitt added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Oct 3, 2017

k8s-github-robot added the milestone-labels-complete label Oct 3, 2017

liggitt assigned liggitt, pipejakob and luxas Oct 3, 2017

This was referenced Oct 3, 2017

Add leader election mechanism to virt-controller kubevirt/kubevirt#461

Merged

kubelet failed to pass pre-flight checks kubevirt/kubevirt#483

Closed

liggitt mentioned this issue Oct 4, 2017

v1.8.0 known issues / FAQ accumulator #53004

Closed

liggitt mentioned this issue Oct 4, 2017

kubeadm preflight-checks on /var/lib/kubelet/ kubernetes/kubeadm#482

Closed

k8s-github-robot closed this as completed in #53317 Oct 4, 2017

liggitt reopened this Oct 4, 2017

eparis removed the milestone-labels-complete label Oct 5, 2017

k8s-github-robot added the milestone/needs-approval label Oct 5, 2017

liggitt closed this as completed Oct 12, 2017

kubeadm 1.8.0 init fails with "/var/lib/kubelet is not empty" #53356

kubeadm 1.8.0 init fails with "/var/lib/kubelet is not empty" #53356

Comments

wjrogers commented Oct 3, 2017

liggitt commented Oct 3, 2017 • edited

Cause

Workaround

Resolution

liggitt commented Oct 3, 2017

surajssd commented Oct 3, 2017

liggitt commented Oct 3, 2017

jpetazzo commented Oct 3, 2017 • edited

liggitt commented Oct 3, 2017

jpetazzo commented Oct 3, 2017

vglisin commented Oct 3, 2017

vglisin commented Oct 3, 2017 • edited

liggitt commented Oct 3, 2017

vglisin commented Oct 3, 2017

naseemkullah commented Oct 3, 2017

liggitt commented Oct 3, 2017 • edited

naseemkullah commented Oct 3, 2017

liggitt commented Oct 3, 2017

jpetazzo commented Oct 3, 2017

liggitt commented Oct 4, 2017

liggitt commented Oct 4, 2017 • edited

hemaprasad commented Oct 4, 2017

jpetazzo commented Oct 4, 2017

hemaprasad commented Oct 4, 2017

liggitt commented Oct 4, 2017

jpetazzo commented Oct 4, 2017

vglisin commented Oct 5, 2017 • edited

jpetazzo commented Oct 5, 2017

vglisin commented Oct 5, 2017

jpetazzo commented Oct 5, 2017

vglisin commented Oct 5, 2017

vglisin commented Oct 5, 2017

k8s-github-robot commented Oct 12, 2017

dkirrane commented Oct 12, 2017

liggitt commented Oct 12, 2017

dkirrane commented Oct 13, 2017

vglisin commented Oct 17, 2017

vglisin commented Oct 17, 2017

vglisin commented Oct 17, 2017

rajendragosavi commented Jun 23, 2019

liggitt commented Oct 3, 2017 •

edited

jpetazzo commented Oct 3, 2017 •

edited

vglisin commented Oct 3, 2017 •

edited

liggitt commented Oct 3, 2017 •

edited

liggitt commented Oct 4, 2017 •

edited

vglisin commented Oct 5, 2017 •

edited