Set up init process for your container in Kubernetes
If you create your a in Kubernetes, you may see a process tree like this in your container:
$ kubectl exec <YOUR_POD> -n <YOUR_NAMESPACE> -c <YOUR_CONTAINER> -it -- bash
...
...
$ ps -aef --forest
UID PID PPID C STIME TTY TIME CMD
app 59 0 0 20:59 pts/1 00:00:00 bash
app 78 59 0 20:59 pts/1 00:00:00 \_ ps -aef --forest
app 1 0 0 20:57 ? 00:00:00 /bin/bash
app 7 1 7 20:57 ? 00:02:29 /java -Dcom........
You can see that bash
(not the pts one) is PID 1
and is the parent of your Java program. If you don’t use bash, you may see your program as PID 1
.
In linux world, PID 1
is the init process. When you run a docker container, PID 1
is whatever you set as ENTRYPOINT
in your docker file.
What does PID 1 do?
As an init process, PID 1
is supposed to do 2 things:
- signal forwarding:
PID 1
should catch signals such asSIGTERM
and forward to its child processes. - reap zombies: when a zombie process is created, it is re-parented to
PID 1
.PID 1
should reap the zombie processes.
Signal forward
Let’s say you are running a bash script as the init process in your container. By default bash script does not catch signals and therefore not forward them to child processes. So if you run docker stop
, your container will be stopped after the grace period.
When you run docker stop
, it does 2 things:
- send
SIGTERM
to your init process - wait for the process to terminate, or wait until the grace period(default 10 seconds) and then send
SIGKILL
Ideally your init program should handle SIGTERM
and do some cleanup work. For example, your init program should forward them to the child processes to make sure they are terminated properly. You can use trap
command to catch and handle signals, for example:
#!/bin/bash
handler() {
kill -TERM "$child" 2>/dev/null
}
trap handler SIGTERM
/my_command &
child=$!
wait "$child"
Reap zombies
A zombie process is created when
- it finishes
- its parent is gone or its parent does not reap it
You can easily produce a zombie process. For example,
docker run -d --rm --name my_app centos bash -c "sleep 10 & exec sleep 100"
exec
will replace the current shell with the command. So you can see bash is supposed to be PID 1
but is replaced by sleep 100
(after 10 seconds):
UID PID PPID C STIME TTY TIME CMD
app 1 0 0 12:14 ? 00:00:00 sleep 100
app 7 1 0 12:14 ? 00:00:00 [sleep] <defunct>
Because bash(which was the parent of sleep 10
) is gone, so after sleep 10
finishes, it becomes defunct
and re-parented to PID 1
. Now it is a zombie process and because sleep 100
(PID 1
) does not reap zombies, sleep 10
will stay there forever until your container dies.
Use tini as the init process
If you run your program as PID 1
, your program should handle signals properly and reap zombies, which could be a cumbersome task if you need to do this for every program. Fortunately, there is a small program called tini
which does these for you. There are two ways to use it:
- Run
docker run --init
. It is included in docker now.
$ ps -ef
UID PID PPID C STIME TTY TIME CMD
app 1 0 0 07:48 pts/0 00:00:00 /sbin/docker-init -- /bin/bash
app 7 1 0 07:48 pts/0 00:00:00 /bin/bash
app 16 7 0 07:50 pts/0 00:00:00 ps -ef
You can see that docker-init
is PID 1
now.
- If you are using other container runtime such as
containerd
, you can addtini
in your pod spec. Simply use it to start your program:
command:
- /bin/tini
- --
- /bin/bash
- .....
After running tini
, you can see your processes in your container:
$ ps -aef --forest
UID PID PPID C STIME TTY TIME CMD
app 253 0 0 21:10 pts/0 00:00:00 bash
app 334 253 0 21:10 pts/0 00:00:00 \_ ps -aef --forest
app 1 0 0 21:09 ? 00:00:00 tini
app 6 1 0 21:09 ? 00:00:00 /bin/bash
app 8 6 0 21:09 ? 00:00:00 \_ sleep 365d
app 100 1 99 21:09 ? 00:01:22 /bin/java -Dcom....
Now tini
is PID 1
and other processes are parented to it.
Tini
has other use cases too. If you are interested, you can check the official website(detail explanation here).
Conclusions
- A common pitfall of containerization is not handling the signals and zombie processes properly.
- Use
tini
can help you to set up init process properly in your containers. If you don’t know if you should use it, simply use it and it will not do any harm.