Overview
When using the Liberty image, we've ran into a rare scenario which causes `CrashLoopBackOff` behaviour within our workloads, and requires the pod in question to be killed and re-created.
This looks to be related to the use of no args mkdir within the Liberty image's docker-server.sh script, which affects pod/container restart scenarios.
Container configuration
emptyDir volume, mounted at /tmp
Scenario
Our K8s node temporarily became unavailable. During this time, the scheduler restarted our Liberty-based container twice.
The container logs here were:
| | 2024-10-24 02:38:30.574 | Found mounted TLS certificates, generating keystore |
| | 2024-10-24 02:38:44.892 | Found mounted TLS certificates, generating keystore |
| | 2024-10-24 02:38:44.952 | mkdir: cannot create directory ‘/tmp/certs’: File exists |
| | 2024-10-24 02:39:39.972 | Found mounted TLS certificates, generating keystore |
| | 2024-10-24 02:39:40.233 | mkdir: cannot create directory ‘/tmp/certs’: File exists
On the first restart, the /tmp/certs directory would have been created. However, due to complications with node unavailability, it seems that the following line to clean up the /tmp/certs directory was never executed.
Upon the second restart, the/tmp/certs directory would already exist, due to the our use of an emptyDir volume mount at /tmp. The the pod containing the workload was still persisted on the same node, and was not moved to another node, so the emptyDir was not cleared between executions.
From the K8s docs, on emptyDir:
When a Pod is removed from a node for any reason, the data in the emptyDir is deleted permanently.
This led to CrashLoopBackOff behaviour until the pod was manually killed, and a new pod was created.
Suggestion
To prevent the above scenario from occurring in workloads with similar configurations, would it make sense to update docker-server.sh to call mkdir /tmp/certs with mkdir -p /tmp/certs?
It looks as though this may be a bug, as mkdir -p was called within other areas of the docker-server.sh script.
Overview
When using the Liberty image, we've ran into a rare scenario which causes `CrashLoopBackOff` behaviour within our workloads, and requires the pod in question to be killed and re-created.This looks to be related to the use of no args
mkdirwithin the Liberty image'sdocker-server.shscript, which affects pod/container restart scenarios.Container configuration
emptyDirvolume, mounted at/tmpScenario
Our K8s node temporarily became unavailable. During this time, the scheduler restarted our Liberty-based container twice.The container logs here were:
On the first restart, the
/tmp/certsdirectory would have been created. However, due to complications with node unavailability, it seems that the following line to clean up the/tmp/certsdirectory was never executed.Upon the second restart, the
/tmp/certsdirectory would already exist, due to the our use of anemptyDirvolume mount at/tmp. The the pod containing the workload was still persisted on the same node, and was not moved to another node, so theemptyDirwas not cleared between executions.From the K8s docs, on
emptyDir:This led to
CrashLoopBackOffbehaviour until the pod was manually killed, and a new pod was created.Suggestion
To prevent the above scenario from occurring in workloads with similar configurations, would it make sense to update
docker-server.shto callmkdir /tmp/certswithmkdir -p /tmp/certs?It looks as though this may be a bug, as
mkdir -pwas called within other areas of thedocker-server.shscript.