Linux Emergency Reset / Shutdown
I was always convinced that a Linux server would not easily break down or get stuck. However, after running an on-premise Kubernetes cluster for a couple of years, on non-optimal kernels or with non-optimal OS-packages on it, I have seen some quite severe issues.
In some cases, a machine started logging lot’s of kernel panics, and then became slow and only partially functioning.
In that state, typing sudo reboot
often did not work. It would just time out on it, and ignore the command.
For this situation we had to ask the team owning the VMWare systems to hit a reset button, to do a hard reset and reboot the server. We could not do that ourselves.
But… after a bit of googling, I found a workaround. A couple of simple commands, which have (almost) exactly the same effect as hitting the reset button!
WARNING: Do not use this if this is not the absolute last thing you can try! Hitting the reset button does not SYNC the disk buffers, and does not do any graceful shutdown actions. This might leave your system in a state in which it either can not boot any more, or it could start a full disk check (fsck) on start. So keep in mind that your server could be offline for a bit before you can reach it again 😉
Type this as root user in a shell on the broken server:
# hard reset
sync
echo 1 > /proc/sys/kernel/sysrq
echo b > /proc/sysrq-trigger
Code language: Bash (bash)
Or you can also shut down the server (which means it will NOT start up any more, until someone hit’s the power button!) – so beware using this on remote servers for which you have no access to the buttons (or virtual buttons in case of VMWare):
# shutdown - turning the server OFF (make sure you can reach it's power button!)
sync
echo 1 > /proc/sys/kernel/sysrq
echo o > /proc/sysrq-trigger
Code language: Bash (bash)
Apparently the above “echo” commands enable processing of the keyboards “sysrq” key, and simulate hitting ALT+sysrq followed by an extra key (while keeping ALT pressed), to indicate to the kernel to do some special stuff. See https://en.wikipedia.org/wiki/Magic_SysRq_key for more info on this, and which keys/actions are possible.
If in above scenario’s the sync
command hangs, then just login to another shell session, and skip the sync
, and do the echo
‘s directly.
If you look at the wiki page, then magic command “s” is also a disk sync, so perhaps you should better execute it like this (will try next time I need this – have not needed this since we moved to the cloud):
# hard reset
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
echo b > /proc/sysrq-trigger
Code language: Bash (bash)