Rescuing Linux when it won't start
Published: 01 Aug 2007 15:41 BST
No matter how much you adore your Linux machine, there will come a time when you will have to rescue your installation.
Yes, even a Linux machine could suffer from a disaster: whether it's because of a corrupt video configuration, a kernel update gone wrong, or a misconfigured init script, it's inevitable. I've seen it happen on a number of occasions — even on my own machines, mostly from corrupt X configurations — and it's frustrating.
The best rescue plan, in my opinion, doesn't have to involve reinstalling. Sometimes the best rescue plan doesn't even involve booting up a rescue disk. This article is going to offer up some tips and tricks on how to avoid failure and help you create the tools you need to recover a dead Linux machine.
Start with the right runlevel
After installing a new Linux system, I immediately take steps to ensure disaster won't strike easily. One of the first steps is to edit the system's runlevel. The runlevel tells the system how far to take the boot process. The runlevel is broken down into six levels:
- 0: Halt (do not set initdefault to this)
- 1: Single user mode
- 2: Multi-user, without NFS (the same as 3, if you do not have networking)
- 3: Full multi-user mode
- 4: Unused
- 5: X11
- 6: Reboot (do not set initdefault to this)
Newer Linux distributions almost always default to runlevel 5 (X11), which means that your system will stop at the graphical log-in screen when boot is complete. This is fine until something (or someone) hoses your X configuration; you will then have to find a means to log in. You could press Ctrl+Alt+F7 to get a text-based virtual screen, but why go through that hassle? Instead, I always change my runlevel to 3 in the file /etc/inittab. The line you change is:
id:5:initdefault:
That will change to:
id:3:initdefault:
This is a very simple method of saving yourself when X doesn't work properly.
Multiple kernels
The next obvious rescue aid is to always have a working kernel installed. I usually work from a kernel updated via yum. Kernels have occasionally been released with flaws that have caused one or more of my machines to not boot. To this end, I always make sure I have at least one perfectly running kernel on a machine. A great way to handle this is to first add plugins=1 in your /etc/yum.conf file. The next step is to take this script (written by Jeremy Katz from Red Hat) and save it as n-installonly.py in /usr/lib/yum-plugins. You can change the number of kernels to retain on the system by changing the tookeep variable (default = 2).
With a known working kernel on your system, you can upgrade safely. If the new kernel is hosed, simply boot the old kernel to solve the issue with the new kernel — be it to remove it, recompile it, or update it.
Rescue mode
If you are using Red Hat and the Lilo boot loader, you can boot into rescue mode by inserting disk 1 of your installation and entering linux rescue at the boot prompt. Once the machine has booted, you will land on the bash# prompt. From this mode, you have a number of tools to use.
As you can see, there are tools to check the integrity of a hard disk, repair hard disks, check kernel modules, mount devices, and create file systems, etc. This is a very good place to start with your rescue attempt — if you're using a Red Hat, or Red Hat-based, system.
The next rescue method is booting into single-user mode, where your computer boots to runlevel 1. Your local file systems will be mounted, but your network will not be activated. You get a usable system maintenance shell. To boot into single-user mode, enter either:
linux single
or
linux emergency
at the Lilo prompt.
Creating a rescue CD
If you're using the Lilo boot loader, there's a great tool called mkrescue. This tool is typically used to create boot floppies, but…
Full Talkback thread
3 comments







