time to bleed by Joe Damato

technical ramblings from a wanna-be unix dinosaur

Archive for November, 2008

PXE booting: easily getting what you want on to remote servers

View Comments

Scaling out your service to multiple servers can be a painful process once you consider package management, configuration, OS installation, and provisioning. The pain can be exacerbated if you want to use a version of linux that your hosting provider does not provide. In this next set of blog posts I’m going to talk about a few of the different ways to deal with these issues.

This post will address PXE booting (pronounced pixie, like the candy our friends up top are eating). PXE booting can help you easily install an OS and provision a server. All your provider has to do is turn your new system on and PXE can handle the rest!

Before talking about provisioning, let’s talk about booting a custom kernel image for a linux network installer.

But why would I not use the provider’s linux?

Plenty of reasons for this one. Maybe you started out with some distro of linux, had to switch providers, and now your new provider doesn’t have the distro you want. Perhaps you want to use some less popular distro, don’t like the way the system was installed, want to install on to software raid, or whatever. An easy solution to this problem is to use PXE booting to easily launch an installer image on multiple machines as the machines boot.

What is PXE booting?

PXE booting is a process that can occur during boot time on a computer system. PXE was designed as an option ROM for the x86 BIOS and you can get this functionality by having a NIC that supports PXE. Many NICs these days support PXE, do some googling or contact your provider to see if your NIC supports PXE.

After the system BIOS comes up, the PXE option ROM code is executed. The NIC on the system broadcasts a DHCPDISCOVER packet with some extra information that lets anyone listening know that it supports PXE. If a DHCP server hears this packet (and also supports PXE) it can transmit data back to the NIC including a file name (the image you will be booting) and an IP address (the server that has the image). The image is transferred via TFTP, loaded into RAM, and booted.

What do I need to get the party started?

You will need to do some initial bootstrapping. You may need to have your provider install whatever generic distro they have, then PXE boot your other machines. As painful as this may sound, it is easier IMHO than installing linux on top of linux. Hopefully, your hosting provider includes a NIC with an IP address on an internal vlan and that NIC is listed early in the boot order in the BIOS.

You will also need some software that should be available via your package management system:
dhcpd – dhcp server
pxelinux (sometimes bundled with syslinux, check your package manager)- pxe boot files
tftp-hpa – tftp server
os kernel image - kernel you want to boot (if you don’t have one, you can use memtest86 to test)

Configure your DHCP server

You’ll need to setup your DHCP server by editing your dhcpd.conf (or equivalent file for your DHCP server). I’ve included the config file I’m using in production below and I’ll go over the important parts below the file.

ddns-update-style interim;
subnet netmask {
        default-lease-time 3600;
        max-lease-time 4800;
        option routers;
        option domain-name-servers;
        option subnet-mask;
        option domain-name "cool-domain.com";
        option time-offset -7;
        option ntp-servers;

host host1 {
        hardware ethernet aa:bb:cc:dd:ee:ff;
        option host-name "host1";
        filename "pxelinux.0";

There are a few important things to note about the above config file:

  • I’ve decided to specify host1 explicity by listing its MAC address. You don’t have to do this; you can specify an entire subnet (see man dhcpd(8) for more information).
  • The filename line – this line specifies the pxelinux file to download and execute (we’ll get to this soon).
  • The next-server line – this line specifies the server where the pxelinux file can be downloaded from. This can be the same server that is running dhcpd or a different one – it doesn’t matter.

For more info on other config options, check man dhcpd(8).

Configure the TFTP server

The tftp server configuration is pretty simple. A couple things to remember when setting up the tftp server:

  • It sounds obvious, but make sure the TFTP server can read from the directory it is pointed at.
  • Make sure hosts.allow and hosts.deny are setup properly to allow only your servers to access the tftp server.

Be sure to test your tftp server setup with a tftp client before moving on to the next step.

Setup pxelinux

Once you’ve used your package management system to install the pxelinux package (if that doesn’t exist, try syslinux sometimes they are packaged together) you can copy the pxelinux.0 file included in the pxelinux package to the directory your tftp server is serving files from.

Create a directory called pxelinux.cfg under the directory where your tftp server is pointed at. If your tftp server is serving from /tftpboot, pxelinux.0 would be under /tftpboot and you’d want to create /tftpboot/pxelinux.cfg/ Under this directory you will create configuration files for the different hosts.

Configure pxelinux

Under the pxelinux.cfg directory you can create configuration files for PXE to use. PXE decides which file to use based on the filename.

  1. The first filename searched is the MAC address of the client with “01-” prepended to it. For example: 01-AA-BB-CC-11-22-33.
  2. If that file is not found, the next filename searched is the IP address in hexadecimal.
  3. Next afterwards is the IP address in hexadecimal with the last digit removed.
  4. Subtracting the last digit repeats until there is only one digit left.
  5. If that file is not found, the last resort is to search for a file called default.

If your server has an IPMI interface, this is a perfect opportunity to use it. The NIC will output debugging information as it searches for files and will let you know what files it finds, if any.

The configuration file itself will look something like this (this is from our actual production config):

prompt 1
timeout 300
display boot.msg
F1 boot.msg
F2 options.msg
default arch
label arch
kernel vmlinuz
append initrd=initrd.img rootdelay=5

The configuration file is pretty straightforward.

  • timeout – number of milliseconds until the default label is executed
  • display – prints the ascii data in the specified file to the screen before doing anything else
  • default – the default label to execute if the timeout is reached
  • label – name for a specific configuration
  • kernel – the kernel to boot
  • append – additional data to pass to the kernel, in this case I’ve specified the initrd to use

That should be all you need to get PXE boot working. You should now be able to boot into the image of your choice, whether it be a network install image or a bootstrapping image. If it is an install image, you can use IPMI to give you remote KVM to guide the install.

Cool, what else can I do?

Pretty much anything you want, including provisioning. Since you can specify an initrd for the kernel to use, you can roll your own initrd. initrds are just gzipped cpio archives. You can create your own initrd which could:

  • Run scripts to provision the system as a database or app server.
  • Download a disk image from an NFS/CIFS/FTP/whatever server and dd it to disk.
  • Script the installation of your favorite linux.
  • Anything else you can think of.


PXE boot is a fast, effective, and easy system that can be setup with minimal effort and provides plenty of flexibility. It provides you with a simple way to bring up new systems when scaling out and can also be used for provisioning and deployment. Of course, it is just one of many possible solutions to custom OS installation and provisioning.

I have used PXE booting for installing a lesser-known linux distro in a remote datacenter and also for kernel development. I hope this blog post inspires you to give it a shot.

Written by Joe Damato

November 3rd, 2008 at 8:59 am

Posted in scaling,systems

Tagged with , , ,