Debian Cluster Components
- on Intel's ia64 architecture -

Back to main page

Table of Contents

  1. Motivation
  2. Setup of the master node
    1. Basic system installation
    2. Compile the queueing system Torque
    3. Recompile tftpd-dcc and configure the network boot ftp daemon
  3. Create the client image
    1. Image creation configuration files
    2. dcc_buildimage
    3. Image post package installation and configuration
    4. Image boot loader configuration
  4. LDAP configuration
  5. Torque configuration
  6. Install clients
  7. To do

1. Motivation
Debian GNU/Linux contains all packages to manage a compute cluster system. Therefore, there are no reasons why Debian should not be the linux distribution for such a system. As a matter of principle, you can configure all required cluster parts manually, e.g., the software imaging or the queueing system.
At this point, the Debian Cluster Components will help you. This is a collection of scripts that depend on all required cluster software packages and will configure a complete working compute cluster based on Debian.
But if you want to use Debian and DCC on Intel's Itanium2 architecture (ia64) you will find some problems like

This tutorial represents a workaround to set up Debian with DCC on ia64 machines.

2. Setup of the master node
This section will describe how to configure the basic Debian system and tells you how to create the missing packages that are not available for ia64. Additionally, all preparations for the client installation are explained.

2.1. Basic system installation
The Debian installation procedure is the same like on other architectures. But during the partitioning step of the installation it is required that you create an EFI boot partition. It is recommended that this partition is the first one on your first hard disk.
Like mentioned in the DCC installation instructions your partition containing /var/ should be big enough to store the client image(s).

After the basic installation you have to configure your network in /etc/network/interfaces

# external network interface
auto eth0
iface eth0 inet static
        address x.x.x.x
        netmask y.y.y.y
        network z.z.z.z
        broadcast u.u.u.u
        gateway v.v.v.v
# internal network interface
auto eth1
iface eth1 inet static
        address 192.168.0.1
        netmask 255.255.255.0
        network 192.168.0.0
        broadcast 192.168.0.255
Furthermore, you have to modify your host table /etc/hosts for the internal cluster network
127.0.0.1 localhost.localdomain localhost
# internal definition
192.168.0.1 master.localdomain master
# optional external definition
xxx.xxx.xxx.xxx cluster.your-external.domain cluster

On i386 and amd64 architectures you can install the Debian Cluster Components without any problems because all binary packages are available and are working. But on ia64 there are no Torque binaries. These have to be build by yourself and have to be installed before you install the DCC packages. In this context some packages have to be installed manually. This tutorial will tell you which packages have to be installed at which moment.

At the end of the basic installation still one word about the filesystem sharing between the nodes. It is recommended that the users login on the master node only and start their jobs there. In the presented configuration on the master node exists a directory /master/ that is shared to all working nodes into /master/ locally. Due to performance reasons each working node should hava a local directory /scratch/ that should be the working directory for the user's jobs. Your job script is executed on a working node and could copy the binary file into /scratch/, excute it there, followed by copying the resulting data files back to /master/.

2.2. Compile the queueing system Torque
It is possible to use the Debian source packages of Torque served by the DCC project. Add the following lines to your /etc/apt/sources.list

deb http://ftp.irb.hr/pub/irb/dcc/ ./
deb-src http://ftp.irb.hr/pub/irb/dcc/ ./
Change into root's home directory or into /tmp/ and call
apt-get build-dep torque
apt-get source torque
cd torque*
dpkg-buildpackage
Now, all required Torque packages for the master and the client nodes have been compiled for ia64 and Debian packages are created. On the master node you have to install the following packages
aptitude install libcurses-perl
dpkg -i torque-common_1.0.1p6-4_all.deb \
	torque-server_1.0.1p6-4_ia64.deb \
	torque-sched_1.0.1p6-4_ia64.deb \
	torque-utils_1.0.1p6-4_ia64.deb

2.3. Recompile tftpd-dcc and configure the network boot ftp daemon
A boot ftp server is required for the tftpd-dcc package. It is recommended to install atftpd instead of tftpd because the directory /tftpboot/ is treated in different ways by the both daemons.

Install atftp and some other required packages with

aptitude install atftpd systemimager-boot-ia64-standard	\
		systemimager-server elilo
Probably, you have to configure the start of the atftpd daemon by the inetd in /etc/inetd.conf adding the line
tftp dgram udp wait nobody /usr/sbin/tcpd /usr/sbin/in.tftpd \ 
	--tftpd-timeout 300 --retry-timeout 5 --mcast-port 1758 \
	--mcast-addr 239.255.0.0-255 --mcast-ttl 1 --maxthread 100 \
	--verbose=5 /tftpboot
Now, restart the inetd daemon or send a HUP signal to it.

Debian uses the elilo boot loader to start on ia64 systems. This has to be prepared now using the tftpd-dcc package. tftpd-dcc depends on the package syslinux but this package is not available on the ia64 architecture. You have two possibilities

Here, the second ways is described.
aptitude install debconf-dcc
apt-get build-dep tftpd-dcc
apt-get source tftpd-dcc
cd tftpd-dcc*
Now, the dependence line in the file debian/control is modified like
Depends: atftpd, systemimager-boot-ia64-standard, systemimager-server, elilo
Additionally, the files postinst and prerm have to be modified. These new scripts can be downloaded here Now, this package can be compiled by
dpkg-buildpackage
and installed with
dpkg -i tftpd-dcc

After the installation of tftpd-dcc the directory /tftpboot/ is prepared for the network boot installation of the clients. More details can be found in the SystemImager FAQs. The master node is now ready to become the "real" master by installing the package dcc-front

aptitude install dcc-front

3. Create the client image
On ia64 systems you have to perform some adaptions of the client image which are explained in the following section.

3.1. Image creation configuration files
The required configuration files are located in /etc/dcc/. The file config contains the name of the image and the hostname prefix for the clients, and the start ip address for the clients. In disktable the partioning of the clients is defined. If you use the gpt hard disk label instead of the msdos label it should be possible to define more than four partitions but this does not work.
The first partition on the first hard drive has to be the EFI boot partition. This partition has to have the filesystem type vfat and has to be mounted in /boot/efi/. The mount flags are defaults and bootable. A useable disktable file could look like

label_type=gpt
/dev/sda1 200 vfat /boot/efi defaults bootable
/dev/sda2 6144 swap
/dev/sda3 6144 xfs /tmp defaults
/dev/sda4 * xfs / defaults
192.168.0.1:/master - nfs /master   rw

Now, you have to modify the package file packages.list. In this file the client kernel image should be defined but this does not work. You have to install the kernel image manually after the image creation. The package file has to contain the following lines

elilo
systemconfigurator
libcurses-perl
tk8.3
discover
You can add more packages that you like to have available on the clients.

Finally, the source list in sources.list can be adapted for other debian mirrors you prefer. This file could look like

deboot http://ftp.debian.org/debian sarge
deb ftp://ftp.fu-berlin.de/pub/unix/linux/mirrors/debian/ stable main
deb-src ftp://ftp.fu-berlin.de/pub/unix/linux/mirrors/debian/ stable main
deb http://security.debian.org/ stable/updates main
deb http://ftp.irb.hr/pub/irb/dcc/ ./
deb-src http://ftp.irb.hr/pub/irb/dcc/ ./

3.2. dcc_buildimage
After all preparations in the last section the building script of the DCC can be called

dcc_buildimage

3.3. Image post package installation and configuration
If the image was created successfully you have to edit the image and to install the last packages which are not available for ia64. Especially, the boot loader has to be configured.

Copy the Torque binaries into the image's /tmp/ directory

cp torque*.deb /var/lib/systemimage/images/IMAGE_NAME/tmp/
Now, change into the image with
dcc_editimage IMAGE_NAME

Inside the image call the following commands

cd /tmp/
dpkg -i torque-common_1.0.1p6-4_all.deb \
	torque-mom_1.0.1p6-4_ia64.deb \
	torque-utils_1.0.1p6-4_ia64.deb
rm torque*
aptitude upgrade
aptitude install dcc-node
aptitude install kernel-image-VERSION-mckinley-smp

It is recommended that the mail system of the clients is configured properly. E.g., you could configure exim in that way that no local mail is used and all system mails to root will be sent to a smarthost.

It could be that not the complete hardware of the clients is supported automatically after the installation, the usb keyboard respectively. But as a matter of principle, this is not nessessary because you login remotely to the clients only. But nevertheless, if you want to have a working usb keyboard you can adapt the file /etc/modules and define the required modules.

You can leave the image chroot environment with "exit".

3.4. Image boot loader configuration
The boot loader requires the file elilo.efi and the used kernel in a special direcory. Call the following commands inside the image

cd /boot/
mkdir efi
cd efi/
cp /usr/lib/elilo/elilo.efi .
cp ../vmlinuz* vmlinuz
Now, a very important modification of the boot loader configuration perl module of the systemconfig package has to be done. Edit the file /usr/lib/systemconfig/Boot.pm inside the image and comment out all boot modules except Boot::EFI.

Leave the image and call

mksidisk -A --name node --file /etc/dcc/disktable
mkautoinstallscript --image node --force --ip-assignment dhcp --post-install reboot
If your internal network interface of the clients is not eth0 you have to adapt the image install script in /var/lib/systemimager/scripts/, additionally. Change eth0 to ethX in the [INTERFACE]-Section.

After this, you have to reenter the image and to edit the file /etc/systemconfig/systemconfig.conf. Normally, you have to modify the path-line in the kernel section only. vmlinuz has to be in the root.

# systemconfig.conf written by systeminstaller.
CONFIGBOOT = YES
CONFIGRD = YES
[BOOT]
        ROOTDEV = /dev/sda4
        BOOTDEV = /dev/sda
        DEFAULTBOOT = vmlinuz
[KERNEL0]
        PATH = /vmlinuz
        LABEL = vmlinuz

4. LDAP configuration
Due to the fact that the Open LDAP server does not work under Debian Sarge ia64 the automatic ldap configuration of the DCC project does not work, too. One simple solution is that you use an existing ldap server in your network. You need to reconfigure the ldap configuration on the master node and inside the image.

/etc/ldap/ldap.conf

BASE dc=your,dc=domain
URI ldaps://your.ldap.server
TLS_CACERT /path/to/your/cacert.pem

/etc/libnss_ldap.conf

host your.ldap.server
base dc=your,dc=domain
uri ldaps://your.ldap.server
ldap_version 3
timelimit 30
bind_timelimit 30
pam_filter objectclass=posixAccount
pam_password md5
nss_base_passwd ou=tree_in_ldap_where_user_accounts_are,dc=your,dc=domain
nss_base_group ou=tree_in_ldap_where_groups_are,dc=your,dc=domain

/etc/pam_ldap.conf

host your.ldap.server
base dc=your,dc=domain
uri ldaps://your.ldap.server
ldap_version 3
timelimit 30
bind_timelimit 30
pam_filter objectclass=posixAccount
pam_password md5

The pam configuration and the nsswitch.conf are already configured properly by the DCC installation.

5. Torque configuration
Last but not least some small adaptions of the Torque system have to be done.
The Torque mom daemon's configuration file /etc/torque/mom_config inside the image requires the $usecp parameter

$clienthost  master.localdomain
$restricted  master.localdomain
$logevent 0x1ff
$usecp cluster.your.domain:/master /master
$usecp cluster:/master /master
$usecp node1.localdomain:/master /master
$usecp node1:/master /master
$usecp node2.localdomain:/master /master
$usecp node2:/master /master
# two entries for each work node
# (one with domain and one without)
On the master node (outside the image) it could be that you have to modify the node properties (e.g., the number of CPUs of each node) in the file /var/spool/torque/server_priv/nodes (see the Torque documentation).

6. Install clients
Now, you can proceed like explained in the DCC installation documentation.

dcc_dicovernode
and boot the clients over the internal network device. The discovering is required for the first client start only. Is the client stored in the SIS database it is sufficient for an installation to boot the client over the network without any actions on the master node.

7. To do
Currently, the pushing of image modifications to all clients does not work without any problems. If you call the image pushing command the boot loader configuration of the clients will be destroyed. First tests that prevent this bahavior lead to a damaged network configuration of the clients.
At the moment I am working for a solution of this problem.
Current workaround: reinstall the client if you have performed changes on the image.

Acknowledgment
Thanks a lot to Valentin Vidic, one of the authors of the Debian Cluster Components Project. He spent a lot of time in answering my questions and gave many helpful hints to find a working configuration for the DCC on an ia64 cluster.

Gordon Grubert, August, 2006
Back to main page