Clone disks with multicasting using Clonezilla

Copyright (c) 2007 David Corbacho.

1. Introduction

The purpose of this project is to clone a system to many computers. I'll explain in this document a way for copy the entire contents of one computer hard disk to other computers in the LAN, distributing them by multicast transfer and using Free Software.

This project has been made in the classrooms of the HAAGA-HELIA University of Applied Sciences in Helsinki (Finland), 2007.

1.2. Scope

The requisites for make this project are:

1.3. Concepts

Clone disk software
Basic operation of clone disk software is to create an image of one computer hard disk and restore it in another one, partitioning automatically the target disk. This kind of software is commonly used in schools and companies where it’s needed to install an operative system, settings and other software to many computers. So using clone disk will provide an easy and very fast way to do the task. [More on Wikipedia: Disk Cloning]
Multicasting
It’s a way to deliver a single message to a select group of recipients. In this case, we want to deliver the image of one hard disk to a group of computers in the LAN. It’s not the same than broadcasting that sends the data to all the computers within a network. The main benefit of use multicasting in the installation is that allows a single installation to be sent simultaneously to many machines, being so fast to transfer it to 100 computers as it to one.
MAC Address
In computer networking a Media Access Control address (MAC address) is a unique identifier attached to most network adapters (NICs). It is a number that acts like a name for a particular network adapter, so, for example, the network cards (or built-in network adapters) in two different computers will have different names, or MAC addresses, as would an Ethernet adapter and a wireless adapter in the same computer, and as would multiple network cards in a router. [More on Wikipedia: MAC address]
NIC
A network card, network adapter or NIC (network interface controller) is a piece of computer hardware designed to allow computers to communicate over a computer network. It is both an OSI layer 1 (physical layer) and layer 2 (data link layer) device, as it provides physical access to a networking medium and provides a low-level addressing system through the use of MAC addresses. It allows users to connect to each other either by using cables or wirelessly. Although other network technologies exist, Ethernet has achieved near-ubiquity since the mid-1990s. Every Ethernet network card has a unique 48-bit serial number called a MAC address, which is stored in ROM carried on the card. Every computer on an Ethernet network must have a card with a unique MAC address. No two cards ever manufactured share the same address. This is accomplished by the Institute of Electrical and Electronics Engineers (IEEE), which is responsible for assigning unique MAC addresses to the vendors of network interface controllers. [More on Wikipedia: Network card]

1.4. Software Solutions

Norton Ghost
Most famous solution, and the one who launched the market of this kind of software, is Ghost, commercial software owned by Symantec that covers almost all the needs related to disk backup and disk cloning operations. It’s a propietary software, so it doesn’t fill our requirements.
dd
It's an Unix program/command that can be used for clone partitions, but it doesn’t support multicasting and it copies the whole partition, including the unused space so it doesn’t fill our requirements neither.
Partimage
It’s an OpenSource Linux/Unix program under GPL 2 license that can create and restore images of hard disks without copying the empty blocks. Even compress the image for save disk space, but still it doesn’t support multicast so it’s not the tool that we are looking for. Last version released in April of 2006 (almost one year ago), so it seems that is kind of stopped project.
Clonezilla
Clonezilla is OpenSource software, based on DRBL, Partition Image, ntfsclone, and udpcast. It allows you to clone hard disks to many computers simultaneously by multicasting transfer. It saves and restores only used blocks in the hard disk and support ext3 files system. So this is best software solution I have found for this project and it fills all requirements.

2. Clonezilla

2.1. About

Clonezilla is a partition or disk cloning tool similar to Symantec Ghost.

The creator(s) of Clonezilla classify it as an OpenSource clone system (OCS). It has been tested in National Center for High-performance Computing (NCHC), Taiwan, and was used to clone 41 computers simultaneously. It took about 50 minutes to clone a 5.6 GBytes system image to all 41 computers via unicasting and only about 10 minutes via multicasting.

Clonezilla is based on DRBL, Partition Image, ntfsclone, and udpcast. Unlike G4U (Ghost for Unix) or G4L (Ghost for Linux), Clonezilla saves and restores only used blocks in the hard disk. This increases the clone efficiency. With DRBL and network boot enabled client computers, the only thing you have to prepare is a Clonezilla server. You do not even have to prepare a bootable CD or floppy with Partition Image for every client computer.

This is not the case of this project, but if you do not want to install DRBL, and you just need Clonezilla to clone individual machines, one by one, i.e. not massively clone, then you can try Clonezilla Live which allows you to use CD/DVD or USB flash drive to boot and run Clonezilla.

Useful links:

2.1. Clonezilla Features

3. Installation

3.1. Overview

Almost all of the work is made on one computer (server). We only have to be sure that the computers (clients) that we are going to install the system to, have been set up with Networking Boot and they support PXE.

In the server, I've installed Ubuntu 6.06, but could be any of GNU/Linux distribution as indicates here the DRBL installation guide.

After that, I configured the network settings. It is recommended to have 2 network cards in the server, so we can have 2 IP addresses, but it's not mandatory. Later in this document it is explained how to configure the network settings having only 1 NIC.

Then we install DRBL, that installs Clonezilla too, and follow the steps of the wizard. One of these steps would recollect the MAC addresses of the clients automatically.

After the installation, for testing, I've run the Clonezilla graphic interface and saved a disk image of another computer in the server (through the network) and cloned it to the rest of computers via multicasting.

Saving the image of the disk (5 Gb) took 1 minute. Restore that same image to 8 computers, simultaneously by multicasting, took only 1 minute 40 seconds.

3.2. Installation of DRBL

Clonezilla is based in DRBL. So first of all, we need to install it and be sure that our hardware fulfill the basic requirements. Don't pay attention about the server needs two or more NICs. You can do it with only one ethernet card.

Basically, I've followed the installation guide that it is in the DRBL Homepage with the instructions for a server with:

  1. Debian/Ubuntu installed
  2. Only one ethernet network card.
  3. with internet connection.

If you have 2 or more network cards, you can use eth0 for WAN (connection to Internet for example) and others for the LAN: eth1, eth2, eth3...

But in our case, with only one network card, we may set two IP addresses, i.e. one in eth0 and another alias IP address in eth0:1. First one will be used for the DRBL server to connect to the public Internet while eth0:1 will be used for the DRBL environment.

These are the steps I did:

Edit the file where the network configuration is stored. Execute:

    sudo gedit /etc/network/interfaces

In that file, and after the configuration of eth0, add the next code that will define the alias IP address eth0:1. Change the gateway to your gateway and save the file:

    auto eth0:1
    iface eth0:1 inet static
        address 192.168.0.1
        netmask 255.255.0.0
        gateway 172.28.1.254

Important. After the changes, execute this command for changes take effect:

    run "/etc/init.d/networking restart"

If you don't, alias IP address will not appear in the wizard described in the chapter 3.3. of this document.

After this, edit the next file:

    sudo gedit /etc/apt/sources.list

Now add these 2 lines in this file that will indicate to the system from where can download DRBL. In my case, are these next 2 lines because I use in the server Ubuntu 6.06 LTS (Dapper Drake). If it's another GNU/Linux distribution, then see the installation guide.

    deb http://free.nchc.org.tw/ubuntu dapper main restricted universe multiverse
    deb http://free.nchc.org.tw/drbl-core drbl stable

Note. We can use these others 2 drbl-core mirrors: (1) http://diskless.nchc.org.tw/drbl-core and (2) http://drbl.sourceforge.net/drbl-core

We update the system and install DRBL executing these 2 commands:

    sudo apt-get update
    sudo apt-get install drbl

If it asks, don't pay attention about the Warning about "the packages cannot be authenticated". Install it anyway.

Now we execute next command to installing DRBL:

    sudo /opt/drbl/sbin/drblsrv –i

A multilanguage wizard will guide us through 4 steps:

Note. The default value is uppercase. If you are agree with default value, then just press "Enter".

        1. [0] English.
        2. [N] Let clients to install Linux via network?
        3. [N] Use the serial console output for clients?
        4. [1] Which CPU architecture kernel do you want to assign for DRBL clients?
                                            [0. i386, 1. i586, 2. same than server]
        5. [N] Do you want to upgrade operating system?

Now it will start to download the needed packages and install them. In my case took around 7 minutes with the high speed connection of the university (some packages downloaded at 1024 kB/s).

3.3. Setting up the environment of DRBL

Now we have to set up the DRBL server, executing this command:

    sudo /opt/drbl/sbin/drblpush –i

Again a wizard will guide us in 27 steps:

        1. [0]. English.
        2. [tielab.helia.fi] DNS domain
           (Maybe could be different in your case, I left the default value and press Enter)
        3. [penguinzilla] NIS/YP domain (Default value)
        4. [ubuntu] client hostname prefix (Default value)
        5. [eth0] Which ethernet port in this server is for public Internet access?
           (Don't choose the alias IP was created before)
        6. [y] Collect the MAC address of clients. 

This 6th step is important ("y" it's not the default value) especially in our case, that we use an alias IP address in the server. If we don't give the MAC addresses of the clients could be conflicts later providing IP addresses to them. So press "y", and after that you have to boot the client computers in order. Automatically the server will start to recognize their MAC addresses. Remember that clients have to boot from etherboot or PXE!.

When is collecting, you can press "Enter" in the server to view the collecting status and if it has captured the MAC address of the NICs you want. If there are all the MAC addresses you need in that list, press "2" and it will stop collecting and continue with the wizard.

When it is pressed "2", it will display a message with the route where you can find the file where MAC addresses are collected, if you need to edit it. I've edited for example, to delete some MAC addresses of computers that I didn't want.

The wizard continues:

        7. [y]. Let the DHCP offer same IP address to the client every time when client
        boots? It's not the default value, but the installation
        guide of Clonezilla recommends it, so it's clearer what are the clients and will
        not be mistakes cloning the systems to incorrect or unknown clients.
        8. [macadr-eth0:1.txt] File name which contains the MAC address. (Default value).
        9. [2] Initial number for "d" in IP address (a.b.c.d) for DRBL clients. (Default
        value).
        10. [y] Now it will show a summary of the settings you have been choosing in
        the wizard, and ask if you want to accept them. Then it will show a very clear layout
        for your DRBL environment. Press Enter if it is correct.
        11. [0]. Full DRBL mode.
        12. [0]. Full Clonezilla mode.
        13. [/home/partimag] Which directory in the server you want to store the saved image?
        (Default value)
        14. [Y]. Use the swap partition in the harddrive of the client, if it exists?
        15. [128] Maximum size (in Mb) for the swap space. (Default value)
        16. [1] Which mode do you want the clients to use after they boot? (1. Graphic mode,
        2. Text Mode).
        17. [0] Which mode do you want when client boots in graphic mode? (0. normal login)
        18. [N]. Set a root's password for clients different from server one?
        19. [N] Set a pxelinux password for clients?
        20. [Y] Set the boot prompt for clients?
        21. [70] How many 1/10 sec is the boot prompt timeout for clients? (Defaut value)
        22. [Y] Use graphical PXELinux menu?
        23. [Y] Let the audio, cdrom, floppy, video and USB devices opened to all users
        in the clients?
        24. [N] Do you want to setup public IP for clients?
        25. [N] Let clients have an option to run terminal mode?
        26. [Y] Let DRBL server as a NAT server?
        
        
        "The running kernel in the server supports NFS over TCP!" [...]
        Press Enter to continue..
        The calculated NETWORK for eth0:1 is 192.168.0.0.
      
        27. [Y]. Deploy the files to system? (Firewall rule will be overwritten during the
        setup)

After 4 minutes it will finish, it depends on how many clients you have.

I have experienced some problems, because when it collected the MAC addresses, it includes some IP addresses too, that shouldn't be there. So after this wizard is completed, if is your case too, just delete what you don't need from this file:

    sudo gedit /etc/drbl/macadr-eth0:1.txt

Also, delete the corresponding whole hosts like this one:

        host ubuntu0-108{
            hardware ethernet 00:0d:56:2c:17:33;
            fixed-address 192.168.0.8;
            # option host-name "ubuntu0-108";
        }

from this next another file "dhcpd.conf":

    sudo gedit /etc/dhcp3/dhcpd.conf

3.4. Setting up and executing Clonezilla.

Now is the easiest part. Basically, I've followed the Clonezilla guide.

Check that in the Boot options of the clients you have setup them to boot from PXE or Etherboot. (If it is Etherboot, then is required to be version 5.4.0 or newer).

Execute in the server:

    sudo /opt/drbl/sbin/dcs
        1. [0] English
           Then, it should display a graphic interface with these questions:
        2. [YES] Do you want to set all clients mode?
        3. Now you can choose the mode of Clonezilla. For example, we are going to save
           an image from the hard disk of one of our clients. So we choose "clonezilla-start".
        4. [save-disk] (Save client entire disk)
        5. [No] Do you want to input the image and device name now? It's better not, so
           later you can choose which hard disk you want to use from the client.
        6. [-q] Use ntfsclone to save NTFS partition instead of partimage
        7. [-p] Choose in client as clone finishes.
        8. [-z3] Use lzo compression (similar to gzip, but faster)

Now, Clonezilla has started to work. You don't need to do anything else in the server. Just turn on the client you have choose to save the image from, and it has to boot from network and display a screen like this:

screenshot

Choose "save image", give the name of the image you want and select the hard disk.

Then, it will start so save the image in the server, in the screen you can see the progress:

screenshot

When it ends, in the server will appear a message similar to this one:

    Client 192.168.0.8 (00:0d:56:2c:17:33) finished cloning.
    Stats: Saved /home/partimag/2007-05-08-22-img, /dev/hdc1, success, .071 mins; /dev/hdc3,
    success, 2.054 mins;

Now, you can run again Clonezilla and choose "clonezilla-stop", to end the application.

After this, we are going to send by multicasting this image saved in /home/partimag to 7 clients, we run again Clonezilla and now we choose:

        1. "clonezilla-start"
        2. "restore-disk"
        3. OK (I left the default values)
        4. Skip this option.
        5 -p reboot (reboot client when clone finishes)
        6. Choose the image file to restore (we choose the previous we've created)
        7. Choose the disk(s) to restore (In my case, only hdc is displayed)
        8. multicast (faster)
        9. clients + time to wait. 
        10. How many clients to restore? 7 (I've noticed that this number doesn't
            affect at all. You can restore 8 clients if you want, always that this
            computers' MAC addresses are in the settings of DRBL)
        11. Time to wait: 300 (Default value)

Be sure of the advices it displays, Press "Enter"and then, boot the clients.

Each client will load a screen where you can choose "restore image" or anyway, if you don't choose anything, in 6 seconds will start to restore the image.

screenshot

After that it will show a screen of the progress (like the image below), and in some minutes you'll have the disk image restored.

screenshot

In my case, I started the process simultaneously in 7 computers, restoring a image of 1.2 Gigabytes. One minute and 30 seconds later, I had the image restored in all clients. When restoring of the image is complete, the server will show a message like this in each client screen:

    Client 192.168.0.9 (00:0d:56:2b:d3:5f) finished cloning. 
    Status: Multicast restored 2007-05-08-22-img, /dev/hdc1, success, .104 mins;
    /dev/hdc3, success, 1.361 mins;

3.5. Troubleshooting.

The client doesn't load the Clonezilla screen.

If you have another DHCP server, maybe could be conflicts and the client doesn't take the right IP. So, you can make sure your Clonezilla DHCP server works, writing the word authoritative at the top of the DHCP configuration file:

        sudo gedit /etc/dhcp3/dhcpd.conf

4. Copyright

Copyright (c) 2007 David Corbacho.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".