banner Debian GNU/Linux

Clone It

Reproducing a running Linux system


rsync

rsync is a program that behaves in much the same way that rcp (remote copy) does, but has many more options and uses the rsync remote-update protocol to greatly speedup file transfers when the destination file already exists.

The rsync remote-update protocol allows rsync to transfer just the differences between two sets of files across the network link, using an efficient checksum-search algorithm described in the technical report that accompanies this package.

Some of the additional features of rsync are:

from: manpage of rsync

When looking to reproduce systems (automatic, unattended installations) or keeping systems in a pre-defined state (e.g. kiosk systems and unattended public workstations), this is a godsent.

synchronise : what, how and when ?

rsync is an intelligent tool in that it only transfers the differences between the machines, so it does not really matter what you synchronise and how often : if there are no differences, nothing is done.

Depending on your plans, you may rsync selected directories, or just / (the entire filesystem). In the latter case, it may be a good idea to at least exclude /proc (file handlers to running processes). Linux is apparently capable of having (close to) its entire system replaced while it is running, so you can rsync at boot (init scripts), at logon (logon scripts) or at any given time (at) or interval (cron). However, some synchronisation operations may cause the filesystem to show 'unclean' and force an fschk at boot time. Smaller synchronisations (without devices, processes, ...) can easily be done online.

In the following examples, we rsync with a running model system, but you can also keep a collection of online files (data, config files, ...) for clones to replicate.

simply clone an entire system ?

To clone an entire system, we will execute rsync with options to :

On a clean, fresh installed system, run

rsync -vrpoglHDIt --exclude=/proc -- progress remotemodel:/ /

which will make every file on the new system identical to the remotesystem. Could be tricky re. hostnames and ip configuration, and you have to be pretty sure the hardware of both machines is identical (because you're also cloning the kernel image, hardware-specific modules, and so on). What you do is actually replacing a running system - by copying another running system. Ain't that cool.

So image you create 1 model configuration. Then, install Linux on as many PC's as you require. The configuration does not matter much so this can easily be automated, although it helps if the configuration is already close to the model : rsync only transfers the difference between the machines. Then, rsync the pc's with the model, using the rsync options shown earlier. The pc's will become exact copies of the model. You may want to change some things (everything that needs to be unique), then reboot. Finished.

Synchronising running systems

When you add an rsync statement in a cron job or startup script, the 'clones' will synchronise at regular intervals and / or everytime they reboot (or when a user logs on). By choosing the directories that need to be synchronised, you can avoid trouble with duplication of values that need to remain unique.

As rsync only transfers differences, this is a nice way to transfer modified configuration files : just edit the config on the model machine, and wait for the others to synchronise. Also works for software distribution : install packages on the model, and configure them. When synchronising, all added files and modifications to existing files will be replicated to the clones. And it can be done while they're online ...

As only changes will be processed, you can use the same rsync statement (without -v and --progress, with -q for quit) : you've started with a synchronised system, and only subsequent changes will be transferred, or you can fine-tune it a bit, eg like so:

rsync 	-rpoglHtuq  \
				--exclude=/proc --exclude=/tmp --exclude=/sys  --exclude=/home --exclude=/root \
				remotemodel:/ /
	

Deepfreeze systems

In a similar way, you can keep one computer as 'the ideal workstation' and let rsync undo any changes made on the clones. When synchronising, the clones will become copies of the model again, no matter what your users have done to them. Ideal for unattended public workstations and kiosk systems. You can rsync the entire system (as in the previous approach, allowing you to distribute modifications to the config easily), and/or use rsync to reset any given directory (eg /home/kioskuser) to a predefined state.

To accomplish this, we use roughly the same rsync 'update' statement as before, but add --delete to remove files from the destination (the kiosk system) that do not exist on the model system. We use compression to speed things up so we can rsync each time a user logs on.

rsync 	-rpoglHtqz					\
				--exclude=/proc --exclude=/tmp --exclude=/sys  	\
				--delete --force				\
				remotemodel:/ /
			shutdown -r +1
	

note that we let the system reboot afterwards - obviously this only makes sense for a well scheduled synchronisation, not when replicating at boot time or during a log-in :-)

Assuming this is a locked down system where the user has only access to $HOME, we can also only synchronise /home at logon, and reserve the more lenghty complete synchronisation for execution at given times (e.g. nightly) or at boot, to implement configuration changes. A limited synchronisation could look like this :

rsync 	-rpoglHtqz					\
				--exclude=/proc --exclude=/tmp --exclude=/sys  	\
				--delete --force				\
				remotemodel:/home/ /home
	

In reverse

If we switch the destination and source paths, we can do a 'push' synchronisation, i.e. make changes to the master and push them to the workstations. Comparisons (size, timestamp, checksums) are made relative to 'sending side' and 'receiving side' so options for update and synchronisation don't need to change whether you pull or push the replication.

	# push replication of user home directories on multiple kiosk systems from a 'master kiosk'
	
	NAME=kiosk
	NUM_KIOSKS=10

	for i in $(seq 1 $NUM_KIOSKS); do
		rsync 	-vrpoglHtqz					\
			--exclude=/proc --exclude=/tmp --exclude=/sys  	\
			--delete --force				\
			--progress					\
			/home  $NAME$i:/home
	done
	

Note that this requires your workstation names to be standardised, and that they will be rsync'ed one by one as the loop is iterated, whereas a pull replication initiated by the workstations would allow multiple synchronisations at once (provided the network and the master machine can deal with the load). To have multiple simultaneous replications, maybe we can put each rsync job in background, provided we don't need to give input (password to access remote system, ...)

Fast Replace

In the previous examples, files are processed 1 by 1, potentially creating an unstable situation until the source system has completely replaced the destination system. This can be avoided by temporarily keeping changes in a designated directory, and implement all changes 'at once'. This is the option --compare-dir=DIR. (Does this require an additional mv ? probably; need to test that)

TODO

add host-based or passwordless authentication and/or use password files/env vars so users don't have to accept certificates and input passwords when rsync starts


Koen Noens
July 2006