banner Debian GNU/Linux

Go forth and multiply

Documenting a Debian Linux system in order to reproduce it


Use cases

System Documentation

Every system administrator knows its a good idea to keep documentation about the servers he's maintaining : you need it to prepare migrations, in recovery scenarios, to plan software roll-out or modifications to a configuration, or to look up configuration details when troubleshooting. Because most of this information is present in the system itself, you can find it there, but that doesn't help if the server you need to know about has just crashed and you're trying to rebuild an other one to replace it. Better collect that information in advance.

Most system administrators find this boring. So they put it off, and then forget to do it. Or they start with some documentation, then forget to keep it up to date, so the documentation of the system doesn't reflect the actual configuration of the system (and when the time comes that you need to reproduce that tweaked daemon config, you don't remember that final detail that makes it all work).

So, let's see if we can document a server automatically, and keep the documentation up to date as well.

Automatic Installations

When you're experimenting with Linux, or when you're maintaining a network with lots of Linux hosts, sooner or later you'll be thinking that there should be a way to automate the setup process, because you're going through the same installation process over and over again, repeating identical setups on multiple machines. Or you've gone to a lot of trouble to create an ideal server, a perfect baseline, or a highly customized workstation in a test environment, and you need to repeat that exact setup (just once, or several times) in a production environment.

These are exactly the kind the kind of repetitive actions that computers are supposed to be so good at. You need an automated installation procedure, but this requires predefined input on what the system is supposed to look like. Here are some commands that, combined together in a script, can be used to document your master setup, and create input files for your unattended installation.


debian installer preseed file

Install debconf-utils, then run debconf-get-selections. This will collect all possible questions the debian-installer may ask, and provide the answers as relevant to the current system. The output can be saved to a file which can then be used directly to set up an other, identical system, or as a starting point for a customized preseed file.

		TARGET="/root/inventory"

		debconf-get-selections --installer > $TARGET/list_packages_installer
		debconf-get-selections > $TARGET/list_packages
	

list_packages_installer will contain the options relevant to the debian-installer : the base system configuration. list_packages will be more elaborate and contains also the choices you've made during package installation. This can be used to re-install packages with the same options as on your model. It is not recommended to use these files directly as preseed files, but you can use the data in it to populate a preseed file with sensible questions and answers.

a list of installed software packages

To reproduce a baseline or model installation, what you need is to install a pre-defined list of packages. From an existing model system, you can collect a list of installed packages with the following commands:

 dpkg -l 

or filter out the generic package names like this (and later let sources.list and pinning decide what versions will get installed) :

	dpkg -l | awk '{print $2}' > installed_packages.list	
	
	## or
	dpkg -l | cut -f2 - > installed_packages.list	
	

Being text files, you can easily modify them should you want to change your baseline configuration or create variations to it. You might want to review the list : do you really want to re-install kernel-images, ... ? One approach could be to use the list as a starting point, and delete any unwanted packages, or those which you know are only there to satisfy dependencies. You could even narrow it down to a list of just the packages you want : apt-get will still resolve dependencies if these are not yet covered by packages on the list. So, things you might want to kick out of the installed_packages.list are :

So what we'll do is keep installed_packages.list as documentation of installed packages, and create a new file (install_packages) from it with some entries left out. Alternatively, you can just write a list of the packages that you know you want to have, knowing that apt will automatically add any other required package to resolve dependencies. If, on your model configuration, you've installed packages during the initial setup (e.g. with task selection), this will also be documented in the output of debconf-get-selections.

configuration files

Debian keeps all configuration files neatly under /etc, except the boot menu (/boot/grub/menu.list). the grub menu.list will be recreated by the grub-installer so we don't worry about it too much (unless you have a specially crafted boot menu you want to keep)

You can just copy the entire /etc and be done with it, but there are some pitfalls. /etc contains /etc/fstab, the file system table, and you don't want to copy that to a new machine unless the target machines will have the same disk and filesystem configuration. If you're going to roll out multiple identical machines, that should be the case; if you're planning to migrate a system to different hardware, you better leave out /etc/fstab - you'll create a new one during debian, base install anyway. You should also pay attention to the passwords and shadow files : do you want them copied ? And you may have a look at ownership and permissions of the files you copy into the new system. So blindly copying /etc may not be such a hot idea.

So, after copying /etc, you might want to remove a number of files so they wont be copied back to the new system(s). Alternatively, you take an opposite, and probably better approach of copying to $TARGET/etc_copy only those configuration files that have been modified. Only those will be copied over the default config files on the new machine(s). In this case, it helps if $TARGET/inventory/etc_copy has the same directory tree as /etc, so the files will file nicely into place when you copy them back to an other system's /etc.

	TARGET="/root/inventory"

	# scenario 1 - copy /etc and remove certain files (listed in 'remove_files')

		cp -Rp /etc $TARGET/etc_copy

		for i in $( cat remove_files ); do
			rm -r $TARGET/etc_copy/$i	#test this : dangerous if $i is empty !!
		done

	# scenario 2 - only copy selected files
		echo building a tree in $TARGET/etc_copy
		for i in $(find /etc -type d); do 
			mkdir $TARGET/etc_temp$i; 
		done
		mv $TARGET/etc_temp/etc $TARGET/etc_copy

		echo copying selected files
		for i in $( cat selected_files ); do
			cp -Rp  $TARGET/etc_copy/$i
		done
	

Obviously, remove_files and selected_files are lists of files to be removed from OR copied to /etc_copy, as per scenario 1 or 2 respectively.

Which files have been modified since ...

To create a selected_files list for the above config file inventory (scenario 2), it would be interesting if we could create a list of (config) files that have been changed recently (eg since system setup). Although we could use tar or some other backup procedure, we' will just copy them, so that it is easy to modify them (eg. delete any unwanted file, make small modifications eg to sources.list or preferences, ...)

To find files modified since a given date, you need a point of reference. You could take the time stamp of a file created during debian install (eg /var/log/debian-installer/syslog or /var/log/base-config.timings.*), but then, all files created during software setup would be newer ? In stead, it would make sense to create a file as the last step of your initial setup, i.e. after you've installed the operating system and all the software, but before you make any changes to the configuration. With that file's time stamp as reference, all changes to config files would be 'newer' and get the customized config file into the copy procedure. With an automatic, scripted setup, this can easily done by adding something like

$(date) > /etc/timestamp

as the final statement of your installation procedure. Any configuration file edited or created later than the creation time of /etc/timestamp will be considered 'customized after initial setup'. Both the file's time stamp and its content can be used to read the relevant date and time.

So what we have to do now is recurse /etc, and for every file, if it's newer than the reference point, add its full path and filename to a list. That list will be 'selected_files', the list of files we will want to copy.


	REFFILE=/var/log/debian-installer/syslog	## or: /etc/timestamp
	
	rm selected_files
	echo looking for files modified since $(stat --format=%y $REFFILE)
	for F in $(find /etc -type f); do  find newer
		if test $F -nt $REFFILE; then
			echo found $F
			echo $F >> $TARGET/selected_files
		fi;
	done
	

Collect configuration stripped of unnecessary comments

In stead of a copy of (a selection of) /etc, you can also read the config files, and collect the configuration data, leaving out (sometimes large) comment blocks. The result is a collection of short, easy to read files that describe the configuration for each daemon, service of package on the system, if that configuration is different from the package defaults (i.e. modified after installation). These files can be used to look up information, but can also be copied to a new host to replace the default configuration file there.

This approach is probably more interesting that just copying entire config files or large parts of /etc.

Collecting hardware info

As the operating system is usually well aware of the hardware it runs on, it can be interrogated to provide this information. This can be useful when you need to fix hardware and driver related trouble. Hardware info is collected by looking in the /PRC filesystem and from certain commands.

Disk subsystem

During normal operations, the disk subsystem is transparant to the operating system : you approach storage via the filesystem. But in some cases, you'll need to know about the partition layout, logical volume configuration, software raid setup, etc. and you better know this in advance : collecting this information after a partition got lost or disk has failed, might be quite a challenge.

Disk information can be collected from /etc/fstab and the output of fdisk, lvm commands, and mdadm. See script.

Users and Groups

When managing a system, you probably create users and add them to groups. You might want to track which user accounts exist, and what group(s) these users are member of. You can do this by reading the /etc/passwd file, lookup users with uid greater than 500 or 1000, and for each account, get the group membership. See script.

Changes to filesystem permissions

Usually, you'll manage access to files through group membership. However, you might want to keep track of changes to filesystem permissions as well. This isn't easy. What you can do is find recently changed files (compare their timestamps to a reference file, eg /etc/timestamp), and list their owner, group, and permissions. This tells you that these files have changed (eg change in permissions, even without modifying the content of the file), and what the permissions are now, which is useful if you need to reproduce it, but it doesn't tell you what the situation was before the change, so it doesn't let you roll back easily. Unless, of course, you document all changes since the default setup.

	find  -cnewer /etc/timestamp | grep -v ^\/home | while read FILENAME; do
		stat -c %z\ %a\ %U:%G\ %n  $FILENAME; echo "";
	done >> $OUTPUT
	

The resulting OUTPUT file will list all files changed since /etc/timestamp's last change time stamp, and reports date and time of the most recent change, the current permissions, and the current owner:group. You can use 'grep' to find relevant info as the file grows longer (eg grep for all changes to a specific file).

	2008-05-17 13:06:38.799694847 +0200 777 root:root /etc/test/somefile
	

Note that we exclude files in /home, because any file created by or modified by a user would show up. We're not interested in user files (they should be included in a backup), we're dealing with system documentation.


sysdoc - automatic system documentation script

All of the above can be done in a script that you just run (as root) - within seconds, you'll have collected valuable information about your system in an organized manner.

sysdoc - a script to create system documentation automatically.

The result in $TARGET should be :

... all of which can be used as input to commands that will help reproduce the system. You will want to copy these files to a server or removable media so they can be used in a automatic reproduction of a customized system ...

example : list of system documentation

	nix:/srv/doc/sysdoc# ls -1
	cp_dnsmasq.conf
	cp_exports
	cp_ntp.conf
	cp_smb.conf
	...
	lst_packages
	lst_packages-full
	lst_systeminfo
	lst_systeminfo-disks
	lst_systeminfo-hardware
	lst_users_groups
	lst_users_groups_systemaccounts
	

example of dns / dhcp server documentation (dnsmasq)
example of system information : disks

there's a helper script, sysdoc.index, that creates a html index page, so the information is more or less structured and easily accessible (through hyperlinks to the sysdoc autput files)


System documentation : just once or constantly updated ?

If this is a 'one off' to document a (model) server, you just run the sysdoc script, and your good to go. If you are applying these scripts to keep up-to-date system documentation, you may want to run one or more cron jobs to regularly update the lists, possibly with some history, i.e. don't overwrite older copies, but add some versioning (eg dates).

In some cases, it's a useful to run the entire script at regular intervals and overwrite previous versions of the documentation : this keeps your documentation up to date with recent changes. You do run the risk, however, of overwriting useful information : if an lvm configuration has gone bad since the last run, your documentation will be overwritten wit newer, incomplete or bad information, while in fact you need the old information to help you recover.

So, probably, you"ll need to think of a versioning system ( could be as simple as appending $(date +%Y%m%d%-%H%M%S) to a directory name), or split the script in 2 parts : one that is run just once (eg hardware info, initial config), and one part that's run at intervals to keep track of recent changes.

You can also update the timestamp of your reference file every time you run the sysdoc script, so that at the next run, only changes since the previous run get documented. This only affects the statements that use a "find files newer than" construction.

Where should I keep those files ?

Obviously, the files need to be accessible from the new system your setting up. During the inventory, we will store them in a user home directory (/root), to be copied to a network server later. During the automated installation procedure, the new systems will need access to these files. That can be accomplished if they are made available via http (web server), ssh, nfs, or included in debian packages or a home made debian repository. You can of course also use removable media.

For disaster recovery, it's best that your documentation does not reside on the server you need to recover.

Recovery or Reproduction

TO DO : create a script that uses the automatically created system documentation as input for a preseeded installation and/or a post-installation script.

Or: pack the documentation in a debian package so you can apply it to other systems as part of their initial setup


Extras

build a debian packagethat contains this script (sysdoc), so it can be installed easily, using standard tools.

When maintaining or running multiple debian systems, apt-proxy, a local Debian mirror, may also be a good idea.

The above has been tested on Debian and Ubuntu, but will probably also work on other distributions as far as they use debian-installer, apt, the debian package system, etc. But you may have to modify a path or a filename here and there.

Something similar for a Microsoft Windows environment


Koen Noens
June 2006 - Lots of additions in May 2008