This article is part of the BackupYourSystem series. More introductory information can be found there.
Introduction
From the man page:
- Rsync is a fast and extraordinarily versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers a large number of options that control every aspect of its behavior and permit very flexible specification of the set of files to be copied. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use.
In other words, rsync is a tool for efficiently copying and backing up data from one location (the source) to another (the destination). It is efficient because it only transfers files which are different between the source and destination directories.
Rsync
Rsync is a command line utility. Users attempting to use it should be familiar with the command line (see Using the Terminal). If you prefer a graphical interface, see the Grsync section of this page.
Installation
Rsync is installed in Ubuntu by default. Be sure to check whether the following packages are installed before starting (see Installing a Package): rsync, xinetd, ssh.
Perform a Simple Backup
The simplest method for backing up over a network is to use rsync via SSH (using the -e ssh option). Alternatively, you can use the rsync daemon (see Rsync Daemon which requires much more configuration. Local backup only requires rsync and read/write access to the folders being synchronized. Below you will find examples of commands that can be used to backup in either case. It should be noted, that a network sync can be performed locally so long as the folder is shared (say by Samba) and then mounted to the machine with folder1. This process gets around having to use ssh but is less secure and should only be used in secure private networks, like at your home.
Local Backup
sudo rsync -azvv /home/path/folder1/ /home/path/folder2
Backup Over Network
sudo rsync --dry-run --delete -azvv -e ssh /home/path/folder1/ remoteuser@remotehost.remotedomain:/home/path/folder2
An explanation of above options to commands:
--dry-run This tells rsync to not actually do anything. It will just write a log of what it would do to the screen. Once you've made sure everything will work as you expect, you have to remove this option, and run the command again to perform the actual backup.
--delete deletes files that don't exist on the system being backed up.(Optional)
-a preserves the date and times, and permissions of the files (same as -rlptgoD).
- With this option rsync will:
Descend recursively into all directories (-r),
copy symlinks as symlinks (-l),
preserve file permissions (-p),
preserve modification times (-t),
preserve groups (-g),
preserve file ownership (-o), and
preserve devices as devices (-D).
-z compresses the data
-vv increases the verbosity of the reporting process
-e specifies remote shell to use
/folder1 and folder2 In the examples above, folder1 and 2 are placeholders for the directories to be synchronized. Folder1 is the original folder, and 2 is the new folder, or existing one to be brought in sync with the first. Replace them with the folders you'd like. A / was added after folder1 so that only the contents, rather than whole folder, would be moved into the second.
A complete synopsis of all the options with the rsync command can be found in the man pages under "Options Summary". The man page for rsync can also be found on linux.die.net
Grsync
Grsync is a GUI frontend for the rsync utility. The simple interface of the GUI exposes many of the basic options available with rsync. It is useful for those who prefer not to use the command line.
Installation
The program grsync does not come installed by default on Ubuntu or any other distrubtion but it is easily available from the main Repositories. To get grsync ensure Universe section of the Ubuntu repositories is enabled in your Software Sources. Then to install this software in Ubuntu, install the following package: grsync.
Configuration
To start up grsync go through the following menus: Applications --> System Tools --> grsync. Upon start up you'll be presented with the main window, where all the configuration takes place.
On this window are all of the options most users will ever need. To explain, the options will be listed and their effects mentioned.
Sessions - This function is the same as profiles in others. Each session will store a different set of source and destination directories, as well as the configuration options associated with the folder pair. This allows for the synchronization of different sets of folders according to different options.
Management of sessions is simple, simply push the Add button to add a new one. To delete, select the session you no longer want from the drop down and push Delete.
Source and Destination - These two boxes list the two folders (technically referred to as directories) that will be synchronized. The top one is the Source and the bottom the Destination. So when you Execute the synchronization, the files from Source will be copied over to the Destination according to the options a user selects.
To specify the directories either Browse for them from the GUI or type them in according to the standard path conventions.
Switch - The universal reload sign located to the right of the Browse buttons is a handy button. It will instantly switch the Source with the Destination.
Import and Export - After having configured sessions, a user may want to back them up for storage. To do so, simply go to the Sessions Menu at the top and select either Import or Export. The former will restore a session from a backup previously made, the latter will make a backup of the current session.
Note: This backup function works on a per session basis. This means, each session you want to back up must be selected from the drop down and then backed up. If you have 3 different sessions, select each in turn and Export them. Same when importing sessions.
Basic Options - Most users will find most of the options they will ever need here. The first four will preserve the properties of the files transferred. The others will modify how the files are copied. For more information on what each does specifically, hover your stationary cursor over the option and it will display a small explanation. The options checked are of course the ones that will be applied during the session.
Advanced Options - This tab holds more options, many are useful and self-explanitory. For those not understood, tooltips will be displayed when the mouse remains over an option long enough.
Additional Options - This entry box allows the input of additional options not presented in the GUI but known to the user. Use is suggested only for experienced users, inputting malformed options may have unexpected consequences.
Simulation and Execution
The last two buttons on the window are Simulation and Execute. The button for simulation is very useful when uncertain what will happen based on the options selected. The normal transfer dialog screen will pop up and in the main pane, a list of files that would have been copied over is listed. The user can then verify if this is as desired or make changes. Once the session is initiated with the Execute button, the dialog will appear again but this time it will actually process the folders accordingly. Ensure before pushing Execute that you are happy with the simulation.
Remote Backup
Backup over a network is possible, preferably the user should mount the network share to be backed up to prior to launching the program. The share would then be listed in the Browse GUI and could easily be added. There is no separate section for network, if more advanced features are required the user is encouraged to look at alternatives, of which there are many.
Alternatives
There are many alternatives, in various stages of development. For an incomplete list, see here.
Rsync Daemon
The rsync daemon is an alternative to SSH for remote backups. Although more difficult to configure, it does provide some benefits. For example, using SSH to make a remote backup of an entire system requires that the SSH daemon allow root login, which is considered a security risk. Using the rsync daemon allows for root login via SSH to be disabled.
Configuration of the rsync Daemon
1. Edit the file /etc/default/rsync to start rsync as daemon using xinetd. The entry listed below, should be changed from false to inetd.
RSYNC_ENABLE=inetd
2. Install xinetd because it's not installed by default.
$ sudo apt-get -y install xinetd
3. Create the file /etc/xinetd.d/rsync to launch rsync via xinetd. It should contain the following lines of text.
service rsync { disable = no socket_type = stream wait = no user = root server = /usr/bin/rsync server_args = --daemon log_on_failure += USERID flags = IPv6 }
4. Create the file /etc/rsyncd.conf configuration for rsync in daemon mode. The file should contain the following. In the file, user should be replaced with the name of user on the remote machine being logged into.
max connections = 2 log file = /var/log/rsync.log timeout = 300 [share] comment = Public Share path = /home/share read only = no list = yes uid = nobody gid = nogroup auth users = user secrets file = /etc/rsyncd.secrets
5. Create /etc/rsyncd.secrets for user's password. User should be the same as above, with password the one used to log into the remote machine as the indicated user.
$ sudo vim /etc/rsyncd.secrets user:password
6. This step sets the file permissions for rsyncd.secrets.
$ sudo chmod 600 /etc/rsyncd.secrets
7. Start/Restart xinetd
$ sudo /etc/init.d/xinetd restart
Testing
Run the following command to check if everything is ok. The output listed is just a sample, should be what is on your shared remote machine. Hostname can be replaced by the IP address of the machine.
$ sudo rsync user@hostname::share Password: drwxr-xr-x 4096 2006/12/13 09:41:59 . drwxr-xr-x 4096 2006/11/23 18:00:03 folders
Backup With Rsync and Ssh
(scroll to bottom if you want a much less informative synopsis of what will be covered)
- When I first began tinkering with this idea, the whole SSH thing kind of confused me, mostly because I didn't think SSH would be easy for an end user to utilize. While SSH is very complex in design, they've made it super easy for the end user to set up an authentication key set. Essentially, SSH is a 1 to 1 authenticated connection that can be obtained without a password. Once this is in place, you can utilize rsync to run automatically. Before we begin, please ensure you have openssh-server installed on your file server in question.
sudo apt-get install openssh-server
Next, we need to set up a key pair. You will receive a public key and private key.
ssh-keygen
You will be asked some questions, such as whether or not you want a password to the key pair, etc. I chose no and basically left everything else default. I went with no password because SSH keys are pretty -*- secure, and plus I wanted this to be automated. I was not sure how I could automate this process while still having a password on it.
The public key needs to get copied to the authorized_keys file on the server. Thanks to a handy command, this is painless. Replace jason@192.168.1.150 with what your setup would be.
ssh-copy-id jason@192.168.1.150
It'll ask you for your password. Put in your password to the user account you're authenticating against on the file server. Once done, you should be able to run:
ssh jason@192.168.1.150
If it did not ask for a password and your prompt changed, you're good to go. If it asked you for a password, something is likely off. Please note, if you mess around with the SSH keys (by deleting them, adding new ones, etc.) it'll require a reboot (some people have told me log out + log in works fine too) to reset. I don't know enough about that to explain what's happening besides taking the educated guess that the SSH key is getting locked to your session. Unless you plan to tinker around like I did, where I would delete the SSH keys and re-generate them over and over for learning purposes, you won't run into this issue. But if you do, I wanted to throw this out there.
- So, SSH is set up and you're good to go. Now what? It's rsync's turn. You have opened the door with SSH, now you need to put it in gear with rsync. Rsync is a remote synchronization tool. For my uses, it's pretty much awesome. I suggest you folks read the rsync man page for more information. Just a side note, anybody reading this who uses Linux, please keep man pages in mind. They're quicker than Google. Honestly. You can read them up by going to terminal and typing "man rsync". Of course, you can substitute rsync for any other command to read more about it as well, aka "man cp" etc. The man page will go over the functionality of a bunch of flags. There's a few I personally use and I'll cover them in my own words below. -a Archive mode. This keeps the time, permissions, owner, group, and other various settings the same as the source. I like using -a because it ensures that my data on the file server match my data on the desktop, even down to who owns what and the time stamps. -z Compression mode. I haven't really used this until recently. I'm not sure if I notice a difference because rsync is pretty fast to begin with, but I tack it in there, mostly because, why not? --exclude= Exclude mode. This is if you want to exclude a specific directory, trash, videos, etc. For example, let's say you want to exclude ALL hidden files/folder... you would do --exclude=.* Notice after the equals sign there is a period and *? That ensures you're doing the wild card, meaning EVERYTHING, but only after the period. Since hidden files/folders are began with a period, you can see how it would include .folder1 .folder2 .folder3, etc.
Note - Personally, I would definitely recommend excluding .gvfs. .gvfs is the gnome virtual file system. It essentially acts as a mount point for network resources. Let's say your file server is accessible through .gvfs. If you rsync everything and don't exclude .gvfs, you're in essence duplicating the data on your file server that already exists, because it'll exist in its primary folder, as well as through .gvfs thanks to your file server.
- /home/jason/Documents /home/jason/Music /home/jason/Pictures /home/jason/.gvfs/Documents /home/jason/.gvfs/Music /home/jason/.gvfs/Pictures By excluding .gvfs, you avoid this all together. If you're backing up a home directory, I'd suggest doing it. Using simply --exclude=.gvfs works for me, but if you need the full path, it would of course be --exclude=/home/jason/.gvfs --delete This will delete files on the destination that don't exist on the source. Let's say you have a folder that contains 100 GB of data and it's simply named "data". If you rename it to "data2", your server would contain a copy of data and data2 @ a grand total of 200 GB. If you want the data on your server to be identical, use --delete. If you want to have some sort of "older file redundancy" (I know some people prefer this), don't use --delete. --progress If you run rsync manually, you'll be able to see the progress of what's going on instead of just a flashing cursor. I only use this flag if I want to run the command manually and see what it's doing. I don't bother using this when it's "showtime" and I want it automated in the background. Other than that, it's just about setting up the source and destination. Let's start with the destination, since after all, we're tinkering with SSH here so it's a tad bit different. For the destination, you'll need the user, server, and folder path. As I said, my name is Jason, and my file server is 192.168.1.150. My folder path on my server in particular is /media/NAS/jason. In my case, NAS is a network drive I shared out, so it's pretty specific to my situation. Yours is likely to differ. Tailor the destination to your own situation. If your "backup drive" is /media/storage and you have a folder on storage named frank, then use /media/storage/frank, etc. In my case:
jason@192.168.1.150:/media/NAS/jason
- is my destination. Now, about the sources. They're simple enough, as it's the same as above except it doesn't include user@server. If you want your entire home directory to be synchronized, you can do so with just:
rsync -az /home/jason jason@192.168.1.150:/media/NAS/jason
If you want your entire home directory synchronized but with the exclusion of .gvfs and the --delete flag, use:
rsync -az --exclude=.gvfs --delete /home/jason jason@192.168.1.150:/media/NAS/jason
Getting the jist of it now?
- Note, you can have multiple sources as well, which makes it handy if you only want to back up a few specific folders to your file server. In my case, I had limited file server space, so I only wanted to back up the most important data to my file server, which to me is Documents and Pictures. Example:
rsync -az --exclude=.gvfs --delete /home/jason/Pictures /home/jason/Documents jason@192.168.1.150:/media/NAS/jason
You can then set up a Cron job for this to run at specific times. I never run rsync as root, so when I set it up in Cron I set it to launch as jason and just tagged the above rsync command in.
- I've since moved away from the Cron route. I shut down my computer at night, but my file server stays up all the time, I added an entry in "Startup Applications" to do the backup for me, which is handy because it runs at system startup. I named it NAS Backup and put the above command in the command field. Everything works like a charm with zero input needed from me. images/smilies/guitar.gif Quick tip, if you'd like to check out a decent rsync GUI, fire up grsync. It's easy to use and will help you structure out the rsync command if you're not entirely sure just yet. Just note, there is no --exclude= flag in the GUI, so you'll have to add it manually under Additional Options, but that's pretty -*- easy to do. Grsync also doesn't use -a, but instead it breaks up -a to -t -o -p -g etc. Read the rsync man page under the -a section to see why this makes little/no difference. Once you have it formulated the way you want, you can also do a test run, which is one of the features of grsync to make sure it works properly prior to giving it the green light. Assuming all is well and you're done, you can schedule this grsync job with, you guessed it, either Startup Applications or Cron. Keep in mind, the syntax for it is "grsync -e jobname". So if you named the job "backup", you'd run grsync -e backup. This would be the same for Cron or Startup Applications. I tested it running it in Startup Applications. It comes up with a GUI window when I log in showing me the status of the data transfer. If I go the route with Startup Applications and just throwing the full rsync command in, it does it completely in the background. Depending on how much of a visual status you want may dictate which route you go. At any rate, serious kudos to the SSH, Rsync, and Grsync team, as they've brewed up some very impressive technologies here.
Summary
The above was meant to be super informative. I hope some users can set up a backup system that works for them. Keep in mind, you never know when Mr. HardDrive is going to tank on you, so plan ahead. Below is a rough summary of what you're doing for the users who don't want to read through a mountain of text. Note: Change the below settings to match your setup, unless your name happens to be Jason and your file server happens to be 192.168.1.150.
Server
sudo apt-get install openssh-server
Client
ssh-keygen
Client
ssh-copy-id jason@192.168.1.150
Client
- "Startup Applications" - Select New - Name it backup or whatever you please, and add desired rsync line in the command box, such as:
rsync -az --exclude=.gvfs --delete /home/jason jason@192.168.1.150:/media/NAS/jason
Originally posted The Ubuntu Forums (ubuntuforums.org)