LVM2 snapshot performance problems

Last update: Mar 17, 2014

Dennis van Dok

Update: I received the following explaination from Andras Korn in February 2012 about why I was seeing the performance issues that I raised in the original writeup. With his permission, I'm including it up front.

In order to maintain snapshot consisteny, the copy-on-write update to the snapshot has to be committed to disk before the write to the origin volume is committed to disk. Otherwise the snapshot might not contain a copy of the original data in case power is lost. Most storage hardware doesn't support write barriers, which means the only way to ensure this kind of consistency is to sync disk buffers of the snapshot before issuing the write to the origin volume. And sync operations are slow on spinning media.

Essentially, having a snapshot turns async writes into sync writes (or, more precisely, every async write can potentially imply an additional sync write). If the snapshot is stored on the same disk as the origin volume, you're also causing a lot of seeking, which slows things down even further.

The good news is that this only happens if you overwrite parts of the origin volume that haven't been copied to the snapshot yet; if you keep overwriting the same bits, things'll be fast again after the first slow write.

Also, snapshots can be mounted read/write too (and they can be merged back into the origin volume, even while they're mounted IIRC). Thus, if you can umount the origin volume and mount the snapshot in its place, you can avoid much of the performance penalty (there'll still be more seeking, but no need to perform an additional sync write for each async write).

That said, it's still preferable from a snapshot performance perspective to store snapshots on SSDs, or to use a filesystem such as zfs (or btrfs?) that doesn't overwrite data in place anyway, making snapshotting cheap.

Nov 1, 2007

Using the snapshotting features of Logical Volume Management (LVM) under Linux causes abysmal disk write performance, it turns out after running some experiments. This undermines the case for using snapshots on highly utilized volumes. I've witnessed performance degradation between a factor of 20 to 30.

There are many uses for LVM, scaling from home computers to large disk arrays. The performance problems that I ran into happened when using LVM snapshots on single disk systems. That is not to say that these problems wouldn't crop up in other situations, so I strongly urge you to do some kind of performance testing if you plan to use snapshots on any sort of system.

Below, I will present a few easy to follow steps to reproduce the tests, even if you don't have LVM configured. All you need is a sizeable blob of free disk space.

LVM has the ability to create a snapshot of a logical volume, which is like an instant copy of the original. Changes to the snapshot are not visible in the original and vice versa. This is done by using a technique called copy-on-write (COW). At the beginning the two volumes are identical, and reading data from the snapshot will refer to the corresponding block in the original volume. But when a write action takes place, first a copy is made of a chunk of the original data. Subsequent reads will either refer to the original data, if the block is still untouched, or to the copied chunk. As more and more changes happen, more chunks are allocated and LVM has to keep track of which chunks are used for which parts of the volume.

Note that it doesn't matter if the write action takes place on the original or on the snapshot: in both cases the copy action has to be done.

How to test the LVM snapshot performance

There are two ways to proceed: the best way is to have a physical device, such as a spare disk or disk partition. If you have no spare disk, you can make do with a large empty file as a loop device. This will taint the performance tests with more overhead, but as we're really interested in the relative performance of snapshots versus no snapshots, this shouldn't matter much.

Setting up a disk or partition

Let's say your disk device is /dev/hdx. First turn it into a physical volume (or PV, this is LVM terminology).

pvcreate /dev/hdx

Using this PV, generate a volume group (VG) named vgtest.

vgcreate vgtest /dev/hdx

Now proceed to 'Setting up the logical volume'.

Setting up a loop device

Create a large empty file in your free space somewhere.

dd if=/dev/zero of=/tmp/bigfile bs=1M count=5000

This creates a 5 GB file full of zeros. Now associate it with a loop device.

losetup /dev/loop0 /tmp/bigfile

Turn the loop device into a physical volume (PV, in LVM terminology).

pvcreate /dev/loop0

Using this PV, generate a volume group (VG) named vgtest.

vgcreate vgtest /dev/loop0

Setting up the logical volume

create the 'original' logical volume on the volume group, but be careful not to use all the available space. We need to leave room for the snapshot's COW chunks. If the VG has 5 GB of space, 3 GB is enough.

lvcreate -L 3G -n original vgtest

For easier testing, let's put a filesystem on it…

mkfs.ext2 /dev/vgtest/original

…and mount it.

mkdir /scratch
mount /dev/vgtest/original /scratch

Testing the performance without snapshots

Generate a 1 GB file in the newly mounted filesystem.

sync; time sh -c "dd if=/dev/zero of=/scratch/moo bs=1M count=1000; sync"

Note that flushing the disk cache is necessary to measure the real performance.

Testing the performance with a snapshot

Create a snapshot of the LV, reserving enough space to allow sufficient changes to be written.

lvcreate -L 2G -s -n snapshot /dev/vgtest/original

(you may need to load the dm-snapshot kernel module first). Now simply repeat the above test and observe the difference.

sync; time sh -c "dd if=/dev/zero of=/scratch/moo bs=1M count=1000; sync"

What is particularly illustrative is letting a vmstat 1 run along in a separate window.

Test Results

Update 30 July, 2009

Results may depend on the used hardware and operating system. I have tested this on several types of systems and a performance hit was seen in every case.

30 July, 2009: CentOS 5, 64 bit

Dell PE1950, hardware RAID 1 with two ST3750640NS drives
CentOS 5.3, Linux 2.6.18-128.1.6.el5xen x86_64

without snapshot:

sync; time sh -c "dd if=/dev/zero of=/scratch/moo bs=1M count=1000; sync"
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 10.3395 seconds, 101 MB/s

real    0m20.285s
user    0m0.000s
sys     0m1.084s

with snapshot:

sync; time sh -c "dd if=/dev/zero of=/scratch/moo bs=1M count=1000; sync"
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 796.627 seconds, 1.3 MB/s

real    16m44.222s
user    0m0.000s
sys     0m1.132s

30 July, 2009: Debian "squeeze", 64 bit

Dell PE1950, SEAGATE ST373455SS
Debian squeeze (testing), Linux 2.6.26-2-xen-amd64

without snapshot:

# sync; time sh -c "dd if=/dev/zero of=/mnt/moo bs=1M count=1000; sync"
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 4.89106 s, 214 MB/s

real    0m11.801s
user    0m0.000s
sys     0m2.660s 

with snapshot:

 
# sync; time sh -c "dd if=/dev/zero of=/mnt/moo bs=1M count=1000; sync"
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 54.0916 s, 19.4 MB/s

real    2m3.906s
user    0m0.004s
sys     0m3.340s 

Closing remarks

I don't want to dismiss snapshots as a useless feature – obviously they are used today by many people for various purposes. But it is a feature that needs to be used with care, because it has definitive performance issues. I was confronted with these problems and surprised by them. More surprising was that I could not find web pages that shout in your face that snapshots spell trouble. All I could find was how convenient they are for doing live backups of your database, which I would like to see in real life on a database that grinds through dozens of transactions per second.

This is not a widely studied subject (by me), so I may be entirely wrong. I would be very happy if other people tried this out and send me their findings.

Update Mar 13, 2009

Several people have sent me mail to comment on my findings. Most of them mentioned they experienced similar problems, so it is good to know I'm not alone! So far I've not had word from the LVM community about this issue.

John Newbigin actually repeated the tests on similar hardware. I include his findings here. Thanks, John!

After some reading and testing on similar hardware, I found that setting the snapshot chunksize to the largest possible value of 512k gave the best results in your dd test.

The default value of 64k seems to low, at least for my hardware. This was on an HP DL380 G5. 6 * 72Gig 10k RPM SAS in hardware RAID5 (Smart Array P400i. 512Meg cache 25% read 75% write).

This is the results I got by using a slightly modified version of your test script:

sync ; time sh -c "dd if=/dev/zero of=asdf bs=1M count=1000 ; sync"
Test on un-snapshotted disk:
5s
Test on the snapshotted disk:
512k   55s
256k   49s
128k   49s
 64k   83s
 32k   63s
 16k  200s
  8k  304s
  4k  625s
On the snapshot:
512k   49s
256k   53s
128k   58s
 64k   58s
 32k   66s
 16k  105s
  8k  169s
  4k  179s

Footnote

When I did a test on the live server with the 512k chunk size, the server froze solid and was reset by the hardware watchdog.

References

my home page