Monday, March 17, 2014

Snapshot support in virt-manager


The biggest feature we added in virt-manager 1.0 is VM snapshot support. Users have been asking us to expose this in the UI for quite a long time. In this post I'll walk you through the new UI.

Let's start with some use cases for VM snapshots:
  1. I want to test some changes to my VM, and either throw them away, or use them permanently.
  2. I want to have a single Fedora 20 VM, but multiple snapshots with mutually exclusive OS changes in each. One snapshot might have F20 libvirt installed, but another snapshot will have libvirt.git installed. I want to switch between them for development, bug triage, etc.
  3. I encountered a bug in the VM. I want to save the running state of the VM incase developers want further debugging information, but I also want to restart the VM and continue to use it in the meantime.
The libvirt APIs support two different types of snapshots with qemu/kvm.

Internal snapshots


Internal snapshots are the snapshots that QEMU has supported for a long time. Libvirt refers to them as 'internal' because all the data is stored as part of the qcow2 disk image: if you have a VM with a single qcow2 disk image and take 10 snapshots, you still have only one file to manage. This is the default snapshot mode if using the 'virsh snapshot-*' commands.

These snapshots can be combine disk and VM memory state for 'checkpointing', so you can jump back and forth between a saved running VM state. A snapshot of an offline VM can also be performed, and only the disk contents will be saved.

Cons:
  • Only works with qcow2 disk images. Since virt-manager has historically used raw images, pre-existing VMs may not be able to work with this type.
  • They are non-live, meaning the VM is paused while all the state is saved. For end users this likely isn't a problem, but if you are managing a public server, minimizing downtime is essential.
  • Historically they were quite slow, but this has improved quite a bit with QEMU 1.6+

External snapshots


External snapshots are about safely creating copy-on-write overlay files for a running VM's disk images. QEMU has supported copy-on-write overlay files for a long time, but the ability to create them for a running VM is only a couple years old. They are called 'external' because every snapshot creates a new overlay file.

While the overlay files have to be qcow2, these snapshots will work with any base disk image. They can also be performed with very little VM downtime, at least under a second. The external nature also allows different use cases like live backup: create a snapshot, back up the original backing image, when backup completes, merge the overlay file changes back into the backing image.

However that's mostly where the benefits end. Compared to internal snapshots, which are an end to end solution with all the complexity handled in QEMU, external snapshots are just a building block for handling the use cases I described above... and the many of the pieces haven't been filled in yet. Libvirt still needs a lot of work to reach feature parity with what internal snapshots already provide. This is understandable, as the main driver for external snapshot support was for features like live backup that internal snapshots weren't suited for. Once that point was reached, there hasn't been much of a good reason to do the difficult work of filling in the remaining support when internal snapshots already fit the bill.

virt-manager support


Understandably we decided to go with internal snapshots in virt-manager's UI. To facilitate this, we've changed the default disk image for new qemu/kvm VMs to qcow2.

The snapshot UI is reachable via the VM details toolbar and menu:


That button will be disabled with an informative tool tip if snapshots aren't supported, such as if the the disk image isn't qcow2, or using a libvirt driver like xen which doesn't have snapshot support wired up.

Here's what the main screen looks like:


It's pretty straight forward. The column on the left lists all the snapshots. The 'VM State' means the state the VM was in when the snapshot was taken. So running/reverting to a 'Running' snapshot means the VM will end up in a running state, a 'Shutoff' snapshot will end up with the VM shutoff, etc.

The check mark indicates the last applied snapshot, which could be the most recently created snapshot or the most recently run/reverted snapshot. The VM disk contents are basically 'the snapshot contents' + 'whatever changes I've made since then'. It's just an indicator of where you are.

Internal snapshots are all independent of one another. You can take 5 successive snapshots, delete 2-4, and snapshot 1 and 5 will still be completely independent. Any notion of a snapshot hierarchy is really just metadata, and we decided not to expose it in the UI. That may change in the future.

Run/revert to a snapshot with the play button along the bottom. Create a new snapshot by hitting the + button. The wizard is pretty simple:


That's about it. Give it a whirl in virt-manager 1.0 and file a bug if you hit any issues.

14 comments:

  1. Any plans to support LVM based snapshots ? Where LVM is volume is used for VM.

    ReplyDelete
    Replies
    1. Libvirt supports creating lvm snapshots through its storage APIs, and in virt-manager 1.0 you can create one via Edit->Host Details->Storage->New Volume if you have your volume group configured as a libvirt logical pool.

      But as far as I know they aren't wired up as part of these snapshot APIs, and even if they were it would be subject to the same current limitations of the 'external' snapshots mentioned above.

      Delete
  2. I tried this out, creating and restoring a couple of snapshots. Creating a snapshot took a few seconds. Upon trying a third snapshot, the "Processing..." dialog remained on screen for 9 minutes, with "sudo virsh list" showing the VM as paused.

    There's no cancel button, should there be? Alternatively a progress indicator would reassure the user that snapshot creation hasn't stalled completely.

    Creating a fourth snapshot took only a few seconds. Could the "snapshot hierarchy" (not shown by virt-manager) be affecting performance? If so, that could be a reason for exposing the hierarchy to the user, so they can understand why some snapshots are cheap to create and others expensive.

    ReplyDelete
    Replies
    1. Good questions. Strange that a third snapshot took so long... is this with a VM created on virt-manager-1.0? We use some specific qcow2 features that should make snapshots faster. virsh vol-dumpxml should show 'lazy_refcounts' for the qcow2 disk (new enough 'qemu-img info' will show it as well but it's not in F20).

      As far as cancelling and progress reporting, unfortunately qemu doesn't provide that info for internal snapshots, so there isn't much we can do at the moment.

      The hierarchy in this case is actually entirely a libvirt concept, as reported by qemu internal snapshots are just a flat list. There might be some concept of hierarchy internally but I don't know if it relates to snapshot performance.

      Delete
    2. This comment has been removed by the author.

      Delete
    3. The VM image file is qcow2, however it was created on a previous version of Fedora (F19 perhaps even F18). I've upgraded the host as new Fedora releases have become available.

      Here's the output of "virsh vol-dumpxml", note no sign of any "lazy_refcounts".

      $ sudo virsh version
      Compiled against library: libvirt 1.1.3
      Using library: libvirt 1.1.3
      Using API: QEMU 1.1.3
      Running hypervisor: QEMU 1.6.1

      $ sudo virsh vol-dumpxml --pool SATA OpenSUSE_13.1.img
      < volume>
      < name>OpenSUSE_13.1.img< /name>
      < key>/media/sata/VirtualMachines/OpenSUSE_13.1.img< /key>
      < source>
      < /source>
      < capacity unit='bytes'>52428800000< /capacity>
      < allocation unit='bytes'>27343843328< /allocation>
      < target>
      < path>/media/sata/VirtualMachines/OpenSUSE_13.1.img< /path>
      < format type='qcow2'/>
      < permissions>
      < mode>0600< /mode>
      < owner>0< /owner>
      < group>0< /group>
      < /permissions>
      < timestamps>
      < atime>1395227630.180188816< /atime>
      < mtime>1395227325.422097932< /mtime>
      < ctime>1395227325.422097932< /ctime>
      < /timestamps>
      < /target>
      < /volume>

      $ sudo qemu-img snapshot -l /media/sata/VirtualMachines/OpenSUSE_13.1.img
      Snapshot list:
      ID TAG VM SIZE DATE VM CLOCK
      1 Running VM 900M 2014-03-18 18:11:03 04:15:05.825
      2 off1 0 2014-03-19 11:08:44 00:00:00.000

      $ qemu-img
      qemu-img version 1.6.1, Copyright (c) 2004-2008 Fabrice Bellard

      The tests yesterday were of running VM, i.e. Memory and Disk snapshots.

      I can recreate the VM with a fresh disk image file and repeat some snapshot tests if that would help. That would eliminate effects of images created with older tools.

      [BTW, Blogger mangled the XML output from vol-dumpxml, I modified it to compensate]

      Delete
    4. You can convert your qcow2 image to use the performances improvements with:

      qemu-img convert -f qcow2 -O qcow2 -o lazy_refcounts=on,compat=1.1,preallocation=metadata $ORIGDISK $NEWDISK

      I assume that preserves snapshot data, but I'm not positive. Maybe want to try deleting the snapshots first if they aren't important to you. Hopefully that speeds things up. If not please file a bug in bugzilla.redhat.com against Fedora 20 qemu

      Delete
  3. Yes, it seems to preserve the snapshots. (Just tested.)

    ReplyDelete
  4. What version use virt-manager? I user Fedora 22 and Virtual Machine Manager 1.2.1, snapshot button not have this version...

    ReplyDelete
    Replies
    1. It's definitely there in f22... maybe the icon is different for you if you aren't using gnome

      Delete
  5. Hello Cole,

    I've recently created a snapshot of a Windows VM in a paused state. Its physical disk size exceeded its virtual size, which is OK and expected.
    However, I've just deleted the snapshot, its physical size has not changed and there is a discrepancy between the latter and what is returned by "qemu-img info":

    qemu-img info ./KVM-Windows-10.qcow2
    image: ./KVM-Windows-10.qcow2
    file format: qcow2
    virtual size: 50G (53687091200 bytes)
    disk size: 54G
    cluster_size: 65536
    Format specific information:
    compat: 1.1
    lazy refcounts: true
    refcount bits: 16
    corrupt: false

    ls -al KVM-Windows-10.qcow2

    -rwxrwx--- 1 root root 76581896192 Dec 18 10:45 KVM-Windows-10.qcow2

    Should I file a bug report on the official site?

    ReplyDelete
    Replies
    1. I don't if that's expected or not, but if it _is_ a bug it's likely at the qemu level. You could mail qemu-devel list if you're curious

      Delete
  6. do you have any best practice for creating snapshots of multiple disks from the same guest, from the command line?

    ReplyDelete
  7. Is there a possibility to create 2 different overlays from one base image using libvirt API?

    [FedoraBase.img] ----- <- [Fedora-guest-1.qcow2]
    \
    \--- <- [Fedora-guest-2.qcow2]

    # qemu-img create -b /export/vmimages/RootBase.img -f qcow2 \
    /export/vmimages/Fedora-guest-1.qcow2

    # qemu-img create -b /export/vmimages/RootBase.img -f qcow2 \
    /export/vmimages/Fedora-guest-2.qcow2

    ReplyDelete