Monday, December 15, 2014

Setting up a minimal rbd/ceph server for libvirt testing

In my last post I talked about setting up a minimal gluster server. Similarly this will describe how I set up a minimal single node rbd/ceph server in a VM for libvirt network storage testing.

I pulled info from a few different places and a lot of other reading, but things just weren't working on F21; trying systemctl start ceph just wasn't producing any output, and all the ceph cli commands just hung. I had better success with F20.

The main difficulty was figuring out a working ceph.conf. My VM's IP address is 1902.168.124.101, and its hostname is 'localceph', so here's what I ended up with:

[global]
auth cluster required = none
auth service required = none
auth client required = none
osd crush chooseleaf type = 0
osd pool default size = 1

[mon]
mon data = /data/$name
[mon.0]
mon addr = 192.168.124.101
host = localceph

[mds]
keyring = /data/keyring.$name
[mds.0]
host = localceph

[osd]
osd data = /data/$name
osd journal = /data/$name/journal
osd journal size = 1000
[osd.0]
host = localceph

Ceph setup steps:
  • Cloned an existing F20 VM I had kicking around, using virt-manager's clone wizard. I called it f20-ceph.
  • In the VM, disable firewalld and set selinux to permissive. Not strictly required but I wanted to make this as simple as possible.
  • Setup the ceph server:
    • yum install ceph
    • I needed to set a hostname for my VM, ceph won't accept 'localhost': hostnamectl set-hostname localceph
    • mkdir -p /data/mon.0 /data/osd.0
    • Overwrite /etc/ceph/ceph.conf with the content listed above.
    • mkcephfs -a -c /etc/ceph/ceph.conf
    • service ceph start
    • Prove it works from my host with: sudo mount -t ceph $VM_IPADDRESS:/ /mnt
  • Add some storage for testing:
    • Libvirt only connects to Ceph's block device interface, RBD. The above mount example is _not_ what libvirt will see, it just proves we can talk to the server.
    • Import files within the VM like: rbd import $filename
    • List files with: rbd list
Notable here is that no ceph auth is used. Libvirt supports ceph auth but at this stage I didn't want to deal with it for testing. This setup doesn't match what a real deployment would ever look like.

Here's the pool definition I passed to virsh pool-define on my host:

<pool type='rbd'>
  <name>f20-ceph</name>
  <source>
    <host name='$VM_IPADDRESS'/>
    <name>rbd</name>
  </source>
</pool>

Thursday, December 11, 2014

Setting up a minimal gluster server for libvirt testing

Recently I've been working on virt-install/virt-manager support for libvirt network storage pools like gluster and rbd/ceph. For testing I set up a single node minimal gluster server in an F21 VM. I mostly followed the gluster quickstart and hit only a few minor hiccups.

Steps for the gluster setup:
  • Cloned an existing F21 VM I had kicking around, using virt-manager's clone wizard. I called it f21-gluster.
  • In the VM, disable firewalld and set selinux to permissive. Not strictly required but I wanted to make this as simple as possible.
  • Setup the gluster server
    • yum install glusterfs-server
    • Edit /etc/glusterfs/glusterd.vol, add: option rpc-auth-allow-insecure on
    • systemctl start glusterd; systemctl enable glusterd
  • Create the volume:
    • mkdir -p /data/brick1/gv0
    • gluster volume create gv0 $VM_IPADDRESS:/data/brick1/gv0
    • gluster volume start gv0
    • gluster volume set gv0 allow-insecure on
  • From my host machine, I verified things were working: sudo mount -t glusterfs $VM_IPADDRESS:/gv0 /mnt
  • I added a couple example files to the directory: a stub qcow2 file, and a boot.iso.
  • Verified that qemu can access the ISO: qemu-system-x86_64 -cdrom gluster://$VM_IPADDRESS/gv0/boot.iso
  • Once I had a working setup, I used virt-manager to create a snapshot of the running VM config. So anytime I want to test gluster, I just start the VM snapshot and I know things are all nicely setup.
The bits about 'allow-insecure' is so that an unprivileged client can access the gluster share, see this bug for more info. The gluster docs also have a section about it but the steps don't appear to be complete.

The final bit is setting up a storage pool with libvirt. The XML I passed to virsh pool-define on my host looks like:

<pool type='gluster'>
  <name>f21-gluster</name>
  <source>
    <host name='$VM_IPADDRESS'/>
    <dir path='/'/>
    <name>gv0</name>
  </source>
</pool>

Wednesday, December 10, 2014

qemu-2.2.0 in rawhide, virt-preview disabled for F20

qemu-2.2.0 was released yesterday, check the release announcement and fine grained changelog. Packages are available in rawhide and fedora-virt-preview for Fedora 21.

But now that Fedora 21 is out, there won't be any new builds for F20 virt-preview. If you want to play with the latest and greatest virt bits, you'll need to update to F21.

Friday, December 5, 2014

virt-manager 1.0 creates qcow2 images that don't work on RHEL6

One of the big features we added in virt-manager 1.0 was snapshot support. As part of this change, we switched to using the QCOW2 disk image format for new VMs. We also enable the QCOW2 lazy_refcounts feature that improves performance of some snapshot operations.

However, not all versions of QEMU in the wild can handle lazy_refcounts, and will refuse to use the disk image, particularly RHEL6 QEMU. So by default, a disk image from a VM created with Fedora 20 virt-manager will not run on RHEL6 QEMU, throwing an error like:

... uses a qcow2 feature which is not supported by this qemu version: QCOW version 3 

This has generated some user confusion.

The 'QCOW version 3' is a bit misleading here: while indeed using lazy_refcounts sets a bit in the QCOW2 file header calling it QCOW3, qemu-img and libvirt still call it QCOW2, but with a different 'compat' setting.

Kevin Wolf, one of the QEMU block maintainers, explains it in this mail:
qemu versions starting with 1.1 can use [lazy_refcounts] which require an incompatible on-disk format. Between version 1.1 and 1.6, they needed to be specified explicitly during image creation, like this:

qemu-img create -f qcow2 -o compat=1.1 test.qcow2 8G

Starting with qemu 1.7, compat=1.1 became the default, so that newly created images can't be read by older qemu versions by default. If you need to read them in older version, you now need to be explicit about using the old format:

qemu-img create -f qcow2 -o compat=0.10 test.qcow2 8G

With the same release, qemu 1.7, a new qemu-img subcommand was introduced that allows converting between both versions, so you can downgrade your existing v3 image to the format known by RHEL 6 like this:

qemu-img amend -f qcow2 -o compat=0.10 test.qcow2
As explained, qemu-img with QEMU 1.7+ also defaults to lazy_refcounts/compat=1.1, but also provides a 'qemu-img amend' tool to easily convert between the two formats.

Unfortunately that command is not available on Fedora 20 and older, however you can use the pre-existing 'qemu-img convert' command:

qemu-img convert -f qcow2 -O qcow2 -o compat=0.10 $ORIGPATH $NEWPATH

Beware though, converting between two qcow2 images will drop all internal snapshots in the new image, so only use that option if you don't need to preserve any snapshot data. 'qemu-img amend' will preserve snapshot data.

(Unfortunately at this time virt-manager doesn't provide any way in the UI to _not_ use lazy_refcounts, but you could always use qemu-img/virsh to create a disk image outside of virt-manager, and select it when creating a new VM.)

Friday, November 28, 2014

x2apic on by default with qemu 2.0+, and some history

x2apic is a performance and scalability feature available in many modern Intel CPUs. Though regardless of whether your host CPU supports it, KVM can unconditionally emulate it for x86 guests, giving an easy performance win with no downside. This feature has existed since 2009 and been a regular recommendation for tuning a KVM VM.

As of qemu 2.0.0 x2apic is enabled automatically (more details at the end).
Priot to that, actually benefiting from x2apic required a tool like virt-manager to explicitly enable the flag, which has had a long bumpy road.

x2apic is exposed on the qemu command line as a CPU feature, like:

qemu -cpu $MODEL,+x2apic ...

And there isn't any way to specify a feature flag without specifying the CPU model. So enabling x2apic required hardcoding a CPU model where traditionally tools (and libvirt) deferred to qemu's default.

A Fedora 13 feature page was created to track the change, and we enabled it in python-virtinst for f13/rawhide. The implementation attempted to hardcode the CPU model name that libvirt detected for the host machine, which unfortunately has some problems as I explained in a previous post. This led to some issues installing 64bit guests, and after trying to hack around it, I gave up and reverted the change.

(In retrospect, we likely could have made it work by just trying to duplicate the default CPU model logic that qemu uses, however that might have hit issues if the CPU default ever changed, like on RHEL for example.)

Later on virt-manager and virt-install gained UI for enabling x2apic, but a user had to know what they were doing and hunt it down.

As mentioned above, as of qemu 2.0.0 any x86 KVM VM will have x2apic automatically enabled, so there's no explicit need to opt in. From qemu.git:
commit ef02ef5f4536dba090b12360a6c862ef0e57e3bc
Author: Eduardo Habkost
Date:   Wed Feb 19 11:58:12 2014 -0300

    target-i386: Enable x2apic by default on KVM