Thursday, April 09, 2009

KVM update by Chris Wright

At a glance - KVM is a Linux kernel module that turns Linux into a Hypervisor.
Requires hardware virtualization extensions - uses paravirtualization where it makes sense
Supports x86 32 & 64 bit, s390, PowerPC, ia64.
It has a competitive performance and feature set
Advanced memory management
tightly integrated into Linux

A hypervisor needs:
- A scheduler and memory management
- An I/O stack
- Device Drivers
- A management stack
- Networking
- Platform Support code

Linux has world-class support for this so why reinvent the wheel?
Reuse Linux code as much as posssible.
Focus on virtualization, leave other things to respective developers.
Benefit from semi-related advances in Linux.

KVM features
- buzzword compliant - VT-x/AMD-V, EPT/NPT, VT-d/IOMMU
- CPU and memory overcommit
- Higher performance paravirtual I/O
- Hotplug (cpu, block, nic)
- SMP guests
- Live Migration
- Power Management
- PCI device assignment and SR-IOV
- Page Sharing
- KVM autotest

One of they key points is that many of the capabilities come directly from the underlying Linux kernel providing these features. 

Libvirt Features
- Hypervisor agnostic:  Xen, KVM, QEMU, LXC, UML, OpenVZ
- Provisioning, lifecycle management
- Storage: IDE/SCSI/LLVM/FC/Multipath/NPIV/NFS
- Netowrking:  Bridging, bonding, vlans, etc.
- Secure remote management: TLS Kerberose
- Many common language bindings: python, perl, ruby, ocaml, c#, java
- CIM provider
- AMQP agent - High bandwidth, bus based, messaging protocol to enable the ability to manage very large numbers of machines.  Common with Wall Street Linux customers.

oVirt features
- Scalable data center virtualization management for server and desktop
- Small footprint virtualization hosting platform
- Web UI for centralized remote management
- Directory integration
- Hierarchical resource pools
- Statistcis gathering
- Provisioning, SLA, load balancing
- Currently built on top of KVM (not a hard requirement)
- Currently directly built on top of Fedora, but again not a hard requirement
- oVirt is about managing the hardware resources as well as the guests, includes ability to include agents on the guests and monitor the guests that way as well.

Question:  can you use oVirt to manage guests on Amazon's EC2?  In principle, yes, but there would be a bit of work to enable that, mostly because it includes the hardware provisioning access.  In some sense, oVirt because the "owner" of the physical machine to enable virtual machine deployment.  It would depend on libvirt running on the physical nodes within the "cloud", e.g. Amazon's EC2.

Newbie question (yes KVM is also Keyboard/Mouse/Monitor multiplexor, but not in our session today ;-): How does KVM compare to OpenVZ and Xen.  OpenVZ is focused on containers running on a single Linux instance.  The guests are not as isolated and do not include a complete and unique kernel & OS instance as is done with Xen or KVM.  OpenVZ is more like chroot on steroids.  Again, more like VMware's ESX.  Xen is basically a microkernel approach to a hypervisor where KVM is a "macro kernel" approach.  Xen allows modifications to a virtual machine to run as a paravirtualized system for performance considerations.  KVM can run a .vmdk image but it is probably more useful to convert to a KVM-friendly format.   Paravirtualizing I/O reduces the enormous number of traps that are otherwise present in a fully virtualized hardware environment where the hardware does not have full IO virtualization or an IO MMU.

<at this point we hit the break but Chris will go into his next 40 slides for those that want to stay ;) >

KVM Execution Model
- Three mdodes for thread executiion instead of the traditional two
  . User mode
  . Kernel mode
  . Guest mode
- A virtual CPU is implemented using a Linux thread
- The Linux scheduler is responsile for scheduing a virtual cpu, as it is a normal thread.

Guest code executes natively apart from trap'n'emulate instructions
- performance critical or security critical operations are handled in kernel, such as mode transition or shadow MMU.
- I/O emulationand management handled in user space such as qemu derived code base and other users are welcome.

Large page allocations are currently pinned and never swapped out.  This is a current downside to using large pages within KVM.

KVM Memory Model
- Guest physical memory is just a chunk of host virtual memory os it can be
  - swapped, shared, backed by large pages, backed by a disk file, COW'ed, NUMA Aware
- The rest of host virtual memory is free for use by the VMM, low bandwidth device utilization

Linux integration
- Preemption (and voluntary sleep) hooks:  preempt notifiers
- Swapping and other virtual memory management:  mmu notifiers
- Also uses the normal linux development model including small code fragment, community review, fully open

MMU Notifiers
- Linux doesn't know about hte KVM MMU
- So it can't:  flush shadow page table entries when it swaps out a page (or migrates it, etc.); Query the pte accessed bit when it determines the recency of a page
- Solution:  add a notifier for 1) tlb flushes 2) access/dirty bit checks
- With MMU notifiers, the KVM shadow MMU follows changes to the Linux view of the process memory map.
- Without this, a guest would be able to touch all user pages and the base Linux wouldn't know that those pages could be swapped out.

- Not nearly as critical for CPU/MMU now with hardware assistance; highly intrusive
- KVM has modular paravirt support: turn on and off as needed by hardware
- Supported areas:  1) hypercall-based, batched MMU operations 2) Clock 3) I/O path (virtio) [The last is the most critical currently]
Now native shadow page table operation is generally more efficient than paravirtualization so paravirt is rarely used with KVM today.

Virtio is cool
- Most devices emulated in userspace with fairly low performance
- paravirtualized IO is the traditional way to accelerate I/O
- Virtio is a framework and set of drivers:
  - A hypervisor independent, domain-independent, bus-independent protocal for transferring buffers
  - A binding layer for attaching virtio to a bus (e.g. pci)
  - Domain specific guest drivers (networking, storage, etc.)
  - Hypervisor specific host support

There is a tradeoff between moving driver support into kernel vs user level.  user level provides better security isolation and often negligble performance degradation.  Using dbus for communication is typically about 60 milliseconds (did he really say ms ?? or microseconds? ) either way, it introduces some latency.  Plan to move the virtio drivers back into kernel to measure and see if there is any noticeable difference.

Infiniband and ofed (?) do they work?  IB just works, driver functions, would you have RDMA support right to the target.  Answer is:  It depends.  Certainly possible but it gets pretty complicated.  Without assigning an adapter to the guest it becomes difficult to register a set of pages with the driver for direct DMA.

Xenner is a mode you can run QEMU in.  An independent application that uses KVM.
Emulates the Xen hypervisor ABI - Much, much smaller than Xen
. Used to run unmodified Xen guests on KVM
Has been going on for quite a while and will soon be a part of QEMU directly.

- QEMU improvements and integration:  libmonitor, machine description
- qxl/SPICE integratoin
- scalability work:  qemu & kvm
- performance work
  - block:  i/o using linux aio
  - Network:  GRO, multiqueue virtio, latency reduction, zero copy
- Enlightenment - the ability to receive calls from Windows guests.  Hyper-V requires VT.,,

Main contributors:  AMD, IBM, Intel, Red Hat
Typical open source project:  mailing lists, IRC
More contributions welcome.


At 10:43 AM, Anonymous Anonymous said...

Is there an official feature set list or roadmap of implmented versus planned features to be added to KVM? A time indexed list would be nice. Hard to compare to Hyper-V or Xen or even vSphere without this. Official KVM website removed the only link to a roadmap that was there?

At 10:44 AM, Anonymous Anonymous said...

Is there an official feature set list or roadmap of implmented versus planned features to be added to KVM? A time indexed list would be nice. Hard to compare to Hyper-V or Xen or even vSphere without this. Official KVM website removed the only link to a roadmap that was there?


Post a Comment

Links to this post:

Create a Link

<< Home