Wednesday, September 05, 2007

Kernel Summit: Mini Summit Readouts

The first mini-summit readout was from the Linux Power Management by Len Brown. Suspend to RAM is the primary poster child. The most visible is often the video support. Video restore is a bit different in process from, say, Windows, which adds some interesting challenges. Keith Packard is helping with the Intel drivers in this space. ATA might be functional in 2.6.23, although it is not enabled by default. The distros are currently enabling it, however. People are now concerned about how fast suspend-to-ram is, with OLPC seeing +90ms for USB resume (Greg KH just pointed out that that was fixed; Len hopes to hear confirmation from Marcelo), there is also a video sync issue on resume as well as an audio "pop" issue. And, device power management is currently "joined at the hip" with the hibernate implementation. This is not necessarily a good thing. Andrew Morton asked who maintains "suspend to ram", Arjan asked if there was a design for "suspend to ram". Len replied that "the community" maintains suspend to ram, and there was a document describing parts of suspend to ram. However, Len points out that suspend to ram support is very under-resourced today. And Kleen pointed out that in part this is a driver problem since it requires support from every driver which is in use on a given hardware component. A maintainer with a clear vision for how suspend-to-ram would be a great thing.

Suspend2 is now officially TuxOnIce and is out of tree. No plans for it to come back into tree but it is a very popular with some people. It is reportedly a more friendly community for support which might be a reason for the popularity of suspend2. Rafeal is actually doing a fair job of supporting suspend-to-ram today according to Linus. However, getting Radeon drivers to suspend and resume correctly is a crap shoot because of the complexities of the firmware, drivers, etc. Arjan is arguing strongly for some sort of a HOWTO - there is some push back but in general, Arjan's point is to document what is working today and let the

cpufreq: p-states break "idle accounting", fewer governors is better today. Dynamic Power Management (DPM) has been out of tree, although there is no longer general disagreement about the approaches within the DPM community.

Ted did an update on the Filesystems BOF that was held at FAST in February. Most of the session was a read-out of the work that people were doing. About 50% of the attendees were research focused as opposed to Linux developers. Part of the result of that was some direction of the research folks on what the key issues were that would need to be addressed. In this space, there was some progress on the unionfs capabilities. On the downside, there wasn't enough time to really dive into relevent Linux topics. A full write up is available at the Usenix site.

James Bottomley represented the remainder of the IO summit, which included some discussion of problems in fibrechannel (I missed the details), and a discussion of upcoming new technologies such as disk drives fronted by solid state/flash devices. There was some concern about performance and how quickly the flash devices might fail. The disk guys were saying that the disk driver sector interface was preventing the underlying firmware from doing content motion based on hot spots, bad blocks, etc. They were hoping to redesign disk drives to support objects instead of being sector based. This has large impacts on things like RAID, for instance. This would obviously break all elevator algorithms today, since it is impossible for the OS to determine where any two blocks are on disk. The technology is still about 5 years out from a disk-only point of view, although this technology is in RAID drives today. NFSv3 was implemented on a disk drive as a project once, per Alan Cox. There is some expectation that NFSv4 or pNFS will be implemented first on drives, before a pure object model.

Martin Blight provided an update on the VM summit held on Monday. The first observation was Deja Vu - many topics were covered a year ago. Realistic benchmarks are still difficult to find. Hard to get repeatable tests if swapping is involved. Page replication was discussed, as was slab cache de-fragmentation. One continuing proposal was to split dentry inodes from files in the dentry cache (okay, I think I missed something here) but this is always harder in practice than it is in theory. The anti-fragmentation code is now likely to be merged. Larger order page allocations were discussed for several reasons, primarily though for large filesystem blocks on disk. Containers was another large topic, including how applications interact with each other on a single machine. Google (Paul Menage) has done one solution, Balbir has another, which is a bit more complex but is probably a better long term approach. Another topic was the complete removal of ZONE_DMA and adding a similar capability more tied to actual hardware requirements.

Avi Kivity next provided an updated on the virtualization mini-summit. Lguest, LinuxonLinux, KVM, UML, Vmware, Xen, x86 vendors, s390, ppc, ia64 represented as well. The running joke was that long explanations of x86 functionality requested by the s390 people was usually ended with the comment "oh, I understand now, we have an instruction that does that" ;)

The first topic was on performance, and how the hypervisor needs to present NUMA characteristics to guest, realizeing that those characteristics can change at run time. Also, cooperative paging/hinting, e.g. CMMS patch set would be useful. And, the group noted that hardware is advancing, including NPT/EPT which solve the shadow page problem, vmexit time reductions, and several targetted optimizations.

Another topic was on the interfaces for guest/hypervisor communication. The most important one here is that guest/hypervisor communication must be done via physical pages (the guest and hypervisor typically don't share virtual page mappings which could be used for communication).

Paravirt_ops paravirtualize more or less of the guest. All solutions find Time as a common issue and hardware is unable to help with this.

A major thread was that the virtualization solutions need to MERGE - being out of tree makes use of the capabilities and validation of the features nearly impossible.

And then on to lunch! ;)



Post a Comment

Links to this post:

Create a Link

<< Home