Wednesday, September 05, 2007

Linux Kernel Summit 2007, Day 1, Cambridge, UK

After a pretty uneventful flight (always the best kind) and an easy 2.5 hour bus ride from Heath Row, I'm here and rested. Jonathan will likely have some good coverage for the the mini-summits and linux.conf.eu, and UKUUG which were held here just prior to this year's kernel summit as well.

Ted introduced the summit as the first one outside of North America, and described the schedule as a bit more upleveled than usual. As usual, Ted pointed out that the schedule, location, and content are always open for discussion as the program committee is always trying to make the event as useful as possible.

The first panel was moderated by Greg KH, discussing how the distributors are working with the kernel community, and how they are satisfying end user needs. The first long thread of discussion debated the long cycles for delivering enterprise kernels to end users (2-3 years from feature in mainline to kernel/distribution with related features in the hands of end users, coupled with the fact that most customers test internally for 3-12 months before deploying a kernel). Greg KH proposed updating the linux kernel version more frequently (yearly? twice a year?) within an enterprise distribution. Dave Jones pointed out that just updating the kernel often requires that a number of user level packages also need to be updated, which increases the timeand effort for updating the distribution. The downside of frequent updates is that the amount of kernel regression testing needs to be increased pretty significantly. This might be more testing on Linus' tree, Andrew Morton's -mm tree, etc.

The major theme is how do we get better testing, better review of new code, and ultimately features faster to end users with a higher level of quality and fewer regressions. In other words, more, better, faster. ;) While the enterprise kernels are claimed to be very close to mainline (some key differences from some Enterprise kernels are: Xen, SystemTap, AppArmour, Real Time, utrace, module signing, Novell debugger, some NFS code, etc.), they are unfortunately typically close to an *old* release of mainline, usually 6-18 months older than mainline. And, there are pressures to include capabilities in distributions ahead of mainline for additional vendor and distributor value. These competing pressures - primarily stability and additional features (quickly!) are fundamentally at odds with each other.

Ingo Molnar pointed out that customer upgrades are a very emotional event - the fear and uncertainty of an upgrade balanced against the gratification of new features and capabilities. kABI has some validity as an emotional balance for perception of stability to existing customers.

A sub-thread on hardware platform availability as a distinct problem from some of the additional capabilities offered by distributions. In other words, splitting up new features from new hardware might be an option. But then Greg KH quickly shot that down suggesting that the two problems are too similar to break up and the solution is likely in the same space.

Generally speaking, the problem was recognized, but the only potential solution discussed was to update kernels a bit more frequently in the enterprise version of the distributions, e.g. every 6-12 months or so.

Dave Jones has a list: Myth: Moving to an upstream kernel magically fixes everything. Each Fedora release has about 500 open bugs, about 1500 open bugs for released kernels, those bugs do not match the 1500 bugs in the kernel bugzilla. Some bugs are isolated to bizarre hardware that are hard for most people to pick up. However, many just need a good developer to look into those defects. Very seldom are problems analyzed for root-cause and people often ask for people to "retry with the latest release" just to see if the problem magically went away. For instance, 2.6.22-rc5 doesn't even boot on Dave's laptop, e.g. a regression. SATA tends to be especially bad right now. Suspend/resume works/fails on an almost alternating basis. Laptops are sufficiently distinct that a fix for one laptop often breaks another loser.

General comments centered around the fact that there is not enough focus on defects in general in the kernel community.

Dave Jones has a list: Myth: Moving to an upstream kernel magically fixes everything. Each Fedora release has about 500 open bugs, about 1500 open bugs for released kernels, those bugs do not match the 1500 bugs in the kernel bugzilla. Some bugs are isolated to bizarre hardware that are hard for most people to pick up. However, many just need a good developer to look into those defects. Very seldom are problems analyzed for root-cause and people often ask for people to "retry with the latest release" just to see if the problem magically went away. For instance, 2.6.22-rc5 doesn't even boot on Dave's laptop, e.g. a regression. SATA tends to be especially bad right now. Suspend/resume works/fails on an almost alternating basis. Laptops are sufficiently distinct that a fix for one laptop often breaks another loser.

Debian - drivers is the biggest problem. The number of ethernet, wireless, video cam drivers have a much wider variety of drivers which are not accepted in mainline, unionfs, squashfs (maintainer was scared off)

Linus strongly advocates getting even crappy drivers into the mainline tree as early as possible. Alan Cox was pointing out how buggy that can make the distribution. Linus' point centers around the fact that the code is much more public if it is in the tree, Alan's point is that
the code is never going to get cleaned up and will harm everyone else in the process. Linus is focused on those that are going into distributions *anyway*, so why not get them into mainline. And, the community then has a better chance of fixing them. Wireless drivers should be in mainline; they are sitting in Dave Miller's tree at the moment.

Deepak representing the embedded side doesn't want to put drivers right into kernel.org - because in many cases the code is written for multiple OS's and is so far from kernel coding standards that it is almost completely undebuggable. While his point was recognized, Linus reiterated that getting things into mainline is the best way to get the code cleaned up, and Greg KH reiterated that there are 85 kernel developers just waiting to help maintain drivers in mainline. One of the key problems is that many drivers go with either very, very old hardware or very new hardware, most of which is not available to most people. Greg KH has offered to help with any drivers that are out of tree and need to make it into the kernel.

Labels:

1 Comments:

At 4:38 AM, Blogger Frank Ch. Eigler said...

> While the enterprise kernels are claimed to be very close to mainline (some key differences from some Enterprise kernels are: [...] SystemTap [...]

Please note that this is a mistake: the RHEL
kernel contains no systemtap-specific patches.

 

Post a Comment

<< Home