Friday, September 21, 2007

Linux Kernel Summit 2008

Well, it isn't official yet, but it looks good that the Linux Kernel Summit will be in Portland in 2008. The program committee for the kernel summit pretty unanimously sees it as a good idea to adjoin the conference with the Linux Plumber's Conference (covered in an early entry). This should help out the Plumber's Conference as well as encourage collaboration across the various projects.

Growing Linux Desktop market share

I have been reading this article over at Information Week over and over the past few days because it bugs me. It bugs me for several reasons, the most important reason being that part of it is right. Now, I think a few of the "7 reasons" are bogus and irrelevent but a few are close to correct. And, the premise that this is not the Year of the Linux Desktop, and that next year is probably not the Year of the Linux Desktop is valid. And it won't be the Year of the Linux Desktop until a few things change. The good news is that some of those things that need to change have started or are slowly inching forward. But there are still a few issues which were touched upon in that article that have not yet been addressed at all.

First, let me do a quick (very quick) commentary on the key points from that article, followed by a few thoughts on what is missing. Alex Wolfe's 7 points are (slightly simplified):

1. Prohibitive application porting costs
2. Fanboy alienation
3. You can't make money on the operating system
4. Resistence from average users
5. Linux is "simple"; Windows "just works"
6. There are way too many Linux Distros
7. No Powerful Evangelist

I think of these, the three that are most important are Applications (and Porting), the "simple" vs. "just works" distinction, and, a bit, the comment about too many Linux Distros. I'll come back to those three in just a moment.

With respect to Fanboy alienation - most end users don't deal with Fanboys, or, if they do, those Fanboys are just as prevalent for any OS, and software, any hardware. And face it, most of these fanboys are die-hard geeks who learn their social skills in SecondLife. I don't think that is helping or hindering Linux in any material way. As far as making money on the OS - OS's haven't been a profit center for anyone *except* Microsoft for over 10 years now, maybe closer to 15. In the days of the Unix wars, the investment in the OS was high but the derived value was always in the enablement of hardware or applications. That surely hasn't changed. As relates to resistence from average users, there may be a bit of that because people resist paradigm changes that don't add value to their lives. And, in many cases, Linux provides "equivalents" for Windows applications, not clearly superior applications. And, of course, there are many applications missing or simply hard to find/identify. Especially when free software doesn't wind up in boxes that people can pick up at Costco or Walmart saying "here's how to do your taxes". And, as far as evangelism, Linux grass roots evangelism seems to be as powerful as any force out there. And, of course, the Linux Foundation is working to provide a focal point for some of that evangelism as well.

However, the three points that I believe are valid from above have slightly different core issues. Yes, the absence of common, key applications on Linux is a challenge. I still struggle with my Linux x86-64 platform because Sun Java plugins for Firefox are not available, Apple's QuickTime isn't available, Macromedia isn't available, etc. Yes, there are workarounds, but they are not bundled in a single "yum" package with dependencies that allows me to "just make it work". But there are other applications that don't provide native Linux ports today - TurboTax, PhotoShop (sure, gimp is more powerful, perhaps, but as a photo editing novice, I was able to do more with PhotoShop in a few hours than I've been able to do with the gimp in several days), and other desktop software. On the other hand, nearly every major application that runs on Windows servers has a strong equivalent or a direct port to Linux, such as databases, web servers, PHP, perl, python, backup/restore software, etc.

I think the real problem here is that interactive applications may be a bit more difficult to write for Linux, in part because things like sound, graphics, panels for print, print preview, document layout, etc. have so many options for the developer to choose from. The Portland Project is helping to standardize many of these with the goal of simplifying the writing/porting of desktop applications. This is one excellent activity that came out of OSDL/The Linux Foundation pulling together the various Linux Desktop Architects and a number of desktop ISVs. Of course, reaping the full benefits from this activity are still a year or two out. I could drone on more about the challenges here and some of the other activities (like the Linux Standard Base) but I'll leave those for another time. The summary though is that there are some challenges in writing/porting desktop applications to Linux, although usually it isn't porting cost that stops companies from having Linux applications, but the ongoing maintainance and support costs based on diverging platforms, changes in the underlying environment, and the required QA that needs to go into the release.

I see a parallel between Linux and IBM internally that parallels Alex's "simple" vs. "just works" argument, although I'd phrase it a bit differently. Linux (and IBM) have *great* technology that can solve nearly any problem, address nearly any workload. However, translating that technology into desktop value, business value requires an expert in those technologies. IBM has started delivering "Express" solutions which provide less functionality that is more accessible, and in some sense, that is what the various distros try to do for a given target end user. But, that leads to the last point that I wanted to address, and what may actually be the most important overall.

I believe there *are* too many Linux distros. Well, maybe not "too many" exactly, but too many that are too different in all of the minutia that matter to application writers, end users, systems administrators, etc. And, most of those differences do not add any value to any of the distros - pulling in a slightly newer version of one library here that is incompatible with some other application or library that has several new features, or changing the location of repositories or the directory structure to match one distros view of "the best" location. Many of those types of changes wind up creating fragmented support forums on a per distro basis (how do I get my TV tuner card to work? Well, what distro and what version of that distro are you running?). And, this proliferation of distros makes it especially hard for the ISV's writing software to know how to adapt to a given distro. Sure, the LSB helps to address some of those, but the LSB is like a color by number book, and to be honest, most cool applications are closer to Van Gogh (or maybe Picasso) when it comes to creativity, and that creativity often requires capabilities beyond the LSB.

My radical answer to this problem, which might also solve some of the desktop adoption issues: Start with a single, common distro as the core for those hundreds of other distros. Something like Fedora or Ubuntu, which are already pretty common. What if Fedora were the basis for OpenSuSE and Ubuntu, for instance, and then those distros added their value *above* the basic capabilities that most ISV's ported to? As we pointed out above, most of the desktop distros are given away for free anyway - there is no value add returned for all that investment in the desktop, typically. The real value is still being added above some basic/core level. Why not consolidate on a single base building block core and give up the rat race of adding negative value at great development cost to many independent distros, and focus on the value add above that line?

[John Cherry had some excellent comments (below) and also posted an article on his blog at the Linux Foundation which is worth reading!]

Labels:

Thursday, September 13, 2007

New Linux Conference: Linux Plumbers

The word for a while has been that Portland is a hot bed of Linux and Open Source developers. It looks like they have decided to put that concentrated open source expertise to a new use: a conference dedicated to developers, with the goal of collaborating around some of the remaining hard problems. Oh, and, no, the conference is not exclusive to Portlanders or web-toed Pacific Northerwest denizens. In fact, the hope is that the conference will draw developers into the northwest to draw on the rich expertise available and also provide access to a wider cross section of developers that is present at some of the other conferences.

One might ask, though, is one more conference one conference too many?

My answer: I don't think so. There *are* a lot of conferences and events out there today: LinuxWorld Expo (for C* level execs, possibly dying as Linux reaches mainstream corporate adoption), Ottawa Linux Symposium (hitting its 10th anniversary, going strong as a place to present current work as a user, a developer, or a systems administrator), the Linux Kernel Summit (invitation only, Linux kernel centric), Linux.conf.au (a great conference, but hey, it is way over there down under and all!), the new Linux.conf.eu (great for the europeans, if still a little small), Linux Kongress (supplanted by the Kernel Summit and Linux.conf.eu this year, back again in 2008).

Whew, that's a lot of Linux boondoggle travelling for the well funded Linux (or Open Source) geek. However, note that the only one of those in the US is for people that probably never even install Linux, but ultimately decide if it has business benefits for them. The only other one in North America is targetted rather broadly at presentations of current activities in the kernel and nearby user level, often by the experts in those fields. While OLS provides BoF's (Birds of a Feather) session for people to chat about common interests, there is no drive or mission to actually converge on solutions at OLS (although that does happen on occasion).

What the Plumber's conference seems to provide is a forum for what I refer to as the "mini-summits" - places where developers can get together to work out key issues. This seems to be a new need and a new phenomena in Open Source, where email, IRC, and mailing lists are king. However, not all issues can actually be resolved in forums where often the loudest or most prolific people can sometimes set the opinion for a group, or where some issues cycle without ending because the collaboration mechanism doesn't seem to drive consensus on tougher issues.

The Plumbers conference should allow developers to talk face to face and work through some of those harder issues, much like often happens inside of companies doing proprietary development today. If you've worked in a proprietary environment, especially one where the development team is local, you've certainly seen the value of pulling the whole team together to hash out issues. I think setting up a scenario where the community can do the equivalent of the "team meeting" - especially for those cross project issues can only help with the overall integration of Linux and increase the value to end users.

Great blog on the Legal mechanisms in place to protect Linux

Andy Updegrove posted a good blog with the reasons behind the upcoming Linux Foundation sponsered legal summits. I highly recommend reading it. It demonstrates one of many methods in progress designed to better develop the legal ecosystem around Linux and to clarify the general legal status of Linux and Open Source. I happen to work at IBM and as many know, IBM seems to hire more lawyers than just about anyone else. And, it tends to hire some very skilled people into those roles. I think many of these lawyers have been in the vanguard of understanding the implications of various open source licenses and their potential impacts on IBM business. Yet, I occasionally encounter lawyers who haven't had the pleasure of working with open source and are at least initially confounded by the differences in the various licenses which surround open source code.

And, having seen how these new laywers come to terms with those issues, often with input from seasoned lawyers, experienced technical and managerial folks, I occasionally wonder how a laywer without that support network would come up to speed on this complex new world that open source provides. From what I've seen thus far, new doctors of jurisprudence rarely get in depth training in school about a ground breaking license like the GPL. And, with the variety of opinions and interpretions often held by all of the arm chair lawyers that the GPL has created, as well as the political and business pressures around open source, I'm not surprised when laywers take the "safe" route of simply recommending "no" when open source comes up.

However, for those businesses that take that simple "no" response and choose to stay out of the open source world, I believe that in many cases they will artificially limit the success of their business operations. Picture, for instance, a world where IBM in 1999 said "no" to Linux, because we were unable to understand the impacts of this upstart GPL license. Sure, someone else would have sooner or later endorsed Linux, in part because the allure to business is so strong, but also because an open minded lawyer actually can come to terms with the GPL. And, to be honest, the legal environment really isn't any more complex than the normal legal wranglings that any large company goes through - it is just that the rules are a bit different, and that in the end, all of the sources released are visible to anyone that cares to look. The extra scrutiny scares laywers (and sometimes developers who are worried about code quality ;-) but in the end, that extra scrutiny also helps ensure that everyone stays within the terms of the respective licenses.

The Linux Foundation's Legal Summits will hopefully blow away some of that fear, uncertainty and doubt, that smokey haze of confusion surrounding open source for those that are less experienced, and allow the legal teams of companies using open source to make informed decisions. Beyond that, it will help companies set up appropriate isolation to prevent cross contamination of source bases when using incompatible licenses (most companies do this today even for non-Open Source licenses) and to understand their rights and obligations when using open source as consumers.

All in all, I see these Legal Summits as a great value add to the Linux ecosystem!

Linux Kernel Summit: Customer Panel slides

The slides for the three presentations made at the Linux kernel summit are available here:

Sean Kamath, Dreamworks

Head Bubba, from Credit Suisse
Markus Rex, The Linux Foundation.

They provide a bit more detail than what I included in my blog. Sean also added comments in the powerpoint/openoffice version which I'll try to put on the site as well.

Amanda at the Linux Foundation also put up a short blog entry about the event.

Labels:

Tuesday, September 11, 2007

Microsoft disses Google on code quality

Okay, I just ran into this article on Google which gave me a couple of laughs. The first was this quote: "...companies should question the actual number of users Google has 'within the enterprise'". Well, I *think* Microsoft was suggesting that there might not be many Google users within an enterprise, and if so, someone must be seriously deluded! I don't know numbers or anything (although I'm sure they are out there) but I bet there are close to as many "Google users" in an enterprise as there are Microsoft users. Of course, many are probably using primarily the search capabilities, but this question could backfire on them if they aren't careful. The other funny point was to beware of 'Google's history of releasing "incomplete products, calling them beta software"'. Well, Duh, wake up and smell the coffee! There is a new trend of people who, for non-business critical apps, are more than willing to use some classes of software before they are "done" - and, in most of those cases, people have the benefit of requesting features and bug fixes which actually get applied in a timely fashion, rather than in a product to be released in three years.

Now, I'm not saying evolutionary development of software with rapid deployment, a la an Agile or Lean development method is right for all software or all business uses, but Hey - the waterfall model is dead, and getting user input during the design and development phase of a product, with rapid responses to that input leads to better products, sooner! Being offended because the world is moving on to something more dynamic is not going to win the heart and soul of businesses that are driven to quickly provide the best services to their employees and customers.

The real issue here is that an old product development method is being challenged by a new development model. And, while I expect that both models will coexist for a long time, anyone burying their head in the sand and dissing the new model because the old is "better" needs to wakeup. Now.

Labels:

Monday, September 10, 2007

Linux Kernel Summit: Future Directions

The last formal session for the kernel summit was to look at the logistics for the next summit. There was quite a bit of discussion about the incestuous nature of the program committee, the elite attitude that the summit appeared to create, and some question of the pedigree of some of the vendors and committee people on the kernel summit discussion list. It makes for amusing reading at times, but some of the concerns needed to be raised and discussed in person. To help improve the neutrality of the discussion, Ted Ts'o introduced the topic but then asked Dave Miller to moderate. Dave happens to be among the most highly respected people of the core, wasnot on the committee the past couple of years but has served on the committee and understands the challenges that such a committee has. As such, he was able to moderate with an eye towards representing the community point of view. I captured a number of the questions/comments raised during that session. Not all questions raised had answers, but many did, and the net summary is that the view is that the program committee is actually doing a good job at pleasing most people most of the time (the highest achievable goal, in many people's eyes ;).

Here are some of those comments, captured mostly via stream of consciousness discussion.

Process Issues vs Technical Discussion
- How was this year? Too much on process issues?

- Holding BOFs in the evening was a bit much
- Is three days a good idea: NO
- BOFs in mid day would be good
- Process was good to have here.
- This has been far more productive than it has been in several years
- Linus still likes the customer/user feedback
- Linus might like to have more (two) different customer sessions?
- Linus really likes the feedback on the process stuff here/face to face
- Hugh strongly agreed on the customer panel.
- Dave Miller pointed out that the customer panels are often
so negative; hearing both good and bad as good
- Benh found it to be amazing

Number of attendees: too large? Too small?
- No larger
- 12 person committee seems large
- Ensures less bias in voting
- Provides shepherds for some of the topic areas
- Most would be invited anyone - Dave Jones doesn't think they
auto-invites are bad
- the invite-only nature causes issues
- Limiting the invitations seems critical; otherwise too many people,
lots of vocal fans
- Elite-ism versus openness issue
- Co-locating with another conference adds value in conjunction with
invite-only
- Important discussions seem to still happen in the short breaks
- Proposal for some smaller discussion groups/parallel panels
- Counterproposal - no small groups (hch)
- Mini-summits before the conference
- HCH strongly believes this should not be panels and lots of
people but should foster cross-polination.
- linux.conf.eu - several attendees who weren't burned out by KS
- KS + OLS usually leads to burnout by end of OLS
- Some went to a pre-minisummit + KS
- Alan Cox - much smaller and you have just "the usual suspects"
and then nothing useful gets done.
- Quite a few new faces - first timers, which is viewed as good
- 5 slot lottery of Maintainers is viewed as a good thing
- Should there be more people here representing lower levels of
user level, e.g. glibc? (they were invited and didn't show)
- After the first round, consider an appeal process for round #2,
per Steve Hemminger?
- Andrew Morton - it is easy for people to fall through the cracks
- Matt Mackall - 2,000 people on the committer list - did we invite
enough of them?
- Andrew - should we consider moving to every other year? Or Dave
Howells - maybe every 18 months.
- Holding this every other year has a corporate downside - it might
fall out of budget.
- Russell King - how well is embedded represented? about 10 people
Would also like to also see some other well-known arm/embedded
people present (Cambridge is the home of ARM, hence would have
expected more people here).
- Should take into account whether someone is local a bit, including
perhaps local funding.
- David Howells - going to Japan might bring in local embedded developers
then, for instance.
- At LCA there will also be a kernel mini-summit.
- Generally people seem to be pretty happy with the program committee.

Organizing committee:
- Picked to have adequate knowledge/coverage of kernel subsystems,
corporate/distro funded work, etc. to pick invitees and topics

- How is the PC chosen? Ted has been chairing and has picked the
program committee. Ted is trying to pick people who have good
coverage of all of the different parts of the kernel, as well as
providing some corporate insight. People who contribute actual
work to the program committee are more likely to get invited
back. People who are interested in being on the program committee
need to let Ted know.

Location next year?

(Here Ted showed some statistics from a survey he did about the time of this years OLS, numbers are from those statistics):

Will people attend KS in US? 12% object, 88% no object
Will people attend KS in Canada? 3% object, 97% no object
Should we coalese KS with another conference? 10% very important, 46% somewhat important, 44% does not matter
Location is much more important for sponsors who care about travel budget.
Approximately 87/188 responses 46% response rate
How many are likely to attend OLS? 18% will, 27% likely, 21% attend if co-located with KS 31% not likely to attend, 4% will not attend

Possible future locations, ranked by interest:
Vancouver
Australia
San Fran
Portland
Seattle
San Diego
Boston
Ottawa

No European Location listed/ request to ping/pong between Europe/US)
We talked about India/Asia/South America as options; very few attendees from there, thus travel long distance for everyone. Dave Miller strongly objects to ruling out China, for instance.

Possible locations considered (including co-location with another conference):

Ottawa w/ OLS (3rd week in July, 2008)
Portland w/ Plumbers Conf (September 2008) - Kristin
Someplace else? LCA Early 2009

Strong votes for going with FOSS in Bangalore
Also recommended China + some conference.

During all of this discussion, Dave Miller was a very strong advocate of taking kernel summit and/or developers to Asia (China, Japan, Korea, etc.) as well as India, South America, and other developing areas to help build a strong global contingency. The general leadership consensus seems to be that there are huge pools of talent to be tapped in those areas, as well as significant Linux usage, but that development and use of Linux are not well represented by the current activities on linux-kernel. Usually building face to face relationships helps expedite and encourage this sort of contribution and should be highly encouraged through both the kernel summit, related conferences, individual developers participating, and through the Linux Foundation's outreach program.

And that took us to the warm beer (did you know you could find Budweiser in Cambridge, UK? I thought England was more or less famous for either its beers or its beer drinking, but come on! Budweiser?!) and then the photo session.

I also recommend checking out the coverage on lwn.net by Jonathan Corbet - he brings insights to his reporting that are well worth the subscription fees for early access, and, if the trivial cost is too great, all of his articles become public after a week. (shameless plug for someone who's editorial and reporting skills I greatly respect!).

Labels:

Linux Kernel Summit: Andrew Morton on Kernel Development

Andrew used a session at the kernel summit to provide general advice to linux kernel developers. Most of the device was targetted at the current process and focused on the representative leadership at the summit, although this advice should be heeded by anyone submitting code to the Linux kernel. Linus frequently jumped in with his own advice and support for Andrew. Here are a few of the excerpts that I captured.

Quilt/Git tree maintainers - please re-sync your tree as soon as you merge with Linus' tree.

At last year's kernel summit, maintainers agreed to send out a one-page summary of what they expected to submit in the next release. Kudos for Roland for actually doing this. The rest of you, please step up to the challenge.

Large patches like code cleanups, white space, lindent, code motion, etc. are proposed to go in right after -rc1. When those are applied and cleaned up - that will probably lead to -rc2. The x86 transmorgification would not be a good example for this, though.

Akpm sends out patches to subsystems maintainers which are only 30% or so applied or rejected by maintainers. Roland requested more clarity in the directions with those emails. Andrew clarified that cc: means review, To: line to the maintainer means to merge it or approve/reject it.

Andrew prefers to merge his 1000 patches *after* submitting the subsystem maintainer trees. Everyone thus usually gets two weeks, and Andrew gets 24 hours to merge 1000 patches. Linus is tired of subsystem mainatiners asking for merges in the last week (last day?) of the merge window. Subsystem maintainers who claim that there are conflicts in the merges are usually wrong and the merge should not take them so long. Linus wants subsystem trees (especially git trees) in the first DAY of the merge window.

Linus also really hates having people say "please pull my git tree" during -rc2 or -rc3. He tends to flame extensively for such travesties, primarily because otherwise he might just pull the tree is silent frustration. In other cases, Linus will silently no longer pull trees if they are outside the merge window. Subsystem maintainers should be enforcing this just like Linus does. Linus was *quite* blunt in his assertion of this policy and gently encouraged (with a large hammer and four letter words) that people help enforce this policy.

ACPI maintainers received kudos for hitting the merge window perfectly in terms of the previously described policy. The networking maintainer (Dave Miller) has been, um, a little more iffy lately, Linus said, somewhat tongue in cheek. All rules, of course, should be interpreted within the guidelines of common sense. Linus suggested that white space cleanups and such should then go into the end of the merge window.

Andrew pointed out that the -mm trees are now once every two weeks or so, as opposed to two a week. He plans to step that rate up again. Mel finds that the -mm tree is usually stable within 2-3 days and can be run on his machines by then. Andrew had promised to try to get daily snapshots out originally and gave up on it because the patchsets were actually seldom stable until he was ready to publish. Ben H. was asking for access to the patches that were going into -mm more frequently, even if the full tree was not functional. The -mm-commits list contains all of these patches as they are applied to the -mm tree.

Andrew believes that there should be some way for the kernel community to deliver user level applications to kernel users. There are about 15 .c files in the Documentation directory. It is not clear yet what the boundaries for inclusion in such a code directory. util-linux package used to be the place for all this per Christoph Helwig. Christoph will help move all the .c files from Documentation into util-linux. Balbir pointed out that there were some example code in Documentation; Christoph pointed out that that was the wrong place for it to be. He felt that util-linux would be the right place even to store basic examples. Maintainers is now Karel Zak (from RedHat) who forked util-linux which was previously unmaintained. The fork is called util-linux-ng.

Andrew was wondering if we were doing a good job of getting things back into the stable tree. He thinks that he is sending about 90% of the right stuff to the stable tree. Greg KH recommends adding stable@kernel.org to the cc: list of any bug fixes that are sent to Linux/Andrew so that they will see when these bugs are committed. There are 80-90 subsystem maintainers - Andrew encourages all of them to be considering which patches should be going to the stable tree.

Semantic checkers such as Sparse and Coverity are good and finding problems; however, some people are doing code motion rather than intelligent evaluation of the code. His example was tests for NULL at the end of a function are often getting moved up in the function, which may be correct, which shuts off the sparse/coverity warning, but in reality, the ideal patch may be to remove that test if that case really can not happen. In other words, use intelligence when fixing problems found by automated tools!

Labels: ,

Linux Kernel Summit: Containers Update

Containers is a significant new development activity in the Linux kernel today. And, it has drawn together a wide variety of independent implementations (out of kernel, today) working towards a common goal. Representing the broader community on the Kernel Summit update was Paul Menage of Google and Eric Biederman, known previously for his work on kexec and kdump, amongst other things. The discussion started with some highlights about containers, including the rather well-agreed to concept of resource namespaces. Namespaces provide a means of grouping various resources, such as process (task) ID's, IPC's, network connections, etc. into discrete "containers" which are can be isolated from other namespaces. Ideally, applications running in a container will only see the tasks and activities related to that application in their view of the world. Also, these individual namespaces can have their resources controlled independently and one containers resource usage should not impact the resource usage of another container. Today, the UTS (utsname) and System V IPC namespace isolation mechanisms are in mainline. The pid namespace is being testined in the -mm tree and is one of the more complex namespaces to isolate. The networking namespace work has complete prototypes, but there are still discussions in progress on how this might go forward in the Linux networking space. Some have suggested that the current Level 2 and level 3 isolation mechanisms are overkill and that netlink offers a mechanism for doing much of this today. The jury is still out on the details of how this will make it to mainline, but hopefully these will be addressed over the next couple of months. Other resources which still need work include time (virtual time, primarily, I believe) and devices, such as pty's. There is also an open question as to whether there are enough CLONE_ bits for the clone() system call given all these new inheriticance properties for use during the creation of a new process or task.

Paul talked a bit about the renaming of his current work to be Control Groups - primarily because his use of the term "containers" overlapped with many of the terms used by other containers groups and both teams were getting confused. Paul talked about CFS and the ability to apply CPU weights to arbitrary groups of processes, about cpusets and some of the rework he has had to do (Andrew Morton volunteered Paul to be the cpusets maintainer since it appears Paul Jackson has taken a sabbatical from Linux Kernel development), about the memory controller work that Balbir Singh is doing, and talked a bit about the problems with the task freezer for freezing and unfreezing tasks. He also mentioned an NSProxy as a way to tie namespaces to control group, and talked about how aggregated limits and controls for swap, disk IO, dirty pages, and network restrictions could be done.

Paul also addressed the common questions about: Well, Why not just use existing interfaces such as.... setrlimit: which can only restrict tasks to simple numerical limits, with no generic support for aggregate limits, and which are only settable on the current process; or uid/gid/pgrp/session concepts: which have historical semantics which can not be co-opted, can only (generally) be set on the current process, and can't be set to arbitrary values. In other words, they don't have much value in allowing system administrators to group and manage processes and their resource consumption.

Additional benefits include the fact that control groups can be nested; they form a strict hierarchy. And, their semantics when nested will depend on the specific resource controller used. Paul pointed out that while the framework has no real measureable performance impact, various resource controllers may trade off throughput for quality of services.

When people asked about the size of the overall namespace code, the answer was that mostly existing lines of code are modified, with a minor addition of code on top of the modifications.

When people asked how close containers are to being able to replicate the functinality of Solaris Zones, the most significant missing component today is the networking code. Once that is worked out, the differences in capability between Zones and Containers is relatively small. And, chances are that Containers will provide some flexibility above what Zones provides in the foreseeable future.

Labels: ,

Thursday, September 06, 2007

Linux Kernel Summit: Memory Management

Next Peter Zijlstra and Mel Gorman led a memory management session. There was some discussion of variable page sizes, with Linus expressing a strong preference for continuing the hugetlbfs strategy, including allowing creation of separate hugetlbfs segments of distinct, native hardware page sizes. In other words, an administrator would create potentially a large page region of 1 GB pages and another of 16 GB pages on a platform that supported those two page sizes. Then the applications would be responsible for explicitly accessing the page regions of the appropriate size. There were several people arguing strenously for a more automated usage of various large pages sizes without requiring application awareness, such as is done in Solaris. Linus would have none of that, arguing that the complexity and maintainability of the MM subsystem would be severely compromised and there would be new and unpleasant application side effects when large pages were not available. The impacts on MM locking in general would probably ripple widely throughout the kernel. The largest concession that Linus made was that extracting common library functions from the mainline path and the hugetlbfs code to simplify maintainance would not bother him at all - a very pragmatic assessment.

The other issue that Mel confronted was that memory management workloads tend to vary widely and to be generally complex. Plus, the various MM contributors seem to be very inconsistent when requiring testing to validate a new patch or change in that area. Specifically, Mel tried to get a clear answer to what type of workload or testing would be acceptable to all in validating performance related patches against the MM subsystem. There was never a clear answer to the question, mostly with the answer "it depends" or "it depends on the patch". But no workload was validated as a "good" workload and microbenchmarks were pretty clearly not the right answer for most of the MM patches. There was an expectation that people working in this space would submit performance comparison numbers for some valid workload showing the value of the patch, but Mel responded that most workloads have been rebuffed with the response that they are not reasonable or valid workloads.

I don't think that conflict was ever resolved, and was left as an open "do the right thing" sort of response. It will be interesting to see how the next few conflicts like that are resolved on LKML or linux-mm.

Labels:

Linux Kernel Summit: Scalability (and Embedded)

Nick Piggin and Matt Mackall led the discussion on scalability - both to larger AND smaller machines. Nick didn't have a lot of detail to discuss on scaling up. In general, Linux today addresses many of the key scalability issues, although there were a couple mentioned during an earlier session related to filesystem recovery speeds that still need to be addressed. Specifically, ext3 filesystems (or ext4) over 16 TB or so are taking much longer to recover, and as filesystems grow, the time to repair after a critical failure can be unreasonably long. However, Nick mostly focused on memory & processor scalability, and there was some mention of some bottleneck in block IO scalability that I didn't catch, although it sounded like it was possibly already being addressed.

Matt Mackall mentioned a few things about the embedded world and the kernel footprint which seems to be growing at a reasonable pace. Apparently the pressure to keep the cache footprint for large systems small has some synergy with the embedded needs for a small memory footprint.

Generally, though, this session didn't have a lot of material and finished early.

Labels: ,

Linux Kernel Summit: Real Time and Scheduler

So, I missed part of the real time and scheduler discussion since I was out in the hallway discussing SystemTap's challenges with Christoph Helwig. SystemTap specifically is depending on utrace which is not yet in kernel and which has some challenges to be addressed. I got some insight into what the problems are and will see if there are things that the SystemTap team can do to help address those. Anyway... back to the topic at hand, as I re-joined it in progress...

Anyway, Thomas Gleixner, Ingo Molnar and Zach Brown led the session (Thomas and Ingo being primarily focused on real time kernel support, and Ingo and Zach working on syslets). Thomas and Ingo were requested additional testing & usage of the RT kernel by more people, especially those that had additional workloads which might benefit from real time kernel support. Real time has made a lot of progress lately and is running some very intensive real time workloads, including being the basis for the next generation US Navy Destroyer class ship and having been deployed on laser wielding robots (I think Thomas mentioned that they just shipped the 250th laser weilding robot with Real Time Linux) -- OH - and those are the GOOD laser weilding robots - not the bad ones! Mostly these are used in manufacturing for welding and such, not for sci fi horror movies. :)

The other topic, somewhat related, is the syslet and general asynch system call operation, such as is required for good operation of asynchronous IO, e.g. AIO. Suparna pointed out that the corner cases for AIO were tough originally, and there is probably more work to do in that space, although the original AIO work addressed many of the original complexities.

Labels:

Kernel Summit: Customer Panel

Thursday's first kernel summit session was a customer panel, organized by James Bottomley and yours truly. The presenters were Sean Kamath from Dreamworks, Head Bubba from Credit Suisse, and Markus Rex, the new CTO for The Linux Foundation. Sean covered the basics of their configuration and workloads, then pointed out some of the key problems that they see in their IT environment around Linux. These problems centered around memory management and a recent shift in semantics regarding RO/RW mounts of local and NFS filesystems. Some of the memory management problems were reportedly fixed (or at least possibly fixed) since the latest enterprise distribution that most customer are using. However, a number of the questions centered around information that was very difficult to for a corporate IT department to determine definitively and, with some reticence, the group present recognized that as a possible problem. Specifically, understanding what capabilities are available for handling out of memory conditions, how to limit the resident set of a process (or group of processes), and how to determine how much memory is being used by a process (or group of processes) were considered difficult for end users to determine. Further, some of these behaviors are different between versions of mainline and the various distro releases.

Head Bubba began with an intro to Credit Suisse's business and IT environment. He then followed with an explanation of some of the key problems that he sees in his environment. The first observation was that nearly every change to the CPU scheduler has an impact on the behavior of many applications. While those impacts can be easily contained when an application writer knows they are coming, but often those changes are made either silently, or only with notification to LKML, which isn't really sufficient for end users (complementing lots of discussion about the signal to noise ratio on LKML is leading to many fewer readers of LKML lately).

Head Bubba's next mentioned the value of Linux Real Time which is currently in development, and some measurements based on his workload and the benefits seen by using RT. One large area of pain is general access to Linux diagnostic tools. SystemTap was mentioned in particular as was utrace and PAPI. Bubba then reviewed some problems with jitter in TCP/IP. The last area where Bubba focused was iWarp and OpenFabrics and the performance, and, more importantly, the latency of packets received in these high bandwidth interconnect technologies. There are some interesting charts in the slides that describe these problems.

The last speaker was Markus Rex, covering the consolidated list of priorities from the Linux Foundations User and Vendor Advisory Councils. The list started with a high level set of areas that are of interest to these groups, including such obvious things as virtualization, power management, scalability, IPv6 readiness, etc. Markus then drilled into some of the specifics in each area, although the attendees would have preferred to see more specifics in some areas, such as scalability. Within virtualization, for instance, it was most important to be able to run any OS as a guest on any major distribution. Right now the paravirt_ops is generally moving in that direction, but the major distros are still not quite there. The hope is that a server or workstation could support running a plurality of distributions as guests simultaneously. Today we aren't quite there with out of the box distributions, although that will likely be present in the next versions of the enterprise distros and most concurrent distros in that time frame. And, with all that flexibliity, end users are going to want to have a single, standardized management interface to all that capability, e.g. CIM interfaces and a unified management application.

Other requested features include power management capabilities for servers, and increased support for device drivers, in conjunction with the community plans for enabling open source device drivers.

The other significant request was for full IPv6 compliance. The US DoD has recently increased the requirements in the IPv6 arena for RFP's and RFQ's (Request for Pricing/Quotes) to include as mandatory many more IPv6 related standards, many of which are not yet implemented by any operating system, including Linux. As a result, there is a full court press to identify the gaps, distribute the work, and achieve compliance in time for the DoD's required dates.

The remainder of the requirements will be able to be found in the slides from the three presenters, which I'll have a pointer to as soon as they are uploaded, probably onto the Linux Foundation site.

Labels:

Wednesday, September 05, 2007

Lightning Talks at Kernel Summit

Ben Herreschmidt started the lightning talks with the topic of drivers and firmware loading. The idea might be to have two suspend calls, one to enable the preloading of firmware if necessary. A GPF allocation might be blocked during a suspend, possibly holding a semaphore, which is a race that would potentially prevent a resume. Ben has had this happen to him once before. Len believes that this should be using system state, which would potentially avoid the problem. Ben also proposed pointing out a couple of drivers as good examples. Response would be to have a set of example drivers or skeleton drivers. Dave Miller pointed out that beauty is in the eye of the beholder, and others pointed out that hardware is more complex than anyone would like and that makes simple drivers hard to create. Ben redirected the context back to suspend/resume in the context of driver examples.

Dave Miller next took the podium to bring up an issue with 10 GB ethernet driver with Sun's new Neptune chip set. It supports MSIX on the PCI-Express bus. One issue would be that IRQ's could be rebalanced or rerouting all the IRQ's round robin from CPU to CPU. Arjan pointed out that the configuration for IRQ balancing doesn't move ethernet IRQ's around today. Arjan and Dave will talk more about details to make sure the infrastructure reasonably supports 10 GB ethernet capabilities appropriately.

Len Brown brought up hardware sensors and AML as a potential issue. Updating a sensor via AML might affect ACPI, such as converting a sensor from degrees celsius to degrees kelvin, leading to ACPI shutting down the machine assuming it is too hot or too cold. On Windows this doesn't seem to be an issue, but no one knows if it is because they coordinate on conventions, if they have an arbitrator, or if everyone is "just lucky". This was tabled to the mailing list.

Labels:

Merging i386 and x86_64 architectures

Andi Kleen and Thomas Gleixner had a healthy debate on the potential merging of the i386 and x86_64 architectures. Linus will accept the patches that merge i386 and x86_64 - he is very in favor of seeing them combined. Comments from the PPC and s390 folks stated that the merge of their architectures was very worthwhile. Andi was still vocally opposed to the merger. There was a long and heated debate both before and after Linus's pronouncement. However, the pronouncement was not changed during any of the resulting debate.

Labels:

How to encourage Vendors to participate in the kernel community

Dirk Hohndel started a session to talk about hardware interactions with the kernel. Jon Linville started with the atheros driver and the current open source implementations and the recent kerfluffle about that driver. It looks like that is all resolved now. There are also issues though with concern about government regulations regarding wireless broadcast ranges. While the regulatory requirements are a bit subjective, most vendors are being quite conservative and mostly avoiding working with the community as well. Intel, for instance, is using originally user level and now firmware to help match the interpretation of the regulatory requirements. However, the atheros cards are completely controlled by the host CPU, so the open source/drivers can't use the binary user level or firmware implementations. This is why Atheros currently uses a binary core with their existing driver.

On the nVidia side, the nouveau driver has made some progress but it is very slow going. It is very difficult to figure out the existing registers and semantics for the hardware. There has been great progress with the 2D accelleration model, and working now on dual head operation and laptop operating. There is some hope that Fedora 8 will ship a pre-release of the driver, and it is probably a year from being fully working. There is also the ati500 driver, it is slowly working one card at a time, each PCI ID being enabled one at a time. No work has been done on 3D, although the R500 has the same 3D engine as the R400 which should help.

The other recent player is Via - which asked for a Linux driver. Not sure yet how we are going to work with them.

Chris Schlager, director of the AMD research center. AMD has been successful so far primarily because of Linux. They have recently acquired ATI, which of course happens to make graphics cards. A recently announced project is Fusion, which tries to combine a CPU with a GPU. Of course, the ATI GPU is currently enabled by a binary blob, which of couse is at odds with the kernel community. The current driver can not be open sourced for various reasons. However, AMD will be developing a driver for 2D and 3D for all r500 and r600 devices. All of this will be done in the X.org environment, probably under the MIT license (not yet sorted out but that is the most likely outcome). ATI is also willing to answer questions, although they may not be able to provide specifications.

At this point, Dirk switched to talking about how the community actually makes it harder for vendors (IHV's in this case) to work with the community, which is actually encouraging IHV's to create and distribute binary drivers. Open source participants are effectively getting punished in comparison when working with the development community. Binary driver writers can simply release drivers on their own schedules, with complete control of their own environment. Open source participants on the other hand, work through the unpredictable interactions with the community, and delivering drivers is much more difficult and much less predictable.

The request is twofold - one is to make binary modules substantially more difficult AND to make it much easier to get drivers into mainline. Linus still supports binary modules primarily because they are not derived works of Linux. But, we also need to be better about accepting support for new hardware. The discussion wandered a bit with an emphasis on making it easier to add support for new hardware as opposed to making it harder to run binary modules, both both resonated well with the group.

Labels:

Andrew Morton on Kernel Quality

Andrew spent a little time clarifying the existing flow - primarily addressing defects first via email, but then capturing defects wihch are not quickly resolved in bugzilla. Andrew pointed out that many bugs get "lost", some people don't respond to Andrew's email to put their bug into bugzilla. If the bugs are sent to some smaller mailing list, the bug may also get lost. Also, bugs that aren't resolved within a couple of days are typically lost, and there is no way to find and track that defect afterwards. Andrew went through about 3500 lkml messages and found ~50 real-looking bug reports which were not responded to adequately or at all. And, Andrew spends a lot of time nagging people and is getting fairly fed up with it.

Linus believes that most people do not read the kernel mailing list any more. Which could be a problem that makes finding such defects hard to find. The proposal is to consider creating a list just for reporting kernel bugs.

Andrews' wrap up questions were: Should he nag more? Do we need a professional nagger? Why are people ignoring so many bug reports? Somone needs to monitor other mailing lists for unloved bug reports.

Code review is good - an hour of review of someone else's 100 person hour patch can help get that code accepted, increase its overall quality, and improve the quality of the submitter's future patches. But we are not doing enough - many patches in Andrew's inbox get no review or just trivial comments. Also, when Andrew commits a patch, he'll cc a few potential reviewers, which doesn't work too well. And, this policy is only used for patches going through the -mm tree when headed for mainline. Lots of stuff goes from developer, into the subsystem tree, then into mainline. The problem stems from git trees being pulled directly without sufficient review of the patches internally. The suggestion is to send patches for review when you merge them, not when you send them to Linus. That way it is fresher in people's minds and you don't have to do patch rework during the merge window.

Developers generally don't have incentive to review other's work. For instance, they have other stuff to work in, reviewing is hard, a bit dull, and rather thankless, and for many developers, it is not part of their job description.

Andrew pointed out that last year we agreed that subsystem maintainers would send out a summary of what they were putting into the next merge window during each rc period, but no one has actually been doing that. That would be useful.

Andrew proposed a strawman which would require a Reviewed-by: for merge. The merger decies whether it was adequate based on the quality of that review, availability of review notes (a link to email archive in changelog would be great), the identify of the reviewer. The reviewers reputation will accumulate credibility over time. Andrew's goal is to produce an exchange economy in which coders have incentive to review others' work. The consensus is to start using this process gradually and gently to see how it works, potentially making it mandatory if it works out well. Reviewed-by: does not mean "reviewed for whitespace" - but means "reviewed for correctness and content." Linus believes that there are many small patches and this process may be a bit cumbersome for small patches. In that case, Christoph suggests, the subsystem maintainer might be the default reviewer. There is still some debate that this will not work with small patches. There is still a proposal to try this and see how it works.

Labels:

The Greater Kernel Ecosystem and Documentation

The next session covered the greater kernel ecosystem, including tings like glibc, udev, the hal layer, etc. Greg KH moderated this panel as well.

An initial assertion is that sysfs is just another system call ABI - sysfs exposes the internals of the kernel. The kernels internals change constantly. There is some substantial expertise in helping to disassociate the kernel internals from sysfs but the general feeling is that this disassociate needs to be more complete so that sysfs interfaces can become a bit more stable. The primary rule is not to assume structure but to assume hierarchy. Then user applications can traverse the hierarchy to determine where the relevent information resides based on an understanding of the hierarchy. libsysfs was a library that no longer exists. The library was written by non-library experts and assumed inappropriate things about libsysfs. The recommendation is to use hal to access sysfs information.

Michael Kerrisk is the man page maintainer for sections 2, 3, 4, and 8. Usually spending a day a week on documenting. In the process he is finding many buggy interfaces. Michael points out that it is hard to design good interfaces, and getting it wrong is painful. User interfaces are difficult and bug-prone, but APIs are forever. There is insufficient review of the interfaces, and the reviews are usually done by implementters/designers ratehr than by the end users (userland programmers). Michael proposed some formal kernel-userland interface development mechanism. This would include a formal signoff and a suite of userland test programs. Dcocumentation should be written by or in collaboration with a kernel developer, and some of the test code should be written by someone other than the developer.

There was a proposal for including the man pages in the kernel source tree. Debated, with the only challenge being about 50 or so systemcalls which have some wrapper in glibc to make the system call user accessible. There is also a desire to include some API testing in the tree as well for system calls and then enforce the API consistency with those tests. Christoph Helwig has volunteered to work with the copious IBM resources available that are working with LTP to pull just the right tests into the mainline git tree for the kernel. There is a strong desire to have better tests, API tests, and system call tests in general, although the community realizes that testing is hard.

Labels:

Kernel Summit: Mini Summit Readouts

The first mini-summit readout was from the Linux Power Management by Len Brown. Suspend to RAM is the primary poster child. The most visible is often the video support. Video restore is a bit different in process from, say, Windows, which adds some interesting challenges. Keith Packard is helping with the Intel drivers in this space. ATA might be functional in 2.6.23, although it is not enabled by default. The distros are currently enabling it, however. People are now concerned about how fast suspend-to-ram is, with OLPC seeing +90ms for USB resume (Greg KH just pointed out that that was fixed; Len hopes to hear confirmation from Marcelo), there is also a video sync issue on resume as well as an audio "pop" issue. And, device power management is currently "joined at the hip" with the hibernate implementation. This is not necessarily a good thing. Andrew Morton asked who maintains "suspend to ram", Arjan asked if there was a design for "suspend to ram". Len replied that "the community" maintains suspend to ram, and there was a document describing parts of suspend to ram. However, Len points out that suspend to ram support is very under-resourced today. And Kleen pointed out that in part this is a driver problem since it requires support from every driver which is in use on a given hardware component. A maintainer with a clear vision for how suspend-to-ram would be a great thing.

Suspend2 is now officially TuxOnIce and is out of tree. No plans for it to come back into tree but it is a very popular with some people. It is reportedly a more friendly community for support which might be a reason for the popularity of suspend2. Rafeal is actually doing a fair job of supporting suspend-to-ram today according to Linus. However, getting Radeon drivers to suspend and resume correctly is a crap shoot because of the complexities of the firmware, drivers, etc. Arjan is arguing strongly for some sort of a HOWTO - there is some push back but in general, Arjan's point is to document what is working today and let the

cpufreq: p-states break "idle accounting", fewer governors is better today. Dynamic Power Management (DPM) has been out of tree, although there is no longer general disagreement about the approaches within the DPM community.

Ted did an update on the Filesystems BOF that was held at FAST in February. Most of the session was a read-out of the work that people were doing. About 50% of the attendees were research focused as opposed to Linux developers. Part of the result of that was some direction of the research folks on what the key issues were that would need to be addressed. In this space, there was some progress on the unionfs capabilities. On the downside, there wasn't enough time to really dive into relevent Linux topics. A full write up is available at the Usenix site.

James Bottomley represented the remainder of the IO summit, which included some discussion of problems in fibrechannel (I missed the details), and a discussion of upcoming new technologies such as disk drives fronted by solid state/flash devices. There was some concern about performance and how quickly the flash devices might fail. The disk guys were saying that the disk driver sector interface was preventing the underlying firmware from doing content motion based on hot spots, bad blocks, etc. They were hoping to redesign disk drives to support objects instead of being sector based. This has large impacts on things like RAID, for instance. This would obviously break all elevator algorithms today, since it is impossible for the OS to determine where any two blocks are on disk. The technology is still about 5 years out from a disk-only point of view, although this technology is in RAID drives today. NFSv3 was implemented on a disk drive as a project once, per Alan Cox. There is some expectation that NFSv4 or pNFS will be implemented first on drives, before a pure object model.

Martin Blight provided an update on the VM summit held on Monday. The first observation was Deja Vu - many topics were covered a year ago. Realistic benchmarks are still difficult to find. Hard to get repeatable tests if swapping is involved. Page replication was discussed, as was slab cache de-fragmentation. One continuing proposal was to split dentry inodes from files in the dentry cache (okay, I think I missed something here) but this is always harder in practice than it is in theory. The anti-fragmentation code is now likely to be merged. Larger order page allocations were discussed for several reasons, primarily though for large filesystem blocks on disk. Containers was another large topic, including how applications interact with each other on a single machine. Google (Paul Menage) has done one solution, Balbir has another, which is a bit more complex but is probably a better long term approach. Another topic was the complete removal of ZONE_DMA and adding a similar capability more tied to actual hardware requirements.

Avi Kivity next provided an updated on the virtualization mini-summit. Lguest, LinuxonLinux, KVM, UML, Vmware, Xen, x86 vendors, s390, ppc, ia64 represented as well. The running joke was that long explanations of x86 functionality requested by the s390 people was usually ended with the comment "oh, I understand now, we have an instruction that does that" ;)

The first topic was on performance, and how the hypervisor needs to present NUMA characteristics to guest, realizeing that those characteristics can change at run time. Also, cooperative paging/hinting, e.g. CMMS patch set would be useful. And, the group noted that hardware is advancing, including NPT/EPT which solve the shadow page problem, vmexit time reductions, and several targetted optimizations.

Another topic was on the interfaces for guest/hypervisor communication. The most important one here is that guest/hypervisor communication must be done via physical pages (the guest and hypervisor typically don't share virtual page mappings which could be used for communication).

Paravirt_ops paravirtualize more or less of the guest. All solutions find Time as a common issue and hardware is unable to help with this.

A major thread was that the virtualization solutions need to MERGE - being out of tree makes use of the capabilities and validation of the features nearly impossible.

And then on to lunch! ;)

Labels:

Linux Kernel Summit 2007, Day 1, Cambridge, UK

After a pretty uneventful flight (always the best kind) and an easy 2.5 hour bus ride from Heath Row, I'm here and rested. Jonathan will likely have some good coverage for the the mini-summits and linux.conf.eu, and UKUUG which were held here just prior to this year's kernel summit as well.

Ted introduced the summit as the first one outside of North America, and described the schedule as a bit more upleveled than usual. As usual, Ted pointed out that the schedule, location, and content are always open for discussion as the program committee is always trying to make the event as useful as possible.

The first panel was moderated by Greg KH, discussing how the distributors are working with the kernel community, and how they are satisfying end user needs. The first long thread of discussion debated the long cycles for delivering enterprise kernels to end users (2-3 years from feature in mainline to kernel/distribution with related features in the hands of end users, coupled with the fact that most customers test internally for 3-12 months before deploying a kernel). Greg KH proposed updating the linux kernel version more frequently (yearly? twice a year?) within an enterprise distribution. Dave Jones pointed out that just updating the kernel often requires that a number of user level packages also need to be updated, which increases the timeand effort for updating the distribution. The downside of frequent updates is that the amount of kernel regression testing needs to be increased pretty significantly. This might be more testing on Linus' tree, Andrew Morton's -mm tree, etc.

The major theme is how do we get better testing, better review of new code, and ultimately features faster to end users with a higher level of quality and fewer regressions. In other words, more, better, faster. ;) While the enterprise kernels are claimed to be very close to mainline (some key differences from some Enterprise kernels are: Xen, SystemTap, AppArmour, Real Time, utrace, module signing, Novell debugger, some NFS code, etc.), they are unfortunately typically close to an *old* release of mainline, usually 6-18 months older than mainline. And, there are pressures to include capabilities in distributions ahead of mainline for additional vendor and distributor value. These competing pressures - primarily stability and additional features (quickly!) are fundamentally at odds with each other.

Ingo Molnar pointed out that customer upgrades are a very emotional event - the fear and uncertainty of an upgrade balanced against the gratification of new features and capabilities. kABI has some validity as an emotional balance for perception of stability to existing customers.

A sub-thread on hardware platform availability as a distinct problem from some of the additional capabilities offered by distributions. In other words, splitting up new features from new hardware might be an option. But then Greg KH quickly shot that down suggesting that the two problems are too similar to break up and the solution is likely in the same space.

Generally speaking, the problem was recognized, but the only potential solution discussed was to update kernels a bit more frequently in the enterprise version of the distributions, e.g. every 6-12 months or so.

Dave Jones has a list: Myth: Moving to an upstream kernel magically fixes everything. Each Fedora release has about 500 open bugs, about 1500 open bugs for released kernels, those bugs do not match the 1500 bugs in the kernel bugzilla. Some bugs are isolated to bizarre hardware that are hard for most people to pick up. However, many just need a good developer to look into those defects. Very seldom are problems analyzed for root-cause and people often ask for people to "retry with the latest release" just to see if the problem magically went away. For instance, 2.6.22-rc5 doesn't even boot on Dave's laptop, e.g. a regression. SATA tends to be especially bad right now. Suspend/resume works/fails on an almost alternating basis. Laptops are sufficiently distinct that a fix for one laptop often breaks another loser.

General comments centered around the fact that there is not enough focus on defects in general in the kernel community.

Dave Jones has a list: Myth: Moving to an upstream kernel magically fixes everything. Each Fedora release has about 500 open bugs, about 1500 open bugs for released kernels, those bugs do not match the 1500 bugs in the kernel bugzilla. Some bugs are isolated to bizarre hardware that are hard for most people to pick up. However, many just need a good developer to look into those defects. Very seldom are problems analyzed for root-cause and people often ask for people to "retry with the latest release" just to see if the problem magically went away. For instance, 2.6.22-rc5 doesn't even boot on Dave's laptop, e.g. a regression. SATA tends to be especially bad right now. Suspend/resume works/fails on an almost alternating basis. Laptops are sufficiently distinct that a fix for one laptop often breaks another loser.

Debian - drivers is the biggest problem. The number of ethernet, wireless, video cam drivers have a much wider variety of drivers which are not accepted in mainline, unionfs, squashfs (maintainer was scared off)

Linus strongly advocates getting even crappy drivers into the mainline tree as early as possible. Alan Cox was pointing out how buggy that can make the distribution. Linus' point centers around the fact that the code is much more public if it is in the tree, Alan's point is that
the code is never going to get cleaned up and will harm everyone else in the process. Linus is focused on those that are going into distributions *anyway*, so why not get them into mainline. And, the community then has a better chance of fixing them. Wireless drivers should be in mainline; they are sitting in Dave Miller's tree at the moment.

Deepak representing the embedded side doesn't want to put drivers right into kernel.org - because in many cases the code is written for multiple OS's and is so far from kernel coding standards that it is almost completely undebuggable. While his point was recognized, Linus reiterated that getting things into mainline is the best way to get the code cleaned up, and Greg KH reiterated that there are 85 kernel developers just waiting to help maintain drivers in mainline. One of the key problems is that many drivers go with either very, very old hardware or very new hardware, most of which is not available to most people. Greg KH has offered to help with any drivers that are out of tree and need to make it into the kernel.

Labels: