Friday, March 14, 2008

Is Cloud Computing nothing but Utility Computing?

So there's this, ahem, interesting blog entry over at RedMonk by James Governer. His assertion on 15 Ways to Tell Its Not Cloud Computing really seems to refer to utility computing rather than cloud computing. And utility computing has been around for a while and matches pretty much all of his assertions. Except for some reason, providers of utility computing, and, worse, consumers of utility computing are still hard to find. Oh, they exist, Amazon EC2 provides effectively a utility computing model, other major vendors have various offerings in this space. But that is not what Cloud Computing is.

More importantly, Cloud Computing provides utility computing, either for a business, or for an internet community. Utility computing is an output, but somewhere underneath that utility there is a lot of coal mining, a lot of power plants, a lot of sub stations that provide that utility computing. And, despite the long time goals of the Grid community, the performance and security challenges (and a few others) haven't allowed for effective sharing of computing power between companies. And, to be honest, even the paradigm shift to Cloud Computing is not going to bring us to that Nirvana over night. Even with some of the greatest computing shifts, change occurs over several years, and with several steps that allow the industry to keep up and to adapt to those paradigm shifts.

As an example, I have been a part of IBM's Linux strategy since about 1999. I've also been involved heavily with the Linux and several open source communities. One realization there was that Linux was going to transform the desktop first, then data center, and then the world. Well, we had the order wrong. And we had the order wrong because the place where change happened on this scale was actually in the Data Center. And that is exactly where the change for Cloud Computing will come from first. And, if you look at the 15 tests to see which ones matter to Enterprises, you'll see that most of them just don't matter. And that is not because Enterprise's don't want utility computing today - it is because they still want a high level of control and management over their data.

I would assert that of the Fortune 500 or Global 1000, there are very few companies putting their trust in an Amazon EC2 offering, nor would they do so for a very long time. And, for small mom & pop shops, they may start with EC2 as a cheap entry point, but there is a point where they need more control, more flexibility, more capability than they can get from today's utility computing providers. And that is the point where, having tasted the services that a Cloud can offer, they will want to build their own, with their own requirements built into it. (Yes, there are more of the Fortune 500 that outsource compute resources, but I would assert that for the most part, those are not clouds for various reasons too lengthy to get into here).

So, if you've read James' blog, here are a few brief comments: I'd agree with James on OGSA, on provisioning/deprovisioning (although Cloud Computing is *sooo* much more than that, see one of my earlier blog entries on that). Well, I thought there'd be more that I agreed with. But I don't disagree with all the rest - but I don't think any of those have any basis in determining whether or not something is or is not a cloud.

Maybe this is part of the core: Amazon (and google, and others) have provided a very visible, very interesting model which demonstrates that underneath the utility computing, they have built a cloud. They maintain that cloud, they provide the services, the manpower (Human IT resources), the intelligence of reducing the cost of managing that cloud, the data centers, the power management, the service and hardware and software maintenance of that cloud. The end users benefit from (and pay for) the utility provided by that cloud.

But what about those companies that want all of the best practices, the simplification of management, the ability to create their own virtual appliances matching their workload, their ability to manage Sorbanes-Oaxley requirements and do it in house? Those are the companies that today are interested in buying into the Cloud Computing vision, not just as a consumer of some of its services, but as companies who push the envelope on innovation as a way to fuel their own business.

Monday, March 10, 2008

Virtualization and consolidation: Less Availability and Less Predictable Performance?

Oh No - say it isn't so! Will virtualization and consolidation put more workloads on the same hardware? When that hardware fails, won't it take out more of your workloads? Remind me, how did consolidation help me? Oh, and with more workloads running on single machines, what happens if all my workloads are busy and growing busier all the time? Won't my performance and throughput go down?

Well, in a nutshell, yes. Yes, that is, if all you do is squeeze workloads onto a smaller number of machines with no thought to the aspects of performance, availability, and throughput. Even with sites such as Amazon's EC2 or AWS, there is currently no published availability plan as of this post on the O'Reilly radar. I think Bill Hammon's post here also gives a concise view of the problem statement. In Bill's post, he addresses two of the key issues - availability and performance that tend to accompany consolidation. I think there are several others as well, but these make a good start. The problem is that, as with any new technology, consolidation and virtualization introduce some new challenges. In this case, base provisioning, management of an image catalog, and simple management of virtual images make consolidation look easy. Basically, you transition your current tools to a virtualized environment, and presto! you are well on your way towards cost savings, ease of management, additional easy-to-access compute cycles for additional test and development activities. But, most people still manage each of those virtual machines as they would a physical machine, and we run more virtual machines on each physical machine - potentially say, 10 virtual machines per physical machine. That means we have to manage N + 10 * N machines, oh, and after consolidation, most people don't throw away the previous machines, the plan to re-use them as their workload grows. Add to that the fact that most workloads being consolidated today are those infrastructure or web services workloads where the availability story is based on running multiple servers of the same type and failover handled by something as simple as DNS or a first-to-respond policy. Consolidating all of your scale out capacity on a few machines means that your scale out policy is no longer going to work as you expect.

On the performance side of things, you have something of the opposite problem. Where previously you would have capacity for anywhere from twice to ten times (or more) the typical average workload, and sometimes as much as twice the capacity needed for peak performance, you now have the case where the aggregate of your ten virtual machines may run at 50% utilization on average, with peak capacity still twice your average. But now when two or three or four virtual machines are using something well in access of their average, the performance of the entire set of virtual machines may be impacted. Today, most workloads which are being consolidated don't find that limit to be too onerous, but as more critical workloads move into the consolidated environment, the risk of oversubscribed physical machines will grow.

So how do you combat these trends? Consolidation, despite these challenges, it still a huge potential money saver. One of the comments in the O'Reilly radar link above suggested one solution: pair your internal infrastructure with a hosting service like Amazon's, and provide availability that way. Clever, although your degraded mode of operation may not be as predictable in terms of responsiveness and performance, at least you have a degraded mode of operation. Another solution is to build an HA solution into your virtual machines and manage your HA yourself. Of course, if all of Amazon's EC2 is done (or any other provider) that doesn't help you much.

A better solution might be to look at the problem in terms of Cloud Computing (yeah, you saw this coming, I bet! :-). Within a cloud, you should conceptually be able to say "instantiate my virtual machine, and I'd like some of these properties." "These properties" might include some level of HA, some level of performance/throughput guarantee (an SLA), some level of backup, maybe even some concept of energy management/efficiency. So, if you have your own in-house cloud and a catalog of virtual machine images that you typically deploy, you would be able to specify your desired level of HA - how highly available and at what cost; your desired performance - what cap on machine resources you want, if any, or some level of guarantee of end-to-end throughput or latency; whether you could contribute your data center's "green" goals by using energy efficient hardware resources, etc.

And, part of the point of a good "Cloud" implementation would be that it would handle these details for you. The best practices of HA would be patterns that could be deployed based in part on a high level specification such as "4 nines" or "5 nines" of availability, or performance could be managed by a workload manager actively migrating tasks within your pool of resources as needed. And no, that isn't technology that is 5 years out... Cloud Computing is a potentially disruptive technology in development right now...


Wednesday, March 05, 2008

Open Grid Forum: Grids and Clouds

I recently heard about Irving Wladowsky-Berger's keynote speech at the Open Grid Forum and was pleased to see that the slides from his keynote were made available there as well. While I'd love to have the transcript from that talk available - Irving is a very engaging and dynamic speaker, the slides alone are also quite interesting. In particular, Slides 19 and 20 provide a nice little visual on something we are referring to as Ensembles, which I'll talk about more in the future. But the graphics give an interesting preview of the thinking regarding how we an simplify the data center. Yeah, they need the transcript or some discussion to enlighten the reader but I think it visually plants a useful concept on grouping of like resources as a means of simplifying the management of those resources. There are some other resources on grids and cloud computing which I think are worth reading, including some comments from Steve Crumb, the Executive Director of the Open Grid Form, and from Ian Foster who is a visionary and leader in the Grid computing space.

The short summary is: Clouds are more than Grids. But there are similarities and in some ways, Clouds build upon some of the thinking and concepts inherent in Grid Computing. Of course, Clouds build on a number of concepts, include autonomic computing, on-demand computing, virtualization, self-healing, and many of the other trends over the past several years. I think the biggest difference over many of the past views is that previously these technologies were focused on improving aspects of the computing environment - where cloud computing really focused on bringing those strategies together to provide value to the end user and to reduce the cost and effort of managing computing resources.

Of course, not all of the work is done, cloud computing is not something you can buy off the shelf today, and some of the work requires a paradigm shift across the industry. On the other hand, most of the work of cloud computing will be done by standing on the shoulders of other giant technology leaps in the industry, so in part, much of the work will be to integrate and reshape those technologies into a new and more accessible form. The journey promises to be challenging but the reward appears to be great!


Feet in the Data Center but Head in the Clouds

Okay, you get it, I'm interested in Cloud Computing... And I have a few things to share on that in general. But the last few entries are really focused on using cloud computing as a solution to fixing what is growing into a mounting crisis, or at least a danged annoying trend of spending more time managing our computers than using them. But there's also another reason that clouds are very interesting, and I think this summary on MSNBC of the work that Google and IBM are doing to start training the next generation of thinkers and problem solvers.

In part, the core of the article is really about thinking about how to use our compute resources to more effectively solve problems. I'm reading over and over about the time it takes for someone to start a new project inside a business. Usually the first step is to find hardware - and enough hardware - to do something interesting. Then there is the location, the configuration, the network access, the account management, oh, what software did you need installed, by the way is there enough power there? Is your new project mission critical? Did you think about backups? Do you need something more reliable? And soon, the mere thought of creating a new project gets plowed under by the gnarly logistics of just getting prepared for a project.

The cool thinking in this google driven model is that it transforms compute power the same way that the Internet transformed our connectedness. Keep in mind that networks are this complex mass of ethernet and switches and hubs and WANs and LANs and firewalls and wireless and cable and goodness knows what else. In some way, the complexity of configuration is high - not as high as servers perhaps, but still not trivial. And, errors like having the power cord kicked out from a switch in a lab may only take down one of my IRC servers for a while, while the rest of the internet remains connected (yes, that happened today). But we don't yet think of compute power as a utility, nor do we think of it as a plentiful utility, which it truly is. If we added up all of the compute power currently in operation on the planet, well, the overall ability is staggering. Perhaps as staggering as the sheer volumes of data running around the Internet today, every day. And with the internet, it sends email, pictures, provides services, all without us really noticing the underlying utilities. We all have access to that staggering amount of network bandwidth, but we don't generally have access to that level of compute power, even though the vast majority of compute power in the world is seriously underutilized.

Google's thinking - and IBM's, Yahoo (er, MicroSoft), Amazon, etc., is that it is time to make these large data farms more accessible. And, to do so requires several large shifts along the way. One shift is to provide some of the fundamentals for managing the servers at a way that reduces the impact on the environment (or corporate pocket book) as well as the human cost for administering these systems. Another, and the focus of the joint education program that Google and IBM are embarking on, is to start educating people on ways that they can more effectively make use of the compute power - not just as a single machine, but as a utility which can be harnessed for increasingly larger challenges.

My day job is focused on building up the infrastructure to simplify the server environment management and ease the access to that server capacity. But in the long run, the shifts in education, the shifts in programming model, will arrive hopefully just as we have mastered the ability to deploy large cloud computing environments. It is definitely something to look forward to!


Monday, March 03, 2008

Is Cloud Computing just Provisioning?

There is more and more buzz on Cloud Computing (and, it just happens, that is why I've been too busy to blog lately ;-). However, one of the common assumptions that seems to be floating around is that Cloud Computing is nothing but a provisioning exercise. As an example (btw, an example of some pretty cool technology) is this reference to Cohesive Flexible Technologies which does some cool stuff with provisioning. Or, 3tera has a great demo for provisioning.

But provisioning is only a part of the story, and, in some ways, only the beginning of the story. There are several other key elements that make cloud computing ultimately more valuable in a business sense. For instance, virtualization is almost a requirement for great cloud computing. Several solutions are already based heavily on virtualization, but most provisioning solutions to date have been focused on deployments on physical hardware, with a few looking at deploying on virtual hardware. Why is virtualization important? Hmm, short answer is that it allows consolidation, migration, isolation, security, and several serviceability requirements than increase the overall value of the cloud. I may go into more depth on that in a future article because the benefits may not all be as visible on the surface.

Another part of the story is image managment, which today is less well evolved than it ultimately needs to be. I've mentioned rPath & rBuilder previously, Amazon provides views of images, but we are basically in that early stage of image management where the number of solutions and repositories tends to boggle the mind. This portion of cloud computing will need some additional standardization, improved tooling, better life cycle management, etc. and will be key to deploying solutions within clouds.

Management of the life cycle of applications or images (or virtual appliances, as some call them) within the cloud is also a challenging area in which there is very little product available today. For instance, people tend to deploy singletons or redundant sets of servers or images. But HA hasn't done a large merge yet with cloud computing. For instance, it should be possible to deploy a virtual appliance which "rarely fails" or "never fails" from a customer point of view. HA solutions can be crafted that way today, but usually they are hand tuned, hand configured. In a true cloud computing environment, that should be merely a parameter to the deployment.

Other aspects include handling things preventive maintenance - does your service go down whenever someone needs to install a new service pack or update the hardware? In a cloud environment, there should be other resources your application could use - so why not just migrate your application to those resources without disruption? Oh, but how do you handle migrating network connections, how do you handle access to the license server? Is someone collecting usage information for charge back? How is that usage information integrated across the cloud for charge back? How do you measure and manage your response times for multi-tiered applications?

I think all of these are aspects that will eventually be included in the expectations of things that a Cloud just handles. And, some of the services that a Cloud will soon offer will cover those bases and potentially many more. Clouds are much more than just provisioning.