The Amazon Elastic Compute Cloud (Amazon EC2) allows people to rent virtual servers by the hour, charging $.10 per hour and $.20 per gigabyte transferred.
As Amazon says in their description of EC2:
Amazon EC2 enables you to increase or decrease capacity within minutes, not hours or days. You can commission one, hundreds or even thousands of server instances simultaneously.In order for this to work out for Amazon, I would think Amazon also needs to avoid coordinated "periodic traffic spikes" in usage of EC2. Otherwise, they also will need to "over-buy safety net capacity" and will see low utilization rates on their cluster.
Amazon EC2 passes on to you the financial benefits of Amazon's scale. You pay a very low rate for the compute capacity you actually consume.
This frees you from many of the complexities of capacity planning, transforms what are commonly large fixed costs into much smaller variable costs, and removes the need to over-buy "safety net" capacity to handle periodic traffic spikes.
With the current pricing structure, there is no incentive to avoid peak load times. In fact, if I were using Amazon EC2 for a batch job, I probably would request my servers during US work hours, the same time EC2 is under heavy load from webservers or other real-time tasks. There is no reason to do otherwise.
I think EC2 should offer a lower rate for low priority requests for servers. Servers at this rate could be pulled from the client at any time for higher priority jobs.
Pricing could be very low because idle servers are worthless to Amazon. If the price point is near the marginal cost of the server time, this service would be attractive to many.
The benefits for Amazon are also apparent. There would be less need to over-buy capacity since capacity could be regained from low priority requests. Utilization would increase, Amazon would get paid for what would otherwise be idle time, and the economics of EC2 would improve.
I have a lot of big data processing tasks -- both for Findory and for side interests -- that fit a batch profile. I am sure other potential EC2 users do as well.
Amazon itself has many batch jobs that fit this profile, including web server log processing, personalization builds, search indexing, and data mining. All of these could be done on borrowed EC2 servers rather than using more expensive dedicated hardware.
Going a step further, I suspect many of these low priority batch jobs could benefit from a different API to Amazon EC2.
Rather than request servers at a time and then manually configure them myself, what I really want is to be able to request a MapReduce job and kick off hundreds or thousands of servers at low priority. Processing on servers that go down or are pulled from us during the job should be restarted elsewhere. At the end, I should get the completed data file.
It should be something like: "Here's my data file, some MapReduce code, and $10. Let me know when you're done."
Powerset already is running a MapReduce clone on Amazon EC2. Powerset has shown both that this service is possible and that there is demand for this service.
It would be MapReduce for the masses. No longer would you have to be at Google to do easy data processing on a massive cluster. You could borrow Amazon's EC2 cluster any time you want.
Update: Nearly three years later, Amazon launches differential pricing for EC2.