We describe a new language called Pig Latin that we have designed to fit in a sweet spot between the declarative style of SQL, and the low-level, procedural style of map-reduce.The paper provides a number of examples of what Pig code looks like and how it executes across a cluster. The related work section of the paper is excellent and should not be missed; it compares Pig, Bigtable, MapReduce, map-reduce-merge, Dryad, and Sawzall.
At a growing number of organizations, innovation revolves around the collection and analysis of enormous data sets such as web crawls, search logs, and click streams ... For example, the engineers who develop search engine ranking algortihms spend much of their time analyzing search logs looking for exploitable trends.
A Pig Latin program is a sequence of steps ... each of which carries out a single data transformation ... Writing a Pig Latin program is similar to specifying a query execution plan ... This method is much more appealing than encoding [a] task as an SQL query, and then coercing the system to choose the desired plan through optimizer hints.
Pig ... is fully implemented and available as ... open-source. [Pig is executed] on Hadoop, an open-source, scalable map-reduce implementation. Pig has an active and growing user base inside Yahoo! and ... [is] beginning to attract users in the broader community.
Please see also my previous posts on Pig and Yahoo's Hadoop clusters, "Yahoo Pig and Google Sawzall", "Hadoop Summit notes", and "Yahoo deploys large scale Hadoop cluster".