The paper says that 250 terabytes of Google Analytics data are stored in Bigtable. That's more than all the images for Google Earth (71T). It is the second largest data set in Bigtable, behind only the 850T of the Google crawl.
Why is it so big? The way I had assumed Google Analytics worked is that it maintained only the summary data for each website. That would be a very small amount of data, nowhere near 250T.
Instead, it appears Google Analytics keeps all the information about user behavior on all sites using Google Analytics permanently, online, and available for various analyses. That would explain 250T of data.
What data does Google Analytics collect? From the Google Analytics help page:
Google Analytics anonymously tracks how visitors interact with a website, including where they came from, what they did on a site, and whether they completed any of the site's conversion goals.
Analytics also keeps track of your e-commerce data, and combines this with campaign and conversion information to provide insight into the performance of your advertising campaigns.
Google Analytics data would tell Google what people are doing on other websites, including how often they go to the site, where they came from, and what they do when they get there. It could be quite useful as part of determining the relevance of sites and how people transition between sites.
See also my previous posts, "Google Personalized Search and Bigtable" and "Google Bigtable paper".