The promise of the API looks attractive at first. Google says:
The University Research Program for Google Search is designed to give university faculty and their research teams high-volume programmatic access to Google Search.However, the limits on how the API can be used may be a problem. From the documentation:
Our aim is to help bootstrap web research ... [Using] Google's search technology, you'll no longer have to operate your own crawl and indexing systems.
Requests to the service MUST be throttled ... A time period of at least one second must be allowed between requests.This makes some interesting types of research impossible using this API, anything that would need to fire off multiple queries quickly.
For example, let's say I am working on a technique for query expansion, so I want results not only for the search given, but also for tens of other related searchers, which I will then combine. With a one second delay between queries, my research prototype will take tens of seconds to respond, making it no longer interactive.
Nor can I try out some natural language analysis for question answering where I first get the results for the search given, then look at the results, then fire off dozens of additional queries to learn more about what I found in those results.
I cannot even do something that attempts to use conditional probabilities of finding two words together versus finding them apart on the Web as part of the analysis, since each of those requires two queries to the search engine and many of them might be required.
It is good that Google is making tools available to researchers, but they may have to go further than a throttled search API. As is, many researchers trying to work at large scale still will have to build their own crawls and indexes.
By the way, it is not entirely fair to pick on Google's search API here. Other search APIs -- including the Yahoo Search API, the Microsoft Live Search API, and the Alexa Web Search Platform -- either have a fee or similar throttling.