I attended the talk. I was hoping for details on Powerset's technology and a live demo, but, unfortunately, the talk was much higher level than that. It mostly covered motivation for natural language search and why the market timing was right. I have to say, it had the feel of an investor pitch.
The most compelling part of the talk for me was when Barney was talking about the value of NLP for extracting additional information from a small data set. For example, Barney compared the performance of Powerset's alpha product running over Wikipedia with Google limited to searching over Wikipedia on several questions (e.g. "Who did IBM acquire in 2003?" and "When did Katrina strike Biloxi?").
On the one hand, these examples might not be fair to Google, since Google gains its power from its massive index; Google is crippled by not allowing it to reach far and wide to answer questions. On the other hand, there are many applications where the all that is available is a small data set (e.g. newspapers, health, product catalogs), and there is considerable value in those problems of maximizing your understanding of that data.
The least compelling part for me was the hyping of the technology Powerset licensed from Xerox PARC, especially when Barney appeared to suggest that this technology means NLP is largely a solved problem:
The fundamental problems we were really worried about -- you know, problems like how do you deal with ambiguity, how do you deal with open vocabulary, how can you be robust in the face of noise and erroneous things, how can you be applied to multiple languages and these kind of things, how can you be computationally efficient at all -- took a really long time and, while they are not just all completely done, the fundamental challenges that they had seen for all that time were basically resolved.It would be nice if the fundamental challenges in NLP were basically resolved, but I do not believe that is the case.
I do agree with the motivation behind Powerset. Especially for verticals, better understanding of smaller data sets would be useful.
I also agree that bloating indexes with data summarizing NLP extractions is a promising approach, despite the x100 longer index build times and x10 increase in index sizes that Barney said may be required. Computers are more powerful and massive clusters are becoming cheaper to acquire. The computational power to do these tasks is at hand.
I am not sure I agree with Barney when he said a linguistics approach to NLP is more likely to bear fruit than a statistical approach. More thoughts on that in my previous post, "Better understanding through big data".
I also have to say I was confused at several points in Barney's talk about whether Powerset was seeking better question answering or trying to do something bigger. Some of his examples seemed like they would not only require understanding query intent and the information on a single web page, but also might require understanding, synthesizing, and combining noisy and possibly conflicting data from multiple sources. The latter is a much harder problem, but Barney seemed to be suggesting that Powerset was taking it on.
In the end, the talk did not address my concern that Powerset is overpromising in the press and is likely to underdeliver. What I would really like to do is play with a live Powerset demo, perhaps Powerset powering Wikipedia search or the search for a major newspaper, and see more details behind the technology. For now, I remain worried that the pitch is running far ahead of the product.
Update: Six months later, Powerset has a management shakeup, losing its COO and having its CEO, Barney Pell, step down to CTO due to a "slip in the company's delivery date of its product."