I have been following the Cyc project, an attempt to build and use a massive database of common sense knowledge, off and on for a decade or so.
So, I was excited when I saw a video of Douglas Lenat's recent talk on Cyc at Google, "Computers versus Common Sense". It is long, but it is full of interesting examples, definitely worthwhile if you have any love for geeky AI stuff.
If you are already familiar with the Cyc project, you still might want to check out the talk at 31:30 and 41:00.
At 31:30, Douglas talks about how they deal with things being partially true or conflicting assertions in the database. They do not use probabilities, but instead allow statements to be consistent in local contexts while inconsistent globally. They also almost never use a formal theorem prover, instead preferring a large set of much faster heuristics for reasoning.
At 41:00, Douglas talks about automatically learning knowledge from the Web. Douglas argues that understanding the natural language text on the Web requires starting with a large handcoded database of common sense knowledge. After building that seed database manually, it can be used to automatically extract additional knowledge from the Web.
On needing a manually constructed seed database of knowledge, I suspect fans of statistical NLP might be quick to disagree. But, Douglas did have some compelling cases that would trip up statistical techniques. For example, no one on the Web ever writes that water runs downhill, but they do write that water runs uphill (as a metaphor).
If you are interested in more, you might check out OpenCyc and the Cyc publications.
See also the Verbosity project (PDF) that I mentioned in an earlier post.
See also my previous post, "AI and the future of search".