When I worked at Amazon, there was a lot of effort into recognizing that two items in the catalog were actually the same item. That was called item authority.
I was recently browsing around in YouTube and I noticed how bad the site is about dealing with multiple copies of the same content. For example, on Weird Al's video, "White & Nerdy", look at the related videos.
The first four are all copies of the same video. They are not "related"; they are the same video.
Of the first ten videos in that list, only three are unique. The others are all duplicates.
This problem is not unique to YouTube. On Google Video, "White & Nerdy", eight of the top ten "related" videos are identical copies of the Weird Al music video.
The point of showing me related content is to help me discover new and interesting content. Showing identical copies of the same video I just watched is not useful to me.
What is useful is helping me find interesting other videos. At a minimum, you could screen out duplicates and then show other Weird Al videos; that would be useful, if a bit obvious. Alternatively, you could show videos that interest people who liked "White & Nerdy", using other customers' actions to help me find interesting content.
Crawling the world's information is not enough. You need to make that information useful. You must help people find relevant information, help people find the information they need.