The Wayback Machine is humongous, and getting humongouser. You can’t search it the way you can search the Web, because it’s too big and what’s in there isn’t sorted, or indexed, or catalogued in any of the many ways in which a paper archive is organized; it’s not ordered in any way at all, except by URL and by date. To use it, all you can do is type in a URL, and choose the date for it that you’d like to look at. It’s more like a phone book than like an archive. Also, it’s riddled with errors. One kind is created when the dead Web grabs content from the live Web, sometimes because Web archives often crawl different parts of the same page at different times: text in one year, photographs in another. In October, 2012, if you asked the Wayback Machine to show you what cnn.com looked like on September 3, 2008, it would have shown you a page featuring stories about the 2008 McCain-Obama Presidential race, but the advertisement alongside it would have been for the 2012 Romney-Obama debate. Another problem is that there is no equivalent to what, in a physical archive, is a perfect provenance. Last July, when the computer scientist Michael Nelson tweeted the archived screenshots of Strelkov’s page, a man in St. Petersburg tweeted back, “Yep. Perfect tool to produce ‘evidence’ of any kind.” Kahle is careful on this point. When asked to authenticate a screenshot, he says, “We can say, ‘This is what we know. This is what our records say. This is how we received this information, from which apparent Web site, at this IP address.’ But to actually say that this happened in the past is something that we can’t say, in an ontological way.” Nevertheless, screenshots from Web archives have held up in court, repeatedly. And, as Kahle points out, “They turn out to be much more trustworthy than most of what people try to base court decisions on.”
You can do something more like keyword searching in smaller subject collections, but nothing like Google searching (there is no relevance ranking, for instance), because the tools for doing anything meaningful with Web archives are years behind the tools for creating those archives. Doing research in a paper archive is to doing research in a Web archive as going to a fish market is to being thrown in the middle of an ocean; the only thing they have in common is that both involve fish.
The Web archivists at the British Library had the brilliant idea of bringing in a team of historians to see what they could do with the U.K. Web Archive; it wasn’t all that much, but it was helpful to see what they tried to do, and why it didn’t work.