Dyspeptic Mutterings: The Internet is NOT an archive.

Friday, September 17, 2021

The Internet is NOT an archive.

Thanks to Amy Welborn for a link to this:

The internet does not preserve knowledge--it was not designed to, so it does not.

Enterprising students designed web crawlers to automatically follow and record every single link they could find, and then follow every link at the end of that link, and then build a concordance that would allow people to search across a seamless whole, creating search engines returning the top 10 hits for a word or phrase among, today, more than 100 trillion possible pages. As Google puts it, “The web is like an ever-growing library with billions of books and no central filing system.”

Now, I just quoted from Google’s corporate website, and I used a hyperlink so you can see my source. Sourcing is the glue that holds humanity’s knowledge together. It’s what allows you to learn more about what’s only briefly mentioned in an article like this one, and for others to double-check the facts as I represent them to be. The link I used points to https://www.google.com/search/howsearchworks/crawling-indexing/. Suppose Google were to change what’s on that page, or reorganize its website anytime between when I’m writing this article and when you’re reading it, eliminating it entirely. Changing what’s there would be an example of content drift; eliminating it entirely is known as link rot.

It turns out that link rot and content drift are endemic to the web, which is both unsurprising and shockingly risky for a library that has “billions of books and no central filing system.” Imagine if libraries didn’t exist and there was only a “sharing economy” for physical books: People could register what books they happened to have at home, and then others who wanted them could visit and peruse them. It’s no surprise that such a system could fall out of date, with books no longer where they were advertised to be—especially if someone reported a book being in someone else’s home in 2015, and then an interested reader saw that 2015 report in 2021 and tried to visit the original home mentioned as holding it. That’s what we have right now on the web.

. . .

The first study, with Kendra Albert and Larry Lessig, focused on documents meant to endure indefinitely: links within scholarly papers, as found in the Harvard Law Review, and judicial opinions of the Supreme Court. We found that 50 percent of the links embedded in Court opinions since 1996, when the first hyperlink was used, no longer worked. And 75 percent of the links in the Harvard Law Review no longer worked.

People tend to overlook the decay of the modern web, when in fact these numbers are extraordinary—they represent a comprehensive breakdown in the chain of custody for facts. Libraries exist, and they still have books in them, but they aren’t stewarding a huge percentage of the information that people are linking to, including within formal, legal documents. No one is. The flexibility of the web—the very feature that makes it work, that had it eclipse CompuServe and other centrally organized networks—diffuses responsibility for this core societal function.

I have seen it happen here: I have block-quoted from news and story links which no longer exist. But for the block-quotes, they might as well never have.

If you want it to last, print it. There is no other option.

Otherwise, it could inadvertently (?) go down the memory hole.

Speaking of which, your daily reminder that Big Tech is only the friend of shareholders and congressbeings:

Similarly, books are now often purchased on Kindles, which are the Hotel Californias of digital devices: They enter but can’t be extracted, except by Amazon. Purchased books can be involuntarily zapped by Amazon, which has been known to do so, refunding the original purchase price. For example, 10 years ago, a third-party bookseller offered a well-known book in Kindle format on Amazon for 99 cents a copy, mistakenly thinking it was no longer under copyright. Once the error was noted, Amazon—in something of a panic—reached into every Kindle that had downloaded the book and deleted it. The book was, fittingly enough, George Orwell’s 1984. (You don’t have 1984. In fact, you never had 1984. There is no such book as 1984.)

At the time, the incident was seen as evocative but not truly worrisome; after all, plenty of physical copies of 1984 were available. Today, as both individual and library book buying shifts from physical to digital, a de-platforming of a Kindle book—including a retroactive one—can carry much more weight.

Physical copies of media, people. There are no substitutes.

Anyway, read the whole thing--it's the unpleasant reminder you need.

Dyspeptic Mutterings

Friday, September 17, 2021

The Internet is NOT an archive.

No comments:

Post a Comment

New digs for ponderings about Levantine Christianity.