Skip to main content

Maintaining a Personal Internet Archive

Robin Rendle points out that he can’t leave his website as a legacy when he dies, because the web is inherently ephemeral. Without continued payment for the domain name, a DNS provider, and hosting, websites are lost to time. Only active stewardship allows pages on the web to outlast their creators.

Paper lasts longer, becase every copy of a book is canonical. Redundancy beats durability. Of course, every time a webpage is read, many copies are made as it goes from source to destination, but those all exist either in memory buffers or as local cache1.

What would it be like if I could self-host a library of web articles I’ve enjoyed? Rather than hyperlinking directly to an article I want to reference, I could link to a reference page on my site with my own copy and from there link back to the original. If the original site goes down, none of the links in my own writing break and readers could still follow my bunny trail.

Of course, the whole idea is a non-starter in our current copyright regime. But I’ve been reading good stuff on the web for long enough now to have seen articles I’ve loved disappear and bookmarks wither away thanks to bitrot. I’ve got Pinboard archival covering recent years, but that doesn’t help with stuff that was originally behind a paywall.

For the web to have similar longevity, we need to be librarians of what we read on the web and maintain our own personal archive of sources that matter to us. My strategy (for now) is to use DEVONthink’s Safari extension to save a simplified web archive of every article I bookmark, so at least I have my own redundant library.


  1. Yes, I am aware of The Internet Archive. They’re great! But they don’t index everything and they also let people purge old content. I don’t care if you are embarrassed by your old articles. Writers can’t come take their old books off my shelves. ↩︎