Digital Archiving

11/7/2024 11:42 am | Share to:

Even before this election I had been thinking about digital archiving. Decades ago I had an idea for a tool that would download my web history and maintain a local archive for me to easily recover and find things I had come across. I am very prone to the "I know I saw something about this recently..." and having to google and search my history to figure out where I had seen it.

I never followed through on that project myself, for a few reasons. The primary reason was that the search tools we had were strong enough the majority of the time.

Now, I'm thinking about digital archiving again for the same reason I have most recently - the threat of the content going away and also just as an ongoing resource in case of not having Internet access. I started playing around with Archivebox yesterday. Archivebox is very close to what I envisioned with Datacomb, short of the automation process. It seems very interesting and robust, I just need to figure out how I would use it. Whether it would build off of my self-hosted Wallabag (a selfhosted Pocket-like reader app, which grabs articles for offline reading.)

I also have a homebrew Python app I created that I called 'Wikindle.' It downloads articles from Wikipedia and converts them into Markdown, though it doesn't download any images. The idea I have for that is to eventually get an E-reader device which can store the entirety of what it downloads (which isn't the entirety of Wikipedia.) As of last night's run, it was roughly 300 megs of text, though there are a lot of articles I want to filter out still.