Update on last night's 'Wikindle' code
Code ran without issue, though the formatting was slightly off.
This morning I hammered out the code to grab the top 100 most popular entries from yesterday and add them to the archive. I'm not sure how useful that will actually be, a lot of those entries are pop culture (am I really going to need an entry about the new Kraven Marvel series?) But we'll see. It isn't like this is a major space hog.
Last night snagged roughly 8,000 entries and it took up 250 megs. Plenty of space.
One thing which is lost in this process is any cross linking. I'd love to go through and add that back in, or even better figure out how to best avoid that bring stripped out from the start. We'll see. In any case, a fun diversion to distract me.
James Cameron on Oceangate
An excellent breakdown and explanation of it all, the response to it. I'm sorry for those who lost their lives, but it also feels pretty dumb to do something so stupid without properly researching the company running them.
The soundbite which deserves to be said over and over, hubris is a killer:
Now there's one wreck lying next to the other wreck, for the same damn reason.
And so the weekend begins
After a good productive workday the wife and I headed to our local plant business and bought some new plants for the yard, both flowering and fruiting. From there we came home and did some gardening before turning to some other chores.
I gave the car a light cleanout and moved some stuff around, and am now taking a breather before starting dinner soon.
I've also been fiddling more with Wikindle. I solved the issue of needing to find new articles to download. First off, it now can take in a list of page categories and pull all articles in that category. The goal is not to recreate Wikipedia on my local machine, but I do want my corpus of articles to be large enough that it covers the "normal" things people look out for. I also don't want bad articles, so I'm currently limiting all categories to be ones which are maintained for quality by Wikipedia.
As I write this, it's in the process of making the pull. We've ballooned from the 8000 this morning, to pulling almost 55,000.
Currently it is pulling from four categories to get that number (well, aside from the extra 100 it is pulling for being popular.)
- Category:Wikipedia level-4 vital articles
- Category:Good_articles
- Category:Main_topic_classifications
- Category:Featured_articles
The download process still has work to be done. I'm still not getting images from articles and I know some things are not translating smoothly, especially in the math sections.
The next action items as I see them:
First, figure out images. I'm not sure where they are being filtered out of the text, and then I need to be able to pull them down and convert the tag to work with the modern day markdown encoding for it.
Second, I need to dig into other conversions from html to markdown and look for other articles or issues with import.
Third, I want to also identify categories of articles I don't want. For example, I'm not going to go to this document for information about state roads in New Jersey (which is currently in the corpus.) So I'll need to add document filtering and a blacklist of articles so it doesn't get re-added.
Fourth, re-add cross linking via markdown/wiki text for articles which exist in my Wikindle.
And lastly, once this is all figured out I will need to figure out the whole "putting it on the kindle" or some other similar long-lasting device. The real nerdy thing would be building my own e-ink device or something. We'll see.
Bejeweled Prayer Book Identified as Belonging to Thomas Cromwell
Normally, I don't care about this sort of thing. But I think it's cool that they have a painting showing it, and that the claim is that it is the only item remaining from what was seen in the painting (including Cromwell, himself.)
New Book Review Function
The coding bug continues. I've been checking out bookmarks.reviews, a book review website from LitHub.
I'm not sure yet how it will translate for RSS and E-mail. As of now, the answer is poorly. But I'm going to work on it.
Here's what it currently looks like. I'm working on increasing the thumbnail size.
For those of you visiting in the browser, at least currently, you'll see the actual embedded code implementation:
An analysis of the near future of Russia and whether Putin will retain control
What is happening in Russia?
— Kamil Galeev (@kamilkazani) June 24, 2023
The mutiny is real. It is also unlikely to succeed. Most probable outcome is:
1. The mutiny fails
2. The regime stands (for a few months)
3. Upon its suppression, regime becomes increasingly dysfunctional -> falls
In other words, Kornilov putsch???? pic.twitter.com/ahczgDqBOW
(Spoilers: He likely will. But it won't be good for him or the country.)