TrickJarrett.com

Friday, June 23rd, 2023

« Previous Day Next Day »

Update on last night's 'Wikindle' code

6/23/2023 6:48 am | : 1 min.

Code ran without issue, though the formatting was slightly off.

This morning I hammered out the code to grab the top 100 most popular entries from yesterday and add them to the archive. I'm not sure how useful that will actually be, a lot of those entries are pop culture (am I really going to need an entry about the new Kraven Marvel series?) But we'll see. It isn't like this is a major space hog.

Last night snagged roughly 8,000 entries and it took up 250 megs. Plenty of space.

One thing which is lost in this process is any cross linking. I'd love to go through and add that back in, or even better figure out how to best avoid that bring stripped out from the start. We'll see. In any case, a fun diversion to distract me.

Share to: | Tags: programming, project, wikipedia, python

James Cameron on Oceangate


An excellent breakdown and explanation of it all, the response to it. I'm sorry for those who lost their lives, but it also feels pretty dumb to do something so stupid without properly researching the company running them.

The soundbite which deserves to be said over and over, hubris is a killer:

Now there's one wreck lying next to the other wreck, for the same damn reason.

Share to: | Tags: james cameron, titanic

And so the weekend begins

6/23/2023 5:55 pm | : 5 mins.

After a good productive workday the wife and I headed to our local plant business and bought some new plants for the yard, both flowering and fruiting. From there we came home and did some gardening before turning to some other chores.

I gave the car a light cleanout and moved some stuff around, and am now taking a breather before starting dinner soon.

I've also been fiddling more with Wikindle. I solved the issue of needing to find new articles to download. First off, it now can take in a list of page categories and pull all articles in that category. The goal is not to recreate Wikipedia on my local machine, but I do want my corpus of articles to be large enough that it covers the "normal" things people look out for. I also don't want bad articles, so I'm currently limiting all categories to be ones which are maintained for quality by Wikipedia.

As I write this, it's in the process of making the pull. We've ballooned from the 8000 this morning, to pulling almost 55,000.

Currently it is pulling from four categories to get that number (well, aside from the extra 100 it is pulling for being popular.)

The download process still has work to be done. I'm still not getting images from articles and I know some things are not translating smoothly, especially in the math sections.

The next action items as I see them:

First, figure out images. I'm not sure where they are being filtered out of the text, and then I need to be able to pull them down and convert the tag to work with the modern day markdown encoding for it.

Second, I need to dig into other conversions from html to markdown and look for other articles or issues with import.

Third, I want to also identify categories of articles I don't want. For example, I'm not going to go to this document for information about state roads in New Jersey (which is currently in the corpus.) So I'll need to add document filtering and a blacklist of articles so it doesn't get re-added.

Fourth, re-add cross linking via markdown/wiki text for articles which exist in my Wikindle.

And lastly, once this is all figured out I will need to figure out the whole "putting it on the kindle" or some other similar long-lasting device. The real nerdy thing would be building my own e-ink device or something. We'll see.

Share to: | Tags: life, programming

Bejeweled Prayer Book Identified as Belonging to Thomas Cromwell

Normally, I don't care about this sort of thing. But I think it's cool that they have a painting showing it, and that the claim is that it is the only item remaining from what was seen in the painting (including Cromwell, himself.)

Share to: | Tags: thomas cromwell

New Book Review Function

6/23/2023 6:22 pm | : 1 min.

The coding bug continues. I've been checking out bookmarks.reviews, a book review website from LitHub.

I'm not sure yet how it will translate for RSS and E-mail. As of now, the answer is poorly. But I'm going to work on it.

Here's what it currently looks like. I'm working on increasing the thumbnail size.

For those of you visiting in the browser, at least currently, you'll see the actual embedded code implementation:

Monsters by Claire Dederer
Monsters by Claire Dederer
Share to: | Tags: programming, glowbug

An analysis of the near future of Russia and whether Putin will retain control


(Spoilers: He likely will. But it won't be good for him or the country.)

Share to: | Tags: coup, wagner group, vladimir putin, russia

End of night update: The script is running, it's currently almost done with articles starting with O.

It's been running for probably 15 hours now.

We are up to 927 megs of text data.

I am estimating it will be in U/V when I wake up. We'll see.

Share to: | Tags: programming, wikindle
« Previous Day Next Day »