It's absurd how much I enjoy statistics
Mar. 27th, 2024 05:52 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
One of the nice things about having a local copy of your entire journal is, you can process all the content in absurd ways that your hosting service would frown upon.
For example:
LapLoaf:ljdump garote$ python ljdumptohtml.py Starting conversion for: garote Opening local database: garote/journal.db Fetching all entries from database Fetching all comments from database Fetching all icons from database Fetching all moods from database Your entries have 2531 image links to 2415 unique destinations.
That's a lot of images. Most of it appears to be thumbnails for photos I took while bicycling. Then there's the images I've re-used the most:
Top 20 most referenced images: 43 uses: http://stat.livejournal.com/img/userinfo.gif 4 uses: http://garote.bdmonkeys.net/livejournal/u5_window.gif 4 uses: http://garote.bdmonkeys.net/livejournal/duck_clock.gif 4 uses: http://garote.bdmonkeys.net/livejournal/bards_tale-pc-guild_indoors.gif 4 uses: http://garote.bdmonkeys.net/livejournal/Screenshot0135.gif 4 uses: http://garote.bdmonkeys.net/livejournal/viewp_gear.gif 3 uses: http://garote.bdmonkeys.net/livejournal/gold_statue.gif 3 uses: http://garote.bdmonkeys.net/livejournal/hacked_maze_ultima.gif 3 uses: http://garote.bdmonkeys.net/livejournal/2006-12-18_22-27-53-PICT0001.jpg 3 uses: http://garote.bdmonkeys.net/livejournal/al-tech-torque_resist.gif 3 uses: http://garote.bdmonkeys.net/livejournal/bb-fact-14.gif 3 uses: http://garote.bdmonkeys.net/livejournal/tje_nerdherd.gif 3 uses: http://garote.bdmonkeys.net/livejournal/sp_ua70.png 3 uses: http://garote.bdmonkeys.net/livejournal/ssi/PIC.DAX_19_7.png 2 uses: http://garote.bdmonkeys.net/livejournal/u5_hut.gif 2 uses: http://garote.bdmonkeys.net/livejournal/you_win.gif 2 uses: http://garote.bdmonkeys.net/livejournal/u5_stones.gif 2 uses: http://garote.bdmonkeys.net/livejournal/tmrpg_ryoohkistat1.gif 2 uses: http://garote.bdmonkeys.net/livejournal/wizardry-creepy_chef.gif 2 uses: http://garote.bdmonkeys.net/livejournal/snes-creepy_book.gif
This makes sense. All these have a general theme.
I'm exploring the idea of adding a feature that compiles all the image references in a journal, then attempts to fetch images to a local folder, and rewrites the link for all the ones it gets successfully.
I like to decorate my journal with bits of ancient abandoned game artwork I've extracted from emulated machines, and I often link to my photos hosted elsewhere. Many of the entries would look pretty shabby or even be incomprehensible without the images, which kinda wrecks the idea of making an archive. Hence this feature. If it actually works (and doesn't make my machine explode) I'll add it to the repo.
The tricky bits are:
- Making sure each image is stored once even if it's referenced many times.
- Keeping track of images that failed to fetch, so we don't retry them forever.
- Picking up where we left off with image fetching.
- Processing new entries so they can find images already fetched.
- Skipping images that are insane sizes like 15MB.
- And more stuff I haven't thought of yet...
(Edit: Two days later, I sat down and implemented it! Whooo!)