garote | Updated Dreamwidth backup script

For quite a while I've been looking for some nice way to get a complete backup of my Dreamwidth content onto my local machine. And I gotta wonder... Is this not a very popular thing? There are a lot of users on here, posting a lot of cool and unique content. Wouldn't they want to have a copy, just in case something goes terribly wrong?

I found a Python script that does a backup, and was patched to work with Dreamwidth, but the backup took the form of a huge pile of XML files. Thousands of them. I wanted something more flexible, so I forked the script and added an optional flag that writes everything (entries, comments, userpic info) to a single SQLite database.

https://github.com/GBirkel/ljdump

Folks on MacOS can just grab the contents of the repo and run the script. All the supporting modules should already be present in the OS. Windows people will need to install some version of Python.

For what it's worth, here's the old discussion forum for the first version of the script, released way back around 2009.

Update, 2024-03-25:

The script now also downloads and stores tag and mood information.

Update, 2024-03-26:

After synchronizing, the script now generates browseable HTML files of the journal, including entries for individual pages with comment threads, and linked history pages showing 20 entries at a time.

Moods, music, tags, and custom icons are shown for the entries where applicable.

Currently the script uses the stylesheet for my personal journal (this one), but you can drop in the styles for yours and it should accept them. The structure of the HTML is rendered as close as possible to what Dreamwidth makes.

Update, 2024-03-28:

The script can also attempt to store local copies of the images embedded in journal entries. It organizes them by month in an images folder next to all the HTML. This feature is enabled with a "--cache_images" argument.

Every time you run it, it will attempt to cache 200 more images, going from oldest to newest. It will skip over images it's already tried and failed to fetch, until 24 hours have gone by, then it will try those images once again.

The image links in your entries are left unchanged in the database. They're swapped for local links only in the generated HTML pages.

Update, 2024-04-02:

The script is now ported to Python 3, and tested on both Windows and MacOS. I've added new setup instructions for both that are a little easier to follow.

Update, 2024-04-30:

Added an option to stop the script from trying to cache images that failed to cache once already.

2024-06-26: Version 1.7.6

Attempt to fix music field parsing for some entries.
Fix for crash on missing security properties for some entries.
Image fetch timeout reduced from 5 seconds to 4 seconds.

2024-08-14: Version 1.7.7

Slightly improves unicode handling in tags and the music field.

2024-09-07: Version 1.7.8

Changes "stop at fifty" command line flag to a "max n" argument, with a default of 400, and applies it to comments as well as entries. This may help people who have thousands of comments complete their initial download. I recommend using the default at least once, then using a value of 1500 afterward until you're caught up.

2024-09-18: Version 1.7.9

Table of contents for the table of contents!
First version of an "uncached images" report to help people find broken image links in their journal.

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Most Popular Tags

australia - 12 uses
bicycling - 144 uses
books - 61 uses
clarke - 22 uses
colorado - 38 uses
computing - 83 uses
dreams - 30 uses
elder scrolls - 28 uses
food - 22 uses
house - 29 uses
mira - 13 uses
mixes - 18 uses
movies - 30 uses
music - 22 uses
new zealand - 20 uses
philosophy - 38 uses
poetry - 6 uses
politics - 85 uses
romance - 56 uses
television - 8 uses
top ten - 11 uses
ucsc - 8 uses
work - 38 uses

Flat | Top-Level Comments Only

From:

garote

Hah! Yeah, my profile is frozen in time as of about 20 years ago.
Sounds like an adorable cat. Of the three currently in my life, one is too old and distinguished to push over, one would wander sullenly away, and the other would gently begin murdering my foot. Viva variety!

annieeats

I miss cats, haven't had any in ages.

Question: If I edit an old post that I have already downloaded, does the program go back and make that update? I never finished tagging old posts and it's a work in progress.

Yes. The Livejournal-inspired software makes an “event log” that includes events like “edited an older entry”, and the script works by catching up with the event log.

I post a lot of backdated stuff so it’s a pretty important feature for me. :D

Thank you again! I used to use LJdump but it didn't work properly with DW when tried a few years ago, so I'm so glad I found this, and much improved, no less. Was actually getting ready to manually save my pages as PDFs or something desperate like that. Can I buy you a virtual cup of coffee or something?

Huh... You know, I've never set up any kind of donationey platform for anything I've done. Tell ya what... Next time you find yourself on the West Coast, in the Berkeley area, I'll meet you and whatever folks you have with you, and you can buy me an iced mocha at the best iced mocha place around (The Baker And Commons Cafe).

Until then, don't worry about it. :)

Got it! I was in Berkeley (stayed at the Maida Center for a retreat) about 8 years ago and loved the Bay area, you never know when I might show up again.

Peeks, Pokes, and Pointers!

Updated Dreamwidth backup script

Updated Dreamwidth backup script

no subject

no subject

no subject

no subject

no subject

no subject

Profile

April 2025

Most Popular Tags

Page Summary