garote: (Default)
garote ([personal profile] garote) wrote 2024-09-08 01:05 am (UTC)

Re: <3

Unfortunately I'm not sure how to fix this issue. It's possible that LJ sends the encoding information as part of the XML when one fetches an entry, e.g.:

<?xml version="1.0" encoding="WINDOWS-1251"?> ..... </xml>

and if so, that can be used to decide what encoding to use when converting it to Unicode. But right now, unless there's some magic happening in the Python XML parser I don't know about, it always assumes UTF-8 so stuff in e.g. WINDOWS-1251 will get mangled.

LJ renders it just fine when presenting its own web interface, so either LJ preserves the encoding information internally, or it follows some kind of guessing procedure to convert it to UTF-8. One could theoretically answer that question by crawling through the LJ source code.

Post a comment in response:

(will be screened)
(will be screened if not validated)
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting