Unfortunately I'm not sure how to fix this issue. It's possible that LJ sends the encoding information as part of the XML when one fetches an entry, e.g.:
and if so, that can be used to decide what encoding to use when converting it to Unicode. But right now, unless there's some magic happening in the Python XML parser I don't know about, it always assumes UTF-8 so stuff in e.g. WINDOWS-1251 will get mangled.
LJ renders it just fine when presenting its own web interface, so either LJ preserves the encoding information internally, or it follows some kind of guessing procedure to convert it to UTF-8. One could theoretically answer that question by crawling through the LJ source code.
Re: <3
<?xml version="1.0" encoding="WINDOWS-1251"?> ..... </xml>
and if so, that can be used to decide what encoding to use when converting it to Unicode. But right now, unless there's some magic happening in the Python XML parser I don't know about, it always assumes UTF-8 so stuff in e.g. WINDOWS-1251 will get mangled.
LJ renders it just fine when presenting its own web interface, so either LJ preserves the encoding information internally, or it follows some kind of guessing procedure to convert it to UTF-8. One could theoretically answer that question by crawling through the LJ source code.