Python Helpers for String/Unicode Encoding, Decoding and Printing
String encoding and decoding as well as encoding detection can be a headache, more so in Python 2 than in Python 3. Here are two little helpers which are used in PDFx, the PDF metadata and reference extractor:
make_compat_str - decode any kind of bytes/str into an unicode object
print_to_console - print (unicode) strings to any kind of console (even windows with cp437, etc.)
print_to_console detects the output locale and tries to correctly encode the given (unicode) string. Using this you can safely print to any kind of terminal, either support UTF-8 or any other encoding (eg. Windows with cp437). Fallback to ascii with backslash-replace:
make_compat_str detects the encoding of a string or bytes object using chardet, and returns a standard unicode object. Just throw any kind of bytes / string at it!
If you have suggestions, feedback or ideas, please reach out to @metachris.