Re: Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3
От | Karsten Hilbert |
---|---|
Тема | Re: Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3 |
Дата | |
Msg-id | Zc89QYhCp0fM6GQB@hermes.hilbert.loc обсуждение исходный текст |
Ответ на | Re: Reporting UnicodeEncodeError info on arbitrary data sent to PG with psycopg3 ("Karl O. Pinc" <kop@karlpinc.com>) |
Список | psycopg |
Am Thu, Feb 15, 2024 at 11:45:15PM -0600 schrieb Karl O. Pinc: > Today there is no substitute for knowing the encoding of the > text your application obtains from the outside world. > This can be highly system dependent because when reading > files open()-ed as text, Python decodes (into UTF-8) the bytes read. Not quite. Python assumes the bytes in the file *are* encoded by whatever encoding is passed to open(), including, if so UTF-8). It then decodes said bytes into *unicode code points*. If we want them back as UTF-8 we need to encode them as such. > By default decoding from the system locale's character encoding. > And when writing files open()-ed as text Python encodes (from UTF-8) again, from unicode, that is: https://docs.python.org/3/howto/unicode.html > No matter how you get your data, to put your data into > the database as text, its bytes must first have their external > encoding decoded to UTF-8. Because Python strings are > UTF-8. unicode code points, but, yeah > Once in Python, psycopg converts the UTF-8 text to the database unicode > It's important to get the encoding right so I think it'd be > good to talk about it. +1 Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
В списке psycopg по дате отправления: