Multibyte still broken
От | Michael Robinson |
---|---|
Тема | Multibyte still broken |
Дата | |
Msg-id | 200005101408.WAA07324@netrinsics.com обсуждение исходный текст |
Ответы |
Re: Multibyte still broken
Re: Multibyte still broken |
Список | pgsql-hackers |
These are exerpts from a message from Tatsuo Ishii dated January 26, on the subject of fragile code in the multibyte routines: ---- begin ---- Defensive programming saves the system but does not user. Once corrupted data is stored in the system, it's totally useless for the user anyway. What about validating data *before* inserting it into a table? ---- end ---- ---- begin ---- > >Here it is. With this patch, copy out should be happy even with the > >wrong data. I'm not sure if it could be displayed correctly, though. > > Thank you very much. However, I think even this is too optimistic: > > >! if (*s & 0x80) > > Shouldn't it be something like: > > if ((*s & 0x80) && (*(s+1) & 0x80)) > > Even though "\242\242\242\0" is an invalid EUC sequence, it still shouldn't be > allowed to break the software. Thanks for the suggestion. More robust code is always good. ---- end ---- More robust code may always be good, but "good" apparently doesn't always go into the tree. Imagine my surprise, while upgrading a production server from 6.5.3 to 7.0, when the data dumped from the old database failed to load into the new database (well, crashed the backend, to be specific). Apparently the "validate your own damn data" sentiment of the first excerpt above has prevailed, because, on inspection, the MB code is just as fragile as it was five months ago. I was forced to perform emergency repairs to my database dump file to fool a non-multibyte 7.0 into accepting it. Since EUC_CN is compatible with Latin-1, and since the benefits of multibyte are small compared to the risks, I intend to stick with unibyte Postgres henceforth. I would, though, recommend a warning in the "INSTALL" file along the lines of: "WARNING: Use of improperly-encoded text with multi-byte support enabled WILL lead to data corruption and/or loss. Donot enable multi-byte support unless you intend to fully validate your own damn data." -Michael Robinson
В списке pgsql-hackers по дате отправления: