Re: EOL characters and multibyte encodings
От | Joe Conway |
---|---|
Тема | Re: EOL characters and multibyte encodings |
Дата | |
Msg-id | 467B00E1.7070400@joeconway.com обсуждение исходный текст |
Ответ на | Re: EOL characters and multibyte encodings (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >> My first thought on fixing this issue was to simply replace all >> instances of '\r' in pg_proc.prosrc with '\n' prior to sending it to the >> R parser. As far as I know, any instances of '\r' embedded in a >> syntactically valid R statement must be escaped (i.e. literally the >> characters "\" and "r"), so that should not be a problem. But I am >> concerned about how this potentially plays against multibyte characters. >> Is it safe to do this, or do I need to use a mb-aware replace algorithm? > > It's safe, because you'll be dealing with prosrc inside the backend, > therefore using a backend-legal encoding, and those don't have any ASCII > aliasing problems (all bytes of an MB character must have high bit set). Great -- I wasn't sure about that. > However I dislike doing it exactly that way because line numbers in the > R script will all get doubled. Unless R never reports errors in terms > of line numbers, you'd be better off to either delete the \r characters > or replace them with spaces. Good point. But I need to be able to deal with Apple EOLs too -- IIRC those can be *only* '\r'. So I guess I need to do a look-ahead whenever I run into '\r', see if it is followed by '\n', and then munge the string accordingly. Joe
В списке pgsql-hackers по дате отправления: