Обсуждение: Re: [BUGS] Nasty tsvector can make dumps unrestorable

Поиск
Список
Период
Сортировка

Re: [BUGS] Nasty tsvector can make dumps unrestorable

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Stuart Bishop <stuart@stuartbishop.net> writes:
> > The attached script creates a tsvector with a value that can be dumped using
> > pg_dump, but not loaded again using pg_restore. This causes restores of a
> > dump containing this value to fail.
>
> Hmm, sorta looks like tsvectorout should be doubling backslashes?

I think the larger question is why tsvectorin() requires
double-backslashes?  It seems it is for marking of single-quotes in
phrases, from what I can tell from the code and regression test usage:

    SELECT E'''1 \\''2'' 3'::tsvector;
      tsvector
    -------------
     '3' '1 ''2'
    (1 row)

My guess is that the '' is used to start/stop phrases, and \\'' puts a
literal '' in the phrase.

I have developed the attached patch which doubles backslashes on output:

    test=> INSERT INTO Foo(bar) VALUES (E'\\\\x');
    INSERT 0 1
    test=> select * from foo;
      bar
    -------
     '\\x'
    (1 row)

However, I am still unclear if the dump code is correct because I don't
see the backslash preserved in \\'' cases, just \\\\ cases:

    test=> CREATE TABLE Foo(bar tsvector);
    CREATE
    test=> INSERT INTO Foo(bar) VALUES (E'\\''x');
    INSERT 0 1
    test=> select * from foo;
      bar
    -------
     '''x'
    (1 row)

and pg_dump outputs:

    COPY foo (bar) FROM stdin;
    '''x'
    \.


While the COPY will load into the table, this doesn't:

    test=> INSERT INTO Foo(bar) VALUES (E'''''x');
    ERROR:  syntax error in tsvector: "''x"

I am confused.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/utils/adt/tsvector.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/tsvector.c,v
retrieving revision 1.6
diff -c -c -r1.6 tsvector.c
*** src/backend/utils/adt/tsvector.c    23 Oct 2007 00:51:23 -0000    1.6
--- src/backend/utils/adt/tsvector.c    9 Nov 2007 23:59:06 -0000
***************
*** 345,350 ****
--- 345,352 ----

              if (t_iseq(curin, '\''))
                  *curout++ = '\'';
+             else if (t_iseq(curin, '\\'))
+                 *curout++ = '\\';

              while (len--)
                  *curout++ = *curin++;

Re: [BUGS] Nasty tsvector can make dumps unrestorable

От
Andrew Dunstan
Дата:

Bruce Momjian wrote:
> However, I am still unclear if the dump code is correct because I don't
> see the backslash preserved in \\'' cases, just \\\\ cases:
>
>     test=> CREATE TABLE Foo(bar tsvector);
>     CREATE
>     test=> INSERT INTO Foo(bar) VALUES (E'\\''x');
>     INSERT 0 1
>     test=> select * from foo;
>       bar
>     -------
>      '''x'
>     (1 row)
>
> and pg_dump outputs:
>
>     COPY foo (bar) FROM stdin;
>     '''x'
>     \.
>
>
> While the COPY will load into the table, this doesn't:
>
>     test=> INSERT INTO Foo(bar) VALUES (E'''''x');
>     ERROR:  syntax error in tsvector: "''x"
>
> I am confused.
>
>   


These two are not equivalent. What happens if you try this?
  INSERT INTO Foo(bar) VALUES (E'''''''x''');


cheers

andrew




Re: [BUGS] Nasty tsvector can make dumps unrestorable

От
Bruce Momjian
Дата:
Andrew Dunstan wrote:
> > While the COPY will load into the table, this doesn't:
> >
> >     test=> INSERT INTO Foo(bar) VALUES (E'''''x');
> >     ERROR:  syntax error in tsvector: "''x"
> >
> > I am confused.
> >
> >   
> 
> 
> These two are not equivalent. What happens if you try this?
> 
>    INSERT INTO Foo(bar) VALUES (E'''''''x''');
test=>    INSERT INTO Foo(bar) VALUES (E'''''''x''');INSERT 0 1test=> select * from foo;  bar------- '''x'(1 row)

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [BUGS] Nasty tsvector can make dumps unrestorable

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> However, I am still unclear if the dump code is correct because I don't
> see the backslash preserved in \\'' cases, just \\\\ cases:

>     test=> INSERT INTO Foo(bar) VALUES (E'\\''x');

You're just confused.  That produces a word whose contents are the
two characters 'x, so either '\'x' or '''x' would be legitimate
output.

However, I'd prefer to see Teodor fix this, because it needs to be
back-patched too, and I'm not entirely sure if there are other
consequences.
        regards, tom lane