Re: Duplicate primary key when primary key is varchar
От | Oliver Jowett |
---|---|
Тема | Re: Duplicate primary key when primary key is varchar |
Дата | |
Msg-id | 4CAC62D4.2020800@opencloud.com обсуждение исходный текст |
Ответ на | Duplicate primary key when primary key is varchar (Tính Trương Xuân <tinh.truong@evolus.vn>) |
Список | pgsql-jdbc |
Tính Trương Xuân wrote: > In the State table, you will notice that there are two rows with the > primary key is AL (it's capital A and capital L). But when you select: > select * from state where code='AL' > There is only one row returned. You may guest that there are some spaces > in the primary key? Nope! > You can verify by this query: > select 'a'||code||'b' from state > You will see that there are two rows with the data as aALb. No space in > between! You do have a space in one of the rows - it's a ZERO WIDTH NO-BREAK SPACE (U+FEFF), also used as a byte order mark in UTF-16. Here's the relevant bit courtesy of hexdump -C: > 00014da0 3a 20 70 6f 73 74 67 72 65 73 0a 2d 2d 0a 0a 43 |: postgres.--..C| > 00014db0 4f 50 59 20 73 74 61 74 65 20 28 63 6f 64 65 2c |OPY state (code,| > 00014dc0 20 6e 61 6d 65 2c 20 73 74 61 74 75 73 29 20 46 | name, status) F| > 00014dd0 52 4f 4d 20 73 74 64 69 6e 3b 0a ef bb bf 41 4c |ROM stdin;....AL| > 00014de0 09 41 4c 41 42 41 4d 41 09 41 43 54 49 56 45 0a |.ALABAMA.ACTIVE.| > 00014df0 41 4b 09 41 4c 41 53 4b 41 09 41 43 54 49 56 45 |AK.ALASKA.ACTIVE| > 00014e00 0a 41 5a 09 41 52 49 5a 4f 4e 41 09 41 43 54 49 |.AZ.ARIZONA.ACTI| > 00014e10 56 45 0a 41 52 09 41 52 4b 41 4e 53 41 53 09 41 |VE.AR.ARKANSAS.A| Note the "0a ef bb bf 41 4c" - that's a linefeed, followed by the UTF-8 encoding of U+FEFF, followed by "AL" "less -U" renders it like this: > COPY state (code, name, status) FROM stdin; > <U+FEFF>AL^IALABAMA^IACTIVE > AK^IALASKA^IACTIVE I guess that your UTF-16 input starts with a BOM, and you're just passing it through unchanged. Perhaps you are explicitly claiming to your parser that the input is UTF-16LE, which shouldn't start with a BOM, so your parser is interpreting it as a no-break space rather than filtering it out? Wikipedia says "For the IANA registered charsets UTF-16BE and UTF-16LE, a byte order mark must not be used because the names of these character sets already determine the byte order. If encountered, an initial U+FEFF must be interpreted as a (deprecated) "zero width no-break space".' Oliver
В списке pgsql-jdbc по дате отправления: