Обсуждение: Lc_collate & lc_type? whitespace and nbsp unique index...

Поиск
Список
Период
Сортировка

Lc_collate & lc_type? whitespace and nbsp unique index...

От
Ilmir Mulyukov
Дата:
Hello! 
I have lc_collate and lc_ctype ru_RU.UTF-8.
And have two similar value column in different rows.
Different is that in one string have whitespace symbol, but another non breakable whitespace symbol.
For example:
Something(whitespace)wrong
Something(non breakable space)wrong

And then when I try to create unique index I catch the error - can not insert duplicate key in row object.
That means that the two rows are the same.

Are there any settings in postgresql that can differentiate this two symbols?

Re: Lc_collate & lc_type? whitespace and nbsp unique index...

От
Ilmir Mulyukov
Дата:

Laurenz, there is no index for this table.
ka_im_pg=#select * from pg_indexes where tablename='_inforg31440';
 schemaname | tablename | indexname | tablespace | indexdef
------------+-----------+-----------+------------+----------
(0 rows)

ka_im_pg=# \d _inforg31440
                  Table "public._inforg31440"
    Column     |     Type      | Collation | Nullable | Default
---------------+---------------+-----------+----------+---------
 _fld31441     | mvarchar(400) |           | not null |
 _fld31442     | mvarchar(10)  |           | not null |
 _fld31443rref | bytea         |           | not null |
 _fld2102      | numeric(7,0)  |           | not null |



ka_im_pg=# INSERT INTO cyrillic VALUES (2, convert_from('\x57\x31\x39\x30\x33\x42\x50\x20\x57\x49\x4e\x4b\x4f\x44\x20\xd0\x9a\xd0\xbe\xd0\xbb\xd0\xbe\xd0\xb4\xd0\xba\xd0\xb8\x20\xd0\xb4\xd0\xb8\xd1\x81\xd0\xba\xd0\xbe\xd0\xb2\xd1\x8b\xd0\xb5\x2c\x20\xd0\xbf\xd0\xb5\xd1\x80\xd0\xb5\xd0\xb4\xd0\xbd\xd0\xb8\xd0\xb5', 'UTF-8'));
INSERT 0 1
ka_im_pg=# INSERT INTO cyrillic VALUES (3, convert_from('\x57\x31\x39\x30\x33\x42\x50\x20\x57\x49\x4e\x4b\x4f\x44\xc2\xa0\xd0\x9a\xd0\xbe\xd0\xbb\xd0\xbe\xd0\xb4\xd0\xba\xd0\xb8\xc2\xa0\xd0\xb4\xd0\xb8\xd1\x81\xd0\xba\xd0\xbe\xd0\xb2\xd1\x8b\xd0\xb5\x2c\x20\xd0\xbf\xd0\xb5\xd1\x80\xd0\xb5\xd0\xb4\xd0\xbd\xd0\xb8\xd0\xb5', 'UTF-8'));
INSERT 0 1

ka_im_pg=# SELECT * FROM cyrillic WHERE t = E'W1903BP WINKOD\u00A0Колодки\u00A0дисковые, передние';
 id |                     t
----+-------------------------------------------
  3 | W1903BP WINKOD Колодки дисковые, передние
(1 row)

ka_im_pg=# SELECT * FROM cyrillic WHERE t = E'W1903BP WINKOD Колодки дисковые, передние';
 id |                     t
----+-------------------------------------------
  2 | W1903BP WINKOD Колодки дисковые, передние
(1 row)

ka_im_pg=# select * from cyrillic;
 id |                     t
----+-------------------------------------------
  2 | W1903BP WINKOD Колодки дисковые, передние
  3 | W1903BP WINKOD Колодки дисковые, передние
(2 rows)

Your variant is working. But in my case the software which inserts the data into the database didn't use the convert_from function.
Try to find a workaround.
Thank you!

пн, 12 июл. 2021 г. в 17:26, Laurenz Albe <laurenz.albe@cybertec.at>:
Please include the list in your replies.

On Mon, 2021-07-12 at 13:07 +0600, Ilmir Mulyukov wrote:
> Laurenz, thank you for you help.
>
> ka_im_pg=# select * from _inforg31440 where _Fld31441='W1903BP WINKOD Колодки дисковые, передние';
> _fld31441 | _fld31442 | _fld31443rref | _fld2102
> -------------------------------------------+-----------+------------------------------------+----------
> W1903BP WINKOD Колодки дисковые, передние | | \x80e9a4bf010099b511ebd3d00c89f5f2 | 0
> W1903BP WINKOD Колодки дисковые, передние | 796 | \x80e9a4bf010099b511ebc98cf68225d2 | 0
> W1903BP WINKOD Колодки дисковые, передние | | \x80e9a4bf010099b511ebc5056610dedf | 0
> W1903BP WINKOD Колодки дисковые, передние | 796 | \x80e8a4bf010099b511ebae02a81729c0 | 0
> (4 rows)
>
>  Utf encode for 1 and 3 row:

> \x57\x31\x39\x30\x33\x42\x50\x20\x57\x49\x4e\x4b\x4f\x44\x20\xd0\x9a\xd0\xbe\xd0\xbb\xd0\xbe\xd0\xb4\xd0\xba\xd0\xb8\x20\xd0\xb4\xd0\xb8\xd1\x81\xd0\xba\xd0\xbe\xd0\xb2\xd1\x8b\xd0\xb5\x2c\x20\xd0\x
> bf\xd0\xb5\xd1\x80\xd0\xb5\xd0\xb4\xd0\xbd\xd0\xb8\xd0\xb5
>
> \x57\x31\x39\x30\x33\x42\x50\x20\x57\x49\x4e\x4b\x4f\x44\xc2\xa0\xd0\x9a\xd0\xbe\xd0\xbb\xd0\xbe\xd0\xb4\xd0\xba\xd0\xb8\xc2\xa0\xd0\xb4\xd0\xb8\xd1\x81\xd0\xba\xd0\xbe\xd0\xb2\xd1\x8b\xd0\xb5\x2c\x
> 20\xd0\xbf\xd0\xb5\xd1\x80\xd0\xb5\xd0\xb4\xd0\xbd\xd0\xb8\xd0\xb5

I doubt that:

CREATE TABLE cyrillic(id integer PRIMARY KEY, t text);

INSERT INTO cyrillic VALUES (1, convert_from('\x573139303342502057494e4b4f4420d09ad0bed0bbd0bed0b4d0bad0b820d0b4d0b8d181d0bad0bed0b2d18bd0b52c20d0bfd0b5d180d0b5d0b4d0bdd0b8d0b5', 'UTF8'));
INSERT INTO cyrillic VALUES (2, convert_from('\x573139303342502057494e4b4f44c2a0d09ad0bed0bbd0bed0b4d0bad0b8c2a0d0b4d0b8d181d0bad0bed0b2d18bd0b52c20d0bfd0b5d180d0b5d0b4d0bdd0b8d0b5', 'UTF8'));

SELECT * FROM cyrillic WHERE t = E'W1903BP WINKOD\u00A0Колодки\u00A0дисковые, передние';

 id │                     t                     
════╪═══════════════════════════════════════════
  2 │ W1903BP WINKOD Колодки дисковые, передние
(1 row)

SELECT * FROM cyrillic WHERE t = E'W1903BP WINKOD Колодки дисковые, передние';

 id │                     t                     
════╪═══════════════════════════════════════════
  1 │ W1903BP WINKOD Колодки дисковые, передние
(1 row)

Perhaps you got a corrupted index on the "_fld31441" column.
Does reindexing that index change things?

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Re: Lc_collate & lc_type? whitespace and nbsp unique index...

От
Tom Lane
Дата:
Ilmir Mulyukov <ilmir.mulyukov@gmail.com> writes:
> Laurenz, there is no index for this table.
> ka_im_pg=# \d _inforg31440
>                   Table "public._inforg31440"
>     Column     |     Type      | Collation | Nullable | Default
> ---------------+---------------+-----------+----------+---------
>  _fld31441     | mvarchar(400) |           | not null |
>  _fld31442     | mvarchar(10)  |           | not null |
>  _fld31443rref | bytea         |           | not null |
>  _fld2102      | numeric(7,0)  |           | not null |

What in the world is type "mvarchar"?  It's certainly not
anything defined by core Postgres.  I think you need to
address this question to whoever wrote that datatype.
It seems somewhat likely that this is intentional behavior
for the type.

            regards, tom lane



Re: Lc_collate & lc_type? whitespace and nbsp unique index...

От
Ilmir Mulyukov
Дата:
I assumed that the issue with the module which implements the mchar and mvarchar types...
Thank you, Laurenz and Tom for your help.


вт, 13 июл. 2021 г., 23:43 Tom Lane <tgl@sss.pgh.pa.us>:
Ilmir Mulyukov <ilmir.mulyukov@gmail.com> writes:
> Laurenz, there is no index for this table.
> ka_im_pg=# \d _inforg31440
>                   Table "public._inforg31440"
>     Column     |     Type      | Collation | Nullable | Default
> ---------------+---------------+-----------+----------+---------
>  _fld31441     | mvarchar(400) |           | not null |
>  _fld31442     | mvarchar(10)  |           | not null |
>  _fld31443rref | bytea         |           | not null |
>  _fld2102      | numeric(7,0)  |           | not null |

What in the world is type "mvarchar"?  It's certainly not
anything defined by core Postgres.  I think you need to
address this question to whoever wrote that datatype.
It seems somewhat likely that this is intentional behavior
for the type.

                        regards, tom lane