Обсуждение: Lc_collate & lc_type? whitespace and nbsp unique index...
Hello!
I have lc_collate and lc_ctype ru_RU.UTF-8.
And have two similar value column in different rows.
Different is that in one string have whitespace symbol, but another non breakable whitespace symbol.
For example:
Something(whitespace)wrong
Something(non breakable space)wrong
And then when I try to create unique index I catch the error - can not insert duplicate key in row object.
That means that the two rows are the same.
Are there any settings in postgresql that can differentiate this two symbols?
Laurenz, there is no index for this table.
ka_im_pg=#select * from pg_indexes where tablename='_inforg31440';
schemaname | tablename | indexname | tablespace | indexdef
------------+-----------+-----------+------------+----------
(0 rows)
schemaname | tablename | indexname | tablespace | indexdef
------------+-----------+-----------+------------+----------
(0 rows)
ka_im_pg=# \d _inforg31440
Table "public._inforg31440"
Column | Type | Collation | Nullable | Default
---------------+---------------+-----------+----------+---------
_fld31441 | mvarchar(400) | | not null |
_fld31442 | mvarchar(10) | | not null |
_fld31443rref | bytea | | not null |
_fld2102 | numeric(7,0) | | not null |
Table "public._inforg31440"
Column | Type | Collation | Nullable | Default
---------------+---------------+-----------+----------+---------
_fld31441 | mvarchar(400) | | not null |
_fld31442 | mvarchar(10) | | not null |
_fld31443rref | bytea | | not null |
_fld2102 | numeric(7,0) | | not null |
ka_im_pg=# INSERT INTO cyrillic VALUES (2, convert_from('\x57\x31\x39\x30\x33\x42\x50\x20\x57\x49\x4e\x4b\x4f\x44\x20\xd0\x9a\xd0\xbe\xd0\xbb\xd0\xbe\xd0\xb4\xd0\xba\xd0\xb8\x20\xd0\xb4\xd0\xb8\xd1\x81\xd0\xba\xd0\xbe\xd0\xb2\xd1\x8b\xd0\xb5\x2c\x20\xd0\xbf\xd0\xb5\xd1\x80\xd0\xb5\xd0\xb4\xd0\xbd\xd0\xb8\xd0\xb5', 'UTF-8'));
INSERT 0 1
INSERT 0 1
ka_im_pg=# INSERT INTO cyrillic VALUES (3, convert_from('\x57\x31\x39\x30\x33\x42\x50\x20\x57\x49\x4e\x4b\x4f\x44\xc2\xa0\xd0\x9a\xd0\xbe\xd0\xbb\xd0\xbe\xd0\xb4\xd0\xba\xd0\xb8\xc2\xa0\xd0\xb4\xd0\xb8\xd1\x81\xd0\xba\xd0\xbe\xd0\xb2\xd1\x8b\xd0\xb5\x2c\x20\xd0\xbf\xd0\xb5\xd1\x80\xd0\xb5\xd0\xb4\xd0\xbd\xd0\xb8\xd0\xb5', 'UTF-8'));
INSERT 0 1
INSERT 0 1
ka_im_pg=# SELECT * FROM cyrillic WHERE t = E'W1903BP WINKOD\u00A0Колодки\u00A0дисковые, передние';
id | t
----+-------------------------------------------
3 | W1903BP WINKOD Колодки дисковые, передние
(1 row)
id | t
----+-------------------------------------------
3 | W1903BP WINKOD Колодки дисковые, передние
(1 row)
ka_im_pg=# SELECT * FROM cyrillic WHERE t = E'W1903BP WINKOD Колодки дисковые, передние';
id | t
----+-------------------------------------------
2 | W1903BP WINKOD Колодки дисковые, передние
(1 row)
id | t
----+-------------------------------------------
2 | W1903BP WINKOD Колодки дисковые, передние
(1 row)
ka_im_pg=# select * from cyrillic;
id | t
----+-------------------------------------------
2 | W1903BP WINKOD Колодки дисковые, передние
3 | W1903BP WINKOD Колодки дисковые, передние
(2 rows)
id | t
----+-------------------------------------------
2 | W1903BP WINKOD Колодки дисковые, передние
3 | W1903BP WINKOD Колодки дисковые, передние
(2 rows)
Your variant is working. But in my case the software which inserts the data into the database didn't use the convert_from function.
Try to find a workaround.
Thank you!
пн, 12 июл. 2021 г. в 17:26, Laurenz Albe <laurenz.albe@cybertec.at>:
Please include the list in your replies.
On Mon, 2021-07-12 at 13:07 +0600, Ilmir Mulyukov wrote:
> Laurenz, thank you for you help.
>
> ka_im_pg=# select * from _inforg31440 where _Fld31441='W1903BP WINKOD Колодки дисковые, передние';
> _fld31441 | _fld31442 | _fld31443rref | _fld2102
> -------------------------------------------+-----------+------------------------------------+----------
> W1903BP WINKOD Колодки дисковые, передние | | \x80e9a4bf010099b511ebd3d00c89f5f2 | 0
> W1903BP WINKOD Колодки дисковые, передние | 796 | \x80e9a4bf010099b511ebc98cf68225d2 | 0
> W1903BP WINKOD Колодки дисковые, передние | | \x80e9a4bf010099b511ebc5056610dedf | 0
> W1903BP WINKOD Колодки дисковые, передние | 796 | \x80e8a4bf010099b511ebae02a81729c0 | 0
> (4 rows)
>
> Utf encode for 1 and 3 row:
>
> \x57\x31\x39\x30\x33\x42\x50\x20\x57\x49\x4e\x4b\x4f\x44\x20\xd0\x9a\xd0\xbe\xd0\xbb\xd0\xbe\xd0\xb4\xd0\xba\xd0\xb8\x20\xd0\xb4\xd0\xb8\xd1\x81\xd0\xba\xd0\xbe\xd0\xb2\xd1\x8b\xd0\xb5\x2c\x20\xd0\x
> bf\xd0\xb5\xd1\x80\xd0\xb5\xd0\xb4\xd0\xbd\xd0\xb8\xd0\xb5
>
> \x57\x31\x39\x30\x33\x42\x50\x20\x57\x49\x4e\x4b\x4f\x44\xc2\xa0\xd0\x9a\xd0\xbe\xd0\xbb\xd0\xbe\xd0\xb4\xd0\xba\xd0\xb8\xc2\xa0\xd0\xb4\xd0\xb8\xd1\x81\xd0\xba\xd0\xbe\xd0\xb2\xd1\x8b\xd0\xb5\x2c\x
> 20\xd0\xbf\xd0\xb5\xd1\x80\xd0\xb5\xd0\xb4\xd0\xbd\xd0\xb8\xd0\xb5
I doubt that:
CREATE TABLE cyrillic(id integer PRIMARY KEY, t text);
INSERT INTO cyrillic VALUES (1, convert_from('\x573139303342502057494e4b4f4420d09ad0bed0bbd0bed0b4d0bad0b820d0b4d0b8d181d0bad0bed0b2d18bd0b52c20d0bfd0b5d180d0b5d0b4d0bdd0b8d0b5', 'UTF8'));
INSERT INTO cyrillic VALUES (2, convert_from('\x573139303342502057494e4b4f44c2a0d09ad0bed0bbd0bed0b4d0bad0b8c2a0d0b4d0b8d181d0bad0bed0b2d18bd0b52c20d0bfd0b5d180d0b5d0b4d0bdd0b8d0b5', 'UTF8'));
SELECT * FROM cyrillic WHERE t = E'W1903BP WINKOD\u00A0Колодки\u00A0дисковые, передние';
id │ t
════╪═══════════════════════════════════════════
2 │ W1903BP WINKOD Колодки дисковые, передние
(1 row)
SELECT * FROM cyrillic WHERE t = E'W1903BP WINKOD Колодки дисковые, передние';
id │ t
════╪═══════════════════════════════════════════
1 │ W1903BP WINKOD Колодки дисковые, передние
(1 row)
Perhaps you got a corrupted index on the "_fld31441" column.
Does reindexing that index change things?
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
Ilmir Mulyukov <ilmir.mulyukov@gmail.com> writes: > Laurenz, there is no index for this table. > ka_im_pg=# \d _inforg31440 > Table "public._inforg31440" > Column | Type | Collation | Nullable | Default > ---------------+---------------+-----------+----------+--------- > _fld31441 | mvarchar(400) | | not null | > _fld31442 | mvarchar(10) | | not null | > _fld31443rref | bytea | | not null | > _fld2102 | numeric(7,0) | | not null | What in the world is type "mvarchar"? It's certainly not anything defined by core Postgres. I think you need to address this question to whoever wrote that datatype. It seems somewhat likely that this is intentional behavior for the type. regards, tom lane
I assumed that the issue with the module which implements the mchar and mvarchar types...
Thank you, Laurenz and Tom for your help.
вт, 13 июл. 2021 г., 23:43 Tom Lane <tgl@sss.pgh.pa.us>:
Ilmir Mulyukov <ilmir.mulyukov@gmail.com> writes:
> Laurenz, there is no index for this table.
> ka_im_pg=# \d _inforg31440
> Table "public._inforg31440"
> Column | Type | Collation | Nullable | Default
> ---------------+---------------+-----------+----------+---------
> _fld31441 | mvarchar(400) | | not null |
> _fld31442 | mvarchar(10) | | not null |
> _fld31443rref | bytea | | not null |
> _fld2102 | numeric(7,0) | | not null |
What in the world is type "mvarchar"? It's certainly not
anything defined by core Postgres. I think you need to
address this question to whoever wrote that datatype.
It seems somewhat likely that this is intentional behavior
for the type.
regards, tom lane