Обсуждение: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)

Поиск

Список

Период

Сортировка

[BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)

От

"第108次明天"

Дата:

18 декабря 2025 г., 09:42:56

To: pgsql-bugs@postgresql.org

Subject: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)

Environment Information

PostgreSQL Version: 14.5
OS: Debian GNU/Linux 11 (bullseye)
Architecture: aarch64 (8-core CPU)
Memory: 8GB
Storage Medium: eMMC

Problem Description

Data corruption occurred in the tb_workstation_message table (record guid: ce04bd3f-232d-4c7f-9a91-540d0e581649). Queries targeting this record fail with:

plaintext

ERROR: invalid memory alloc request size 18446744073709551613

Corruption details of the record:

Only the guid field remains valid;
All int-type fields (e.g., upload_status) are set to 0;
All timestamp-type fields (e.g., create_time) are reset to 2000-01-01 00:00:00;
String-type fields (e.g., operator, content) are inaccessible (queryable only after forcing UPDATE to null).

Timeline

Record insertion: 2025-11-25 21:23:51
Last modification: 2025-11-27 09:41:10
Anomaly window: 2025-12-01 03:00:00 ~ 09:02:31 (record was intact before 03:00:00)
First error detected: 2025-12-01 09:02:31 (triggered by a 30s-interval business cron job querying upload_status = 0)

Error Output (Query Example)

sql

postgres=# select * from tb_workstation_message where guid = 'ce04bd3f-232d-4c7f-9a91-540d0e581649';
ERROR:  invalid memory alloc request size 18446744073709551613

Investigation Details

Database Logs: No output between 2025-12-01 05:37:05.932 ~ 09:02:31.533 (no abnormalities found).
Dynamic Library Check: libpq.so had environment variable misconfiguration (not pointing to PG install dir), but the business app does NOT depend on this library — low corruption probability.
eMMC Storage Analysis:
- Used dd to extract the corrupted block; hex analysis shows only guid bytes are valid (others are 0, confirming physical data corruption).
fstrim Check:
- fstrim executed at 2025-12-01 00:27 (asynchronous, runs on disk idle); anomaly occurred after this.
- Reproduction attempt: Simulated data insertion + 10+ fstrim runs — corruption not reproduced.

Impact

Business failure: Query errors trigger business alerts.
Backup failure: Database backups are affected by corrupted data.

Additional Notes

No system/app/PG restarts during the anomaly window.
No writes to tb_workstation_message after 2025-12-01 05:34:00.
No online operations/terminals between 05:34:00 ~ 09:02:31.

Request:

We suspect this may relate to fstrim on eMMC (with PG 14.5) but cannot reproduce it. Could the community help analyze the root cause, or guide further troubleshooting?

第108次明天

87326549@qq.com

从QQ邮箱发来的超大附件

postgresql_2025-12-01_000000(1).csv (739.5KB, 2026年1月17日 14:42)

进入下载页面

Вложения

D43EFDC4@A4F66438.70A2436900000000.png

Re: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)

От

Laurenz Albe

Дата:

18 декабря 2025 г., 11:40:56

On Thu, 2025-12-18 at 14:42 +0800, 第108次明天 wrote:
> Environment Information
>  * PostgreSQL Version: 14.5
>  * OS: Debian GNU/Linux 11 (bullseye)
>  * Architecture: aarch64 (8-core CPU)
>  * Memory: 8GB
>  * Storage Medium: eMMC
> Problem Description
> Data corruption occurred in the tb_workstation_message table (record guid: ce04bd3f-232d-4c7f-9a91-540d0e581649).
> Queries targeting this record fail with:
> ERROR: invalid memory alloc request size 18446744073709551613
> Corruption details of the record:
>  * Only the guid field remains valid;
>  * All int-type fields (e.g., upload_status) are set to 0;
>  * All timestamp-type fields (e.g., create_time) are reset to 2000-01-01 00:00:00;
>  * String-type fields (e.g., operator, content) are inaccessible (queryable only after forcing UPDATE to null).
> Investigation Details   1. Database Logs: No output between 2025-12-01 05:37:05.932 ~ 09:02:31.533 (no abnormalities
found).
>    2. Dynamic Library Check: libpq.so had environment variable misconfiguration (not pointing to PG install dir),
>       but the business app does NOT depend on this library — low corruption probability.
>    3. eMMC Storage Analysis:
>        - Used dd to extract the corrupted block; hex analysis shows only guid bytes are valid (others are 0,
>          confirming physical data corruption).
>    4. fstrim Check:
>        - fstrim executed at 2025-12-01 00:27 (asynchronous, runs on disk idle); anomaly occurred after this.
>        - Reproduction attempt: Simulated data insertion + 10+ fstrim runs — corruption not reproduced.
> Impact * Business failure: Query errors trigger business alerts.
>  * Backup failure: Database backups are affected by corrupted data.
> Additional Notes * No system/app/PG restarts during the anomaly window.
>  * No writes to tb_workstation_message after 2025-12-01 05:34:00.
>  * No online operations/terminals between 05:34:00 ~ 09:02:31.
> Request:
> We suspect this may relate to fstrim on eMMC (with PG 14.5) but cannot reproduce it. Could the community
> help analyze the root cause, or guide further troubleshooting?

"fstrim" should not destroy data...

I am wondering how this problem can lead to backup failure - unless you are using "pg_dump" for backups.

You probably figured out that you will have to delete the row to get rid of the problem.
In addition, you should dump and restore the database to a new cluster - who knows what
other data corruption is lurking there.

I don't know what caused your problem, but I would suspect hardware failure.
Perhaps you should check or replace the disk.

Yours,
Laurenz Albe

Re: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)

От

Greg Sabino Mullane

Дата:

18 декабря 2025 г., 16:30:51

PostgreSQL Version: 14.5

That is way too old. Upgrade to 14.20

or guide further troubleshooting?

Run the pg_checksums program to get a handle on the extent of the corruption

https://www.postgresql.org/docs/14/app-pgchecksums.html

Database backups are affected by corrupted data.

I assume you are not using a good backup system then. pgBackRest, for example, will not only check for corrupted pages during the backup, but report on it and allow the backup to proceed.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)

[BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)

Environment Information

Problem Description

Timeline

Error Output (Query Example)

Investigation Details

Impact

Additional Notes

Вложения

Re: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)

Re: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)