[MDEV-26276] Significant data corruption after dropping/adding database Created: 2021-07-29  Updated: 2021-08-03

Status: Open
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.5.11
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Nathan Jensen Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: regression
Environment:

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic

dpkg -l | grep maria
ii libmariadb-dev 1:10.5.11+maria~bionic amd64 MariaDB database development files
ii libmariadb3:amd64 1:10.5.11+maria~bionic amd64 MariaDB database client library
ii libmariadbclient18 1:10.5.11+maria~bionic amd64 Virtual package to satisfy external libmariadbclient18 depends
ii mariadb-client 1:10.5.11+maria~bionic all MariaDB database client (metapackage depending on the latest version)
rc mariadb-client-10.1 1:10.1.48-0ubuntu0.18.04.1 amd64 MariaDB database client binaries
ii mariadb-client-10.5 1:10.5.11+maria~bionic amd64 MariaDB database client binaries
ii mariadb-client-core-10.5 1:10.5.11+maria~bionic amd64 MariaDB database core client binaries
ii mariadb-common 1:10.5.11+maria~bionic all MariaDB common configuration files
ii mariadb-server 1:10.5.11+maria~bionic all MariaDB database server (metapackage depending on the latest version)
ii mariadb-server-10.5 1:10.5.11+maria~bionic amd64 MariaDB database server binaries
ii mariadb-server-core-10.5 1:10.5.11+maria~bionic amd64 MariaDB database core server files



 Description   

I'm observing significant data corruption after dropping and loading a schema. This is so significant that I thought others would have reported it as well. However, I have searched and found nothing. I can reproduce the error using only SQL commands, so it seems legit.

  1. Starting state: The database is up and running well. I have used the C/C++ API to add several entries to the database
  2. The process writing to the DB is stopped, (tried both killing and stopping cleanly)
  3. Here is one entry in the DB:

MariaDB [sigmaDB]> select * from path;
+-----+-------+------------------+--------------------+----------------------+-----------------+--------------+--------------+-------------------+--------------+----------+-----------+---------+------------+--------------+----------+
| idx | name  | uuid             | primaryCollectUuid | componentId          | sigmaId         | reference_id | wptspacing_m | distanceMarginMet | hdgMarginDeg | approved | completed | aborted | restricted | callbacksSet | priority |
+-----+-------+------------------+--------------------+----------------------+-----------------+--------------+--------------+-------------------+--------------+----------+-----------+---------+------------+--------------+----------+
|   1 | 1:Lot | t��~��N������a▒�            | ���4B#�S�����?            | 14481131123844579333 | 220964525205148 |            0 |          100 |                10 |          360 |        1 |         0 |       0 |          0 |            1 |        0 |
+-----+-------+------------------+--------------------+----------------------+-----------------+--------------+--------------+-------------------+--------------+----------+-----------+---------+------------+--------------+----------+
1 row in set (0.000 sec)
 
MariaDB [sigmaDB]> select count(*) from path;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.001 sec)

  1. Then I drop the DB and it appears to be really gone

MariaDB [sigmaDB]> drop database sigmaDB;
Query OK, 37 rows affected (0.101 sec)
 
MariaDB [(none)]> use sigmaDB;
ERROR 1049 (42000): Unknown database 'sigmaDB'
MariaDB [(none)]> select * from path;
ERROR 1046 (3D000): No database selected
MariaDB [(none)]>

  1. Then I reload the schema (without any inserted data)

root@c01045:/sigma# grep -i insert sigmaDB.sql
root@c01045:/sigma# cat sigmaDB.sql | mysql -u sigma -p 
Enter password: 
root@c01045:/sigma#

  1. Now I look in the db in the path table and magically I have data that existed from before the previous drop database command

MariaDB [sigmaDB]> select * from path;
+-----+-------+------------------+--------------------+----------------------+-----------------+--------------+--------------+-------------------+--------------+----------+-----------+---------+------------+--------------+----------+
| idx | name  | uuid             | primaryCollectUuid | componentId          | sigmaId         | reference_id | wptspacing_m | distanceMarginMet | hdgMarginDeg | approved | completed | aborted | restricted | callbacksSet | priority |
+-----+-------+------------------+--------------------+----------------------+-----------------+--------------+--------------+-------------------+--------------+----------+-----------+---------+------------+--------------+----------+
|   1 | 1:Lot | t��~��N������a▒�            | ���4B#�S�����?            | 14481131123844579333 | 220964525205148 |            0 |          100 |                10 |          360 |        1 |         0 |       0 |          0 |            1 |        0 |
+-----+-------+------------------+--------------------+----------------------+-----------------+--------------+--------------+-------------------+--------------+----------+-----------+---------+------------+--------------+----------+
1 row in set (0.000 sec)

  1. Then I restart mariadb service

root@c01045:/sigma# systemctl restart mysql
root@c01045:/sigma#

  1. After the restart of the service, the "phantom row" is now gone from the database

MariaDB [sigmaDB]> select * from path;
Empty set (0.002 sec)
MariaDB [sigmaDB]>

This really makes me think that the drop/reload isn't happening as it should. It seems like some amount of data is being stored in memory and not destroyed through the drop then schema load. Once the service is brought down and restarted, this cached data is no longer there and the querries work appropriately. Note that rolling back to 10.3 removes this problem/behavior.



 Comments   
Comment by Daniel Black [ 2021-07-30 ]

Do you have a datadir preserved that you can upload for the private use by the mariadb developers to understand/resolve this issue?

Comment by Nathan Jensen [ 2021-08-02 ]

Daniel,

I think I can provide what you need, but can you be more specific about what you mean by datadir? I'd hate to give you only half of what you need and have to rinse/repeat this process. Do you mean all the files in /var/lib/mysql? Any other log files? At what state do you want the data dir? I could snapshot it before or after the observed corruption. Thanks for the help!

-Nate

Comment by Daniel Black [ 2021-08-03 ]

A snapshot before would be most useful if possible. Everything else can be derived. Yes all of /var/lib/mysql. If you have log files of mariadb that would be good too (from file or journalctl -u mariadb.service from recent restarts). Thanks njensen. Is there an indication that this is a filesystem out of space error?

Comment by Nathan Jensen [ 2021-08-03 ]

Daniel,

I think I have added a useful debug data set for you. Since the problem is trivial to reproduce, I took data dir snapshots at all stages. All of these snaps are contained in the uploaded archive: MDEV-26276_debug_data.tar. When you break these sections out, here is how to align them with the steps I show in the above ticket description:

  1. beforeSchemaLoad.tar.gz – Totally clean/dropped DB. The name of the DB we will eventually work with is called sigmaDB
  2. whileRunning.tar.gz – This is my C/C++ code which loads the schema and populates the table; note the single entry in the path table
  3. afterKill.tar.gz – This is after I killed my C/C++ code. Note that the database seems sane at this point; queries return results that are expected
  4. afterDbDrop.tar.gz – This is after I dropped the database via the command line client (i.e. "drop database sigmaDB"). After this, the DB appears to be gone in the command line client as expected
  5. afterReloadCorrupted.tar.gz – This is after I reload the schema via command line client; note that there should be no data inserted in any table. However, when I query the path table I see a single result
  6. afterRestartGood.tar.gz – This is after I restart the mysql service. At this point, the record in the path table is gone (as it should have been all along) and my DB appears to be back in a sane state

Also note that I uploaded the output of the journalctl command in file: MDEV-26276jounalctl

If there is anything else I can provide, please don't hesitate to ask! I appreciate the help.

-Nate

Generated at Thu Feb 08 09:44:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.