Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26276

Significant data corruption after dropping/adding database

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.5.11
    • None
    • Server

    Description

      I'm observing significant data corruption after dropping and loading a schema. This is so significant that I thought others would have reported it as well. However, I have searched and found nothing. I can reproduce the error using only SQL commands, so it seems legit.

      1. Starting state: The database is up and running well. I have used the C/C++ API to add several entries to the database
      2. The process writing to the DB is stopped, (tried both killing and stopping cleanly)
      3. Here is one entry in the DB:

      MariaDB [sigmaDB]> select * from path;
      +-----+-------+------------------+--------------------+----------------------+-----------------+--------------+--------------+-------------------+--------------+----------+-----------+---------+------------+--------------+----------+
      | idx | name  | uuid             | primaryCollectUuid | componentId          | sigmaId         | reference_id | wptspacing_m | distanceMarginMet | hdgMarginDeg | approved | completed | aborted | restricted | callbacksSet | priority |
      +-----+-------+------------------+--------------------+----------------------+-----------------+--------------+--------------+-------------------+--------------+----------+-----------+---------+------------+--------------+----------+
      |   1 | 1:Lot | t��~��N������a▒�            | ���4B#�S�����?            | 14481131123844579333 | 220964525205148 |            0 |          100 |                10 |          360 |        1 |         0 |       0 |          0 |            1 |        0 |
      +-----+-------+------------------+--------------------+----------------------+-----------------+--------------+--------------+-------------------+--------------+----------+-----------+---------+------------+--------------+----------+
      1 row in set (0.000 sec)
       
      MariaDB [sigmaDB]> select count(*) from path;
      +----------+
      | count(*) |
      +----------+
      |        1 |
      +----------+
      1 row in set (0.001 sec)
      

      1. Then I drop the DB and it appears to be really gone

      MariaDB [sigmaDB]> drop database sigmaDB;
      Query OK, 37 rows affected (0.101 sec)
       
      MariaDB [(none)]> use sigmaDB;
      ERROR 1049 (42000): Unknown database 'sigmaDB'
      MariaDB [(none)]> select * from path;
      ERROR 1046 (3D000): No database selected
      MariaDB [(none)]>
      

      1. Then I reload the schema (without any inserted data)

      root@c01045:/sigma# grep -i insert sigmaDB.sql
      root@c01045:/sigma# cat sigmaDB.sql | mysql -u sigma -p 
      Enter password: 
      root@c01045:/sigma#
      

      1. Now I look in the db in the path table and magically I have data that existed from before the previous drop database command

      MariaDB [sigmaDB]> select * from path;
      +-----+-------+------------------+--------------------+----------------------+-----------------+--------------+--------------+-------------------+--------------+----------+-----------+---------+------------+--------------+----------+
      | idx | name  | uuid             | primaryCollectUuid | componentId          | sigmaId         | reference_id | wptspacing_m | distanceMarginMet | hdgMarginDeg | approved | completed | aborted | restricted | callbacksSet | priority |
      +-----+-------+------------------+--------------------+----------------------+-----------------+--------------+--------------+-------------------+--------------+----------+-----------+---------+------------+--------------+----------+
      |   1 | 1:Lot | t��~��N������a▒�            | ���4B#�S�����?            | 14481131123844579333 | 220964525205148 |            0 |          100 |                10 |          360 |        1 |         0 |       0 |          0 |            1 |        0 |
      +-----+-------+------------------+--------------------+----------------------+-----------------+--------------+--------------+-------------------+--------------+----------+-----------+---------+------------+--------------+----------+
      1 row in set (0.000 sec)
      

      1. Then I restart mariadb service

      root@c01045:/sigma# systemctl restart mysql
      root@c01045:/sigma#
      

      1. After the restart of the service, the "phantom row" is now gone from the database

      MariaDB [sigmaDB]> select * from path;
      Empty set (0.002 sec)
      MariaDB [sigmaDB]>
      

      This really makes me think that the drop/reload isn't happening as it should. It seems like some amount of data is being stored in memory and not destroyed through the drop then schema load. Once the service is brought down and restarted, this cached data is no longer there and the querries work appropriately. Note that rolling back to 10.3 removes this problem/behavior.

      Attachments

        Activity

          danblack Daniel Black added a comment -

          Do you have a datadir preserved that you can upload for the private use by the mariadb developers to understand/resolve this issue?

          danblack Daniel Black added a comment - Do you have a datadir preserved that you can upload for the private use by the mariadb developers to understand/resolve this issue?
          njensen Nathan Jensen added a comment -

          Daniel,

          I think I can provide what you need, but can you be more specific about what you mean by datadir? I'd hate to give you only half of what you need and have to rinse/repeat this process. Do you mean all the files in /var/lib/mysql? Any other log files? At what state do you want the data dir? I could snapshot it before or after the observed corruption. Thanks for the help!

          -Nate

          njensen Nathan Jensen added a comment - Daniel, I think I can provide what you need, but can you be more specific about what you mean by datadir? I'd hate to give you only half of what you need and have to rinse/repeat this process. Do you mean all the files in /var/lib/mysql? Any other log files? At what state do you want the data dir? I could snapshot it before or after the observed corruption. Thanks for the help! -Nate
          danblack Daniel Black added a comment -

          A snapshot before would be most useful if possible. Everything else can be derived. Yes all of /var/lib/mysql. If you have log files of mariadb that would be good too (from file or journalctl -u mariadb.service from recent restarts). Thanks njensen. Is there an indication that this is a filesystem out of space error?

          danblack Daniel Black added a comment - A snapshot before would be most useful if possible. Everything else can be derived. Yes all of /var/lib/mysql. If you have log files of mariadb that would be good too (from file or journalctl -u mariadb.service from recent restarts). Thanks njensen . Is there an indication that this is a filesystem out of space error?
          njensen Nathan Jensen added a comment -

          Daniel,

          I think I have added a useful debug data set for you. Since the problem is trivial to reproduce, I took data dir snapshots at all stages. All of these snaps are contained in the uploaded archive: MDEV-26276_debug_data.tar. When you break these sections out, here is how to align them with the steps I show in the above ticket description:

          1. beforeSchemaLoad.tar.gz – Totally clean/dropped DB. The name of the DB we will eventually work with is called sigmaDB
          2. whileRunning.tar.gz – This is my C/C++ code which loads the schema and populates the table; note the single entry in the path table
          3. afterKill.tar.gz – This is after I killed my C/C++ code. Note that the database seems sane at this point; queries return results that are expected
          4. afterDbDrop.tar.gz – This is after I dropped the database via the command line client (i.e. "drop database sigmaDB"). After this, the DB appears to be gone in the command line client as expected
          5. afterReloadCorrupted.tar.gz – This is after I reload the schema via command line client; note that there should be no data inserted in any table. However, when I query the path table I see a single result
          6. afterRestartGood.tar.gz – This is after I restart the mysql service. At this point, the record in the path table is gone (as it should have been all along) and my DB appears to be back in a sane state

          Also note that I uploaded the output of the journalctl command in file: MDEV-26276jounalctl

          If there is anything else I can provide, please don't hesitate to ask! I appreciate the help.

          -Nate

          njensen Nathan Jensen added a comment - Daniel, I think I have added a useful debug data set for you. Since the problem is trivial to reproduce, I took data dir snapshots at all stages. All of these snaps are contained in the uploaded archive: MDEV-26276 _debug_data.tar. When you break these sections out, here is how to align them with the steps I show in the above ticket description: beforeSchemaLoad.tar.gz – Totally clean/dropped DB. The name of the DB we will eventually work with is called sigmaDB whileRunning.tar.gz – This is my C/C++ code which loads the schema and populates the table; note the single entry in the path table afterKill.tar.gz – This is after I killed my C/C++ code. Note that the database seems sane at this point; queries return results that are expected afterDbDrop.tar.gz – This is after I dropped the database via the command line client (i.e. "drop database sigmaDB"). After this, the DB appears to be gone in the command line client as expected afterReloadCorrupted.tar.gz – This is after I reload the schema via command line client; note that there should be no data inserted in any table. However, when I query the path table I see a single result afterRestartGood.tar.gz – This is after I restart the mysql service. At this point, the record in the path table is gone (as it should have been all along) and my DB appears to be back in a sane state Also note that I uploaded the output of the journalctl command in file: MDEV-26276 jounalctl If there is anything else I can provide, please don't hesitate to ask! I appreciate the help. -Nate

          People

            Unassigned Unassigned
            njensen Nathan Jensen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.