[MDEV-21230] Corrupt database with MariaDB 10.4.10 (see: MDEV-20987) Created: 2019-12-05  Updated: 2020-04-20  Resolved: 2020-04-20

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.4.10
Fix Version/s: 10.3.21, 10.4.11

Type: Bug Priority: Critical
Reporter: Markus Assignee: Thirunarayanan Balathandayuthapani
Resolution: Duplicate Votes: 0
Labels: bug, centos7, need_feedback
Environment:

CentOS 7.7.1908, MariaDB 10.4.10, Relution 4.56, (VirtualBox 6.0.14)


Attachments: File docker-compose.yml     File mariadb.log     File relution.cnf    
Issue Links:
Duplicate
is duplicated by MDEV-21088 Table cannot be loaded after instant ... Closed
Relates
relates to MDEV-21088 Table cannot be loaded after instant ... Closed
relates to MDEV-21148 mysqld: storage/innobase/btr/btr0cur.... Closed
relates to MDEV-20987 InnoDB fails to start when fts table ... Closed

 Description   

We are still experiencing the issue first reported with MDEV-20987, even with MariaDB 10.4.10. After MariaDB and/or the server is rebooted, we see errors like "Table 'relution.scheduler_task' doesn't exist in engine".

Steps for reproduction
Base installation

  • Create new virtual machine
    • "CentOS 7.x / MariaDB 10.4.x"
    • Linux
    • Red Hat (64-bit)
    • 4096 MB base memory
    • 20 GB SSD
  • Install CentOS
    • CentOS-7-x86_64-Minimal-1804.iso
    • Minimal Install
    • Automatic partitioning
    • Wired network
    • Create user during installation ("relution")
  • Reboot

Post-Installation steps

  • yum -y update
  • lsb_release -a
    • LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
    • Distributor ID: CentOS
    • Description: CentOS Linux release 7.7.1908 (Core)
    • Release: 7.7.1908
    • Codename: Core
  • Reboot

Install MariaDB

  • Create /etc/yum.repos.d/MariaDB.repo

    # MariaDB 10.4 CentOS repository list - created 2019-12-05 10:15 UTC
    # http://downloads.mariadb.org/mariadb/repositories/
    [mariadb]
    name = MariaDB
    baseurl = http://yum.mariadb.org/10.4/centos7-amd64
    gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
    gpgcheck=1
    

  • yum -y install mariadb-server mariadb
  • mariadb --version

    mariadb  Ver 15.1 Distrib 10.4.10-MariaDB, for Linux (x86_64) using readline 5.1
    

Configure MariaDB for application

  • Create /etc/my.cnf.d/relution.cnf

    [mysqld]
    collation-server=utf8mb4_general_ci
    character-set-server=utf8mb4
    max_allowed_packet=1G
    innodb_file_per_table=1
    

  • systemctl start mariadb.service
  • systemctl enable mariadb.service

Install application

Create initial database

  • Create /opt/create_database.sh =>

    #!/bin/bash
    if [[ $# -ne 2 ]]; then
        echo "Usage: $0 <database> <password>"
        exit 1
    fi
     
    DATABASE=$1
    PASSWORD=$2
     
    mysql -u root <<EOF
    CREATE DATABASE $DATABASE CHARACTER SET 'utf8mb4' COLLATE 'utf8mb4_general_ci';
    CREATE USER 'relution'@'localhost' IDENTIFIED BY '$PASSWORD';
    GRANT ALL PRIVILEGES ON $DATABASE.* TO 'relution'@'localhost' IDENTIFIED BY '$PASSWORD';
    CREATE USER 'relution'@'%' IDENTIFIED BY '$PASSWORD';
    GRANT ALL PRIVILEGES ON $DATABASE.* TO 'relution'@'%' IDENTIFIED BY '$PASSWORD';
    FLUSH PRIVILEGES;
    EOF
    

  • chmod +x create_database.sh
  • ./create_database.sh relution password

Minimal application configuration

  • Create /opt/relution/application.yml =>

    relution:
      server:
        externalURL: https://www.example.org
      database:
        type: mysql
        url: jdbc:mariadb://127.0.0.1:3306/relution?useServerPrepStmts=true
        username: relution
        password: password
    

Start application

  • sudo -u relution ./bin/relution-foreground.sh
  • Wait until database has been initialized
  • "INFO 10033 [ main] vice.StandardLockService: Successfully released change log lock"
  • Stop application (Ctrl+C)

You can now start and stop the application, the database is working as expected.

Restart MariaDB

  • systemctl stop mariadb.service
  • systemctl start mariadb.service

Attempt to start application again

  • sudo -u relution ./bin/relution-foreground.sh
    ==> The database is now broken

2019-12-05 11:37:36.146 ERROR 10335 [        main] s.boot.SpringApplication: Application run failed []
java.sql.SQLException: Table 'relution.scheduler_task' doesn't exist in engine
Query is: SELECT t0.`uuid`, t1.`scheduler_task_uuid`, t0.`active`, t0.`clustered`, t0.`data`, t0.`description`, t0.`last_success_date`, t0.`name`, t0.`run_as_user`, t0.`schedulable_type`, t0.`schedulable_uuid`, t0.`version`, t1.`expression`, t1.`period_number`, t1.`period_type`, t1.`start` FROM `scheduler_task` t0, `scheduler_task_period` t1 WHERE (t1.`scheduler_task_uuid` = t0.`uuid`)
<…>
Wrapped by: org.eclipse.persistence.exceptions.DatabaseException: 
Internal Exception: java.sql.SQLSyntaxErrorException: (conn=18) Table 'relution.scheduler_task' doesn't exist in engine
Error Code: 1932



 Comments   
Comment by Markus [ 2019-12-19 ]

Here's an easier reproduction, using docker:

MariaDB 10.4.10

  • docker-compose up
  • (wait until db initialized)
  • (can connect to localhost:8081)
  • docker-compose stop
  • docker-compose up
    ==> Internal Exception: java.sql.SQLSyntaxErrorException: (conn=19) Table 'relution.scheduler_task' doesn't exist in engine

MariaDB 10.4.11

  • docker-compose up
  • (wait until db initialized)
  • (can connect to localhost:8081)
  • docker-compose stop
  • docker-compose up
    ==> No more issues

We've just tested MariaDB 10.4.11 and the issue can no longer be observed.

Comment by Thirunarayanan Balathandayuthapani [ 2020-03-16 ]

m.pfeiffer can you please attach the server log file too ? The error messages in the log file could help
us to find the root cause of the issue. Thanks

Comment by Markus [ 2020-03-16 ]

Log file attached, this should be the relevant line:

2020-03-16 10:12:38 19 [ERROR] InnoDB: Table `relution`.`scheduler_task` contains unrecognizable instant ALTER metadata

Comment by Marko Mäkelä [ 2020-03-16 ]

m.pfeiffer, I see, the server denied access to that table due to corruption. If you used a 10.3 or 10.4 version before MDEV-19916 was fixed, and an ADD COLUMN (or in 10.4, DROP COLUMN or changing the order of columns) had been executed as an instantaneous operation, then the table could be corrupted as a result of a DML operation.

If the cause is something else, then we would need a complete test case that would repeat the corruption. We made extensive efforts to chase down the bug around September and October last year, before I finally realized that MDEV-19916 had fixed that problem.

I think that something else should have changed between 10.4.10 and 10.4.11. Was it with exactly the same database contents?

Comment by Marko Mäkelä [ 2020-03-16 ]

This could be related to MDEV-21088 and MDEV-21148.

Comment by Markus [ 2020-03-16 ]

@marko, I used the Docker setup from #comment-140997 just now to produce the server log file I attached in my previous comment. So this was a "fresh install" of MariaDB 10.4.10 with a brand new database. The issue can be easily reproduced with the steps in the initial description or by using Docker for both MariaDB and our product.

I know the initial description is pretty lengthy, but basically it boils down to:

  1. Install OS
  2. Install MariaDB
  3. Install our service
  4. Start our service for the first time
  5. Let it initialize its database (create tables, fill in some default content)
  6. Stop our service
  7. Restart database
    ==> Can no longer start our service because database is now corrupt

The Docker setup does pretty much the same thing, only a lot faster, since there is no need to install anything. The issue occurs with both MariaDB 10.4.10 and 10.4.9, but does not occur with 10.4.11. Once corrupt, updating the MariaDB version does not fix the issue. So this is a "fresh install" each time, by using "docker-compose down --volumes" between tests to remove the containers and associated volumes.

Comment by Markus [ 2020-03-16 ]

Alright, I've used our service's logs to get the SQL statements issued by Liquibase to set up the database. Using the statements for one of the affected tables, I'm now able to reproduce the issue manually.

  1. Start MariaDB (docker-compose up)
  2. Connect with a client
  3. Execute the SQL commands below
  4. Restart MariaDB (Ctrl+C, docker-compose stop, docker-compose up)
  5. The table is now inaccessible

Repeating the same steps with MariaDB 10.4.11 does no longer see the same issue.

docker-compose.yml

Show all

version: '2.1'
services:
  mariadb:
    image: mariadb:10.4.10
    container_name: testbox_mariadb
    environment:
      - MYSQL_DATABASE=relution
      - MYSQL_ROOT_PASSWORD=S3creTM4r14
      - MYSQL_USER=relution
      - MYSQL_PASSWORD=relution
    expose:
      - '3306'
    ports:
      - '3307:3306'
    volumes:
      - './mariadb.cnf:/etc/mysql/conf.d/mariadb.cnf'

mariadb.cnf

Show all

[mysqld]
collation-server=utf8mb4_general_ci
character-set-server=utf8mb4
max_allowed_packet=1G
innodb_file_per_table=1

SQL

CREATE DATABASE `corruption`;
 
CREATE TABLE `corruption`.`scheduler_task` (`uuid` CHAR(36) NOT NULL, `name` VARCHAR(255) NOT NULL, `description` TEXT NULL, `run_as_user` VARCHAR(255) NULL, `active` BIT(1) NULL, `clustered` BIT(1) NULL, `schedulable_type` VARCHAR(255) NOT NULL, `schedulable_uuid` VARCHAR(255) NOT NULL, `last_success_date` timestamp NULL, `data` BLOB NULL, `bundle` VARCHAR(255) NULL, `version` INT DEFAULT 1 NOT NULL, `application_uuid` CHAR(36) NULL, CONSTRAINT `PK_SCHEDULER_TASK` PRIMARY KEY (`uuid`)) ENGINE=InnoDB ROW_FORMAT=DYNAMIC;
ALTER TABLE `corruption`.`scheduler_task` MODIFY `last_success_date` timestamp NULL;
ALTER TABLE `corruption`.`scheduler_task` ROW_FORMAT = DYNAMIC;
ALTER TABLE `corruption`.`scheduler_task` DROP COLUMN `application_uuid`;
ALTER TABLE `corruption`.`scheduler_task` DROP COLUMN `bundle`;

Comment by Marko Mäkelä [ 2020-03-16 ]

m.pfeiffer, it looks like your report could be a duplicate of MDEV-21088. The PRIMARY KEY is CHAR(36). If the default character set is UTF-8 or some other variable-length character set, then ROW_FORMAT=DYNAMIC would internally store that as variable length. In the case of utf8mb3, the internally stored length would be 36 to 3*36 bytes.

Can you confirm the character set of the column uuid, for example by SHOW CREATE TABLE corruption.scheduler_task?

Free support advice:

  1. Try to use as short keys as possible, especially for InnoDB PRIMARY KEY. Using a text string type for storing hexadecimal digits is an overkill.
  2. Be aware that when instant DROP COLUMN is being used (MDEV-15562), some storage overhead will remain. Today I was working on MDEV-20590 to help control that.
Comment by Markus [ 2020-03-16 ]

It does indeed look like a duplicate of that issue. We use utf8mb4 for our database since we need to support emojis.

We rarely drop columns for backwards compatibility reasons, which would explain why <2% of our tables were affected. We use strings for legacy/compatibility reasons, not sure how feasible it is to change that unfortunately. I suppose it might be possible to change the character set of our key columns though. I'll open an internal ticket to investigate that.

Generated at Thu Feb 08 09:05:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.