Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34422

InnoDB writes corrupted log on macOS and AIX due to uninitialized log_sys.lsn_lock

Details

    Description

      I have encountered a recurring issue with MariaDB 11.5.1 where the InnoDB storage engine fails to initialize due to a "Missing FILE_CHECKPOINT" error. This issue arises consistently after importing data into a freshly created database and then attempting to restart the MariaDB server.

      *Steps to Reproduce:*

      1. Install MariaDB 11.5.1.
      2. Initialize a new, empty database.
      3. Import data into the database.
      4. Restart the MariaDB server.

      *Expected Behavior:*

      MariaDB should restart successfully without any errors.

      *Actual Behavior:*

      The server fails to restart, and the following error messages appear in the log files:

      240619 11:35:26 mysqld_safe Starting mariadbd daemon with databases from /Applications/ServBay/db/mariadb/11.5
      2024-06-19 11:35:26 0 [Note] Starting MariaDB 11.5.1-MariaDB-log source revision 65079ece42f383bdeb9b3f913a833a6f0123919d as process 32024
      2024-06-19 11:35:26 0 [Warning] Setting lower_case_table_names=2 because file system for /Applications/ServBay/db/mariadb/11.5/ is case insensitive
      2024-06-19 11:35:26 0 [Warning] --innodb-file-per-table is deprecated and will be removed in a future release
      2024-06-19 11:35:26 0 [Note] InnoDB: Compressed tables use zlib 1.3.1
      2024-06-19 11:35:26 0 [Note] InnoDB: Number of transaction pools: 1
      2024-06-19 11:35:26 0 [Note] InnoDB: Using generic crc32 instructions
      2024-06-19 11:35:26 0 [Note] InnoDB: Initializing buffer pool, total size = 4.000GiB, chunk size = 64.000MiB
      2024-06-19 11:35:26 0 [Note] InnoDB: Completed initialization of buffer pool
      2024-06-19 11:35:26 0 [Note] InnoDB: End of log at LSN=47733
      2024-06-19 11:35:26 0 [Note] InnoDB: Opened 3 undo tablespaces
      2024-06-19 11:35:26 0 [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
      2024-06-19 11:35:26 0 [Note] InnoDB: Setting file '/Applications/ServBay/db/mariadb/11.5/ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
      2024-06-19 11:35:26 0 [Note] InnoDB: File '/Applications/ServBay/db/mariadb/11.5/ibtmp1' size is now 12.000MiB.
      2024-06-19 11:35:26 0 [Note] InnoDB: log sequence number 47733; transaction id 14
      2024-06-19 11:35:26 0 [Note] InnoDB: Loading buffer pool(s) from /Applications/ServBay/db/mariadb/11.5/ib_buffer_pool
      2024-06-19 11:35:26 0 [Note] Plugin 'FEEDBACK' is disabled.
      2024-06-19 11:35:26 0 [Note] Plugin 'wsrep-provider' is disabled.
      2024-06-19 11:35:26 0 [Note] InnoDB: Buffer pool(s) load completed at 240619 11:35:26
      2024-06-19 11:35:27 0 [Note] Server socket created on IP: '::1'.
      2024-06-19 11:35:27 0 [Note] Server socket created on IP: '127.0.0.1'.
      2024-06-19 11:35:27 0 [Note] Server socket created on IP: '::'.
      2024-06-19 11:35:27 0 [Note] Server socket created on IP: '0.0.0.0'.
      2024-06-19 11:35:27 0 [Note] mariadbd: Event Scheduler: Loaded 0 events
      2024-06-19 11:35:27 0 [Note] /Applications/ServBay/package/mariadb/11.5/current/bin/mariadbd: ready for connections.
      Version: '11.5.1-MariaDB-log'  socket: '/Applications/ServBay/tmp/mariadb-11.5.sock'  port: 3306  ServBay.dev
      2024-06-19 11:37:01 0 [Note] /Applications/ServBay/package/mariadb/11.5/current/bin/mariadbd (initiated by: unknown): Normal shutdown
      2024-06-19 11:37:01 0 [Note] InnoDB: FTS optimize thread exiting.
      2024-06-19 11:37:01 0 [Note] InnoDB: Starting shutdown...
      2024-06-19 11:37:01 0 [Note] InnoDB: Dumping buffer pool(s) to /Applications/ServBay/db/mariadb/11.5/ib_buffer_pool
      2024-06-19 11:37:01 0 [Note] InnoDB: Buffer pool(s) dump completed at 240619 11:37:01
      2024-06-19 11:37:01 0 [Note] InnoDB: Removed temporary tablespace data file: "/Applications/ServBay/db/mariadb/11.5/ibtmp1"
      2024-06-19 11:37:01 0 [Note] InnoDB: Shutdown completed; log sequence number 10225990; transaction id 623
      2024-06-19 11:37:01 0 [Note] /Applications/ServBay/package/mariadb/11.5/current/bin/mariadbd: Shutdown complete
       
      240619 11:37:01 mysqld_safe mysqld from pid file /Applications/ServBay/tmp/mariadb-11.5.pid ended
      240619 11:37:02 mysqld_safe Starting mariadbd daemon with databases from /Applications/ServBay/db/mariadb/11.5
      2024-06-19 11:37:02 0 [Warning] Setting lower_case_table_names=2 because file system for /Applications/ServBay/db/mariadb/11.5/ is case insensitive
      2024-06-19 11:37:02 0 [Note] Starting MariaDB 11.5.1-MariaDB-log source revision 65079ece42f383bdeb9b3f913a833a6f0123919d as process 32351
      2024-06-19 11:37:02 0 [Warning] --innodb-file-per-table is deprecated and will be removed in a future release
      2024-06-19 11:37:02 0 [Note] InnoDB: Compressed tables use zlib 1.3.1
      2024-06-19 11:37:02 0 [Note] InnoDB: Number of transaction pools: 1
      2024-06-19 11:37:02 0 [Note] InnoDB: Using generic crc32 instructions
      2024-06-19 11:37:02 0 [Note] InnoDB: Initializing buffer pool, total size = 4.000GiB, chunk size = 64.000MiB
      2024-06-19 11:37:02 0 [Note] InnoDB: Completed initialization of buffer pool
      2024-06-19 11:37:02 0 [ERROR] InnoDB: Missing FILE_CHECKPOINT(10225974) at 10225974
      2024-06-19 11:37:02 0 [ERROR] InnoDB: Log scan aborted at LSN 10225974
      2024-06-19 11:37:02 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
      2024-06-19 11:37:02 0 [Note] InnoDB: Starting shutdown...
      2024-06-19 11:37:03 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
      2024-06-19 11:37:03 0 [Note] Plugin 'FEEDBACK' is disabled.
      2024-06-19 11:37:03 0 [Note] Plugin 'wsrep-provider' is disabled.
      2024-06-19 11:37:03 0 [ERROR] Unknown/unsupported storage engine: InnoDB
      2024-06-19 11:37:03 0 [ERROR] Aborting
      240619 11:37:03 mysqld_safe mysqld from pid file /Applications/ServBay/tmp/mariadb-11.5.pid ended
      

      This issue has been observed consistently across multiple users and environments.

      *Environment Details:*

      • MariaDB version: 11.5.1
      • Operating System: macOS Sonoma

      *Additional Information:*

      The issue does not occur when the database is empty. It only arises after data has been imported and the server is restarted. Sometimes it will also cause ib_logfile0 to be lost.

      11.5.0 works fine.

      *Reproducibility:*

      This issue is reproducible every time following the steps outlined above.

      Attachments

        Issue Links

          Activity

            I recovered my data using innodb_force_recovery as well (dumped and then restored to a fresh DB) and later found that all my foreign key contraints were gone.

            rec Richard Eckart de Castilho added a comment - I recovered my data using innodb_force_recovery as well (dumped and then restored to a fresh DB) and later found that all my foreign key contraints were gone.

            @ooloo @Richard Eckart de Castilho Yes, I think I mentioned this in the original issue - level 6 make it run, but the damage varies. First time I got away with just damaged indexes, but the next two times there was actual data loss. So relying on level 6 is a Russian roulette.

            joro_abv Georgi Georgiev added a comment - @ooloo @Richard Eckart de Castilho Yes, I think I mentioned this in the original issue - level 6 make it run, but the damage varies. First time I got away with just damaged indexes, but the next two times there was actual data loss. So relying on level 6 is a Russian roulette.

            As far as I can tell, this was caused by MDEV-33515, where I had removed the member function init_lsn_lock(). I should have remembered that on some platforms, calling pthread_mutex_init() is mandatory. Unfortunately, the AIX and macOS builders on our CI farm are experimental. At that point of time, the AIX builds consistently failed due to a configuration issue, and there were massive amounts of test failures on macOS (there still are many), so this regression was not caught by then.

            A possible work-around could be to disable the use of log_sys.lsn_lock and replace it with the pure spinlock, by setting the parameter innodb_log_spin_wait_delay to a nonzero value. (As noted in MDEV-33515, depending on the system, this will improve or degrade the performance; more CPU will be burned in user space but there could be much fewer context switches.)

            Yesterday, I experimented with unconditionally enabling SUX_LOCK_GENERIC to disable the futex-based mutex and rw-lock implementations on all platforms. I really should have done this during the development of MDEV-33515. I only tested that combination locally on my Debian GNU/Linux system.

            On Linux, in my experiment I also disabled the use of memory-mapped log on /dev/shm. This did not cause any additional failures on Linux, but on Microsoft Windows I got a nice crash when an uninitialized mutex was being acquired during a log write. Gosselin independently found that the issue is a missing initialization of the mutex log_sys.lsn_lock.

            marko Marko Mäkelä added a comment - As far as I can tell, this was caused by MDEV-33515 , where I had removed the member function init_lsn_lock() . I should have remembered that on some platforms, calling pthread_mutex_init() is mandatory. Unfortunately, the AIX and macOS builders on our CI farm are experimental. At that point of time, the AIX builds consistently failed due to a configuration issue, and there were massive amounts of test failures on macOS (there still are many), so this regression was not caught by then. A possible work-around could be to disable the use of log_sys.lsn_lock and replace it with the pure spinlock, by setting the parameter innodb_log_spin_wait_delay to a nonzero value. (As noted in MDEV-33515 , depending on the system, this will improve or degrade the performance; more CPU will be burned in user space but there could be much fewer context switches.) Yesterday, I experimented with unconditionally enabling SUX_LOCK_GENERIC to disable the futex-based mutex and rw-lock implementations on all platforms. I really should have done this during the development of MDEV-33515 . I only tested that combination locally on my Debian GNU/Linux system. On Linux, in my experiment I also disabled the use of memory-mapped log on /dev/shm . This did not cause any additional failures on Linux, but on Microsoft Windows I got a nice crash when an uninitialized mutex was being acquired during a log write. Gosselin independently found that the issue is a missing initialization of the mutex log_sys.lsn_lock .

            Thanks for the patch. I have provided my comments in https://github.com/MariaDB/server/pull/3433.

            debarun Debarun Banerjee added a comment - Thanks for the patch. I have provided my comments in https://github.com/MariaDB/server/pull/3433 .

            While analyzing this issue I found another problem with similar redo corruption symptom which can happen in general case on any platform with high concurrency. I have now filed MDEV-34689 to address the issue.

            debarun Debarun Banerjee added a comment - While analyzing this issue I found another problem with similar redo corruption symptom which can happen in general case on any platform with high concurrency. I have now filed MDEV-34689 to address the issue.

            People

              marko Marko Mäkelä
              naizhao Sam
              Votes:
              5 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.