Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9931

InnoDB reads first page of every .ibd file at startup

Details

    • 10.1.15, 10.1.18

    Description

      A user noticed that InnoDB in MariaDB 10.1 reads the first page of every .ibd file at startup, even if the server is not going through crash recovery. I understand that this would be normal behavior when the server is performing crash recovery, as explained here:

      https://dev.mysql.com/doc/refman/5.7/en/innodb-recovery-tablespace-discovery.html

      However, it seems unusual that this also happens during a normal startup.

      Example strace output:

      open("./db1/innodb_tab1.isl", O_RDWR)   = -1 ENOENT (No such file or directory)
      open("./db1/innodb_tab1.ibd", O_RDONLY) = 14
      pread(14, "l\202\2631\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\32\36\217\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
      close(14)                               = 0
      open("./db1/innodb_tab2.isl", O_RDWR)   = -1 ENOENT (No such file or directory)
      open("./db1/innodb_tab2.ibd", O_RDONLY) = 14
      pread(14, "\360\5\25\270\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0321\310\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
      close(14)                               = 0
      open("./mysql/gtid_slave_pos.isl", O_RDWR) = -1 ENOENT (No such file or directory)
      open("./mysql/gtid_slave_pos.ibd", O_RDONLY) = 14
      pread(14, "\327U\344\200\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\30\242\35\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
      close(14)                               = 0
      open("./mysql/innodb_index_stats.isl", O_RDWR) = -1 ENOENT (No such file or directory)
      open("./mysql/innodb_index_stats.ibd", O_RDONLY) = 14
      pread(14, "\3246$\253\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\30\221h\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
      close(14)                               = 0
      open("./mysql/innodb_table_stats.isl", O_RDWR) = -1 ENOENT (No such file or directory)
      open("./mysql/innodb_table_stats.ibd", O_RDONLY) = 14
      pread(14, "v\315D\320\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\30~\253\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
      close(14)
      

      The full strace output is attached as strace.out.

      The content of the error log confirms that crash recovery is not taking place:

      Transparent huge pages are enabled, according to /sys/kernel/mm/transparent_hugepage/enabled
      2016-04-15 16:49:49 140726057044096 [ERROR] TokuDB: Huge pages are enabled, disable them before continuing
       
      2016-04-15 16:49:49 140726057044096 [ERROR] ************************************************************
      2016-04-15 16:49:49 140726057044096 [ERROR]                                                             
      2016-04-15 16:49:49 140726057044096 [ERROR]                         @@@@@@@@@@@                         
      2016-04-15 16:49:49 140726057044096 [ERROR]                       @@'         '@@                       
      2016-04-15 16:49:49 140726057044096 [ERROR]                      @@    _     _  @@                      
      2016-04-15 16:49:49 140726057044096 [ERROR]                      |    (.)   (.)  |                      
      2016-04-15 16:49:49 140726057044096 [ERROR]                      |             ` |                      
      2016-04-15 16:49:49 140726057044096 [ERROR]                      |        >    ' |                      
      2016-04-15 16:49:49 140726057044096 [ERROR]                      |     .----.    |                      
      2016-04-15 16:49:49 140726057044096 [ERROR]                      ..   |.----.|  ..                      
      2016-04-15 16:49:49 140726057044096 [ERROR]                       ..  '      ' ..                       
      2016-04-15 16:49:49 140726057044096 [ERROR]                         .._______,.                         
      2016-04-15 16:49:49 140726057044096 [ERROR]                                                             
      2016-04-15 16:49:49 140726057044096 [ERROR] TokuDB will not run with transparent huge pages enabled.        
      2016-04-15 16:49:49 140726057044096 [ERROR] Please disable them to continue.                            
      2016-04-15 16:49:49 140726057044096 [ERROR] (echo never > /sys/kernel/mm/transparent_hugepage/enabled)  
      2016-04-15 16:49:49 140726057044096 [ERROR]                                                             
      2016-04-15 16:49:49 140726057044096 [ERROR] ************************************************************
      2016-04-15 16:49:49 140726057044096 [ERROR] Plugin 'TokuDB' init function returned error.
      2016-04-15 16:49:49 140726057044096 [ERROR] Plugin 'TokuDB' registration as a STORAGE ENGINE failed.
      2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Using mutexes to ref count buffer pool pages
      2016-04-15 16:49:49 140726057044096 [Note] InnoDB: The InnoDB memory heap is disabled
      2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Memory barrier is not used
      2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Compressed tables use zlib 1.2.7
      2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Using Linux native AIO
      2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Using SSE crc32 instructions
      2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Initializing buffer pool, size = 128.0M
      2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Completed initialization of buffer pool
      2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Highest supported file format is Barracuda.
      2016-04-15 16:49:49 140726057044096 [Warning] InnoDB: Resizing redo log from 2*8192 to 2*3072 pages, LSN=1718467
      2016-04-15 16:49:49 140726057044096 [Warning] InnoDB: Starting to delete and rewrite log files.
      2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Setting log file ./ib_logfile101 size to 48 MB
      2016-04-15 16:49:50 140726057044096 [Note] InnoDB: Setting log file ./ib_logfile1 size to 48 MB
      2016-04-15 16:49:51 140726057044096 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
      2016-04-15 16:49:51 140726057044096 [Warning] InnoDB: New log files created, LSN=1718796
      2016-04-15 16:49:51 140726057044096 [Note] InnoDB: 128 rollback segment(s) are active.
      2016-04-15 16:49:51 140726057044096 [Note] InnoDB: Waiting for purge to start
      2016-04-15 16:49:51 140726057044096 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.28-76.1 started; log sequence number 1718467
      2016-04-15 16:49:51 140726057044096 [Note] Plugin 'FEEDBACK' is disabled.
      2016-04-15 16:49:51 140725341619968 [Note] InnoDB: Dumping buffer pool(s) not yet started
      2016-04-15 16:49:51 140726057044096 [Note] Server socket created on IP: '::'.
      2016-04-15 16:49:51 140726057044096 [Note] /usr/sbin/mysqld: ready for connections.
      Version: '10.1.13-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
      2016-04-15 16:50:47 140726056315648 [Note] /usr/sbin/mysqld: Normal shutdown
       
      2016-04-15 16:50:47 140726056315648 [Note] Event Scheduler: Purging the queue. 0 events
      2016-04-15 16:50:47 140725324834560 [Note] InnoDB: FTS optimize thread exiting.
      2016-04-15 16:50:47 140726056315648 [Note] InnoDB: Starting shutdown...
      2016-04-15 16:50:49 140726056315648 [Note] InnoDB: Shutdown completed; log sequence number 1718806
      2016-04-15 16:50:49 140726056315648 [Note] /usr/sbin/mysqld: Shutdown complete
      

      Attachments

        Issue Links

          Activity

            I expect that it will turn out to be an intentional change (it's for jplindst to confirm), but does it cause any real problems?

            elenst Elena Stepanova added a comment - I expect that it will turn out to be an intentional change (it's for jplindst to confirm), but does it cause any real problems?

            It caused problems for one particular user who had over 40,000 InnoDB tables. Reading the first page of each .ibd file could take several minutes to complete at times, and this caused the user's startup time to exceed MariaDB's default systemd startup timeout.

            GeoffMontee Geoff Montee (Inactive) added a comment - It caused problems for one particular user who had over 40,000 InnoDB tables. Reading the first page of each .ibd file could take several minutes to complete at times, and this caused the user's startup time to exceed MariaDB's default systemd startup timeout.

            This is expected behavior because encryption information (if present) is on the first page of each .ibd file, thus when table is first accessed this information is read. Currently, this is done on system startup. Has customer all 40,000 tables on the same database ? I will research if reading the first page can be delayed to point when database is changed.

            jplindst Jan Lindström (Inactive) added a comment - This is expected behavior because encryption information (if present) is on the first page of each .ibd file, thus when table is first accessed this information is read. Currently, this is done on system startup. Has customer all 40,000 tables on the same database ? I will research if reading the first page can be delayed to point when database is changed.
            danblack Daniel Black added a comment -

            In the meantime see https://mariadb.com/kb/en/mariadb/systemd/ for increasing the TimeoutStartSec as you may have done already.

            danblack Daniel Black added a comment - In the meantime see https://mariadb.com/kb/en/mariadb/systemd/ for increasing the TimeoutStartSec as you may have done already.

            Has customer all 40,000 tables on the same database ? I will research if reading the first page can be delayed to point when database is changed

            I would be very careful about it, it will make things much worse for instances where, indeed, most tables are accumulated in a few databases.
            Now the lag happens on startup, and once it's become a known nuisance, basically it's up to the DBA to make sure that the startup is initiated in advance. If the lag happens when database is changed, it will affect users' active operation, which is usually a much bigger problem. A user triggered (directly or implicitly, from a front-end or from an application) USE ..., and suddenly everything freezes for minutes.

            Of course, there are also restarts due to crashes and other force majeur, when the DBA cannot make sure that the startup is initiated in advance, but then a recovery would happen, and as it was said before, the reading would happen anyway.

            Maybe a compromise worth considering (I don't know whether it's actually doable, and whether there are pitfalls in it) would be to start without the read, but immediately after that start reading first pages in the background. If the database change happens before first pages for all tables in the database have been read, then alas, the user will have to wait, but chances for that will be much less this way.

            Also, it can be made controllable via an option, somewhat like buffer pool pre-loading and such.

            elenst Elena Stepanova added a comment - Has customer all 40,000 tables on the same database ? I will research if reading the first page can be delayed to point when database is changed I would be very careful about it, it will make things much worse for instances where, indeed, most tables are accumulated in a few databases. Now the lag happens on startup, and once it's become a known nuisance, basically it's up to the DBA to make sure that the startup is initiated in advance. If the lag happens when database is changed, it will affect users' active operation, which is usually a much bigger problem. A user triggered (directly or implicitly, from a front-end or from an application) USE ... , and suddenly everything freezes for minutes. Of course, there are also restarts due to crashes and other force majeur, when the DBA cannot make sure that the startup is initiated in advance, but then a recovery would happen, and as it was said before, the reading would happen anyway. Maybe a compromise worth considering (I don't know whether it's actually doable, and whether there are pitfalls in it) would be to start without the read, but immediately after that start reading first pages in the background. If the database change happens before first pages for all tables in the database have been read, then alas, the user will have to wait, but chances for that will be much less this way. Also, it can be made controllable via an option, somewhat like buffer pool pre-loading and such.

            Hello,

            I am the customer in question.

            #1 - No this is not 1 DB, it is 400 DB with varying table counts.. most under 400 tables. This is semi typical in the shared hosting market.

            #2 - If encryption is causing this problem, perhaps an option to completely disable the encryption engine would resolve the issue? Instead of delaying reads to after start, what about a disabled-encrypted = 1 in the my.cnf to remove this?

            #3 - Obvious here, but your daemon searches for .isl files on start (innodb symlinks?) - Why wouldn't we have another extention to reference/mark a table as encrypted? That way a simple stat of the file determines if you need to read, and a my.cnf config would be not required. Probably a bit too late to be changing the game.

            I realize this situation wasn't tested in development, probably because the team assumes this is not common... but I assure you in the web hosting space it is very common.. luckily, those people usually stick to the OS prebuilt mysql/mariadb, which is still on 5.5 so they are not experiencing the issues.

            IMO #2 solves the problem and is safe.. I understand you can't default it to off at this point, but perhaps having it in

            joe@sprynex.com Joseph Marcelletti added a comment - Hello, I am the customer in question. #1 - No this is not 1 DB, it is 400 DB with varying table counts.. most under 400 tables. This is semi typical in the shared hosting market. #2 - If encryption is causing this problem, perhaps an option to completely disable the encryption engine would resolve the issue? Instead of delaying reads to after start, what about a disabled-encrypted = 1 in the my.cnf to remove this? #3 - Obvious here, but your daemon searches for .isl files on start (innodb symlinks?) - Why wouldn't we have another extention to reference/mark a table as encrypted? That way a simple stat of the file determines if you need to read, and a my.cnf config would be not required. Probably a bit too late to be changing the game. I realize this situation wasn't tested in development, probably because the team assumes this is not common... but I assure you in the web hosting space it is very common.. luckily, those people usually stick to the OS prebuilt mysql/mariadb, which is still on 5.5 so they are not experiencing the issues. IMO #2 solves the problem and is safe.. I understand you can't default it to off at this point, but perhaps having it in

            More complaints from real-life world: MDEV-10105, MDEV-10198.

            elenst Elena Stepanova added a comment - More complaints from real-life world: MDEV-10105 , MDEV-10198 .

            commit 2bedc3978b90bf5abe1029df393c63ced1849bed
            Author: Jan Lindström <jan.lindstrom@mariadb.com>
            Date: Thu Sep 22 16:32:26 2016 +0300

            MDEV-9931: InnoDB reads first page of every .ibd file at startup

            Analysis: By design InnoDB was reading first page of every .ibd file
            at startup to find out is tablespace encrypted or not. This is
            because tablespace could have been encrypted always, not
            encrypted newer or encrypted based on configuration and this
            information can be find realible only from first page of .ibd file.

            Fix: Do not read first page of every .ibd file at startup. Instead
            whenever tablespace is first time accedded we will read the first
            page to find necessary information about tablespace encryption
            status.

            TODO: Add support for SYS_TABLEOPTIONS where all table options
            encryption information included will be stored.

            jplindst Jan Lindström (Inactive) added a comment - commit 2bedc3978b90bf5abe1029df393c63ced1849bed Author: Jan Lindström <jan.lindstrom@mariadb.com> Date: Thu Sep 22 16:32:26 2016 +0300 MDEV-9931 : InnoDB reads first page of every .ibd file at startup Analysis: By design InnoDB was reading first page of every .ibd file at startup to find out is tablespace encrypted or not. This is because tablespace could have been encrypted always, not encrypted newer or encrypted based on configuration and this information can be find realible only from first page of .ibd file. Fix: Do not read first page of every .ibd file at startup. Instead whenever tablespace is first time accedded we will read the first page to find necessary information about tablespace encryption status. TODO: Add support for SYS_TABLEOPTIONS where all table options encryption information included will be stored.

            People

              jplindst Jan Lindström (Inactive)
              GeoffMontee Geoff Montee (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.