[MDEV-9931] InnoDB reads first page of every .ibd file at startup Created: 2016-04-15  Updated: 2020-12-08  Resolved: 2016-09-27

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.1.13
Fix Version/s: 10.1.18

Type: Bug Priority: Major
Reporter: Geoff Montee (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: innodb

Attachments: File strace.out    
Issue Links:
Blocks
blocks MDEV-10105 Mariadb initialize a shut down just a... Closed
blocks MDEV-10198 MariaDB restart is too slow Closed
Problem/Incident
causes MDEV-12349 Remove useless (encryption) fields fr... Closed
causes MDEV-15705 Remove global status counter Innodb_p... Closed
causes MDEV-20307 Server crashes with thread_stack=1310... Closed
Relates
relates to MDEV-12610 MariaDB start is slow Closed
Sprint: 10.1.15, 10.1.18

 Description   

A user noticed that InnoDB in MariaDB 10.1 reads the first page of every .ibd file at startup, even if the server is not going through crash recovery. I understand that this would be normal behavior when the server is performing crash recovery, as explained here:

https://dev.mysql.com/doc/refman/5.7/en/innodb-recovery-tablespace-discovery.html

However, it seems unusual that this also happens during a normal startup.

Example strace output:

open("./db1/innodb_tab1.isl", O_RDWR)   = -1 ENOENT (No such file or directory)
open("./db1/innodb_tab1.ibd", O_RDONLY) = 14
pread(14, "l\202\2631\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\32\36\217\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
close(14)                               = 0
open("./db1/innodb_tab2.isl", O_RDWR)   = -1 ENOENT (No such file or directory)
open("./db1/innodb_tab2.ibd", O_RDONLY) = 14
pread(14, "\360\5\25\270\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0321\310\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
close(14)                               = 0
open("./mysql/gtid_slave_pos.isl", O_RDWR) = -1 ENOENT (No such file or directory)
open("./mysql/gtid_slave_pos.ibd", O_RDONLY) = 14
pread(14, "\327U\344\200\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\30\242\35\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
close(14)                               = 0
open("./mysql/innodb_index_stats.isl", O_RDWR) = -1 ENOENT (No such file or directory)
open("./mysql/innodb_index_stats.ibd", O_RDONLY) = 14
pread(14, "\3246$\253\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\30\221h\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
close(14)                               = 0
open("./mysql/innodb_table_stats.isl", O_RDWR) = -1 ENOENT (No such file or directory)
open("./mysql/innodb_table_stats.ibd", O_RDONLY) = 14
pread(14, "v\315D\320\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\30~\253\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
close(14)

The full strace output is attached as strace.out.

The content of the error log confirms that crash recovery is not taking place:

Transparent huge pages are enabled, according to /sys/kernel/mm/transparent_hugepage/enabled
2016-04-15 16:49:49 140726057044096 [ERROR] TokuDB: Huge pages are enabled, disable them before continuing
 
2016-04-15 16:49:49 140726057044096 [ERROR] ************************************************************
2016-04-15 16:49:49 140726057044096 [ERROR]                                                             
2016-04-15 16:49:49 140726057044096 [ERROR]                         @@@@@@@@@@@                         
2016-04-15 16:49:49 140726057044096 [ERROR]                       @@'         '@@                       
2016-04-15 16:49:49 140726057044096 [ERROR]                      @@    _     _  @@                      
2016-04-15 16:49:49 140726057044096 [ERROR]                      |    (.)   (.)  |                      
2016-04-15 16:49:49 140726057044096 [ERROR]                      |             ` |                      
2016-04-15 16:49:49 140726057044096 [ERROR]                      |        >    ' |                      
2016-04-15 16:49:49 140726057044096 [ERROR]                      |     .----.    |                      
2016-04-15 16:49:49 140726057044096 [ERROR]                      ..   |.----.|  ..                      
2016-04-15 16:49:49 140726057044096 [ERROR]                       ..  '      ' ..                       
2016-04-15 16:49:49 140726057044096 [ERROR]                         .._______,.                         
2016-04-15 16:49:49 140726057044096 [ERROR]                                                             
2016-04-15 16:49:49 140726057044096 [ERROR] TokuDB will not run with transparent huge pages enabled.        
2016-04-15 16:49:49 140726057044096 [ERROR] Please disable them to continue.                            
2016-04-15 16:49:49 140726057044096 [ERROR] (echo never > /sys/kernel/mm/transparent_hugepage/enabled)  
2016-04-15 16:49:49 140726057044096 [ERROR]                                                             
2016-04-15 16:49:49 140726057044096 [ERROR] ************************************************************
2016-04-15 16:49:49 140726057044096 [ERROR] Plugin 'TokuDB' init function returned error.
2016-04-15 16:49:49 140726057044096 [ERROR] Plugin 'TokuDB' registration as a STORAGE ENGINE failed.
2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2016-04-15 16:49:49 140726057044096 [Note] InnoDB: The InnoDB memory heap is disabled
2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Memory barrier is not used
2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Compressed tables use zlib 1.2.7
2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Using Linux native AIO
2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Using SSE crc32 instructions
2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Initializing buffer pool, size = 128.0M
2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Completed initialization of buffer pool
2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Highest supported file format is Barracuda.
2016-04-15 16:49:49 140726057044096 [Warning] InnoDB: Resizing redo log from 2*8192 to 2*3072 pages, LSN=1718467
2016-04-15 16:49:49 140726057044096 [Warning] InnoDB: Starting to delete and rewrite log files.
2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Setting log file ./ib_logfile101 size to 48 MB
2016-04-15 16:49:50 140726057044096 [Note] InnoDB: Setting log file ./ib_logfile1 size to 48 MB
2016-04-15 16:49:51 140726057044096 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
2016-04-15 16:49:51 140726057044096 [Warning] InnoDB: New log files created, LSN=1718796
2016-04-15 16:49:51 140726057044096 [Note] InnoDB: 128 rollback segment(s) are active.
2016-04-15 16:49:51 140726057044096 [Note] InnoDB: Waiting for purge to start
2016-04-15 16:49:51 140726057044096 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.28-76.1 started; log sequence number 1718467
2016-04-15 16:49:51 140726057044096 [Note] Plugin 'FEEDBACK' is disabled.
2016-04-15 16:49:51 140725341619968 [Note] InnoDB: Dumping buffer pool(s) not yet started
2016-04-15 16:49:51 140726057044096 [Note] Server socket created on IP: '::'.
2016-04-15 16:49:51 140726057044096 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.1.13-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
2016-04-15 16:50:47 140726056315648 [Note] /usr/sbin/mysqld: Normal shutdown
 
2016-04-15 16:50:47 140726056315648 [Note] Event Scheduler: Purging the queue. 0 events
2016-04-15 16:50:47 140725324834560 [Note] InnoDB: FTS optimize thread exiting.
2016-04-15 16:50:47 140726056315648 [Note] InnoDB: Starting shutdown...
2016-04-15 16:50:49 140726056315648 [Note] InnoDB: Shutdown completed; log sequence number 1718806
2016-04-15 16:50:49 140726056315648 [Note] /usr/sbin/mysqld: Shutdown complete



 Comments   
Comment by Elena Stepanova [ 2016-04-15 ]

I expect that it will turn out to be an intentional change (it's for jplindst to confirm), but does it cause any real problems?

Comment by Geoff Montee (Inactive) [ 2016-04-15 ]

It caused problems for one particular user who had over 40,000 InnoDB tables. Reading the first page of each .ibd file could take several minutes to complete at times, and this caused the user's startup time to exceed MariaDB's default systemd startup timeout.

Comment by Jan Lindström (Inactive) [ 2016-04-21 ]

This is expected behavior because encryption information (if present) is on the first page of each .ibd file, thus when table is first accessed this information is read. Currently, this is done on system startup. Has customer all 40,000 tables on the same database ? I will research if reading the first page can be delayed to point when database is changed.

Comment by Daniel Black [ 2016-04-21 ]

In the meantime see https://mariadb.com/kb/en/mariadb/systemd/ for increasing the TimeoutStartSec as you may have done already.

Comment by Elena Stepanova [ 2016-04-21 ]

Has customer all 40,000 tables on the same database ? I will research if reading the first page can be delayed to point when database is changed

I would be very careful about it, it will make things much worse for instances where, indeed, most tables are accumulated in a few databases.
Now the lag happens on startup, and once it's become a known nuisance, basically it's up to the DBA to make sure that the startup is initiated in advance. If the lag happens when database is changed, it will affect users' active operation, which is usually a much bigger problem. A user triggered (directly or implicitly, from a front-end or from an application) USE ..., and suddenly everything freezes for minutes.

Of course, there are also restarts due to crashes and other force majeur, when the DBA cannot make sure that the startup is initiated in advance, but then a recovery would happen, and as it was said before, the reading would happen anyway.

Maybe a compromise worth considering (I don't know whether it's actually doable, and whether there are pitfalls in it) would be to start without the read, but immediately after that start reading first pages in the background. If the database change happens before first pages for all tables in the database have been read, then alas, the user will have to wait, but chances for that will be much less this way.

Also, it can be made controllable via an option, somewhat like buffer pool pre-loading and such.

Comment by Joseph Marcelletti [ 2016-04-21 ]

Hello,

I am the customer in question.

#1 - No this is not 1 DB, it is 400 DB with varying table counts.. most under 400 tables. This is semi typical in the shared hosting market.

#2 - If encryption is causing this problem, perhaps an option to completely disable the encryption engine would resolve the issue? Instead of delaying reads to after start, what about a disabled-encrypted = 1 in the my.cnf to remove this?

#3 - Obvious here, but your daemon searches for .isl files on start (innodb symlinks?) - Why wouldn't we have another extention to reference/mark a table as encrypted? That way a simple stat of the file determines if you need to read, and a my.cnf config would be not required. Probably a bit too late to be changing the game.

I realize this situation wasn't tested in development, probably because the team assumes this is not common... but I assure you in the web hosting space it is very common.. luckily, those people usually stick to the OS prebuilt mysql/mariadb, which is still on 5.5 so they are not experiencing the issues.

IMO #2 solves the problem and is safe.. I understand you can't default it to off at this point, but perhaps having it in

Comment by Elena Stepanova [ 2016-06-10 ]

More complaints from real-life world: MDEV-10105, MDEV-10198.

Comment by Jan Lindström (Inactive) [ 2016-09-22 ]

commit 2bedc3978b90bf5abe1029df393c63ced1849bed
Author: Jan Lindström <jan.lindstrom@mariadb.com>
Date: Thu Sep 22 16:32:26 2016 +0300

MDEV-9931: InnoDB reads first page of every .ibd file at startup

Analysis: By design InnoDB was reading first page of every .ibd file
at startup to find out is tablespace encrypted or not. This is
because tablespace could have been encrypted always, not
encrypted newer or encrypted based on configuration and this
information can be find realible only from first page of .ibd file.

Fix: Do not read first page of every .ibd file at startup. Instead
whenever tablespace is first time accedded we will read the first
page to find necessary information about tablespace encryption
status.

TODO: Add support for SYS_TABLEOPTIONS where all table options
encryption information included will be stored.

Generated at Thu Feb 08 07:38:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.