[MDEV-9931] InnoDB reads first page of every .ibd file at startup - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.1.13
Fix Version/s: 10.1.18
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Labels:
- innodb

Sprint:
10.1.15, 10.1.18

Description

A user noticed that InnoDB in MariaDB 10.1 reads the first page of every .ibd file at startup, even if the server is not going through crash recovery. I understand that this would be normal behavior when the server is performing crash recovery, as explained here:

https://dev.mysql.com/doc/refman/5.7/en/innodb-recovery-tablespace-discovery.html

However, it seems unusual that this also happens during a normal startup.

Example strace output:

open("./db1/innodb_tab1.isl", O_RDWR)   = -1 ENOENT (No such file or directory)

open("./db1/innodb_tab1.ibd", O_RDONLY) = 14

pread(14, "l\202\2631\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\32\36\217\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384

close(14)                               = 0

open("./db1/innodb_tab2.isl", O_RDWR)   = -1 ENOENT (No such file or directory)

open("./db1/innodb_tab2.ibd", O_RDONLY) = 14

pread(14, "\360\5\25\270\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0321\310\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384

close(14)                               = 0

open("./mysql/gtid_slave_pos.isl", O_RDWR) = -1 ENOENT (No such file or directory)

open("./mysql/gtid_slave_pos.ibd", O_RDONLY) = 14

pread(14, "\327U\344\200\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\30\242\35\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384

close(14)                               = 0

open("./mysql/innodb_index_stats.isl", O_RDWR) = -1 ENOENT (No such file or directory)

open("./mysql/innodb_index_stats.ibd", O_RDONLY) = 14

pread(14, "\3246$\253\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\30\221h\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384

close(14)                               = 0

open("./mysql/innodb_table_stats.isl", O_RDWR) = -1 ENOENT (No such file or directory)

open("./mysql/innodb_table_stats.ibd", O_RDONLY) = 14

pread(14, "v\315D\320\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\30~\253\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384

close(14)

The full strace output is attached as strace.out.

The content of the error log confirms that crash recovery is not taking place:

Transparent huge pages are enabled, according to /sys/kernel/mm/transparent_hugepage/enabled

2016-04-15 16:49:49 140726057044096 [ERROR] TokuDB: Huge pages are enabled, disable them before continuing

2016-04-15 16:49:49 140726057044096 [ERROR] ************************************************************

2016-04-15 16:49:49 140726057044096 [ERROR]

2016-04-15 16:49:49 140726057044096 [ERROR]                         @@@@@@@@@@@

2016-04-15 16:49:49 140726057044096 [ERROR]                       @@'         '@@

2016-04-15 16:49:49 140726057044096 [ERROR]                      @@    _     _  @@

2016-04-15 16:49:49 140726057044096 [ERROR]                      |    (.)   (.)  |

2016-04-15 16:49:49 140726057044096 [ERROR]                      |             ` |

2016-04-15 16:49:49 140726057044096 [ERROR]                      |        >    ' |

2016-04-15 16:49:49 140726057044096 [ERROR]                      |     .----.    |

2016-04-15 16:49:49 140726057044096 [ERROR]                      ..   |.----.|  ..

2016-04-15 16:49:49 140726057044096 [ERROR]                       ..  '      ' ..

2016-04-15 16:49:49 140726057044096 [ERROR]                         .._______,.

2016-04-15 16:49:49 140726057044096 [ERROR]

2016-04-15 16:49:49 140726057044096 [ERROR] TokuDB will not run with transparent huge pages enabled.

2016-04-15 16:49:49 140726057044096 [ERROR] Please disable them to continue.

2016-04-15 16:49:49 140726057044096 [ERROR] (echo never > /sys/kernel/mm/transparent_hugepage/enabled)

2016-04-15 16:49:49 140726057044096 [ERROR]

2016-04-15 16:49:49 140726057044096 [ERROR] ************************************************************

2016-04-15 16:49:49 140726057044096 [ERROR] Plugin 'TokuDB' init function returned error.

2016-04-15 16:49:49 140726057044096 [ERROR] Plugin 'TokuDB' registration as a STORAGE ENGINE failed.

2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Using mutexes to ref count buffer pool pages

2016-04-15 16:49:49 140726057044096 [Note] InnoDB: The InnoDB memory heap is disabled

2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins

2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Memory barrier is not used

2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Compressed tables use zlib 1.2.7

2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Using Linux native AIO

2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Using SSE crc32 instructions

2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Initializing buffer pool, size = 128.0M

2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Completed initialization of buffer pool

2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Highest supported file format is Barracuda.

2016-04-15 16:49:49 140726057044096 [Warning] InnoDB: Resizing redo log from 2*8192 to 2*3072 pages, LSN=1718467

2016-04-15 16:49:49 140726057044096 [Warning] InnoDB: Starting to delete and rewrite log files.

2016-04-15 16:49:49 140726057044096 [Note] InnoDB: Setting log file ./ib_logfile101 size to 48 MB

2016-04-15 16:49:50 140726057044096 [Note] InnoDB: Setting log file ./ib_logfile1 size to 48 MB

2016-04-15 16:49:51 140726057044096 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0

2016-04-15 16:49:51 140726057044096 [Warning] InnoDB: New log files created, LSN=1718796

2016-04-15 16:49:51 140726057044096 [Note] InnoDB: 128 rollback segment(s) are active.

2016-04-15 16:49:51 140726057044096 [Note] InnoDB: Waiting for purge to start

2016-04-15 16:49:51 140726057044096 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.28-76.1 started; log sequence number 1718467

2016-04-15 16:49:51 140726057044096 [Note] Plugin 'FEEDBACK' is disabled.

2016-04-15 16:49:51 140725341619968 [Note] InnoDB: Dumping buffer pool(s) not yet started

2016-04-15 16:49:51 140726057044096 [Note] Server socket created on IP: '::'.

2016-04-15 16:49:51 140726057044096 [Note] /usr/sbin/mysqld: ready for connections.

Version: '10.1.13-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server

2016-04-15 16:50:47 140726056315648 [Note] /usr/sbin/mysqld: Normal shutdown

2016-04-15 16:50:47 140726056315648 [Note] Event Scheduler: Purging the queue. 0 events

2016-04-15 16:50:47 140725324834560 [Note] InnoDB: FTS optimize thread exiting.

2016-04-15 16:50:47 140726056315648 [Note] InnoDB: Starting shutdown...

2016-04-15 16:50:49 140726056315648 [Note] InnoDB: Shutdown completed; log sequence number 1718806

2016-04-15 16:50:49 140726056315648 [Note] /usr/sbin/mysqld: Shutdown complete

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

strace.out
1.06 MB
2016-04-15 21:02

Issue Links

blocks

MDEV-10105 Mariadb initialize a shut down just after starting

Closed

MDEV-10198 MariaDB restart is too slow

Closed

causes

MDEV-12349 Remove useless (encryption) fields from dict_table_t

Closed

MDEV-15705 Remove global status counter Innodb_pages0_read

Closed

MDEV-20307 Server crashes with thread_stack=131072, sometimes without anything in error log

Closed

relates to

MDEV-12610 MariaDB start is slow

Closed

(1 relates to)

Activity

Ascending order - Click to sort in descending order

Elena Stepanova added a comment - 2016-04-15 23:21

I expect that it will turn out to be an intentional change (it's for jplindst to confirm), but does it cause any real problems?

Elena Stepanova added a comment - 2016-04-15 23:21 I expect that it will turn out to be an intentional change (it's for jplindst to confirm), but does it cause any real problems?

Geoff Montee (Inactive) added a comment - 2016-04-15 23:27

It caused problems for one particular user who had over 40,000 InnoDB tables. Reading the first page of each .ibd file could take several minutes to complete at times, and this caused the user's startup time to exceed MariaDB's default systemd startup timeout.

Geoff Montee (Inactive) added a comment - 2016-04-15 23:27 It caused problems for one particular user who had over 40,000 InnoDB tables. Reading the first page of each .ibd file could take several minutes to complete at times, and this caused the user's startup time to exceed MariaDB's default systemd startup timeout.

Jan Lindström (Inactive) added a comment - 2016-04-21 05:08

This is expected behavior because encryption information (if present) is on the first page of each .ibd file, thus when table is first accessed this information is read. Currently, this is done on system startup. Has customer all 40,000 tables on the same database ? I will research if reading the first page can be delayed to point when database is changed.

Jan Lindström (Inactive) added a comment - 2016-04-21 05:08 This is expected behavior because encryption information (if present) is on the first page of each .ibd file, thus when table is first accessed this information is read. Currently, this is done on system startup. Has customer all 40,000 tables on the same database ? I will research if reading the first page can be delayed to point when database is changed.

Daniel Black added a comment - 2016-04-21 06:14

In the meantime see https://mariadb.com/kb/en/mariadb/systemd/ for increasing the TimeoutStartSec as you may have done already.

Daniel Black added a comment - 2016-04-21 06:14 In the meantime see https://mariadb.com/kb/en/mariadb/systemd/ for increasing the TimeoutStartSec as you may have done already.

Elena Stepanova added a comment - 2016-04-21 08:16

Has customer all 40,000 tables on the same database ? I will research if reading the first page can be delayed to point when database is changed

I would be very careful about it, it will make things much worse for instances where, indeed, most tables are accumulated in a few databases.
Now the lag happens on startup, and once it's become a known nuisance, basically it's up to the DBA to make sure that the startup is initiated in advance. If the lag happens when database is changed, it will affect users' active operation, which is usually a much bigger problem. A user triggered (directly or implicitly, from a front-end or from an application) USE ..., and suddenly everything freezes for minutes.

Of course, there are also restarts due to crashes and other force majeur, when the DBA cannot make sure that the startup is initiated in advance, but then a recovery would happen, and as it was said before, the reading would happen anyway.

Maybe a compromise worth considering (I don't know whether it's actually doable, and whether there are pitfalls in it) would be to start without the read, but immediately after that start reading first pages in the background. If the database change happens before first pages for all tables in the database have been read, then alas, the user will have to wait, but chances for that will be much less this way.

Also, it can be made controllable via an option, somewhat like buffer pool pre-loading and such.

Elena Stepanova added a comment - 2016-04-21 08:16 Has customer all 40,000 tables on the same database ? I will research if reading the first page can be delayed to point when database is changed I would be very careful about it, it will make things much worse for instances where, indeed, most tables are accumulated in a few databases. Now the lag happens on startup, and once it's become a known nuisance, basically it's up to the DBA to make sure that the startup is initiated in advance. If the lag happens when database is changed, it will affect users' active operation, which is usually a much bigger problem. A user triggered (directly or implicitly, from a front-end or from an application) USE ... , and suddenly everything freezes for minutes. Of course, there are also restarts due to crashes and other force majeur, when the DBA cannot make sure that the startup is initiated in advance, but then a recovery would happen, and as it was said before, the reading would happen anyway. Maybe a compromise worth considering (I don't know whether it's actually doable, and whether there are pitfalls in it) would be to start without the read, but immediately after that start reading first pages in the background. If the database change happens before first pages for all tables in the database have been read, then alas, the user will have to wait, but chances for that will be much less this way. Also, it can be made controllable via an option, somewhat like buffer pool pre-loading and such.

Joseph Marcelletti added a comment - 2016-04-21 19:53

Hello,

I am the customer in question.

#1 - No this is not 1 DB, it is 400 DB with varying table counts.. most under 400 tables. This is semi typical in the shared hosting market.

#2 - If encryption is causing this problem, perhaps an option to completely disable the encryption engine would resolve the issue? Instead of delaying reads to after start, what about a disabled-encrypted = 1 in the my.cnf to remove this?

#3 - Obvious here, but your daemon searches for .isl files on start (innodb symlinks?) - Why wouldn't we have another extention to reference/mark a table as encrypted? That way a simple stat of the file determines if you need to read, and a my.cnf config would be not required. Probably a bit too late to be changing the game.

I realize this situation wasn't tested in development, probably because the team assumes this is not common... but I assure you in the web hosting space it is very common.. luckily, those people usually stick to the OS prebuilt mysql/mariadb, which is still on 5.5 so they are not experiencing the issues.

IMO #2 solves the problem and is safe.. I understand you can't default it to off at this point, but perhaps having it in

Joseph Marcelletti added a comment - 2016-04-21 19:53 Hello, I am the customer in question. #1 - No this is not 1 DB, it is 400 DB with varying table counts.. most under 400 tables. This is semi typical in the shared hosting market. #2 - If encryption is causing this problem, perhaps an option to completely disable the encryption engine would resolve the issue? Instead of delaying reads to after start, what about a disabled-encrypted = 1 in the my.cnf to remove this? #3 - Obvious here, but your daemon searches for .isl files on start (innodb symlinks?) - Why wouldn't we have another extention to reference/mark a table as encrypted? That way a simple stat of the file determines if you need to read, and a my.cnf config would be not required. Probably a bit too late to be changing the game. I realize this situation wasn't tested in development, probably because the team assumes this is not common... but I assure you in the web hosting space it is very common.. luckily, those people usually stick to the OS prebuilt mysql/mariadb, which is still on 5.5 so they are not experiencing the issues. IMO #2 solves the problem and is safe.. I understand you can't default it to off at this point, but perhaps having it in

Elena Stepanova added a comment - 2016-06-10 13:52

More complaints from real-life world: ~~MDEV-10105~~, ~~MDEV-10198~~.

Elena Stepanova added a comment - 2016-06-10 13:52 More complaints from real-life world: MDEV-10105 , MDEV-10198 .

Jan Lindström (Inactive) added a comment - 2016-09-22 13:39

commit 2bedc3978b90bf5abe1029df393c63ced1849bed
Author: Jan Lindström <jan.lindstrom@mariadb.com>
Date: Thu Sep 22 16:32:26 2016 +0300

~~MDEV-9931~~: InnoDB reads first page of every .ibd file at startup

Analysis: By design InnoDB was reading first page of every .ibd file
at startup to find out is tablespace encrypted or not. This is
because tablespace could have been encrypted always, not
encrypted newer or encrypted based on configuration and this
information can be find realible only from first page of .ibd file.

Fix: Do not read first page of every .ibd file at startup. Instead
whenever tablespace is first time accedded we will read the first
page to find necessary information about tablespace encryption
status.

TODO: Add support for SYS_TABLEOPTIONS where all table options
encryption information included will be stored.

Jan Lindström (Inactive) added a comment - 2016-09-22 13:39 commit 2bedc3978b90bf5abe1029df393c63ced1849bed Author: Jan Lindström <jan.lindstrom@mariadb.com> Date: Thu Sep 22 16:32:26 2016 +0300 MDEV-9931 : InnoDB reads first page of every .ibd file at startup Analysis: By design InnoDB was reading first page of every .ibd file at startup to find out is tablespace encrypted or not. This is because tablespace could have been encrypted always, not encrypted newer or encrypted based on configuration and this information can be find realible only from first page of .ibd file. Fix: Do not read first page of every .ibd file at startup. Instead whenever tablespace is first time accedded we will read the first page to find necessary information about tablespace encryption status. TODO: Add support for SYS_TABLEOPTIONS where all table options encryption information included will be stored.

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Geoff Montee (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2016-04-15 21:03

Updated:: 2020-12-08 15:38

Resolved:: 2016-09-27 05:51

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.