[MDEV-13080] [ERROR] InnoDB: Missing MLOG_CHECKPOINT between the checkpoint x and the end y - Jira

Elena Stepanova added a comment - 2017-06-13 14:59

Can you paste the body of the trigger?
We have a confirmed report about a leak in a trigger which uses a subquery: ~~MDEV-12992~~ . Can yours be similar?

Elena Stepanova added a comment - 2017-06-13 14:59 Can you paste the body of the trigger? We have a confirmed report about a leak in a trigger which uses a subquery: MDEV-12992 . Can yours be similar?

Farid UYAR added a comment - 2017-06-13 16:40

Dont really understand russian so cant comment on the previous report. But yeah on my triggers I'm storing some variables like this :

SET @var = (SELECT IFNULL(total,0) from table_name where ...).

Then using those variables to insert them (simple aggregations). Could it be that those var are never freed from memory ?

Farid UYAR added a comment - 2017-06-13 16:40 Dont really understand russian so cant comment on the previous report. But yeah on my triggers I'm storing some variables like this : SET @var = (SELECT IFNULL(total,0) from table_name where ...). Then using those variables to insert them (simple aggregations). Could it be that those var are never freed from memory ?

Elena Stepanova added a comment - 2017-06-13 16:50

Sorry I forgot to mention that, you didn't need to read the whole report, there were a couple of comments at the end with the scenario and all.

That particular scenario involved a subquery in the trigger, although possibly there were other variations. Or, does your WHERE clause have a subquery?

Elena Stepanova added a comment - 2017-06-13 16:50 Sorry I forgot to mention that, you didn't need to read the whole report, there were a couple of comments at the end with the scenario and all. That particular scenario involved a subquery in the trigger, although possibly there were other variations. Or, does your WHERE clause have a subquery?

Farid UYAR added a comment - 2017-06-14 10:14

Okay I show your comment on the other issue. I dont really have SELECT subqueries like in your example.
Most of my triggers are like this :

IF NEW.var1 = X AND NEW.var2 = Y THEN

     INSERT INTO or DELETE FROM

END IF;

Seems pretty simple to me.

The most complex ones are like this :

IF NEW.var1 = X THEN

   SET @x = (SELECT count(*) FROM other_table WHERE column1 = NEW.xx AND column2 = yy)

   IF @x > 0 THEN

          INSERT INTO agg_table VALUES (NEW.id, @x)

   ELSE

          DELETE FROM agg_table

   END IF;

END IF;

So no subqueries, only @var usage to store count aggregate.

Farid UYAR added a comment - 2017-06-14 10:14 Okay I show your comment on the other issue. I dont really have SELECT subqueries like in your example. Most of my triggers are like this : IF NEW.var1 = X AND NEW.var2 = Y THEN INSERT INTO or DELETE FROM END IF; Seems pretty simple to me. The most complex ones are like this : IF NEW.var1 = X THEN SET @x = (SELECT count(*) FROM other_table WHERE column1 = NEW.xx AND column2 = yy) IF @x > 0 THEN INSERT INTO agg_table VALUES (NEW.id, @x) ELSE DELETE FROM agg_table END IF; END IF; So no subqueries, only @var usage to store count aggregate.

Elena Stepanova added a comment - 2017-06-25 22:30

Are other_table and agg_table here actual tables, or is any of them a view?
And in the first trigger, what does DELETE have in WHERE clause, and what kind of INSERT is it – INSERT .. VALUES or INSERT ... SELECT?

Elena Stepanova added a comment - 2017-06-25 22:30 Are other_table and agg_table here actual tables, or is any of them a view? And in the first trigger, what does DELETE have in WHERE clause, and what kind of INSERT is it – INSERT .. VALUES or INSERT ... SELECT ?

Elena Stepanova added a comment - 2017-07-31 13:13

Fuyar,

Did you have a chance to upgrade to 10.2.7 and if you did, did it resolve the problem?

Elena Stepanova added a comment - 2017-07-31 13:13 Fuyar , Did you have a chance to upgrade to 10.2.7 and if you did, did it resolve the problem?

Andrew Ramsay added a comment - 2017-07-31 15:06

I nearly logged a new fault, but the missing checkpoint message in the log says the problem is the same, even if the cause is not.
I believe 10.2.7 also has [the] problem, although possibly not to the same degree:

2017-07-29 11:58:59 4208 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions
2017-07-29 11:58:59 4208 [Note] InnoDB: Uses event mutexes
2017-07-29 11:58:59 4208 [Note] InnoDB: Compressed tables use zlib 1.2.3
2017-07-29 11:58:59 4208 [Note] InnoDB: Number of pools: 1
2017-07-29 11:58:59 4208 [Note] InnoDB: Using generic crc32 instructions
2017-07-29 11:58:59 4208 [Note] InnoDB: Initializing buffer pool, total size = 16G, instances = 8, chunk size = 128M
2017-07-29 11:59:00 4208 [Note] InnoDB: Completed initialization of buffer pool
2017-07-29 11:59:00 4208 [Note] InnoDB: Highest supported file format is Barracuda.
2017-07-29 11:59:00 4208 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1070287781678
2017-07-29 11:59:00 4208 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 1070287781678 between the checkpoint 1070287781678 and the end 1070287807736.
2017-07-29 11:59:00 4208 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
2017-07-29 11:59:00 4208 [Note] InnoDB: Starting shutdown...
2017-07-29 11:59:01 4208 [ERROR] Plugin 'InnoDB' init function returned error.
2017-07-29 11:59:01 4208 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2017-07-29 11:59:01 4208 [Note] Plugin 'FEEDBACK' is disabled.
2017-07-29 11:59:01 4208 [ERROR] Unknown/unsupported storage engine: InnoDB
2017-07-29 11:59:01 4208 [ERROR] Aborting

happened for the first time two days ago, although seems more stable than 10.2.6. This happened after I had taken the server up and down a few times in fairly quick succession due to rebooting win7x64 after multiple software upgrades (this could be a red herring though!). server refused to start until innodb_force_recovery=6.

Andrew Ramsay added a comment - 2017-07-31 15:06 I nearly logged a new fault, but the missing checkpoint message in the log says the problem is the same, even if the cause is not. I believe 10.2.7 also has [the] problem, although possibly not to the same degree: 2017-07-29 11:58:59 4208 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions 2017-07-29 11:58:59 4208 [Note] InnoDB: Uses event mutexes 2017-07-29 11:58:59 4208 [Note] InnoDB: Compressed tables use zlib 1.2.3 2017-07-29 11:58:59 4208 [Note] InnoDB: Number of pools: 1 2017-07-29 11:58:59 4208 [Note] InnoDB: Using generic crc32 instructions 2017-07-29 11:58:59 4208 [Note] InnoDB: Initializing buffer pool, total size = 16G, instances = 8, chunk size = 128M 2017-07-29 11:59:00 4208 [Note] InnoDB: Completed initialization of buffer pool 2017-07-29 11:59:00 4208 [Note] InnoDB: Highest supported file format is Barracuda. 2017-07-29 11:59:00 4208 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1070287781678 2017-07-29 11:59:00 4208 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 1070287781678 between the checkpoint 1070287781678 and the end 1070287807736. 2017-07-29 11:59:00 4208 [ERROR] InnoDB: Plugin initialization aborted with error Generic error 2017-07-29 11:59:00 4208 [Note] InnoDB: Starting shutdown... 2017-07-29 11:59:01 4208 [ERROR] Plugin 'InnoDB' init function returned error. 2017-07-29 11:59:01 4208 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 2017-07-29 11:59:01 4208 [Note] Plugin 'FEEDBACK' is disabled. 2017-07-29 11:59:01 4208 [ERROR] Unknown/unsupported storage engine: InnoDB 2017-07-29 11:59:01 4208 [ERROR] Aborting happened for the first time two days ago, although seems more stable than 10.2.6. This happened after I had taken the server up and down a few times in fairly quick succession due to rebooting win7x64 after multiple software upgrades (this could be a red herring though!). server refused to start until innodb_force_recovery=6.

Elena Stepanova added a comment - 2017-07-31 16:45

It appears that there are three problems (either related to each other or not) reported in this issue:

there's a big memory leak

the server regularly hangs

innodb crash recovery could not start because MLOG_CHECKPOINT was not found

amramsay, as I understand, you have had the third kind. Have you experienced the other two?

Elena Stepanova added a comment - 2017-07-31 16:45 It appears that there are three problems (either related to each other or not) reported in this issue: there's a big memory leak the server regularly hangs innodb crash recovery could not start because MLOG_CHECKPOINT was not found amramsay , as I understand, you have had the third kind. Have you experienced the other two?

Andrew Ramsay added a comment - 2017-07-31 17:36

1. To my knowledge I have not experienced the memory leak, I have been watching memory usage quite closely, during my current processing - it has remained static at 18.5GB.
2.I am not aware of the server hanging, although on occasion it has run a bit slow, but I put that down to the amount of data being processed/imported - I have been moving fairly substantial amounts of data (for me anyway) over the last few weeks (~50GB binlogs per 24hour period) - repeated db imports, all has run smoothly.
3. as you said - yes, the checkpoint issue, once.

Andrew Ramsay added a comment - 2017-07-31 17:36 1. To my knowledge I have not experienced the memory leak, I have been watching memory usage quite closely, during my current processing - it has remained static at 18.5GB. 2.I am not aware of the server hanging, although on occasion it has run a bit slow, but I put that down to the amount of data being processed/imported - I have been moving fairly substantial amounts of data (for me anyway) over the last few weeks (~50GB binlogs per 24hour period) - repeated db imports, all has run smoothly. 3. as you said - yes, the checkpoint issue, once.

Elena Stepanova added a comment - 2017-07-31 20:00

amramsay, thanks.

In this case, let's treat this issue as being about the missing MLOG_CHECKPOINT (as the subject suggests, anyway), and consider other parts of it possibly irrelevant.

jplindst, any thoughts about the checkpoint problem?

Elena Stepanova added a comment - 2017-07-31 20:00 amramsay , thanks. In this case, let's treat this issue as being about the missing MLOG_CHECKPOINT (as the subject suggests, anyway), and consider other parts of it possibly irrelevant. jplindst , any thoughts about the checkpoint problem?

Andrew Ramsay added a comment - 2017-08-24 09:13

For information - I note that MLOG_CHECKPOINT issue is still present in 10.2.8. I installed 10.2.8 on 18-Aug, crash occurred on 20-Aug, minimal db activity between upgrade and crash recovery:

– restart after crash, but no encryption key present –
2017-08-20 20:43:38 6472 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions
2017-08-20 20:43:38 6472 [Note] InnoDB: Uses event mutexes
2017-08-20 20:43:38 6472 [Note] InnoDB: Compressed tables use zlib 1.2.3
2017-08-20 20:43:38 6472 [Note] InnoDB: Number of pools: 1
2017-08-20 20:43:38 6472 [Note] InnoDB: Using generic crc32 instructions
2017-08-20 20:43:38 6472 [Note] InnoDB: Initializing buffer pool, total size = 16G, instances = 8, chunk size = 128M
2017-08-20 20:43:39 6472 [Note] InnoDB: Completed initialization of buffer pool
2017-08-20 20:43:39 6472 [Note] InnoDB: Highest supported file format is Barracuda.
2017-08-20 20:43:39 6472 [Note] InnoDB: Starting crash recovery from checkpoint LSN=656734850618
2017-08-20 20:43:39 6472 [ERROR] InnoDB: Incorrect log record type:90
2017-08-20 20:43:39 6472 [Warning] InnoDB: Log scan aborted at LSN 656734850560
2017-08-20 20:43:39 6472 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
2017-08-20 20:43:39 6472 [Note] InnoDB: Starting shutdown...
2017-08-20 20:43:40 6472 [ERROR] Plugin 'InnoDB' init function returned error.
2017-08-20 20:43:40 6472 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2017-08-20 20:43:40 6472 [Note] Plugin 'FEEDBACK' is disabled.
2017-08-20 20:43:40 6472 [ERROR] Unknown/unsupported storage engine: InnoDB
2017-08-20 20:43:40 6472 [ERROR] Aborting

– restart, now with encryption key present –
2017-08-20 21:15:29 4800 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions
2017-08-20 21:15:29 4800 [Note] InnoDB: Uses event mutexes
2017-08-20 21:15:29 4800 [Note] InnoDB: Compressed tables use zlib 1.2.3
2017-08-20 21:15:29 4800 [Note] InnoDB: Number of pools: 1
2017-08-20 21:15:29 4800 [Note] InnoDB: Using generic crc32 instructions
2017-08-20 21:15:29 4800 [Note] InnoDB: Initializing buffer pool, total size = 16G, instances = 8, chunk size = 128M
2017-08-20 21:15:30 4800 [Note] InnoDB: Completed initialization of buffer pool
2017-08-20 21:15:30 4800 [Note] InnoDB: Highest supported file format is Barracuda.
2017-08-20 21:15:30 4800 [Note] InnoDB: Starting crash recovery from checkpoint LSN=656734850618
2017-08-20 21:15:30 4800 [ERROR] InnoDB: Incorrect log record type:90
2017-08-20 21:15:30 4800 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 656734850618 between the checkpoint 656734850618 and the end 656734861665.
2017-08-20 21:15:30 4800 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
2017-08-20 21:15:30 4800 [Note] InnoDB: Starting shutdown...
2017-08-20 21:15:31 4800 [ERROR] Plugin 'InnoDB' init function returned error.
2017-08-20 21:15:31 4800 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2017-08-20 21:15:31 4800 [Note] Plugin 'FEEDBACK' is disabled.
2017-08-20 21:15:31 4800 [ERROR] Unknown/unsupported storage engine: InnoDB
2017-08-20 21:15:31 4800 [ERROR] Aborting

...I had to go to recovery level 6 to get database online for dumping.
I noted that the log message was slightly different on the first pass, presumably because data could not be decrypted for interpretation, due to lack of key (my fault)?

Andrew Ramsay added a comment - 2017-08-24 09:13 For information - I note that MLOG_CHECKPOINT issue is still present in 10.2.8. I installed 10.2.8 on 18-Aug, crash occurred on 20-Aug, minimal db activity between upgrade and crash recovery: – restart after crash, but no encryption key present – 2017-08-20 20:43:38 6472 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions 2017-08-20 20:43:38 6472 [Note] InnoDB: Uses event mutexes 2017-08-20 20:43:38 6472 [Note] InnoDB: Compressed tables use zlib 1.2.3 2017-08-20 20:43:38 6472 [Note] InnoDB: Number of pools: 1 2017-08-20 20:43:38 6472 [Note] InnoDB: Using generic crc32 instructions 2017-08-20 20:43:38 6472 [Note] InnoDB: Initializing buffer pool, total size = 16G, instances = 8, chunk size = 128M 2017-08-20 20:43:39 6472 [Note] InnoDB: Completed initialization of buffer pool 2017-08-20 20:43:39 6472 [Note] InnoDB: Highest supported file format is Barracuda. 2017-08-20 20:43:39 6472 [Note] InnoDB: Starting crash recovery from checkpoint LSN=656734850618 2017-08-20 20:43:39 6472 [ERROR] InnoDB: Incorrect log record type:90 2017-08-20 20:43:39 6472 [Warning] InnoDB: Log scan aborted at LSN 656734850560 2017-08-20 20:43:39 6472 [ERROR] InnoDB: Plugin initialization aborted with error Generic error 2017-08-20 20:43:39 6472 [Note] InnoDB: Starting shutdown... 2017-08-20 20:43:40 6472 [ERROR] Plugin 'InnoDB' init function returned error. 2017-08-20 20:43:40 6472 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 2017-08-20 20:43:40 6472 [Note] Plugin 'FEEDBACK' is disabled. 2017-08-20 20:43:40 6472 [ERROR] Unknown/unsupported storage engine: InnoDB 2017-08-20 20:43:40 6472 [ERROR] Aborting – restart, now with encryption key present – 2017-08-20 21:15:29 4800 [Note] InnoDB: Mutexes and rw_locks use Windows interlocked functions 2017-08-20 21:15:29 4800 [Note] InnoDB: Uses event mutexes 2017-08-20 21:15:29 4800 [Note] InnoDB: Compressed tables use zlib 1.2.3 2017-08-20 21:15:29 4800 [Note] InnoDB: Number of pools: 1 2017-08-20 21:15:29 4800 [Note] InnoDB: Using generic crc32 instructions 2017-08-20 21:15:29 4800 [Note] InnoDB: Initializing buffer pool, total size = 16G, instances = 8, chunk size = 128M 2017-08-20 21:15:30 4800 [Note] InnoDB: Completed initialization of buffer pool 2017-08-20 21:15:30 4800 [Note] InnoDB: Highest supported file format is Barracuda. 2017-08-20 21:15:30 4800 [Note] InnoDB: Starting crash recovery from checkpoint LSN=656734850618 2017-08-20 21:15:30 4800 [ERROR] InnoDB: Incorrect log record type:90 2017-08-20 21:15:30 4800 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 656734850618 between the checkpoint 656734850618 and the end 656734861665. 2017-08-20 21:15:30 4800 [ERROR] InnoDB: Plugin initialization aborted with error Generic error 2017-08-20 21:15:30 4800 [Note] InnoDB: Starting shutdown... 2017-08-20 21:15:31 4800 [ERROR] Plugin 'InnoDB' init function returned error. 2017-08-20 21:15:31 4800 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 2017-08-20 21:15:31 4800 [Note] Plugin 'FEEDBACK' is disabled. 2017-08-20 21:15:31 4800 [ERROR] Unknown/unsupported storage engine: InnoDB 2017-08-20 21:15:31 4800 [ERROR] Aborting ...I had to go to recovery level 6 to get database online for dumping. I noted that the log message was slightly different on the first pass, presumably because data could not be decrypted for interpretation, due to lack of key (my fault)?

Elena Stepanova added a comment - 2018-10-01 18:52

http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/14336

innodb.undo_truncate_recover '16k,2,innodb' w1 [ fail ]  Found warnings/errors in server log file!

        Test ended at 2018-09-21 04:45:22

line

2018-09-21  4:45:20 0 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 24653138 between the checkpoint 23916606 and the end 24653138.

2018-09-21  4:45:20 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2012] with error Generic error

2018-09-21  4:45:21 0 [ERROR] Plugin 'InnoDB' init function returned error.

2018-09-21  4:45:21 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

^ Found warnings in /mnt/buildbot/build/mariadb-10.3.10/mysql-test/var/1/log/mysqld.1.err

ok

Elena Stepanova added a comment - 2018-10-01 18:52 http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/14336 innodb.undo_truncate_recover '16k,2,innodb' w1 [ fail ] Found warnings/errors in server log file! Test ended at 2018-09-21 04:45:22 line 2018-09-21 4:45:20 0 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 24653138 between the checkpoint 23916606 and the end 24653138. 2018-09-21 4:45:20 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2012] with error Generic error 2018-09-21 4:45:21 0 [ERROR] Plugin 'InnoDB' init function returned error. 2018-09-21 4:45:21 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. ^ Found warnings in /mnt/buildbot/build/mariadb-10.3.10/mysql-test/var/1/log/mysqld.1.err ok

Daniel Black added a comment - 2018-10-16 22:56 - edited

 docker logs a80641d42796

MySQL init password for existing database...

MySQL init process in progress...

2018-10-15 13:12:54 70366838301360 [Note] mysqld (mysqld 10.2.17-MariaDB-1:10.2.17+maria~bionic) starting as process 43 ...

2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins

2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Uses event mutexes

2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Compressed tables use zlib 1.2.11

2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Using Linux native AIO

2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Number of pools: 1

2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Using POWER8 crc32 instructions

2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M

2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Completed initialization of buffer pool

2018-10-15 13:12:54 70366127255728 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().

2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Highest supported file format is Barracuda.

2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Starting crash recovery from checkpoint LSN=4101625

2018-10-15 13:12:54 70366838301360 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 4101625 between the checkpoint 4101305 and the end 4101632.

2018-10-15 13:12:54 70366838301360 [ERROR] InnoDB: Plugin initialization aborted with error Generic error

2018-10-15 13:12:55 70366838301360 [Note] InnoDB: Starting shutdown...

2018-10-15 13:12:55 70366838301360 [ERROR] Plugin 'InnoDB' init function returned error.

2018-10-15 13:12:55 70366838301360 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

2018-10-15 13:12:55 70366838301360 [Note] Plugin 'FEEDBACK' is disabled.

2018-10-15 13:12:55 70366838301360 [ERROR] Unknown/unsupported storage engine: InnoDB

2018-10-15 13:12:55 70366838301360 [ERROR] Aborting

update:

uploaded files to ftp.mariadb.org:

~~MDEV-13080~~-docker-export-ppc64le.tar.xz - the result of the docker export of the container. Will be exceptionally close to the docker mariadb official mariadb:10.2.18-bionic - basic configuration.
~~MDEV-13080~~-datadir.tar.xz - datadir (unrecovered)

Daniel Black added a comment - 2018-10-16 22:56 - edited docker logs a80641d42796 MySQL init password for existing database... MySQL init process in progress... 2018-10-15 13:12:54 70366838301360 [Note] mysqld (mysqld 10.2.17-MariaDB-1:10.2.17+maria~bionic) starting as process 43 ... 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Uses event mutexes 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Compressed tables use zlib 1.2.11 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Using Linux native AIO 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Number of pools: 1 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Using POWER8 crc32 instructions 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Completed initialization of buffer pool 2018-10-15 13:12:54 70366127255728 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Highest supported file format is Barracuda. 2018-10-15 13:12:54 70366838301360 [Note] InnoDB: Starting crash recovery from checkpoint LSN=4101625 2018-10-15 13:12:54 70366838301360 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 4101625 between the checkpoint 4101305 and the end 4101632. 2018-10-15 13:12:54 70366838301360 [ERROR] InnoDB: Plugin initialization aborted with error Generic error 2018-10-15 13:12:55 70366838301360 [Note] InnoDB: Starting shutdown... 2018-10-15 13:12:55 70366838301360 [ERROR] Plugin 'InnoDB' init function returned error. 2018-10-15 13:12:55 70366838301360 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 2018-10-15 13:12:55 70366838301360 [Note] Plugin 'FEEDBACK' is disabled. 2018-10-15 13:12:55 70366838301360 [ERROR] Unknown/unsupported storage engine: InnoDB 2018-10-15 13:12:55 70366838301360 [ERROR] Aborting update: uploaded files to ftp.mariadb.org: MDEV-13080 -docker-export-ppc64le.tar.xz - the result of the docker export of the container. Will be exceptionally close to the docker mariadb official mariadb:10.2.18-bionic - basic configuration. MDEV-13080 -datadir.tar.xz - datadir (unrecovered)

Marko Mäkelä added a comment - 2018-11-20 13:51

~~MDEV-13830~~ and ~~MDEV-15282~~ could share the root cause with this.

Marko Mäkelä added a comment - 2018-11-20 13:51 MDEV-13830 and MDEV-15282 could share the root cause with this.

Thirunarayanan Balathandayuthapani added a comment - 2018-11-21 11:38

MLOG_CHECKPOINT is present in the redo log partially. MLOG_CHECKPOINT record is 9 bytes. 1 byte for MLOG_CHECKPOINT marker and 8 bytes for CHECKPOINT LSN.

MLOG_CHECKPOINT record is present partially (only 3 bytes) at the end of log block and next block is empty.

Scenario could be that checkpoint info is written before MLOG_CHECKPOINT record is written in the next block and server could be killed
before writing the block.

Yet to write mtr test case for it and read the code too

Thirunarayanan Balathandayuthapani added a comment - 2018-11-21 11:38 MLOG_CHECKPOINT is present in the redo log partially. MLOG_CHECKPOINT record is 9 bytes. 1 byte for MLOG_CHECKPOINT marker and 8 bytes for CHECKPOINT LSN. MLOG_CHECKPOINT record is present partially (only 3 bytes) at the end of log block and next block is empty. Scenario could be that checkpoint info is written before MLOG_CHECKPOINT record is written in the next block and server could be killed before writing the block. Yet to write mtr test case for it and read the code too

Marko Mäkelä added a comment - 2018-11-27 13:27

I tried to review the code, and I did not find anything obviously wrong, except potential race conditions in log_write_up_to(), for example here:

	if (flush_to_disk

	    && (log_sys->n_pending_flushes > 0

		|| !os_event_is_set(log_sys->flush_event))) {

		/* Figure out if the current flush will do the job

		for us. */

		bool work_done = log_sys->current_flush_lsn >= lsn;

		log_write_mutex_exit();

		os_event_wait(log_sys->flush_event);

		if (work_done) {

			return;

How can we be sure that the desired work was actually completed here? Could there have been multiple pending writes, and could the event have been signaled for an earlier write, for something older than log_sys->current_flush_lsn? Perhaps we should check log_sys->flushed_to_disk_lsn before returning.

I tried to reproduce the failure, but in the end I only reproduced (and subsequently developed a fix for) ~~MDEV-17849~~.

Marko Mäkelä added a comment - 2018-11-27 13:27 I tried to review the code, and I did not find anything obviously wrong, except potential race conditions in log_write_up_to() , for example here: if (flush_to_disk && (log_sys->n_pending_flushes > 0 || !os_event_is_set(log_sys->flush_event))) { /* Figure out if the current flush will do the job for us. */ bool work_done = log_sys->current_flush_lsn >= lsn; log_write_mutex_exit(); os_event_wait(log_sys->flush_event); if (work_done) { return ; How can we be sure that the desired work was actually completed here? Could there have been multiple pending writes, and could the event have been signaled for an earlier write, for something older than log_sys->current_flush_lsn ? Perhaps we should check log_sys->flushed_to_disk_lsn before returning. I tried to reproduce the failure, but in the end I only reproduced (and subsequently developed a fix for) MDEV-17849 .

Eugene Kosov (Inactive) added a comment - 2018-12-21 17:10

To reproduce this bug I use mtr -mem -force -max-test-fail=9999 -suite=innodb -par=5 innodb.undo_truncate_recover{,,,} -repeat=100 --. Sometimes also https://jira.mariadb.org/browse/MDEV-18038 happens.

Both failures are rare.

log_buffer_extend() is called in this test. When I disable it with --innodb-log-buffer-size=100000000 I see no errors at all. But I'm unsure because it's not easy to reproduce. Anyway this is my current guess that it's data races during redo log extending.

Eugene Kosov (Inactive) added a comment - 2018-12-21 17:10 To reproduce this bug I use mtr -mem -force -max-test-fail=9999 -suite=innodb -par=5 innodb.undo_truncate_recover{,,,} -repeat=100 -- . Sometimes also https://jira.mariadb.org/browse/MDEV-18038 happens. Both failures are rare. log_buffer_extend() is called in this test. When I disable it with --innodb-log-buffer-size=100000000 I see no errors at all. But I'm unsure because it's not easy to reproduce. Anyway this is my current guess that it's data races during redo log extending.

Eugene Kosov (Inactive) added a comment - 2018-12-28 18:58

I put write_always = true; at the very beginning of log_checkpoint(). This makes checkpoints flushed more frequent. And both failures again disappear. Maybe the last checkpoint sometimes is not committed and this is the source of problems?

Eugene Kosov (Inactive) added a comment - 2018-12-28 18:58 I put write_always = true; at the very beginning of log_checkpoint() . This makes checkpoints flushed more frequent. And both failures again disappear. Maybe the last checkpoint sometimes is not committed and this is the source of problems?

Marko Mäkelä added a comment - 2019-03-08 13:20

kevg, that is interesting. Could it be that we sometimes update last_checkpoint_lsn before the corresponding write to the checkpoint header becomes persistent? Could it be that fil_aio_wait() is invoked on the redo log pseudo-tablespace for something else than the latest pending checkpoint write? After all, we do allow multiple log_checkpoint() executions in parallel, interleaved with each other.

Could you add an assertion to log_complete_checkpoint() or its call path, to verify that the completed write was for the expected checkpoint?

Note: The redo log is not being extended while the server is running. Only the redo log write buffer is extended during this in 10.2 and 10.3. In 10.4, less redo log is being written thanks to the ~~MDEV-17138~~ MLOG_MEMSET record.

Marko Mäkelä added a comment - 2019-03-08 13:20 kevg , that is interesting. Could it be that we sometimes update last_checkpoint_lsn before the corresponding write to the checkpoint header becomes persistent? Could it be that fil_aio_wait() is invoked on the redo log pseudo-tablespace for something else than the latest pending checkpoint write? After all, we do allow multiple log_checkpoint() executions in parallel, interleaved with each other. Could you add an assertion to log_complete_checkpoint() or its call path, to verify that the completed write was for the expected checkpoint? Note: The redo log is not being extended while the server is running. Only the redo log write buffer is extended during this in 10.2 and 10.3. In 10.4, less redo log is being written thanks to the MDEV-17138 MLOG_MEMSET record.

Eugene Kosov (Inactive) added a comment - 2019-03-26 14:56

My assumption about `write_always = true` was incorrect at least for ~~MDEV-18038~~ which sometimes happen.

Eugene Kosov (Inactive) added a comment - 2019-03-26 14:56 My assumption about `write_always = true` was incorrect at least for MDEV-18038 which sometimes happen.

Marko Mäkelä added a comment - 2019-04-23 13:10 - edited

I implemented debug instrumentation that should make bugs in crash recovery and mariabackup --prepare more repeatable.

Usage:

./mtr --mysqld=--debug=d,ib_log_checkpoint_avoid …

Marko Mäkelä added a comment - 2019-04-23 13:10 - edited I implemented debug instrumentation that should make bugs in crash recovery and mariabackup --prepare more repeatable. Usage: ./mtr --mysqld=--debug=d,ib_log_checkpoint_avoid …

Eugene Kosov (Inactive) added a comment - 2019-05-13 07:33 - edited

The bug is inside log_buffer_extend(). ~~MDEV-18038~~ too.

This patch makes log_buffer_extend() run on every mtr.commit().

diff --git a/storage/innobase/include/log0log.h b/storage/innobase/include/log0log.h

index 16399edad50..dd549baddce 100644

--- a/storage/innobase/include/log0log.h

+++ b/storage/innobase/include/log0log.h

@@ -81,7 +81,7 @@ log_free_check(void);

 /** Extends the log buffer.

 @param[in]     len     requested minimum size in bytes */

-void log_buffer_extend(ulong len);

+void log_buffer_extend();

 /** Check margin not to overwrite transaction log from the last checkpoint.

 If would estimate the log write to exceed the log_group_capacity,

diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc

index 409b42c452a..4ffa906f86b 100644

--- a/storage/innobase/log/log0log.cc

+++ b/storage/innobase/log/log0log.cc

@@ -159,8 +159,10 @@ log_buf_pool_get_oldest_modification(void)

 /** Extends the log buffer.

 @param[in]     len     requested minimum size in bytes */

-void log_buffer_extend(ulong len)

+void log_buffer_extend()

+       size_t len = srv_log_buffer_size + 1;

        byte    tmp_buf[OS_FILE_LOG_BLOCK_SIZE];

        log_mutex_enter_all();

diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc

index 92b1aa38a81..d4950ffba05 100644

--- a/storage/innobase/mtr/mtr0mtr.cc

+++ b/storage/innobase/mtr/mtr0mtr.cc

@@ -764,9 +764,7 @@ mtr_t::Command::prepare_write()

        ut_ad(len > 0);

        ut_ad(n_recs > 0);

-       if (len > srv_log_buffer_size / 2) {

-               log_buffer_extend(ulong((len + 1) * 2));

-       }

+       log_buffer_extend();

        ut_ad(m_impl->m_n_log_recs == n_recs);

And with that patch I always see ~~MDEV-13080~~ or ~~MDEV-18038~~ in 10.3 with this simple test:

create or replace table t (a int) engine=innodb;

insert into t values (1);

drop table t;

Eugene Kosov (Inactive) added a comment - 2019-05-13 07:33 - edited The bug is inside log_buffer_extend() . MDEV-18038 too. This patch makes log_buffer_extend() run on every mtr.commit() . diff --git a/storage/innobase/include/log0log.h b/storage/innobase/include/log0log.h index 16399edad50..dd549baddce 100644 --- a/storage/innobase/include/log0log.h +++ b/storage/innobase/include/log0log.h @@ -81,7 +81,7 @@ log_free_check(void); /** Extends the log buffer. @param[in] len requested minimum size in bytes */ -void log_buffer_extend(ulong len); +void log_buffer_extend(); /** Check margin not to overwrite transaction log from the last checkpoint. If would estimate the log write to exceed the log_group_capacity, diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc index 409b42c452a..4ffa906f86b 100644 --- a/storage/innobase/log/log0log.cc +++ b/storage/innobase/log/log0log.cc @@ -159,8 +159,10 @@ log_buf_pool_get_oldest_modification(void) /** Extends the log buffer. @param[in] len requested minimum size in bytes */ -void log_buffer_extend(ulong len) +void log_buffer_extend() { + size_t len = srv_log_buffer_size + 1; + byte tmp_buf[OS_FILE_LOG_BLOCK_SIZE]; log_mutex_enter_all(); diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc index 92b1aa38a81..d4950ffba05 100644 --- a/storage/innobase/mtr/mtr0mtr.cc +++ b/storage/innobase/mtr/mtr0mtr.cc @@ -764,9 +764,7 @@ mtr_t::Command::prepare_write() ut_ad(len > 0); ut_ad(n_recs > 0); - if (len > srv_log_buffer_size / 2) { - log_buffer_extend(ulong((len + 1) * 2)); - } + log_buffer_extend(); ut_ad(m_impl->m_n_log_recs == n_recs); And with that patch I always see MDEV-13080 or MDEV-18038 in 10.3 with this simple test: create or replace table t (a int ) engine=innodb; insert into t values (1); drop table t;

Marko Mäkelä added a comment - 2019-05-15 12:34

We will keep the log_buffer_extend() but remove the log file write from it.

Marko Mäkelä added a comment - 2019-05-15 12:34 We will keep the log_buffer_extend() but remove the log file write from it.

MariaDB Server

[ERROR] InnoDB: Missing MLOG_CHECKPOINT between the checkpoint x and the end y

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration