[MDEV-26110] mtr failure innodb.innodb_scrub (assertion) aix RelWithDebug build Created: 2021-07-08 Updated: 2022-05-04 Resolved: 2021-07-22 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.5, 10.6 |
| Fix Version/s: | 10.5.12, 10.6.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | Daniel Black | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
AIX |
||
| Issue Links: |
|
||||||||
| Description |
|
EGuesnet can I get a hand with this please. Its constantly failing and with standard kind of debugging I've got a corrupted stack trace. As its an assertion its not just a test failure. Also reproducable on the CMAKE_BUILD_TYPE=Debug
notes on other aix errors:
|
| Comments |
| Comment by Etienne Guesnet [ 2021-07-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This fail occurs since version 10.5.9 according my backlogs. I can reproduce it. I have the following backtrace from core:
2 threads. No backtrace for the second, only
If I correctly understand the error, it fails at storage/innobase/os/os0file.cc:4141
with OS_FILE_LOG_BLOCK_SIZE = 512U. The test mysql-test/suite/innodb/t/innodb_scrub.test repeats 500 times an insert, then commits and does some stuff. I do not understand the reason of the error as I see in log debug/mysql-test/var/log/innodb.innodb_scrub-innodb/mysqld.1/mysqld.log:
500 times. Last command is
that seems OK. It is the first time I investigate this error, and I do not understand where it is from. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2021-07-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've been trying to bisect where it came into today without much luck. A p *cb at the point of assertion may provide some hints. Its the buf arg at the breakpoint fil_space_t::io. Does block_size (global get get 512 multiple size? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2021-07-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Tips most welcome as to what I need to get a decent backtrace on AIX. Cmake-3.16 is problematic and the LIBPATH solution is hackish (which is what I've done in buildbot). I'm investigating a cmake upgrade if this is related. Failing that will look at a downgrade. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Etienne Guesnet [ 2021-07-12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The same. I have proceed from 10.6.1 to avoid rebuild all, and I have the following :
The value 295 in continue was found by manual search. Buffer value seems to be kind of field_ref_zero (that refers to storage/innobase/buf/buf0buf.cc, function buf_is_zeroes). I do not have idea about where this bug is from.
LIBPATH is a know bug. Trouble between pthread and not pthread stdlib is a common trouble on AIX. I will speak with our GCC expert (he is currently on vacation). As far as I know, it is not corrected in higher version. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2021-07-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks EGuesnet. Sounds like the cmake version isn't the case of my lack of symbols in backtracking so if I'm missing essential build options on https://buildbot.mariadb.org/#/builders/121 I'm keen to know. Thanks for the debug info here. Yes I'm trying to be careful with these packages (saved by environment paths so far) but will clean them up. Thanks for clearing up my knowlege of AIX terms. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Etienne Guesnet [ 2021-07-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Your options seem good. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-07-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In the file log0log.h, I remember that we had to replace alignas with MY_ALIGNED because the 128-byte cache line size on s390x would be too large for alignas according to GCC 5. In the GDB output above, I see that the address of field_ref_zero is only aligned to 32 bytes, even though the code requested 1024-byte alignment. (The assertion expects 512-byte alignment.) Would the following work?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Etienne Guesnet [ 2021-07-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The test fails again with this change. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2021-07-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks Etienne, i found that too, just didn't update this quick enough. Marko notes: Come to think of it, that ut_a on alignment can be disabled on any platform that does not support O_DIRECT. If s390x would silently ignore the MY_ALIGNED then scrubbing should be broken there as well. But Linux on s390x would actually support O_DIRECT. But, if we now know that MY_ALIGNED does not work, then maybe we should use aligned_malloc for these as well:
I'll look at this tomorrow | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-07-22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I will take over this. I suspect that also s390x on Linux may be affected, and the problems should not be limited to scrubbing. Possibly the difference between alignas and MY_ALIGNED is that the latter may be silently ignored by the compiler when the requested alignment cannot be guaranteed. (On Debian GNU/Linux on s390x, GCC 5 complained about alignas, but MY_ALIGNED was fine. We did not check the actual alignment.) 10.5 might appear to work, because it did not yet default to innodb_flush_method=O_DIRECT ( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-07-22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Apart from the special 64KiB all-zero buffer field_ref_zero, which should now be 4KiB-aligned, there were a few 512-byte aligned log buffers related to ib_logfile0, which were previously allocated statically or from the stack. Now we use guaranteed aligned heap memory allocation for those as well. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Etienne Guesnet [ 2021-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi @marko, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2021-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
10.6.4 is coming out very soon. commit is - https://github.com/MariaDB/server/commit/82d5994520c239da1b6edf1b24e08138ae0c753d 10.5 - test with mtr - | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Etienne Guesnet [ 2021-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I confirm all is OK with the commit. |