[MDEV-11044] SIGSEGV on server startup Created: 2016-10-13  Updated: 2017-07-05  Resolved: 2017-07-05

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.0.22, 10.0.25, 10.1.14
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Matthias Fehring Assignee: Marko Mäkelä
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

openSUSE Tumbleweed, Linux 4.7.6, glibc 2.24, Akonadi 5.3.1, CPU: AMD FX 6300, RAM 16GB


Attachments: Text File complete-config.txt     Text File gdb.txt     Text File mysql.conf     File mysql.err     File undo_page.bin    

 Description   

When trying to startup the MariaDB server for KDE Akonadi, the server crashes with a segmentation fault.



 Comments   
Comment by Elena Stepanova [ 2016-10-13 ]

Could you please also attach the server error log? Thanks.

Comment by Elena Stepanova [ 2016-10-19 ]

Copying the stack trace from the attached gdb.txt to make it searchable in JIRA:

#0  __memcpy_ssse3 () at ../sysdeps/x86_64/multiarch/memcpy-ssse3.S:2955
#1  0x0000555555ce16f5 in memcpy (__len=<optimized out>, __src=0x7fffe7f6d008, __dest=<optimized out>) at /usr/include/bits/string3.h:53
#2  mem_heap_dup (heap=heap@entry=0x555557a58490, data=data@entry=0x7fffe7f6d008, len=<optimized out>) at /usr/src/debug/mariadb-10.0.22/storage/xtradb/mem/mem0mem.cc:125
#3  0x0000555555d582f0 in trx_undo_rec_copy (heap=0x555557a58490, undo_rec=0x7fffe7f6d008 "") at /usr/src/debug/mariadb-10.0.22/storage/xtradb/include/trx0rec.ic:111
#4  trx_purge_get_next_rec (n_pages_handled=n_pages_handled@entry=0x7fffdcc0dde0, heap=0x555557a58490) at /usr/src/debug/mariadb-10.0.22/storage/xtradb/trx/trx0purge.cc:908
#5  0x0000555555d58de4 in trx_purge_fetch_next_rec (heap=<optimized out>, n_pages_handled=0x7fffdcc0dde0, roll_ptr=0x7fffc8003260) at /usr/src/debug/mariadb-10.0.22/storage/xtradb/trx/trx0purge.cc:959
#6  trx_purge_attach_undo_recs (batch_size=300, limit=0x555557a2bec0, purge_sys=0x555557a2bdb8, n_purge_threads=<optimized out>) at /usr/src/debug/mariadb-10.0.22/storage/xtradb/trx/trx0purge.cc:1037
#7  trx_purge (n_purge_threads=<optimized out>, batch_size=300, truncate=truncate@entry=false) at /usr/src/debug/mariadb-10.0.22/storage/xtradb/trx/trx0purge.cc:1219
#8  0x0000555555d4bc3e in srv_do_purge (n_total_purged=<synthetic pointer>, n_threads=1) at /usr/src/debug/mariadb-10.0.22/storage/xtradb/srv/srv0srv.cc:3281
#9  srv_purge_coordinator_thread (arg=<optimized out>) at /usr/src/debug/mariadb-10.0.22/storage/xtradb/srv/srv0srv.cc:3467
#10 0x00007ffff70f3454 in start_thread (arg=0x7fffdcc0e700) at pthread_create.c:333
#11 0x00007ffff5e9b3ff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

Comment by Jan Lindström (Inactive) [ 2016-10-28 ]

Is it possible that you run out-of-memory ?

Comment by Matthias Fehring [ 2016-10-29 ]

No, i don't think so. There is enough free memory and it crashes directly.

$> free
              total        used        free      shared  buff/cache   available
Mem:       16441664     5260672     7630164       50600     3550828    10792216
Swap:       2096124           0     2096124
 
$> /usr/sbin/mysqld --defaults-file=~/.local/share/akonadi/mysql.conf --datadir=~/.local/share/akonadi/db_data/ --socket=/tmp/akonadi-user.gDf8qg/mysql.socket
161029 13:53:20 [Note] /usr/sbin/mysqld (mysqld 10.0.22-MariaDB) starting as process 32102 ...
Speicherzugriffsfehler (Speicherabzug geschrieben) (Segmentation fault)

Comment by Marko Mäkelä [ 2016-11-23 ]

Could the undo log page be corrupted, causing the "len" parameter to be very big?
If you still have the core dump or can reproduce the problem, I would like to see the entire undo log page from here:

(gdb) frame 3
#3  0x0000555555d582f0 in trx_undo_rec_copy (heap=0x555557a58490, undo_rec=0x7fffe7f6d008 "") at /usr/src/debug/mariadb-10.0.22/storage/xtradb/include/trx0rec.ic:111
(gdb) print srv_page_size
(gdb) dump binary memory undo_page.bin undo_rec&~(srv_page_size-1) undo_rec&~(srv_page_size-1)+srv_page_size
(gdb) dump binary memory undo_page.bin 0x7fffe7f6c000 0x7fffe7f70000

The last statement assumes innodb_page_size=16k (asking to dump 0x4000 bytes from undo_rec-0x1008). Use the last statement only if the previous one does not work.
I would like to see the file undo_page.bin, whose size in bytes should correspond to innodb_page_size (srv_page_size).

Comment by Matthias Fehring [ 2016-11-23 ]

(gdb) frame 3
#3  0x0000555555d582f0 in trx_undo_rec_copy (heap=0x555557a4e910, undo_rec=0x7fffe7f6d008 "") at /usr/src/debug/mariadb-10.0.22/storage/xtradb/include/trx0rec.ic:111
111     /usr/src/debug/mariadb-10.0.22/storage/xtradb/include/trx0rec.ic: Datei oder Verzeichnis nicht gefunden.
(gdb) print srv_page_size
$1 = 16384
(gdb) dump binary memory undo_page.bin undo_rec&~(srv_page_size-1) undo_rec&~(srv_page_size-1)+srv_page_size
Argument to arithmetic operation not a number or boolean.
(gdb) dump binary memory undo_page.bin 0x7fffe7f6c000 0x7fffe7f70000

Comment by Marko Mäkelä [ 2016-11-28 ]

Thanks for the update. At byte 0x1008 in undo_page.bin there are 16 zero bytes before what looks like the start of an undo log record. Let me try to decode the page dump.
At TRX_UNDO_PAGE_FREE we have the address 0x1378, which looks correct (it is immediately after the last undo log record in the page). At 0x1376 (the last 2 bytes of the last record) we have a pointer to the previous undo log record at 0x132e, whose preceding 2 bytes point backwards to 0x12f0. Going further backwards, we have 0x12b2, 0x1274, 0x1236, 0x1217, 0x11ba, 0x1184.
Between 0x10f6 and 0x1183. All bytes are zero. I would have expected the 2 bytes at 0x1182 to point to the preceding undo log record, not to 0x0000.
Looking a bit around, at 0x1018 we have a forward-pointer to 0x104e, whose preceding 2 bytes correctly point backwards to 0x1018. This forward-chain looks correct: 0x1018, 0x104e, 0x10ca. But then, at 0x10ca, we have 4 zero bytes.
Similarly, from 0xeaf there is a contiguous doubly-linked list of undo log records: 0xeaf, 0xee4, 0xf40, 0xf5e (again, 4 zero bytes).
Similarly, 0xd43, 0xd79, 0xdd6, 0xdf5 (again, 4 zero bytes).
Similarly, 0xc64, 0xc89.
I am not that familiar with this part of the InnoDB undo log page. I might need more data from the same core dump to be able to see where the execution went wrong.

(gdb) frame 4
(gdb) info locals
(gdb) p *n_pages_handled

It could be easiest for me to get the data files (at least the InnoDB system tablespace files and the ib_logfile*) to repeat the problem myself. I understand if you do not want to disclose confidential data.
With rarely occurring crashes like this, it is difficult to tell if we have a software bug (and how to reproduce it in easy steps) or if it is some kind of a hardware failure. For example, page checksums only protect data when it is residing in files, not when it is in the InnoDB buffer pool (RAM).

Comment by Matthias Fehring [ 2016-12-06 ]

All that happend after a system crash caused by a game (X:Rebirth). As the storage contains information about private and business e-mails I am not willing and legally not allowed to disclose the data.

__memcpy_ssse3 () at ../sysdeps/x86_64/multiarch/memcpy-ssse3.S:2848
2848    ../sysdeps/x86_64/multiarch/memcpy-ssse3.S: Datei oder Verzeichnis nicht gefunden.
(gdb) frame 4
#4  trx_purge_get_next_rec (n_pages_handled=n_pages_handled@entry=0x7fffdcc0dde0, heap=0x555557a4e8c0) at /usr/src/debug/mariadb-10.0.22/storage/xtradb/trx/trx0purge.cc:908
908     /usr/src/debug/mariadb-10.0.22/storage/xtradb/trx/trx0purge.cc: Datei oder Verzeichnis nicht gefunden.
(gdb) info locals
rec = 0x7fffe7f6d008 ""
rec2 = <optimized out>
offset = 4104
page_no = 633
space = 0
zip_size = 0
mtr = {memo = {heap = 0x0, used = 32, 
    data = "\001\000\000\000\000\000\000\000\200\310\377\342\377\177\000\000\001\000\000\000\000\000\000\000\200\321\377\342\377\177", '\000' <repeats 58 times>, "\t\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\260\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000\t\000\000\000\062", '\000' <repeats 19 times>, "[\000\000\000n", '\000' <repeats 19 times>, "w\000\000\000|", '\000' <repeats 11 times>, " \000\000\330\377\177\000\000\200", '\000' <repeats 23 times>..., base = {count = 93825018611448, start = 0x68, end = 0x7fffd80266c8}, list = {prev = 0x555555dc3214 <dict_table_stats_lock(dict_table_t*, unsigned long)+596>, next = 0x11}}, log = {heap = 0x0, used = 0, 
    data = "\000\000\000\000\000\000\000\000]\000\000\000n\000\000\000\000\252\220y%\rg\220\000\000\000\000\000\000\000\000\000\t\000\330\377\177\000\000\300\350\244WUU\000\000\000\000\000\000\000\000\000\000\300\066\343VUU\000\000\000\000\000\000\000\000\000\000\350\343\244WUU\000\000_\031\316UUU\000\000\000\000\000\000\000\000\000\000\025", '\000' <repeats 16 times>, "\004", '\000' <repeats 15 times>, "\252\220y%\rg\220h\004\000\000\000\000\000\000\000\252\220y%\rg\220\200p\345VUU\000\000\300\350\244WUU\000\000(\351\244WUU\000\000\300\350\244WUU\000\000\300\066\343VUU\000\000<6\323UUU\000\000"..., base = {count = 0, start = 0x0, end = 0x555557a4e3e8}, list = {prev = 0x555557a22158, 
      next = 0x7ffff5e2e254 <__GI___libc_malloc+84>}}, inside_ibuf = 0, modifications = 0, made_dirty = 0, n_log_recs = 0, n_freed_pages = 0, log_mode = 21, start_lsn = 8104, end_lsn = 10405299918667295232, trx = 0x0}
(gdb) p *n_pages_handled
$1 = 166

Comment by Marko Mäkelä [ 2016-12-20 ]

Thanks, Matthias!

If there was a system crash that was resolved by forcibly switching the power off, it is possible that we are witnessing the results of another part of the hardware/firmware/software stack failing. Maybe fsync() is not working on the hardware as expected, and some writes are being reordered, causing a situation where some of the recovered InnoDB data pages correspond to a wrong logical time (too old or too new LSN).

Admittedly, it is unlikely that a Linux kernel hang triggered by a GPU intensive game would cause the hard disk firmware to hang. Perhaps the system was not completely frozen, and it was just in the middle of performing a write when the power was switched off?

How were the configuration parameters innodb_doublewrite, innodb_flush_method, innodb_checksum_algorithm, innodb_checksums set? Were any non-default InnoDB or XtraDB configuration parameters specified?

I did some more code review. It seems to me that purge should sequentially read the undo logs of each purgeable transaction. The wrong purge_sys->offset (4104 == 0x1008) should actually have been read from the same page, at some earlier address than 0x1008. The next-record pointer is in the first two bytes of the undo log record, and it must not be past the currently marked end of the page; see trx_undo_page_get_next_rec().
I think that the undo log pages are being written in an append-only fashion, so the next-record pointer should be in the immediately preceding record.

It seems that trx_undo_page_get_next_rec() can start reading an undo page from an arbitrary offset (purge_sys->hdr_offset) that we could have read as corrupted from some other page. There are quite a few bytes "0x10 0x08" in the undo log page, and of course it is also possible that purge_sys->hdr_offset was directly initialized as 0x1008 from some other page.

With the available data, it is not possible for me to debug this further. If you still have the data and are willing to investigate, it would be interesting to see which undo pages were accessed by trx0purge.cc before the corruption was noticed. That could then allow us to determine the cause of the corruption. Because I do not remember seeing anything like this while working on MySQL and InnoDB since 2003, I would tend to believe that the problem lies outside InnoDB. That said, I am not fully familiar of the changes made in XtraDB or MariaDB.

Comment by Matthias Fehring [ 2016-12-20 ]

Hello Marko,

I attached the complete config of the server instance as it is returned by the --verbose --help option. I still have the data available. What futher information can I provide?

complete-config.txt

Comment by Marko Mäkelä [ 2017-01-10 ]

Hello Matthias,
Sorry, I missed your update.
I see that the InnoDB doublewrite buffer and the page checksums were enabled. So, crash recovery should have performed as expected. It is theoretically possible that a corrupted page accidentally got a valid-looking checksum, though.

The page checksum of the attached file undo_page.bin (which was extracted from gdb) is valid, which suggests that there was no recent redo log activity for the page, or a redo log checkpoint was written by one of the startup attempts before the code crashed when accessing the seemingly corrupted undo log.

As far as I can tell, given that you (quite understandably) do not want to share the data files for deeper analysis, the only way to move forward with this is that someone starts up the server on the data files under a debugger, setting breakpoints in InnoDB code to figure out where the corruption originally occurs.

Less likely options are that someone else runs into this bug, or that you are able to create a reduced test case for repeating the problem.

One more thing that you could perhaps try is to check the diagnostics of the storage medium. Does smartctl -a report any serious errors, such as sector relocation? I am thinking of a possible chain of events like this: The server crashed. On the first crash recovery attempt (applying the redo log), reading the undo log page failed, and the hard drive substituted an empty page (filled with zeros). This would pass the InnoDB page checksum. Then, the redo log records would be applied and the page would be rewritten, except for some ‘holes’ that were not covered by the redo log records.

Also, I wonder if there is some tool like memtest86 which would test the reliability of file systems and the block device. Running memtest86 or similar could also be a good idea, because InnoDB page checksums are only calculated immediately before writing a block back to disk. A low-probability RAM corruption caused by unreliable hardware could go unnoticed for a long time.

Comment by Matthias Fehring [ 2017-06-28 ]

While using Thunderbird since this issue happened to my Kontact MariaDB database, I totally forgot about this issue report.

I checked the SMART values of my storage device and everything is fine with it. I also die an overnight memtest86 run to test my system memory, also without any issues.

To be able to use KDE's Kontact again, I deleted the old data (including the corrupted database) an redownloaded the emails from the IMAP server.

So I am sorry not being able to help to resolve this anymore. Maybe it was a weird combination of different circumstances that led tot this situation.

I think this can be closed as WONTFIX or something similar.

Thanks to all for your efforts.

Generated at Thu Feb 08 07:46:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.