[MDEV-15275] innodb_gis.rtree_purge failed in buildbot with timeout Created: 2018-02-11  Updated: 2023-11-28

Status: Open
Project: MariaDB Server
Component/s: GIS, Storage Engine - InnoDB, Tests
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 0
Labels: hang

Issue Links:
Relates
relates to MDEV-14059 InnoDB assertion failure offset >= ((... Closed
relates to MDEV-15284 innodb_gis.rtree_concurrent_srch fail... Confirmed
relates to MDEV-24257 innodb_gis.rtree_purge failed in bb,... Open
relates to MDEV-29835 Partial server freeze Closed

 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-zyp-sles12-amd64/builds/4263/steps/mtr/logs/stdio

innodb_gis.rtree_purge '64k,innodb'      w3 [ fail ]  timeout after 900 seconds
        Test ended at 2018-01-27 10:13:22
 
Test case timeout after 900 seconds
 
== /dev/shm/var/3/log/rtree_purge.log == 
SET @saved_frequency = @@GLOBAL.innodb_purge_rseg_truncate_frequency;
SET GLOBAL innodb_purge_rseg_truncate_frequency = 1;
create table t (
b point not null,d point not null, spatial key (d),spatial key (b)
) engine=innodb;
 
 == /dev/shm/var/3/tmp/analyze-timeout-mysqld.1.err ==
mysqltest: Could not open connection 'default' after 500 attempts: 2002 Can't connect to local MySQL server through socket '/dev/shm/var/tmp/3/mysqld.1.sock' (111)
 
 
 - saving '/dev/shm/var/3/log/innodb_gis.rtree_purge-64k,innodb/' to '/dev/shm/var/log/innodb_gis.rtree_purge-64k,innodb/'
 
Retrying test innodb_gis.rtree_purge, attempt(2/3)...
 
worker[3] > Restart  - not started
sys_vars.innodb_buffer_pool_load_abort_basic 'innodb' w2 [ pass ]       
***Warnings generated in error logs during shutdown after running tests: innodb_gis.rtree_purge
 
2018-01-27 10:01:31 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:01:31 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:01:31 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:02:02 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:02:02 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:02:02 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:02:02 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:02:33 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:02:33 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:02:33 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:02:33 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:03:04 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:03:04 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:03:04 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:03:04 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:03:35 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:03:35 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:03:35 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:03:35 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:04:06 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:04:06 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:04:06 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:04:06 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:04:37 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:04:37 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:04:37 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:04:37 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:05:08 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:05:08 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:05:08 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:05:08 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:05:39 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:05:39 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:05:39 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:05:39 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:06:10 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:06:10 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:06:10 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:06:10 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:06:41 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:06:41 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:06:41 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:06:41 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:07:12 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:07:12 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:07:12 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:07:12 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:07:43 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:07:43 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:07:43 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:07:43 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:08:14 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:08:14 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:08:14 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:08:14 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:08:45 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:08:45 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:08:45 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:08:45 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:09:16 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:09:16 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:09:16 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:09:16 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:09:47 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:09:47 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:09:47 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:09:47 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:10:18 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:10:18 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:10:18 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:10:18 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:10:49 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:10:49 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:10:49 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:10:49 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:11:20 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:11:20 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:11:20 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:11:20 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:11:51 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:11:51 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:11:51 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:11:51 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:12:22 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:12:22 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:12:22 140308897511168 [Warning] InnoDB: A long semaphore wait:
2018-01-27 10:12:22 140308897511168 [Warning] InnoDB: A long semaphore wait:
Attempting backtrace. You can use the following information to find out



 Comments   
Comment by Daniel Black [ 2020-10-03 ]

ref: https://buildbot.mariadb.org/#/builders/33/builds/4843/steps/8/logs/stdio

Comment by Marko Mäkelä [ 2022-10-24 ]

The hang looks like a latching order violation similar to the one reported in MDEV-29835. It used to mostly occur on the 32-bit debug builder kvm-fulltest2. Locally, I can reliably repeat it with an ASAN debug build of 10.6.

It seems that code paths that involve rtr_pcur_open() are not properly holding an exclusive index latch. I diagnosed a deadlock between two threads: one performing an INSERT and another executing rtr_get_father_node() inside btr_compress() as part of a purge of transaction history. The latter thread should either hold an exclusive index latch, or it should never have released the parent page latch in the first place.

Comment by Marko Mäkelä [ 2022-10-24 ]

I debugged this as follows:

./mtr --rr --parallel=auto innodb_gis.rtree_purge
rr replay var/1/log/mysqld.1.rr/latest-trace

I set a breakpoint to the rtr_page_get_father_block() call in btr_compress() and a watchpoint on father_cursor.page_cur.block.page.lock.writer. Then, reverse-continue and backtrace would show where the latch was prematurely released:

#1  0x000055ae4f00926e in mtr_t::release_block_at_savepoint (this=<optimized out>, savepoint=<optimized out>, block=<optimized out>) at /mariadb/10.6/storage/innobase/include/mtr0mtr.inl:180
#2  0x000055ae4f0b0df9 in rtr_pcur_getnext_from_path (tuple=tuple@entry=0x7f6bac0050b0, mode=mode@entry=PAGE_CUR_RTREE_LOCATE, btr_cur=btr_cur@entry=0x7f6bb77ecd40, target_level=target_level@entry=0, 
    latch_mode=latch_mode@entry=522, index_locked=<optimized out>, mtr=<optimized out>) at /mariadb/10.6/storage/innobase/gis/gis0sea.cc:232
#3  0x000055ae4f0a96e8 in rtr_pcur_open (index=index@entry=0x7f6b900ab2a0, tuple=<optimized out>, tuple@entry=0x7f6bac0050b0, latch_mode=latch_mode@entry=522, cursor=cursor@entry=0x7f6bb77ecd40, 
    mtr=mtr@entry=0x7f6bb77ecfe0) at /mariadb/10.6/storage/innobase/gis/gis0sea.cc:593
#4  0x000055ae4f25233e in row_search_index_entry (index=index@entry=0x7f6b900ab2a0, entry=entry@entry=0x7f6bac0050b0, mode=mode@entry=522, pcur=pcur@entry=0x7f6bb77ecd40, mtr=mtr@entry=0x7f6bb77ecfe0)
    at /mariadb/10.6/storage/innobase/row/row0row.cc:1300

There is a confusing comment in rtr_pcur_getnext_from_path():

		/* Once we have pages in "path", these pages are
		predicate page locked, so they can't be shrunk away.
		They also have SSN (split sequence number) to detect
		splits, so we can directly latch single page while
		getting them. They can be unlatched if not qualified.
		One reason for pre-latch is that we might need to position
		some parent position (requires latch) during search */

After this comment, we release all non-leaf page latches, which would seem to be a prerequisite for this hang.

There is a FIXME comment referring to MDEV-14059, noting that for some reason, the SPATIAL INDEX code is making copies of block descriptors.

The dodgy latching and locking rules for SPATIAL INDEX that we adopted from MySQL 5.7 could easily explain intermittent-wrong-result bugs like MDEV-15284.

Generated at Thu Feb 08 08:20:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.