[MDEV-23727] mariadb read stuck forever with primary key Created: 2020-09-13  Updated: 2021-02-08  Resolved: 2021-02-08

Status: Closed
Project: MariaDB Server
Component/s: Optimizer, Storage Engine - InnoDB
Affects Version/s: 10.3.11
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Arvinder Singh Assignee: Sergei Petrunia
Resolution: Incomplete Votes: 1
Labels: need_feedback

Attachments: Text File mysqld_full_bt_all_threads_20200914_1.txt     Text File mysqld_full_bt_all_threads_20200914_2.txt     File show_engine_innodb_status_10.3.11_20200914    

 Description   

Our database is unresponsive and we are unable to read anything from a specific table (size 450G, rows: 5 millions).
We have primary key on first column, even this gets stuck for hours and never responds:
SELECT * FROM table_name where id > 1 and message_id < 3 limit 1; #hangs forever
SELECT * FROM table_name limit 1; #hang forever
(where id is a primary key).

The exact same problem persists on the slave db, So it's not a storage or hardware issue.

However, select with other indexed columns are working:
SELECT * FROM table_name where a_id = '000-000-113-00002' limit 1; #result in (0.03 sec)

Mariadb version: 10.3.11-MariaDB-log

Would really appreciate any help on it. Our production app is stuck due to this. Please help.



 Comments   
Comment by Arvinder Singh [ 2020-09-13 ]

[^db_read_stuck] Attached innodb status.

Comment by Daniel Black [ 2020-09-13 ]

Can you grab two gdb full backtraces with ~1m between them so we can identify progress.

Comment by Arvinder Singh [ 2020-09-13 ]

[^mysqld_full_bt_all_threads.txt] [^mysqld_full_bt_all_threads_new.txt]
Attached Backtraces For All Threads

Comment by Marko Mäkelä [ 2020-09-14 ]

[^db_read_stuck] mentions a wait in the adaptive hash index (btr0sea.cc), but neither [^mysqld_full_bt_all_threads_new.txt] nor [^mysqld_full_bt_all_threads.txt] show any threads executing in that file. Were the stack traces really produced from the same execution as the status output?

Comment by Arvinder Singh [ 2020-09-14 ]

stack traces produced at different time than show engine status.
Sorry, I was busy in setting up new server. I got `engine innodb status` and stack traces at same time now and attached( file named: *_20200914 ).
Only query running(and stuck) while doing so was: select * from <table> limit 1;

Thanks for looking into it.

Comment by Arvinder Singh [ 2020-09-17 ]

Our production db is stuck for days on one specific query, would really appreciate if someone can help us to find the root cause.
Let me know if I need to provide anything.
Thanks in advance.

Comment by Vincent Milum Jr [ 2020-09-17 ]

Hello MariaDB team!

I'm one of Arvinder's co-workers, also working on this issue, and wanted to bring in some additional information along with some more questions.

I've cloned the database data in question: SCP the data and log dirs to a set of test machines while MariaDB was stopped.

I've confirmed the issue described above is still present on the latest 10.3, 10.4, and 10.5 builds.

On the test machines, my main two tests are SELECT * FROM tbl LIMIT 13 and SELECT * FROM tbl LIMIT 14

LIMIT 13 works on all three versions as expected, returning in a few milliseconds.
LIMIT 14 on 10.3 and 10.4 takes ~50 minutes to return the results, on 10.5 it takes ~40 seconds to return the results.
Each time one of these queries returned, I would re-run it again instantly. Results were repeated consistently multiple times.

I was also monitoring disk i/o on the host OS during all of this, and noticed that MariaDB was writing ~20-40MB/sec to disk the entire time MariaDB was running on the test system, several hours long, even with zero active connections. I enabled performance_schema on the test machines, and noticed that virtually all the writes were going to the same table with the locking issue, despite there being no other connections to this test server other than my interactive console.

Later I created a second test system, and ran TRUNCATE TABLE, and noticed that the large write volume continued, despite the truncated table now being ~256KB on disk. These writes, however, were going to two other files.
1) the shared innodb tablespace file ibdata1 - despite having file-per-table enabled, this file was ~60GB in size
2) mysql.gtid_slave_pos - which at the time was ~1.6GB in size with only one record

Inside information_schema, it reportd the gtid table as the 1.6GB in data usage with little to no "free" space, despite having only a single record in this table.

I had let the server continue to run in the background, and came back a few hours later, at which point the heavy write activity had stopped. the gtid table stats listed in information_schema now shows 1.6GB for both data and 1.6GB for free.

SO, this is a long and round-about way of asking if the InnoDB page cleaner could be the culprit for locking specific pages, that would cause mutex/spinlock contention, preventing SELECT statements from reading those specific pages? And if so, is the mutex or spinlock contention something that was improved in 10.5, which is why those return rows faster? (but still taking far too long to return rows)

Comment by Marko Mäkelä [ 2021-01-04 ]

In both mysqld_full_bt_all_threads_20200914_1.txt and mysqld_full_bt_all_threads_20200914_2.txt, I see only one thread executing something with ha_innobase:: in the stack trace.

In mysqld_full_bt_all_threads_20200914_1.txt, that thread (Thread 6) does not appear to be stuck at all: it is reading a record in an apparent index or table scan.

In mysqld_full_bt_all_threads_20200914_2.txt, it is again Thread 6 that is executing the SQL statement, and that thread is executing a pread system call.

Slow query execution is a different matter from a server hang. Hangs usually are triggered by some highly nondeterministic event. I would server stack traces of all threads during the time the server genuinely appears to be hung.

As far as I understand, some attachments have been deleted. The no-longer-available attachment db_read_stuck would still be useful for checking whether stack traces of genuine hangs match the original report.

Comment by Marko Mäkelä [ 2021-01-04 ]

darkain, your comment seems to suggest that the server is not actually hung, but the query execution is taking a long time. That would seem to hint that the problem is a bad query plan. Could you provide more information for verifying the bug? I am only a storage engine developer, and I do not know the upper layer very well, but I believe that we would need something like the following:

  1. EXPLAIN SELECT of the query on both the ‘good’ and ‘bad’ system
  2. SHOW CREATE TABLE of the table(s) involved
  3. In 10.4 or later: MDEV-6111 optimizer trace output
  4. Possibly, some data for repeating the observed EXPLAIN SELECT

You can obfuscate the table, column and index names and the data, but ideally we would want valid SQL statements that can be executed to reproduce the observed problem.

Comment by Marko Mäkelä [ 2021-01-04 ]

While the bug database is not supposed to be a forum to obtain free support, I can inform you that the InnoDB system tablespace (which I hope to eliminate some day, see MDEV-11633) by default contains both undo log records and a change buffer for secondary index leaf pages. TRUNCATE TABLE was rewritten in MDEV-13564.

The problem with mysql.gtid_slave_pos is related to replication and should be filed as a separate issue.

I cannot imagine which change in InnoDB could possibly explain the huge difference in query execution time. My only guess would be that there is a huge purge backlog (a large History list length in SHOW ENGINE INNODB STATUS) and you are at the same time using a very small buffer pool. I think that a bad query execution plan (for whatever reason) is a more likely explanation for the observed slowness. But, I am really guessing because I do not have enough data.

Generated at Thu Feb 08 09:24:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.