[MDEV-9222] unacceptable slow SQL query Created: 2015-12-01  Updated: 2019-03-15  Resolved: 2019-03-15

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.1.9
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Moshe L Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None
Environment:

Linux debian, Windows 2008 R2



 Description   

SELECT 144 as f_id, M_ID, m_video,m_audio,m_album, m_topic, M_Linear, M_NUM, M_TITLE, M_Content, M_User, M_DATE FROM forum_messages WHERE m_visible=1 and m_forum =144 and m_id>8292603 and m_topic=759799 order by m_id asc

running fastly on 10.0.22
running fastly most of the time on 10.1.9.

1/10000 I see an unacceaptable slow (about 2-5 minutes or more). the slowest state is `sending data`.

table:

CREATE TABLE `forum_messages` (
  `m_forum` smallint(6) NOT NULL DEFAULT '0',
  `m_id` int(11) NOT NULL AUTO_INCREMENT,
  `m_topic` mediumint(9) NOT NULL DEFAULT '0',
  `m_linear` varchar(1200) CHARACTER SET utf8 NOT NULL,
  `m_title` varchar(60) CHARACTER SET utf8mb4 NOT NULL DEFAULT '',
  `m_Content` mediumtext CHARACTER SET utf8mb4 NOT NULL,
  `m_user` mediumint(9) NOT NULL DEFAULT '0',
  `m_num` smallint(6) NOT NULL DEFAULT '0',
  `m_date` datetime NOT NULL DEFAULT '1987-11-13 00:00:00',
  `m_ip` varchar(16) CHARACTER SET utf8 NOT NULL DEFAULT '',
  `m_visible` tinyint(4) NOT NULL DEFAULT '1',
  `m_video` tinyint(4) NOT NULL,
  `m_audio` tinyint(4) NOT NULL,
  `m_album` tinyint(4) NOT NULL,
  `m_anon_user` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`m_id`),
  KEY `topic num` (`m_topic`,`m_num`),
  KEY `forum-id` (`m_forum`,`m_id`),
  KEY `user` (`m_user`,`m_date`,`m_forum`) USING BTREE,
  KEY `Normal Queries` (`m_topic`,`m_linear`(200),`m_visible`),
  KEY `stat` (`m_date`,`m_forum`,`m_user`)
) ENGINE=InnoDB AUTO_INCREMENT=8256287 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;

explain:

1	SIMPLE	forum_messages	ref	PRIMARY,topic num,forum-id,Normal Queries	topic num	3	const	48	Using index condition; Using where; Using filesort

Optimizer switch:

index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on

what I need to upload to help solving this issue? I can see it on three servers with the same data.



 Comments   
Comment by Elena Stepanova [ 2015-12-02 ]

Moshe L,

Is this query plan from when the query is slow, or from when the query is fast? Could you please provide the other explain and specify which is which?
Please also attach your cnf file(s) and the output of

show index in forum_messages; 
show table status like 'forum_messages';

Comment by Moshe L [ 2015-12-02 ]

Hi Elena!

1. the query itself is slow from time to time. (1-5minutes for sometims).
other times the same query run fastly

show index:

forum_messages	0	PRIMARY	1	m_id	A	5555004				BTREE		
forum_messages	1	topic num	1	m_topic	A	2777502				BTREE		
forum_messages	1	topic num	2	m_num	A	5555004				BTREE		
forum_messages	1	forum-id	1	m_forum	A	18				BTREE		
forum_messages	1	forum-id	2	m_id	A	5555004				BTREE		
forum_messages	1	user	1	m_user	A	82910				BTREE		
forum_messages	1	user	2	m_date	A	5555004				BTREE		
forum_messages	1	user	3	m_forum	A	5555004				BTREE		
forum_messages	1	Normal Queries	1	m_topic	A	925834				BTREE		
forum_messages	1	Normal Queries	2	m_linear	A	5555004	200			BTREE		
forum_messages	1	Normal Queries	3	m_visible	A	5555004				BTREE		
forum_messages	1	stat	1	m_date	A	5555004				BTREE		
forum_messages	1	stat	2	m_forum	A	5555004				BTREE		
forum_messages	1	stat	3	m_user	A	5555004				BTREE		

forum_messages	InnoDB	10	Compact	5555004	593	3294625792	0	917520384	29360128	8256287	2015-12-02 09:49:07			utf8mb4_bin		row_format=COMPRESSED key_block_size=4	

the workaround (for now) is to add new index for (m_topic, m_id), but I think the old one need to work for this situation.
and is was works.

Comment by Elena Stepanova [ 2015-12-02 ]

Moshe L,
Thanks.

Yes, I understand it's the same query. However, to have such a big difference, either the query has to use a different execution plan, or tables must be locked. We can rule out the 2nd option because you would see lock waits in the processlist,
I assume you are not using persistent statistics, so it can actually happen that the optimizer produces different plans for the same query involving an InnoDB table, and yes, sometimes the plan can be unfortunate. To understand why it happens in this case, we need to see the bad plan.
Since you cannot predict when it happens, I recommend to enable slow query log and set it up to record query plans for slow queries. It can be done like this:

set global log_slow_verbosity='query_plan,explain';
set log_slow_verbosity='query_plan,explain';
set global slow_query_log=1;

Optionally, you can also configure long_query_time (in seconds) to define how long a query must be to be considered a slow one. Since your slow query takes really long, you can set it lets say to 60 seconds.
When you encounter the problem again, find the query in the slow log and paste a part of the log surrounding the query, or just attach the whole slow log.

Please also attach your cnf file(s) as requested before.

Comment by Elena Stepanova [ 2015-12-02 ]

enabled global_log and long_query_time as you requested. waiting for logs.

I hope you mean slow_query_log. Also make sure you've set log_slow_verbosity as above, otherwise there will be no query plan in the log!

Comment by Moshe L [ 2015-12-03 ]

use a7;
SET timestamp=1449150315;
SELECT 144 as f_id, M_ID, m_video,m_audio,m_album, m_topic, M_Linear, M_NUM, M_TITLE, M_Content, M_User,  M_DATE FROM forum_messages WHERE m_visible=1 and m_forum =144 and  m_id>8298167 and m_topic=760496 order by m_id asc;
# Time: 151203 16:54:18
# User@Host:
# Thread_id: 157491  Schema: a7  QC_hit: No
# Query_time: 120.916178  Lock_time: 0.000072  Rows_sent: 1  Rows_examined: 5893569
# Rows_affected: 0
#
# explain: id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra
# explain: 1	SIMPLE	forum_messages	index	PRIMARY,topic num,forum-id,Normal Queries	PRIMARY	4	const	3591450	5893569.00	0.00	0.00	Using where
#
SET timestamp=1449154458;
SELECT 144 as f_id, M_ID, m_video,m_audio,m_album, m_topic, M_Linear, M_NUM, M_TITLE, M_Content, M_User,  M_DATE FROM forum_messages WHERE m_visible=1 and m_forum =144 and  m_id>8298283 and m_topic=760535 order by m_id asc;
# Time: 151203 17:01:29
# User@Host: 
# Thread_id: 153499  Schema: a7  QC_hit: No
# Query_time: 139.815013  Lock_time: 0.000077  Rows_sent: 0  Rows_examined: 5893591
# Rows_affected: 0
#
# explain: id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra
# explain: 1	SIMPLE	forum_messages	index	PRIMARY,topic num,forum-id,Normal Queries	PRIMARY	4	const	3591472	5893591.00	0.00	0.00	Using where
#
SET timestamp=1449154889;
SELECT 144 as f_id, M_ID, m_video,m_audio,m_album, m_topic, M_Linear, M_NUM, M_TITLE, M_Content, M_User,  M_DATE FROM forum_messages WHERE m_visible=1 and m_forum =144 and  m_id>8298216 and m_topic=760279 order by m_id asc;
# Time: 151203 17:28:11
# User@Host:
# Thread_id: 153484  Schema: a7  QC_hit: No
# Query_time: 61.011938  Lock_time: 0.000060  Rows_sent: 0  Rows_examined: 5893643
# Rows_affected: 0
#
# explain: id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra
# explain: 1	SIMPLE	forum_messages	index	PRIMARY,topic num,forum-id,Normal Queries	PRIMARY	4	const	3591524	5893643.00	0.00	0.00	Using where
#
SET timestamp=1449156491;
SELECT 22 as f_id, M_ID, m_video,m_audio,m_album, m_topic, M_Linear, M_NUM, M_TITLE, M_Content, M_User,  M_DATE FROM forum_messages WHERE m_visible=1 and m_forum =22 and  m_id>8298263 and m_topic=759843 order by m_id asc;
 

Comment by Elena Stepanova [ 2016-01-07 ]

So, as I understand, this is the difference in query plans:

Fast plan

1	SIMPLE	forum_messages	ref	PRIMARY,topic num,forum-id,Normal Queries	topic num	3	const	48	Using index condition; Using where; Using filesort

Slow plan

1	SIMPLE	forum_messages	index	PRIMARY,topic num,forum-id,Normal Queries	PRIMARY	4	const	3591524	5893643.00	0.00	0.00	Using where

Have you ever used engine-independent statistics before? It would be interesting to see if it helps (or, on the contrary, if it makes the slow plan to become persistent).

If you want to try, you need to do the following:

set use_stat_tables = PREFERABLY;
set global use_stat_tables = PREFERABLY;
analyze table forum_messages;

After that, try to run EXPLAIN for your query and the query itself. If it returns a good plan, and the query is fast, wait to see if you are still getting those delays. The idea of using statistics is that the query should be always executed the same way, until the data changes and you run ANALYZE again. It can cause a good or bad plan, that's another story (and if it produces a bad plan, it needs to be fixed); but it should be deterministic, unlike the current default behavior of InnoDB, where it uses random data samples.

Please let us know about the result, or if you cannot do this experiment.

Comment by Moshe L [ 2016-01-08 ]

I tried to enable stat_tables from your comments and do analyze table. it seems works right now. I will wait some days to see if the problem repeats.

I also checked the tables mysql.table_stats, mysql.column_stats and mysql.index_stats, and these tables are empty. the innodb_* version of them is updated to ANALYZE TABLE time, but for other tables I can see a dates from the past (2-5 months ago).
is this OK ?

Thanks

Comment by Elena Stepanova [ 2016-01-08 ]

No, it's not normal.
When you run ANALYZE TABLE after you enabled statistics, you should get an extra line in the output, saying that the data is collected, like this:

MariaDB [test]> analyze table forum_messages;
+---------------------+---------+----------+-----------------------------------------+
| Table               | Op      | Msg_type | Msg_text                                |
+---------------------+---------+----------+-----------------------------------------+
| test.forum_messages | analyze | status   | Engine-independent statistics collected |
| test.forum_messages | analyze | status   | OK                                      |
+---------------------+---------+----------+-----------------------------------------+
2 rows in set (0.64 sec)

And after that you should have data related to your table in all of mysql.index_stats, mysql.table_stats and mysql.column_stats.
Are you getting any errors in the error log related to the statistics?

Although ironically, it might be that the empty statistical tables by pure luck help you to trigger the good execution plan; but it's of course not a solution, as it's not reliable.

Comment by Moshe L [ 2016-01-08 ]

Hi !

I am getting "Error 2013: Lost connection to MySQL server during query SQL Statement" when I am trying to use analyze table

recived crash:

Jan  8 13:09:09 db1 mysqld: stack_bottom = 0x7fa5306ecdf8 thread_stack 0x80000
Jan  8 13:09:09 db1 mysqld: (my_addr_resolve failure: fork)
Jan  8 13:09:10 db1 mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e) [0x7fa839b9fb6e]
Jan  8 13:09:10 db1 mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x34d) [0x7fa8396e4d6d]
Jan  8 13:09:10 db1 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0) [0x7fa838d198d0]
Jan  8 13:09:10 db1 mysqld: /usr/sbin/mysqld(update_statistics_for_table(THD*, TABLE*)+0x5fe) [0x7fa8395d5b9e]
Jan  8 13:09:10 db1 mysqld: /usr/sbin/mysqld(+0x4f24aa) [0x7fa83963c4aa]
Jan  8 13:09:10 db1 mysqld: /usr/sbin/mysqld(Sql_cmd_analyze_table::execute(THD*)+0xbc) [0x7fa83963deec]
Jan  8 13:09:10 db1 mysqld: /usr/sbin/mysqld(mysql_execute_command(THD*)+0x14dc) [0x7fa83955ce0c]
Jan  8 13:09:10 db1 mysqld: /usr/sbin/mysqld(mysql_parse(THD*, char*, unsigned int, Parser_state*)+0x20e) [0x7fa839564dbe]
Jan  8 13:09:10 db1 mysqld: /usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x1f4a) [0x7fa839567b3a]
Jan  8 13:09:10 db1 mysqld: /usr/sbin/mysqld(do_command(THD*)+0x16e) [0x7fa83956838e]
Jan  8 13:09:10 db1 mysqld: /usr/sbin/mysqld(do_handle_one_connection(THD*)+0x182) [0x7fa839630b72]
Jan  8 13:09:10 db1 mysqld: /usr/sbin/mysqld(handle_one_connection+0x40) [0x7fa839630d30]
Jan  8 13:09:10 db1 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4) [0x7fa838d120a4]
Jan  8 13:09:10 db1 mysqld: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fa836ebe04d]
Jan  8 13:09:10 db1 mysqld: 
Jan  8 13:09:10 db1 mysqld: Trying to get some variables.
Jan  8 13:09:10 db1 mysqld: Some pointers may be invalid and cause the dump to abort.
Jan  8 13:09:10 db1 mysqld: Query (0x7fa66c313020): is an invalid pointer
Jan  8 13:09:10 db1 mysqld: Connection ID (thread ID): 1979669
Jan  8 13:09:10 db1 mysqld: Status: NOT_KILLED
Jan  8 13:09:10 db1 mysqld: 
Jan  8 13:09:10 db1 mysqld: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
Jan  8 13:09:10 db1 mysqld: 

Comment by Elena Stepanova [ 2016-01-08 ]

Are there any errors in the log before the crash?

Could you please run
check table mysql.index_stats, mysql.column_stats, mysql.table_stats
to see if there is anything wrong with the tables?

Comment by Moshe L [ 2016-01-08 ]

mysql.index_stats check status OK
mysql.column_stats check status OK
mysql.table_stats check warning Size of indexfile is: 4096 Should be: 2048
mysql.table_stats check warning Size of datafile is: 28 Should be: 0
mysql.table_stats check error Record-count is not ok; is 1 Should be: 0
mysql.table_stats check warning Found 1 key parts. Should be: 0
mysql.table_stats check error Corrupt

fixed (repair table).

now trying again.

Comment by Moshe L [ 2016-01-08 ]

I am trying yo analyze small table and not the large (~5G) one.

crashed again
{{
set use_stat_tables = PREFERABLY;
set global use_stat_tables = PREFERABLY;

analyze table forum_rec; // small table.}}

Comment by Elena Stepanova [ 2016-01-08 ]

Would you be able to provide the data dump from this small table?
Also, when you run repair on the statistical tables, please instead try to run
repair table mysql.index_stats, mysql.column_stats, mysql.table_stats EXTENDED
– in some cases, regular repair table might fail to do the work.

Comment by Moshe L [ 2016-01-08 ]

mysql.index_stats repair status OK
mysql.column_stats repair status OK
mysql.table_stats repair status OK

still crashing (upgraded to 10.1.10).

Comment by Elena Stepanova [ 2016-01-08 ]

So, would you be able to provide the datadump of the table?
You can upload it to ftp.askmonty.org/private, this way only MariaDB developers will have access to it.

Comment by Moshe L [ 2016-01-08 ]

the crashed table from the example ?

Dump20160108.sql
but I think that the crash is on every table.

Comment by Elena Stepanova [ 2016-01-08 ]

Indeed, I am not getting the crash on your data, so maybe something is wrong with the statistical tables themselves – something that CHECK/REPAIR does not recognize. It's not unheard of.
So, we are now facing two separate problems: the initial one, slow query, and the new one, crash on updating statistics.
To try to solve the first one, as suggested earlier, you need working statistical tables.
Could you please store your current mysql/index_stats.* , mysql/column_stats.* , mysql/table_stats.* files (9 files total, .frm, .MYI and .MYD for each table), and upload them to the same FTP server?
After that, you can try to truncate or even re-create the tables from scratch, to see if it helps to get rid of the crash. Please let us know about the results.

Comment by Moshe L [ 2016-01-09 ]

uploaded.

Comment by Moshe L [ 2016-01-09 ]

do you need something more from me? uploaded myi/myd files and also the table.

Comment by Elena Stepanova [ 2016-01-12 ]

I've only found 5 files – 3 files that belong to table_stats, and 2 files that belong to index_stats (frm and MYI). There is no index_stats.MYD, and no column_stats.* files. Did you upload them? Maybe something went wrong.

With the provided table_stats files only, I am still not getting the crash. There is still hope that one of the two other tables causes it... If you still have the other 4 files (those with which your instance crashes), please upload them as well.

Did you try to truncate or re-create the tables? Did it help?

Comment by Moshe L [ 2016-01-12 ]

I suspect that these files is not the problem.

I tried on a new server dump > restore and the second server is crashing also.

Comment by Elena Stepanova [ 2016-01-13 ]

Sorry, I don't quite understand.
Are you saying that ANALYZE TABLE <table_name> crashes for you on any server with any <table_name> as long as use_stat_tables is set to PREFERABLY?

Comment by Moshe L [ 2016-01-13 ]

yes

Comment by Elena Stepanova [ 2016-01-13 ]

Can you try a really simple new table, like

SET use_stat_tables=PREFERABLY;
CREATE TABLE t1 (i int); 
INSERT INTO t1 VALUES (1),(2);
ANALYZE TABLE t1;

Something is obviously different about your servers, we need to find out what it can be.

Comment by Moshe L [ 2016-01-13 ]

crashed

Comment by Elena Stepanova [ 2016-01-13 ]

Okay, thanks... Lets get back for a moment.

First, your environment field says "Linux debian, Windows 2008 R2". Which are we talking about now – Linux, or Windows, or both? And if it's Linux, which exactly Debian is it? And on which platform?
Secondly, which packages are you using? Is it MariaDB packages, or do you build from source?
And thirdly, earlier you quoted your cnf files, are they still relevant, are they from the same instances where crashes occur?

Oh, and please paste or attach the error log from this last crash.

Comment by Moshe L [ 2016-01-13 ]

the two servers in this test is Debian Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64

MariaDB 10.1.9 / 10.1.10 from apt-get. the same servers used same settings.

Tried on the three one, Windows Server 2008 R2, same problem.

Comment by Moshe L [ 2016-01-13 ]

2008 - msi from mariadb.org, 10.1.9

Comment by Elena Stepanova [ 2016-01-22 ]

Frankly, I am at a loss at the moment.
So, you started a new server, without any old data, ran the basic test above, and still got the crash?
Obviously, it does not happen to others, otherwise we would be drowning in reports, and our tests don't reveal it either. Something is different about your servers or configuration.
Do you think you could grant as a temporary access to one of the servers, which does not have any confidential data?

Comment by Elena Stepanova [ 2016-02-19 ]

Moshe L, are you still experiencing the issue? If so, could you please consider the request above?

Comment by Moshe L [ 2016-02-19 ]

Hi !

I can get you an access to duplicate of the server, via TeamViewer or similar.

And on production, for now, I am using workaround by adding other key that solves the problem with the specific query so the code works.

Thanks

Comment by Elena Stepanova [ 2016-02-19 ]

If it's a linux server, it might be easier just to grant us a temporary SSH access. If you want to do so, my public ssh key is here: https://launchpad.net/~elenst/+sshkeys

Comment by Ian Gilfillan [ 2019-03-15 ]

Can this be closed as incomplete/cannot reproduce?

Comment by Elena Stepanova [ 2019-03-15 ]

Closing for now, as investigation apparently stalled at some point.

Generated at Thu Feb 08 07:33:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.