[MDEV-7256] SEGV in JOIN::exec() during normal shutdown Created: 2014-12-03  Updated: 2015-01-28  Due: 2014-12-17  Resolved: 2014-12-04

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: None
Fix Version/s: 5.5.36

Type: Bug Priority: Major
Reporter: Daniel Black Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: None
Environment:

centos 5
uname -a
Linux xxx 2.6.18-308.24.1.el5 #1 SMP Tue Dec 4 17:43:34 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
mysql Ver 15.1 Distrib 5.5.29-MariaDB, for Linux (x86_64) using readline 5.1


Issue Links:
Relates
relates to MDEV-714 LP:1020645 - crash (sig 11) with unio... Closed

 Description   

141203 11:46:38 [Note] /usr/sbin/mysqld: Normal shutdown
 
141203 11:46:45 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.046421', position 104858337
141203 11:46:46 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 5.5.29-MariaDB-log
key_buffer_size=262144
read_buffer_size=8388608
max_used_connections=201
max_threads=202
thread_count=21206
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1761298 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x2aac9e836cc0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x2aaec74040b0 thread_stack 0x48000
(my_addr_resolve failure: fork)
/usr/sbin/mysqld(my_print_stacktrace+0x2e) [0xa975ee]
/usr/sbin/mysqld(handle_fatal_signal+0x44c) [0x6d4fcc]
/lib64/libpthread.so.0 [0x30ed80ebe0]
/usr/sbin/mysqld [0x5b6d88]
/usr/sbin/mysqld(JOIN::exec()+0x888) [0x5ce198]
/usr/sbin/mysqld(st_select_lex_unit::exec()+0x1d2) [0x601e92]
/usr/sbin/mysqld(mysql_union(THD*, LEX*, select_result*, st_select_lex_unit*, unsigned long)+0x2d) [0x60467d]
/usr/sbin/mysqld(handle_select(THD*, LEX*, select_result*, unsigned long)+0x8a) [0x5d12fa]
/usr/sbin/mysqld [0x57cc40]
/usr/sbin/mysqld(mysql_execute_command(THD*)+0x793) [0x57f4d3]
/usr/sbin/mysqld(mysql_parse(THD*, char*, unsigned int, Parser_state*)+0x1b1) [0x585f61]
/usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x153a) [0x5874aa]
/usr/sbin/mysqld(do_command(THD*)+0xd2) [0x587842]
/usr/sbin/mysqld(do_handle_one_connection(THD*)+0x143) [0x63ddb3]
/usr/sbin/mysqld(handle_one_connection+0x4c) [0x63defc]
/lib64/libpthread.so.0 [0x30ed80677d]
/lib64/libc.so.6(clone+0x6d) [0x30ed0d3c1d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x358bf338): SELECT ...
FROM locations loc
LEFT OUTER JOIN fc
ON fc.sLocType = loc.sType
AND fc.sLocCode = loc.sCode
AND fc.sCustomCode = 'TWC'
AND fc.sPeriod1FcastShort IS NOT NULL
WHERE loc.sType = 'DIST'
AND loc.sCode = 'S00'
....
UNION
...
Connection ID (thread ID): 82256764
Status: KILL_SERVER
  
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=off,mrr_sort_keys=on,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
141203 11:47:06 mysqld_safe Number of processes running now: 0
141203 11:47:06 mysqld_safe mysqld restarted  
141203 11:47:06 [Warning] The syntax '--log-slow-queries' is deprecated and will be removed in a future release. Please use '--slow-query-log'/'--slow-query-log-file' instead. 
141203 11:47:06 [Warning] An old style --language or -lc-message-dir value with language specific part detected: /usr/share/mysql/english/
141203 11:47:06 [Warning] Use --lc-messages-dir without language specific part instead.
141203 11:47:06 InnoDB: The InnoDB memory heap is disabled
141203 11:47:06 InnoDB: Mutexes and rw_locks use GCC atomic builtins
141203 11:47:06 InnoDB: Compressed tables use zlib 1.2.3
141203 11:47:06 InnoDB: Using Linux native AIO
141203 11:47:06 InnoDB: Error: Linux Native AIO is not supported on tmpdir.
InnoDB: You can either move tmpdir to a file system that supports native AIO
InnoDB: or you can set innodb_use_native_aio to FALSE to avoid this message.
141203 11:47:06 InnoDB: Error: Linux Native AIO check on tmpdir returned error[22]
141203 11:47:06 InnoDB: Warning: Linux Native AIO disabled.
141203 11:47:06 InnoDB: Initializing buffer pool, size = 6.0G
141203 11:47:07 InnoDB: Completed initialization of buffer pool
141203 11:47:07 InnoDB: highest supported file format is Barracuda.
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
141203 11:47:07  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 18251098, file name mysql-bin.046419
InnoDB: and relay log file
InnoDB: position 18251382, file name ./mysqld-relay-bin.062653
InnoDB: Last MySQL binlog file position 0 15525751, file name /database/tlog/mysql-bin.070307
141203 11:47:10  InnoDB: Waiting for the background threads to start
141203 11:47:11 Percona XtraDB (http://www.percona.com) 1.1.8-29.3 started; log sequence number 4858350400666
141203 11:47:11 [Note] Plugin 'FEEDBACK' is disabled.
141203 11:47:11 [Note] Recovering after a crash using /database/tlog/mysql-bin
141203 11:47:11 [Note] Starting crash recovery...
141203 11:47:11 [Note] Crash recovery finished.  

This occurred after a replication stopped on all slaves due to error

Last_SQL_Error: Error 'Incorrect key file for table 'l_sample'; 
 try to repair it' on query. Default database: ''. 
 Query: 'CREATE UNIQUE INDEX `idx_l_sample_1` ON l_sample (`source`,`timestamp_utc`,`latitude`,`longitude`)'

Most slaves responded after

mysql> stop slave;
mysql> set global sql_slave_skip_counter = 1;
mysql> start slave;

Except for the one server.

"it didn't respond to the STOP SLAVE command. That is, the MySQL prompt didn't reappear. This was replicated in a second SSH session, ie I could connect and run SHOW SLAVE STATUS but after running STOP SLAVE it didn't respond.

Also, the load average was ridiculously high. The index appears to have been created in some slaves (in particular twcddbl01, aptwcpdb011 and aptwcpdb021), but not on others (aptwcpdb051)."

After that attempted restart of this production server which generated the backtrace above

# /etc/init.d/mysql restart
Shutting down MySQL............The server quit without updating PID file (/var/lib/mysql/mysqld.pid).[FAILED]
Failed to stop running server, so refusing to try to start.[FAILED]

| l_sample | CREATE TABLE `l_sample` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `source` varchar(10) NOT NULL,
  `timestamp_utc` datetime NOT NULL,
  `latitude` decimal(10,7) NOT NULL,
  `longitude` decimal(10,7) NOT NULL,
  `amperage` decimal(5,1) DEFAULT NULL,
  `event_type` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx_l_sample_1` (`source`,`timestamp_utc`,`latitude`,`longitude`)
) ENGINE=InnoDB AUTO_INCREMENT=24011281 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC |

The l_sample wasn't part of the UNION - it was a new table of relatively small size, plenty of space (147G+) and no io errors.

This is probably a oneoff that I won't be able to repeat.



 Comments   
Comment by Elena Stepanova [ 2014-12-03 ]

Have you seen it happen on newer versions?
We had one that looked very much alike, MDEV-714 , fixed in 5.5.36

Upd: sorry, your text says that it's a oneoff so obviously you couldn't see it on newer versions.

Comment by Daniel Black [ 2014-12-03 ]

looks like MDEV-714 to me too.

Comment by Elena Stepanova [ 2014-12-04 ]

As discussed above, and without any realistic chance to reproduce it, optimistically closing as fixed in 5.5.36 along with MDEV-714.

Generated at Thu Feb 08 07:18:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.