[MDEV-12989] Crash with "an invalid pointer" error Created: 2017-06-04  Updated: 2018-01-27  Resolved: 2018-01-27

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.1.23
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Pavel Kaidalov Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

CentOS 6.9, kernel-2.6.32-696.1.1.el6.x86_64, Amazon AWS EC2 r3.l instance, Galera 3-nodes cluster.


Issue Links:
Relates
relates to MDEV-13340 mysqld (or it's thread) crashed durin... Closed

 Description   

I just have got one of Galera nodes crashed, probably on "drop database" statement. Also this crash caused to Galera SST recovery, not IST, here is part of recovery log:

2017-06-03 11:11:01 139999898085120 [Note] WSREP: Service thread queue flushed.
2017-06-03 11:11:01 139999827401472 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-00
00-000000000000) does not match group state UUID (feaee2fa-f621-11e5-9316-3259ad165a1b): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.

Here is error report:

2017-06-03 09:00:37 7faa107f7700 InnoDB: FTS Optimize Removing table dev_c/question_detail
2017-06-03 09:01:57 7faa107f7700 InnoDB: FTS Optimize Removing table dev_c/question_detail_answer
2017-06-03 09:03:53 7faa107f7700 InnoDB: FTS Optimize Removing table dev_c/standard
170603  9:03:54 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.1.23-MariaDB
key_buffer_size=134217728
read_buffer_size=1048576
max_used_connections=38
max_threads=2049
thread_count=26
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6487237 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7faa2288a008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7facafff7058 thread_stack 0x48400
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7facb0b73c8b]
/usr/sbin/mysqld(handle_fatal_signal+0x4d5)[0x7facb06cebf5]
/lib64/libpthread.so.0(+0x3d4f40f7e0)[0x7facafcce7e0]
/usr/sbin/mysqld(_Z18plugin_unlock_listP3THDPP13st_plugin_intj+0x6d)[0x7facb056b01d]
/usr/sbin/mysqld(_Z14lex_end_stage1P3LEX+0x5e)[0x7facb0549fee]
/usr/sbin/mysqld(_Z7lex_endP3LEX+0x11)[0x7facb054a151]
/usr/sbin/mysqld(+0x6cbae4)[0x7facb07caae4]
/usr/sbin/mysqld(_ZN20sp_instr_jump_if_notD0Ev+0x38)[0x7facb07d2228]
/usr/sbin/mysqld(_ZN7sp_headD1Ev+0x166)[0x7facb07cbad6]
/usr/sbin/mysqld(_ZN7sp_headD0Ev+0x11)[0x7facb07cca61]
/usr/sbin/mysqld(_ZN19Table_triggers_listD1Ev+0x47)[0x7facb05e6627]
/usr/sbin/mysqld(_Z18intern_close_tableP5TABLE+0x22)[0x7facb0514132]
/usr/sbin/mysqld(_Z16tdc_remove_tableP3THD26enum_tdc_remove_table_typePKcS3_b+0x1dd)[0x7facb0680fdd]
/usr/sbin/mysqld(_Z23mysql_rm_table_no_locksP3THDP10TABLE_LISTbbbbb+0xb85)[0x7facb05d8dd5]
/usr/sbin/mysqld(_Z14mysql_rm_tableP3THDP10TABLE_LISTcc+0x154)[0x7facb05d96a4]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x17a8)[0x7facb0559d48]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x324)[0x7facb05613c4]
/usr/sbin/mysqld(+0x4624e4)[0x7facb05614e4]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1c6e)[0x7facb056380e]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x151)[0x7facb05643c1]
/usr/sbin/mysqld(_Z26threadpool_process_requestP3THD+0xc7)[0x7facb065cef7]
/usr/sbin/mysqld(+0x5a9435)[0x7facb06a8435]
/lib64/libpthread.so.0(+0x3d4f407aa1)[0x7facafcc6aa1]
/lib64/libc.so.6(clone+0x6d)[0x7facae1aabcd]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fa8d7576020): is an invalid pointer
Connection ID (thread ID): 23089077
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
 
We think the query pointer is invalid, but we will try to print it anyway.
Query: DROP TABLE IF EXISTS `student`
 
170603 09:03:55 mysqld_safe Number of processes running now: 0
170603 09:03:55 mysqld_safe WSREP: not restarting wsrep node automatically
170603 09:03:55 mysqld_safe mysqld from pid file /var/lib/mysql/dev.pid ended}}

And here is configuration:

# mysqld --print-defaults
mysqld would have been started with the following arguments:
--sql_mode=NO_ENGINE_SUBSTITUTION,TRADITIONAL --handlersocket_address=0.0.0.0 --handlersocket_port=9998 --handlersocket_port_wr=9999 --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --user=mysql --character-set-server=utf8 --performance_schema=0 --optimizer_search_depth=0 --general_log=0 --general_log_file=/var/log/mysql/ram/mysql.log --slow_query_log=1 --slow_query_log_file=/var/log/mysql/slow-queries.log --long_query_time=1 --log-error=/var/log/mysql/mysqld.log --innodb_io_capacity=1500 --innodb_io_capacity_max=3000 --query_cache_limit=1M --query_cache_size=0 --query_cache_type=0 --max_allowed_packet=512M --max_sort_length=250 --transaction_isolation=READ-COMMITTED --innodb_buffer_pool_size=4G --innodb_buffer_pool_instances=4 --innodb_log_file_size=256M --innodb_log_buffer_size=8M --innodb_flush_method=O_DIRECT_NO_FSYNC --innodb_open_files=2K --innodb_file_format=Barracuda --innodb-file-per-table=1 --innodb_thread_concurrency=0 --join_buffer_size=1M --read_buffer_size=1M --sort_buffer_size=2M --read_rnd_buffer_size=1M --thread_cache_size=128 --tmp_table_size=2G --max_heap_table_size=2G --max_user_connections=2990 --max_connections=3000 --log_bin_trust_function_creators=1 --group_concat_max_len=10240 --innodb_support_xa=0 --innodb_autoinc_lock_mode=2 --innodb_flush_log_at_trx_commit=2 --default_storage_engine=InnoDB --connect_timeout=600000 --max_connect_errors=10000 --net_read_timeout=600000 --connect_timeout=600000 --net_write_timeout=600000 --binlog_format=ROW --server_id=222 --sync-binlog=0 --expire-logs_days=3 --max-binlog-size=1G --log-slave-updates=1 --slave-skip-errors=1062 --slave_type_conversions=ALL_NON_LOSSY --skip-slave-start --log-bin=mysql-bin --relay-log=mysql-relay-bin --extra_port=3307 --thread_handling=pool-of-threads --thread_pool_max_threads=2k --innodb_buffer_pool_dump_at_shutdown=1 --innodb_buffer_pool_load_at_startup=1 --wsrep_on=ON --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep_cluster_address=gcomm://10.10.2.249,10.10.2.22,10.10.2.213 --wsrep_node_address=10.10.2.22 --wsrep_cluster_name=dev --wsrep_node_name=dev_03 --wsrep_sst_method=xtrabackup-v2 --wsrep_sst_auth=user:********* --wsrep_retry_autocommit=5 --wsrep_provider_options=gcache.size=5G; repl.commit_order=1; gmcast.segment=2 --wsrep_replicate_myisam=0 --wsrep_slave_threads=4 --wsrep_max_ws_size=2G



 Comments   
Comment by Elena Stepanova [ 2017-06-09 ]

Is the crash repeatable, or was it a one-time thing?
Was DROP TABLE executed on the same node which crashed, or did it come via Galera replication?

Comment by Pavel Kaidalov [ 2017-06-13 ]

No, this crash was one-time thing, I have weekly databases re-import and only once I faced this crash.
I can't tell if DROP TABLE was executed really on the same node, unfortunately, I restored this node using Galera SST already. However I can tell that this DROP was not executed on other nodes, at least other node's log show that previous table was dropped and then lost connection to crashed node:

2017-06-03 09:03:53 7f3ec0bf7700 InnoDB: FTS Optimize Removing table dev_c/standard
2017-06-03  9:03:55 139913269864192 [Note] WSREP: (d5790a1f, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.10.2.213:4567
2017-06-03  9:03:56 139913269864192 [Note] WSREP: (d5790a1f, 'tcp://0.0.0.0:4567') reconnecting to b16b26c2 (tcp://10.10.2.213:4567), attempt 0

Comment by Elena Stepanova [ 2017-07-15 ]

Could you please provide the output of

SHOW CREATE TABLE `student`;
SHOW TRIGGERS LIKE 'student';

I don't really expect that we'll be able to reproduce the problem yet, more likely we'll have to wait till somebody else complains about a similar issue and we can compare the circumstances; but better to have all the information at hand by the time when it happens.

Comment by Elena Stepanova [ 2017-07-18 ]

anikitin,

Please look at it from the perspective of the cluster, taking into account other known problems, cluster locking limitations and such. Previously there wasn't good enough reason to suspect it is Galera-specific, but given recently filed MDEV-13340, I am inclined to reconsider. Both have something to do with locking and dropping triggers, and if MDEV-13340 also locked the entire cluster, it is likely to be related to Galera after all.

Comment by Andrii Nikitin (Inactive) [ 2017-07-19 ]

kpvadm , could you share:
1. exact commands executed during mentioned 'weekly database re-import'
2. my.cnf file from one of the nodes
3. list of all files in data directory for two or all three nodes, e.g.:
ls -lRa /var/lib/mysql

Generated at Thu Feb 08 08:02:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.