[MDEV-9486] Crash in metadata lock plugin Created: 2016-01-28  Updated: 2017-04-19  Resolved: 2017-04-19

Status: Closed
Project: MariaDB Server
Component/s: Locking, Plugins
Affects Version/s: 10.0.22
Fix Version/s: 10.0.26, 10.1.15

Type: Bug Priority: Major
Reporter: Guillaume Lefranc Assignee: Unassigned
Resolution: Fixed Votes: 1
Labels: None
Environment:

RHEL 7


Issue Links:
Relates
relates to MDEV-9728 Hard crash in metadata_lock_info Closed

 Description   

While doing this query:

MariaDB [information_schema]> select * from METADATA_LOCK_INFO where table_name='session';
 
ERROR 2013 (HY000): Lost connection to MySQL server during query

Server crashed and got the following stack trace:

160128 16:02:34 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.0.22-MariaDB-enterprise-wsrep-log
key_buffer_size=67108864
read_buffer_size=131072
max_used_connections=127
max_threads=501
thread_count=162
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1236419 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x7f36738ae008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f36043c6cb0 thread_stack 0x48000
160128 16:02:36 [Warning] WSREP: last inactive check more than PT1.5S ago (PT2.365S), skipping check
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0xbf632e]
/usr/sbin/mysqld(handle_fatal_signal+0x390)[0x758be0]
/lib64/libpthread.so.0(+0xf100)[0x7f599c853100]
/usr/sbin/mysqld[0xc1d389]
/usr/sbin/mysqld(_Z23well_formed_copy_ncharsPK15charset_info_stPcjS1_PKcjjPS4_S5_S5_+0x1a2)[0x6598d2]
/usr/sbin/mysqld(_ZN15Field_varstring5storeEPKcjPK15charset_info_st+0x67)[0x7495f7]
/usr/lib64/mysql/plugin/metadata_lock_info.so(_Z31i_s_metadata_lock_info_fill_rowP10MDL_ticketPv+0x130)[0x7f51971fd020]
/usr/sbin/mysqld[0x6b659e]
/usr/sbin/mysqld(_Z11mdl_iteratePFiP10MDL_ticketPvES1_+0x29)[0x6b68c9]
/usr/lib64/mysql/plugin/metadata_lock_info.so(_Z33i_s_metadata_lock_info_fill_tableP3THDP10TABLE_LISTP4Item+0x27)[0x7f51971fd137]
/usr/sbin/mysqld(_Z24get_schema_tables_resultP4JOIN23enum_schema_table_state+0x26e)[0x64cd0e]
/usr/sbin/mysqld(_ZN4JOIN10exec_innerEv+0x705)[0x633a75]
/usr/sbin/mysqld(_ZN4JOIN4execEv+0x11)[0x635bf1]
/usr/sbin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xc7)[0x632567]
/usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x27d)[0x6330ad]
/usr/sbin/mysqld[0x5d118e]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x566f)[0x5e063f]
/usr/sbin/mysqld[0x5e4af1]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1c8f)[0x5e6cdf]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x242)[0x5e7a32]
/usr/sbin/mysqld(_Z26threadpool_process_requestP3THD+0xc7)[0x6e9e37]
/usr/sbin/mysqld[0x72bc8d]
/lib64/libpthread.so.0(+0x7dc5)[0x7f599c84bdc5]
/lib64/libc.so.6(clone+0x6d)[0x7f599b0c721d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f35a5841020): is an invalid pointer
Connection ID (thread ID): 87123
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=on,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on



 Comments   
Comment by Elena Stepanova [ 2016-01-28 ]

tanj,

Does the `session` table exist? If it does, please provide the output of SHOW CREATE TABLE and SHOW INDEX IN.
Was the table in use at the moment of the crash? Do you have any kind of monitoring on the machine where it happened? If you have a processlist output, status output, innodb status, general logging, or any other kind of periodic monitors around the moment of crash, please provide the info.
Please attach your cnf file(s).
Please specify which kind of binaries you are using – packages, binary tarball, source build.

If you have the coredump, please run

gdb --batch --eval-command="thread apply all bt" <mysqld binary> <coredump>
gdb --batch --eval-command="thread apply all bt full" <mysqld binary> <coredump>

and attach the output.

Comment by Elena Stepanova [ 2016-01-28 ]

nirbhay_c, do you think it can be related to Galera, could you please try it on a cluster? AFAIR table locking does not work very well with Galera, maybe metadata_lock_info does not either?

Comment by Nirbhay Choubey (Inactive) [ 2016-01-28 ]

I tried the following query on a locked innodb table on 10.0-galera (latest), and it worked ok.

MariaDB [test]> select * from information_schema.METADATA_LOCK_INFO where table_name='t1';
+-----------+-----------------+---------------+---------------------+--------------+------------+
| THREAD_ID | LOCK_MODE       | LOCK_DURATION | LOCK_TYPE           | TABLE_SCHEMA | TABLE_NAME |
+-----------+-----------------+---------------+---------------------+--------------+------------+
|         8 | MDL_SHARED_READ | MDL_EXPLICIT  | Table metadata lock | test         | t1         |
+-----------+-----------------+---------------+---------------------+--------------+------------+
1 row in set (0.00 sec)
 

Comment by Guillaume Lefranc [ 2016-02-09 ]

CREATE TABLE `session` (
  `id` bigint(20) unsigned DEFAULT NULL,
  `session_id` varchar(48) NOT NULL DEFAULT '',
  `guid` varchar(36) DEFAULT NULL,
  `browser_id` varchar(32) DEFAULT NULL,
  `brand_id` varchar(16) DEFAULT NULL,
  `channel_traffic_source` varchar(32) DEFAULT NULL,
  `channel_custom_data` text CHARACTER SET utf8,
  `referer` varchar(2000) DEFAULT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `ip` varchar(150) DEFAULT NULL,
  `country` varchar(3) DEFAULT NULL,
  `continent` varchar(3) DEFAULT NULL,
  `host` varchar(45) DEFAULT NULL,
  `proxy_ip` varchar(16) DEFAULT NULL,
  PRIMARY KEY (`created_at`,`session_id`),
  KEY `session_session_id` (`session_id`),
  KEY `session_id` (`id`),
  KEY `session_guid` (`guid`),
  KEY `session_brand_id` (`brand_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE ( UNIX_TIMESTAMP(`created_at`))
(PARTITION p2015_11 VALUES LESS THAN (1448928000) ENGINE = InnoDB,
 PARTITION p2015_12 VALUES LESS THAN (1451606400) ENGINE = InnoDB,
 PARTITION p2016_01 VALUES LESS THAN (1454284800) ENGINE = InnoDB,
 PARTITION p2016_02 VALUES LESS THAN (1456790400) ENGINE = InnoDB,
 PARTITION p2016_03 VALUES LESS THAN (1459468800) ENGINE = InnoDB,
 PARTITION p2016_04 VALUES LESS THAN (1462060800) ENGINE = InnoDB) /*
 
+---------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name           | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| session |          0 | PRIMARY            |            1 | created_at  | A         |    14589489 |     NULL | NULL   |      | BTREE      |         |               |
| session |          0 | PRIMARY            |            2 | session_id  | A         |   175073875 |     NULL | NULL   |      | BTREE      |         |               |
| session |          1 | session_session_id |            1 | session_id  | A         |   175073875 |     NULL | NULL   |      | BTREE      |         |               |
| session |          1 | session_id         |            1 | id          | A         |   175073875 |     NULL | NULL   | YES  | BTREE      |         |               |
| session |          1 | session_guid       |            1 | guid        | A         |   175073875 |     NULL | NULL   | YES  | BTREE      |         |               |
| session |          1 | session_brand_id   |            1 | brand_id    | A         |        3679 |     NULL | NULL   | YES  | BTREE      |         |               |
+---------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
 

Table was in use (read and written to concurrently) at the moment of the crash.

Exact MariaDB package used is MariaDB-Galera-server-10.0.22-1.el7.centos.x86_64

Let me know if you need any additional information.

Comment by Guillaume Lefranc [ 2016-02-09 ]

Configuration
 
[mariadb]
plugin-maturity=gamma
 
feedback-user-info='enterprise'
 
## -- Remove the following line to enable feedback reporting to mariadb.org
feedback-url=''
 
[mysqld]
# -- Disable for applications using old auth protocol
enable-secure-auth
# -- Auditing - pre-load Plugin
plugin-load=server_audit
 
###########################################
[mysqld]
log-error=/var/log/mysqld.log
# General
# bind_address = 127.0.0.1
skip-name-resolve
max-allowed-packet                       = 64M
open-files-limit                         = 65535
max-connect-errors                       = 100000
max-connections                          = 4000
thread-cache-size                        = 200
back-log                                 = 128
#net_buffer_length                       = 16K
#connect_timeout                         = 10
#net_read_timeout                        = 30
#net_write_timeout                       = 60
#interactive_timeout                     = 28800
interactive_timeout                     = 600
#wait_timeout                            = 28800
wait_timeout                            = 600
#net_retry_count                         = 10
character-set-server                     = 'utf8'
collation-server                         = 'utf8_general_ci'
# -- Do not change these defaults, unless you really know.
#read_rnd_buffer_size                    = 256K
#sort_buffer_size                        = 2M
#read_buffer_size                        = 128K
#max_tmp_tables                          = 32
tmp-table-size                           = 256M
max-heap-table-size                      = 256M
tmpdir                                   = /dev/shm
# -- This will enable a stricter SQL mode to avoid bad data
sql-mode                                 = 'ALLOW_INVALID_DATES,STRICT_TRANS_TABLES'
 
optimizer_search_depth                   = 0
optimizer_switch                         = 'index_merge=on'
optimizer_switch                         = 'index_merge_union=on'
optimizer_switch                         = 'index_merge_sort_union=on'
optimizer_switch                         = 'index_merge_intersection=on'
optimizer_switch                         = 'index_merge_sort_intersection=off'
optimizer_switch                         = 'index_condition_pushdown=on'
optimizer_switch                         = 'derived_merge=on'
optimizer_switch                         = 'derived_with_keys=on'
optimizer_switch                         = 'firstmatch=on'
optimizer_switch                         = 'loosescan=on'
optimizer_switch                         = 'materialization=on'
optimizer_switch                         = 'in_to_exists=on'
optimizer_switch                         = 'semijoin=on'
optimizer_switch                         = 'partial_match_rowid_merge=on'
optimizer_switch                         = 'partial_match_table_scan=on'
optimizer_switch                         = 'subquery_cache=on,mrr=on'
optimizer_switch                         = 'mrr_cost_based=on'
optimizer_switch                         = 'mrr_sort_keys=off'
optimizer_switch                         = 'outer_join_with_cache=on'
optimizer_switch                         = 'semijoin_with_cache=on'
optimizer_switch                         = 'join_cache_incremental=on'
optimizer_switch                         = 'join_cache_hashed=on'
optimizer_switch                         = 'join_cache_bka=on'
optimizer_switch                         = 'optimize_join_buffer_size=on'
optimizer_switch                         = 'table_elimination=on'
optimizer_switch                         = 'extended_keys=on'
 
join-buffer-space-limit                  = 4M
join-cache-level                         = 6
join-buffer-size                         = 4M
 
# Security
symbolic-links                           = 0
local-infile                             = 0
 
# Logging
log-warnings                             = 2
slow-query-log                           = 1
long-query-time                          = 5
log-slow-verbosity                       = 'query_plan,innodb'
#log_slow_rate_limit                     = 1000
 
# Replication / Binary logs
server-id                                 = 101
log_bin_trust_function_creators         = 1
#slave_net_timeout                         = 60
#max_prepared_stmt_count                   = 16382
binlog-annotate-row-events               = ON
log-bin=mariadb-bin
binlog-format                            = ROW
expire-logs-days                         = 7
max-binlog-size                          = 1024M
sync-binlog                              = 1
binlog-stmt-cache-size                   = 128K
binlog-cache-size                        = 256K
#slave_compressed_protocol                = ON
slave-transaction-retries                = 10
#relay_log_recovery                       = ON
#sync_master_info                         = 1
#sync_relay_log                           = 0
#sync_relay_log_info                      = 1
log_slave_updates=1
 
# InnoDB
default-storage-engine                   = 'InnoDB'
innodb-stats-on-metadata                 = 0
innodb-stats-sample-pages                = 32
table-definition-cache                   = 2048
table-open-cache                         = 2048
transaction-isolation                    = READ-COMMITTED
innodb-support-xa                        = ON
 
# Query Cache
query-cache-size                         = 0
query-cache-type                         = 0
innodb-buffer-pool-instances             = 8
#innodb-buffer-pool-size                  = 2G
innodb-max-dirty-pages-pct               = 50
innodb-file-per-table                    = 1
innodb-file-format                       = Barracuda
innodb-flush-log-at-trx-commit           = 1
#innodb_flush_method                      = O_DIRECT
innodb-log-buffer-size                   = 64M
innodb-log-files-in-group                = 2
#innodb-log-file-size                     = 1024M
#innodb_purge_threads                     = 1
#innodb_io_capacity                      = 200
#innodb_flush_neighbors                   = 0
#innodb-read-io-threads                   = 8
#innodb-write-io-threads                  = 8
#innodb-thread-concurrency                = 64
 
#innodb-open-files                       = 2048
 
innodb_io_capacity                      = 50000
innodb_lru_scan_depth                   = 4000
innodb-read-io-threads                  = 64
innodb-write-io-threads                 = 64
innodb-buffer-pool-size                 = 100G
innodb_log_file_size                    = 4G
innodb_flush_neighbors                  = 0
innodb-open-files                       = 4000
innodb_checksum_algorithm               = CRC32
thread_handling                         = pool-of-threads
 
# MyISAM
key-buffer-size                          = 64M
flush                                    = OFF
myisam-recover-options                   = BACKUP,FORCE
myisam-sort-buffer-size                  = 64M
 
# Other
# -- Enable INFORMATION_SCHEMA.%_STATISTICS tables
userstat                                 = ON
archive                                  = OFF
blackhole                                = OFF
#federated                               = OFF
#innodb                                  = FORCE
 
[mysql]
default-character-set                    = 'utf8'
auto-rehash                              = FALSE
local-infile                             = 1
max-allowed-packet                       = 64M
secure-auth                              = TRUE
 
[mysqldump]
max-allowed-packet                       = 1G
default-character-set                    = 'utf8'
 
[myisamchk]
key-buffer-size                          = 1G
sort-buffer-size                         = 1G
read-buffer-size                         = 8M
write-buffer-size                        = 8M
 
[galera]
wsrep_provider="/usr/lib64/galera/libgalera_smm.so"
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
wsrep_slave_threads=64
innodb_flush_log_at_trx_commit=0
innodb_force_primary_key=1
wsrep_provider_options="gcache.size=32G; gcs.fc_master_slave=yes"
wsrep_node_address="10.0.7.45"
wsrep_max_ws_size=2147483648
wsrep_sst_method=xtrabackup-v2
 
[xtrabackup]
parallel=16
 
[sst]
compressor='pigz -p16'
decompressor='pigz -p16 -dc'
 

Comment by Elena Stepanova [ 2016-02-16 ]

tanj,
was it a one-time thing, or is it happening from time to time?

I tried to run some concurrent tests on a standalone server, could not hit the failure so far. I used the community server, one can only hope it's not that different.

nirbhay_c, could you please also try to run the same on a 10.0.22 Galera cluster with a similar config? Some concurrent DML load on a couple nodes, and a frequent query from metadata_lock_info on one of them? I still think that given the differences in locking between MGC and vanilla server, Galera might be relevant here.

Comment by Nirbhay Choubey (Inactive) [ 2016-03-24 ]

Can't repeat with a combination of SELECT, INSERT, FTWRL and SELECT .. I_S.METADATA_LOCK_INFO executing concurrently over a 2-node cluster.

Comment by Elena Stepanova [ 2017-04-19 ]

The crash was fixed in the scope of MDEV-9728.

Generated at Thu Feb 08 07:35:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.