[MDEV-10347] mysqld got signal 11 Created: 2016-07-08  Updated: 2016-11-02  Resolved: 2016-11-02

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 10.1.13
Fix Version/s: 10.1.19

Type: Bug Priority: Critical
Reporter: sungwon.han Assignee: Alexander Barkov
Resolution: Fixed Votes: 1
Labels: None
Environment:

RHEL 6.7, pacemaker, replication
2.6.32-573.18.1.el6.x86_64


Attachments: Text File create_table_20160712.sql     Text File master_last.sql     File mdev10347.test     Text File my_cnf.txt     Text File mysql_err.txt     Text File query_result_20160710.txt     Text File slave_last.sql    
Sprint: 10.1.19

 Description   

The database had many transaction(update) from batch server. Suddenly master server had gone, slave server promote to new maser in replication with pacemker. The master server get the error [ERROR] mysqld got signal 11. About 10 minutes later, slave also shutdown abnormally. It also same error([ERROR] mysqld got signal 11).
And I used mysqbinlog to fine last SQL. I found similary transaction(same table, same update) in master and slave server.

I attached detail log(Last SQL using mysqlbinlog in master and slave, mysqld.err)



 Comments   
Comment by Elena Stepanova [ 2016-07-08 ]

Unfortunately, the query in the binlog is not the one that caused the crash. Your server crashed on SELECT executed via a prepared statement, not on UPDATE; and SELECT wouldn't make it to the binary log anyway.

Do you have many various SELECT queries on the server? Maybe you happen to know what was running around the time of the crash, it could help to narrow down the problem.
Did it happen only once on each server? Have they been working all right after you restarted them?
If the crash happens from time to time, would it be possible to enable the general log until the next occurrence?

Please also attach your cnf file(s).

Comment by sungwon.han [ 2016-07-08 ]

@Elena Stepanova
Thank you for your answer. I attached my.cnf. It happen abnormal shutdown after signal 11 error. I heard that this server's memory parameter set little small. Could you check that memory parameter is correct?
Hardware Spec. (64GB. 20core(physical core is 10core. It is hyperviser)
my_cnf.txt

Comment by Elena Stepanova [ 2016-07-08 ]

At the first glance, I don't see anything particularly criminal in the cnf file, but server tuning depends most of all on the workflow the server deals with, which we know nothing about. For accurate tuning, you would need a consultant onsite or a remote DBA.

Back to the crash, I didn't quite understand from your answer – did it only happen twice, or does it keep happening? And will you be able to enable the general log?

Comment by sungwon.han [ 2016-07-09 ]

@Elena Stepanova
It keep happening after first signal 11 error. It can be starting server, but insert transaction makes server signal 11 error. (not select)
Actually, I think INSERT statements very simple.
I start 100(or >100) transaction just below. and server is okay. (100 transaction means little transaction.)
INSERT INTO TEMP_TABLE SELECT * FROM BIGTABLE LIMIT 100;

But I start several same transaction just below. and server got signal 11 error.
INSERT INTO TEMP_TABLE SELECT * FROM BIGTABLE LIMIT 10000;(or > 100000)

I enable the general log to fine last transaction. SELECT statements is okay. but INSERT or UPDATE statements affects mariadb server.
What can I do solve the problem?

Comment by Elena Stepanova [ 2016-07-09 ]

After you enabled the general log and got the crash, please do the following:

  • open the error log from this last crash;
  • at the end of the crash report, there are lines like

    Trying to get some variables.
    Some pointers may be invalid and cause the dump to abort.
    Query (0x7fe0ea835020): is an invalid pointer
    Connection ID (thread ID): xxxxxxx
    Status: NOT_KILLED
    

    where xxxxxxx is a decimal number (can be big, can be small, does not matter); remember this number;

  • open your general log, go to the end of it, and find the last query that belongs to this connection. It will look like

          xxxxxxx Execute <query>

or

          xxxxxxx Query <query>

It might be not the very last query in the whole log, but it should be the last one that has xxxxxxx as the connection number.

When you found it, please paste the query and the following information for each table involved in the query:

SHOW CREATE TABLE <table_name>;
SHOW INDEX IN <table_name>;
SHOW TABLE STATUS LIKE '<table_name>';
SHOW TRIGGERS LIKE '<table_name>';

As soon as we identify the problem, we might be able to come up with a workaround.
Thanks.

Comment by sungwon.han [ 2016-07-10 ]

In mariadb err.log

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fe0ea835020): is an invalid pointer
Connection ID (thread ID): 271665
Status: NOT_KILLED

I found the SQL that thread id is 271665 in slow-query log. (I didin't enabled general-log).

There are two SQL that same thread id(271665). 
# Time: 160708 14:54:18
# User@Host: glap_test[glap_test] @  [10.185.217.95]
# Thread_id: 271665  Schema: glaptest  QC_hit: No
# Query_time: 3.034989  Lock_time: 0.000032  Rows_sent: 0  Rows_examined: 507934
# Rows_affected: 147924
#
# explain: id   select_type     table   type    possible_keys   key     key_len ref     rows    r_rows  filtered        
 
r_filtered      Extra
# explain: 1    SIMPLE  T       range   
 
TB_LI_GERP_SEW_SHIP_N04,TB_LI_GERP_SEW_SHIP_N05,TB_LI_GERP_SEW_SHIP_N02,TB_LI_GERP_SEW_SHIP_N07,TB_LI_GERP_SEW_SHIP_N01,TB_LI_GER
 
P_SEW_SHIP_N06 TB_LI_GERP_SEW_SHIP_N02 18      NULL    697056  147924.00       100.00  100.00  Using where; Using buffer
#
SET timestamp=1467957258;
UPDATE /* ShipmentCommonMapper.updateInterfaceEDNOForLCL */
                                       TB_LI_GERP_SEW_SHIP T
                                         SET T.CONTAINER_NO = T.ATTRIBUTE5
                   , T.ATTRIBUTE17 = T.CONTAINER_NO
                                 WHERE T.TRANSFER_FLAG = 'N'
                                   AND T.corp_type = 'LGEKR'
                                   AND T.LCL_FLAG = 'Y';
 
 
# Time: 160708 14:54:57
# User@Host: glap_test[glap_test] @  [10.185.217.95]
# Thread_id: 271665  Schema: glaptest  QC_hit: No
# Query_time: 39.851348  Lock_time: 0.000024  Rows_sent: 0  Rows_examined: 576283
# Rows_affected: 216273
#
# explain: id   select_type     table   type    possible_keys   key     key_len ref     rows    r_rows  filtered        
 
r_filtered      Extra
# explain: 1    SIMPLE  T       range   
 
TB_LI_GERP_SEW_SHIP_N04,TB_LI_GERP_SEW_SHIP_N05,TB_LI_GERP_SEW_SHIP_N02,TB_LI_GERP_SEW_SHIP_N07,TB_LI_GERP_SEW_SHIP_N01,TB_LI_GER
 
P_SEW_SHIP_N06 TB_LI_GERP_SEW_SHIP_N02 18      NULL    697056  216273.00       100.00  100.00  Using where; Using buffer
#
SET timestamp=1467957297;
UPDATE /* ShipmentCommonMapper.updateInterfaceDcCd */
                                       TB_LI_GERP_SEW_SHIP T
                                         SET T.ATTRIBUTE18 = T.DC_CD
                                           , T.DC_CD = T.ATTRIBUTE3
                                 WHERE T.TRANSFER_FLAG = 'N'
                                   AND T.corp_type = 'LGEKR'
                                   AND T.SOURCE_SYSTEM_CODE = 'SEW'
                                   AND T.SERVICE_FLAG = 'N'
                                   AND T.ATTRIBUTE3 IS NOT NULL;

The query result that your required is just attached.(The file name is query_result_20160710.txt)

Thanks.

Comment by Elena Stepanova [ 2016-07-10 ]

gks3117,
The slow query log is not suitable for this purpose. It only records queries which take longer than a certain limit, and these queries most certainly don't have anything to do with the crash – first, because these UPDATEs don't match the stack trace you provided, and secondly because the crashing query cannot be in the slow log, because it crashes before it finishes executing, hence the server does not have any information about its duration and cannot mark it as slow.

Comment by sungwon.han [ 2016-07-11 ]

I got it, we'll be testing for making about signal 11 error. I found that singnal 11 error be related about memory and segmentation. (ex. null point exception)
Actually, we try to be crashed server forcely to make signal 11. I set my.cnf to make coredump and general-log. Could you tell me how to be crashed server about signal 11 error?

Comment by sungwon.han [ 2016-07-11 ]

I find the sql that causing crash server in below.

Prepared STEP :
 SELECT a.SOURCE_SYSTEM_CODE
              , IFNULL(NULLIF(a.CONTAINER_tXN_ID, ''), 0) AS CONTAINER_tXN_ID
              , 0 as  EXPENSE_ID
              , ifnull(nullif(a.loading_type_code,''),'X') as loading_type_code
              , a.container_no
              , DATE_FORMAT(a.pg_ymd, '%Y%m%d') AS pg_ymd
              , A.PG_YMD AS PG_YMD_DATE
              , a.interface_id
              , MAX(a.allocation_no)          allocation_no
              , MAX(a.info_change_flag)       info_change_flag
              , b.gdls_cd                     dc_cd
              , a.dc_cd                       org_code
              , MAX(account_unit)             au_code
              , lcl_flag                 lcl_flag
              , MAX(container_type)           container_type
              , IFNULL(MAX(ship_method),'X')              ship_method
              , MAX(transport_type)           transport_type
              , MAX(IFNULL(NULLIF(A.f_dest,''),'*'))    destination_name
              , MAX(A.TOTAL_SHP_QTY)          total_shp_qty
              , SUM(A.ORDER_QTY)              sum_order_qty
              , SUM(A.CBM_SUM)                cbm_sum 
              , MAX(A.WEIGHT)                 weight
              , MAX(a.l_port)                 l_port
              , MAX(a.d_port)                 dis_port
              , MAX(a.f_dest)                 f_dest
              , IFNULL(MAX(a.price_terms),'X')            price_terms
              , MAX(a.attribute3)             cw
              , MAX(A.ORIGIN_ALLOCATION_NO)   origin_allocation_no
              , MAX(a.invoice_no)             commercial_inv_no
              , max(IFNULL(a.attribute6, '*'))   departure_place
              , max(ifnull(a.ATTRIBUTE16,0))     drop_qty
              , max(ifnull(a.SHUTTLE_QTY,0))     SHUTTLE_QTY
              , count(distinct a.ATTRIBUTE5) as export_permits_qty
              , a.ATTRIBUTE5  as export_permits
              ,MAX( A.ORDER_TYPE) AS ORDER_TYPE
              ,max(a.attribute8) as attribute8
              ,CASE WHEN lcl_flag = 'Y' THEN
                       IFNULL((SELECT E.GDLS_CD
                         FROM TB_LM_CODEMAPPING E
                        WHERE E.CORP_TYPE = a.corp_type
                          AND E.ORD_SYS_CD =  IFNULL(MAX(nullif(carrier_code1,'')),MAX(SHIPPING_LINE2))
                          AND E.USE_TYPE =  'Y'
                          AND E.CD_TYPE = 'LCL'
                          )
                         ,IFNULL(MAX(nullif(carrier_code1,'')),MAX(SHIPPING_LINE2))
                         )
                ELSE
                    IFNULL(MAX(nullif(carrier_code1,'')),MAX(SHIPPING_LINE2))
                END AS carrier_code
                , (SELECT  COUNT(*)
                  FROM TB_LM_MODEL  MD
                 WHERE A.CORP_tYPE = MD.CORP_TYPE
                   AND A.MODEL_CD = MD.MODEL_CD
                   AND MD.CBM = 0
                ) CBM_ZERO_CNT
           FROM tb_li_gerp_sew_ship a
              , tb_lm_codemapping   b
              , tb_lm_dcconfig      d
          WHERE a.corp_type = ?
             AND transfer_flag IN ('N','E')
            AND a.corp_type = b.corp_type
            AND a.dc_cd    = b.ord_sys_cd
            AND b.cd_type  = 'CI'
            AND b.use_yn   = 'Y'
            AND d.corp_type = b.corp_type
            AND d.dc_cd     = b.gdls_cd
            AND d.EXPORT_IF_YN = 'Y'
            AND a.SOURCE_SYSTEM_CODE = 'SEW'
            AND A.SERVICE_FLAG = 'N'
            AND ( a.LCL_FLAG = 'Y' OR ifnull(a.loading_type_code,'X') NOT IN ('OAL','ODL', 'OME'))
          GROUP BY
                a.SOURCE_SYSTEM_CODE
              , IFNULL(NULLIF(a.CONTAINER_tXN_ID, ''), 0)
              , a.loading_type_code
              , a.container_no
              , DATE_FORMAT(a.pg_ymd, '%Y%m%d')
                          , a.interface_id, b.gdls_cd, a.dc_cd
                          , A.LCL_FLAG
                          , a.ATTRIBUTE5
          ORDER BY a.interface_id

On last saturday, I got java error.
(Could not read resultset: unexpected end of stream,
Could not read result set unexpected end of stream read 0 bytes from 4)
so, I do remove sql including max function in where clause. then it's solved.

I think that the problem line is "AND E.ORD_SYS_CD = IFNULL(MAX(nullif(carrier_code1,'')),MAX(SHIPPING_LINE2))". I think that the server crashed on SELECT executed via a prepared statement. Is it bug?

Comment by Elena Stepanova [ 2016-07-12 ]

Server crashing on a query, no matter whether the query is good or bad, is most definitely a bug.
That's great that you were able to locate the query. Now, if you could provide the database dump (at least for the 4 tables below), it would have been perfect.

TB_LM_MODEL
tb_li_gerp_sew_ship
tb_lm_codemapping
tb_lm_dcconfig

If you can't provide the dump, could you please run

SHOW CREATE TABLE TB_LM_MODEL;
SHOW INDEX IN TB_LM_MODEL;
SHOW TABLE STATUS LIKE 'TB_LM_MODEL';
SHOW TRIGGERS LIKE 'TB_LM_MODEL';
 
SHOW CREATE TABLE tb_li_gerp_sew_ship;
SHOW INDEX IN tb_li_gerp_sew_ship;
SHOW TABLE STATUS LIKE 'tb_li_gerp_sew_ship';
SHOW TRIGGERS LIKE 'tb_li_gerp_sew_ship';
 
SHOW CREATE TABLE tb_lm_codemapping;
SHOW INDEX IN tb_lm_codemapping;
SHOW TABLE STATUS LIKE 'tb_lm_codemapping';
SHOW TRIGGERS LIKE 'tb_lm_codemapping';
 
SHOW CREATE TABLE tb_lm_dcconfig;
SHOW INDEX IN tb_lm_dcconfig;
SHOW TABLE STATUS LIKE 'tb_lm_dcconfig';
SHOW TRIGGERS LIKE 'tb_lm_dcconfig';

and paste or attach the output?

Thanks.

Comment by sungwon.han [ 2016-07-12 ]

I confirm that PreparedStatement SELECT using max fucntion(group function) in where cluase make singal 11.
My team use mariadb-jdbc-1.4.6 that useServerPrepStmts's default values is true. But maraidb-jdbc-1.2.6(<1.3.0) and mysql-5.1.39 jdbc connector did not make signal 11 error. I find that those connector' parameter(useServerPrepStmts)'s values is false. So it did not make signal error.
My team applied the parameter is false. Then it it did not make signal error!
Probably, it is bug. I provie the create table sql to determine my case. Thanks.

Comment by sungwon.han [ 2016-07-15 ]

Dear, Elena Stepanova.
We wanna solve this problem to patch mariadb until next week. How it is going this problem?

Comment by Elena Stepanova [ 2016-07-15 ]

gks3117,
Thanks for the data, I was able to reproduce the problem.
As a workaround, if you are able to modify the application, you can replace nullif in the fragment of the query that you identified with IF:

- AND E.ORD_SYS_CD =  IFNULL(MAX(nullif(carrier_code1,'')),MAX(SHIPPING_LINE2))
+ AND E.ORD_SYS_CD =  IFNULL(MAX(if(carrier_code1='',null,carrier_code1)),MAX(SHIPPING_LINE2))

It should allow to avoid the crash until the bug is fixed.

Comment by Elena Stepanova [ 2016-07-15 ]

Smaller test case

CREATE TABLE t1 (f1 VARCHAR(10), f2 VARCHAR(40));
CREATE TABLE t2 (f3 VARCHAR(20));
PREPARE stmt FROM "
  SELECT (
    SELECT IFNULL(f3,4) FROM t2 
    WHERE IFNULL(NULLIF(f1,''),1)
  ) AS sq
  FROM t1
  GROUP BY f2
";
EXECUTE stmt;

Stack trace from 10.1 commit 12ac3ee11e8bc433432a2b3ba1bb79ea2e2d9965

#2  0x00007fb90c5530a9 in handle_fatal_signal (sig=11) at /data/src/10.1/sql/signal_handler.cc:294
#3  <signal handler called>
#4  0x00007fb90c5b7849 in Item_func::fix_fields (this=0x7fb900856bc8, thd=0x7fb902bf9070, ref=0x7fb900856ee0) at /data/src/10.1/sql/item_func.cc:208
#5  0x00007fb90c5b7869 in Item_func::fix_fields (this=0x7fb900856e58, thd=0x7fb902bf9070, ref=0x7fb9008e2ce0) at /data/src/10.1/sql/item_func.cc:208
#6  0x00007fb90c2c22ac in setup_conds (thd=0x7fb902bf9070, tables=0x7fb9008563f0, leaves=..., conds=0x7fb9008e2ce0) at /data/src/10.1/sql/sql_base.cc:8629
#7  0x00007fb90c3a03bd in setup_without_group (thd=0x7fb902bf9070, ref_pointer_array=0x7fb9008579f8, tables=0x7fb9008563f0, leaves=..., fields=..., all_fields=..., conds=0x7fb9008e2ce0, order=0x0, group=0x0, hidden_group_fields=0x7fb9008e2bc0, reserved=0x7fb90085591c) at /data/src/10.1/sql/sql_select.cc:645
#8  0x00007fb90c357e40 in JOIN::prepare (this=0x7fb9008e2898, rref_pointer_array=0x7fb9008558f8, tables_init=0x7fb9008563f0, wild_num=0, conds_init=0x7fb900856e58, og_num=0, order_init=0x0, skip_order_by=false, group_init=0x0, having_init=0x0, proc_param_init=0x0, select_lex_arg=0x7fb900855680, unit_arg=0x7fb9008559e0) at /data/src/10.1/sql/sql_select.cc:799
#9  0x00007fb90c600571 in subselect_single_select_engine::prepare (this=0x7fb900857060, thd=0x7fb902bf9070) at /data/src/10.1/sql/item_subselect.cc:3561
#10 0x00007fb90c5f6540 in Item_subselect::fix_fields (this=0x7fb900856f28, thd_param=0x7fb902bf9070, ref=0x7fb9008570b0) at /data/src/10.1/sql/item_subselect.cc:262
#11 0x00007fb90c2c03ad in setup_fields (thd=0x7fb902bf9070, ref_pointer_array=0x7fb900857868, fields=..., mark_used_columns=MARK_COLUMNS_READ, sum_func_list=0x7fb9008e2678, allow_sum_func=true) at /data/src/10.1/sql/sql_base.cc:7901
#12 0x00007fb90c357d7b in JOIN::prepare (this=0x7fb9008e2318, rref_pointer_array=0x7fb900854ad0, tables_init=0x7fb9008570f8, wild_num=0, conds_init=0x0, og_num=1, order_init=0x0, skip_order_by=false, group_init=0x7fb9008577f8, having_init=0x0, proc_param_init=0x0, select_lex_arg=0x7fb900854858, unit_arg=0x7fb900854158) at /data/src/10.1/sql/sql_select.cc:795
#13 0x00007fb90c3614a7 in mysql_select (thd=0x7fb902bf9070, rref_pointer_array=0x7fb900854ad0, tables=0x7fb9008570f8, wild_num=0, fields=..., conds=0x0, og_num=1, order=0x0, group=0x7fb9008577f8, having=0x0, proc_param=0x0, select_options=2416216832, result=0x7fb900857848, unit=0x7fb900854158, select_lex=0x7fb900854858) at /data/src/10.1/sql/sql_select.cc:3429
#14 0x00007fb90c35702c in handle_select (thd=0x7fb902bf9070, lex=0x7fb900854090, result=0x7fb900857848, setup_tables_done_option=0) at /data/src/10.1/sql/sql_select.cc:384
#15 0x00007fb90c327382 in execute_sqlcom_select (thd=0x7fb902bf9070, all_tables=0x7fb9008570f8) at /data/src/10.1/sql/sql_parse.cc:5894
#16 0x00007fb90c31d28f in mysql_execute_command (thd=0x7fb902bf9070) at /data/src/10.1/sql/sql_parse.cc:2960
#17 0x00007fb90c344ec6 in Prepared_statement::execute (this=0x7fb900b44470, expanded_query=0x7fb90d275a10, open_cursor=false) at /data/src/10.1/sql/sql_prepare.cc:4269
#18 0x00007fb90c343d37 in Prepared_statement::execute_loop (this=0x7fb900b44470, expanded_query=0x7fb90d275a10, open_cursor=false, packet=0x0, packet_end=0x0) at /data/src/10.1/sql/sql_prepare.cc:3901
#19 0x00007fb90c341dcf in mysql_sql_stmt_execute (thd=0x7fb902bf9070) at /data/src/10.1/sql/sql_prepare.cc:3033
#20 0x00007fb90c31d2c0 in mysql_execute_command (thd=0x7fb902bf9070) at /data/src/10.1/sql/sql_parse.cc:2971
#21 0x00007fb90c32aaba in mysql_parse (thd=0x7fb902bf9070, rawbuf=0x7fb9008e2088 "EXECUTE stmt", length=12, parser_state=0x7fb90d2765e0) at /data/src/10.1/sql/sql_parse.cc:7314
#22 0x00007fb90c3194e4 in dispatch_command (command=COM_QUERY, thd=0x7fb902bf9070, packet=0x7fb90453e071 "EXECUTE stmt", packet_length=12) at /data/src/10.1/sql/sql_parse.cc:1486
#23 0x00007fb90c31821b in do_command (thd=0x7fb902bf9070) at /data/src/10.1/sql/sql_parse.cc:1107
#24 0x00007fb90c44de63 in do_handle_one_connection (thd_arg=0x7fb902bf9070) at /data/src/10.1/sql/sql_connect.cc:1350
#25 0x00007fb90c44dbc7 in handle_one_connection (arg=0x7fb902bf9070) at /data/src/10.1/sql/sql_connect.cc:1262
#26 0x00007fb90c72f090 in pfs_spawn_thread (arg=0x7fb909027ef0) at /data/src/10.1/storage/perfschema/pfs.cc:1860
#27 0x00007fb90ba190a4 in start_thread (arg=0x7fb90d277b00) at pthread_create.c:309
#28 0x00007fb909bd187d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

The problem appeared in 10.1 tree with the following revision:

commit ce40ccaf24af2fe395f541cb1079256de8727ccd
Author: Alexander Barkov <bar@mariadb.org>
Date:   Thu Jan 28 13:58:39 2016 +0400 
 
    MDEV-9181 (NULLIF(count(table.col)), 0) gives wrong result on 10.1.x
    Wrapping args[0] and args[2] into an Item_cache for aggregate functions.

The original test case is attached.
mdev10347.test

Comment by Alexander Barkov [ 2016-11-02 ]

Even small test

DROP TABLE IF EXISTS t1,t2;
CREATE TABLE t1 (f1 VARCHAR(10), f2 VARCHAR(40));
CREATE TABLE t2 (f3 VARCHAR(20));
PREPARE stmt FROM "
  SELECT (
    SELECT NULL FROM t2
    WHERE NULLIF(f1,'')
  ) AS sq
  FROM t1
  GROUP BY f2
";
EXECUTE stmt;

Generated at Thu Feb 08 07:41:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.