[MDEV-8325] Server crash in pagecache_fwrite after killing query consuming a lot of tmp disk space - Jira

Details

Type: Bug
Status: Confirmed (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.0.14, 10.0(EOL)
Fix Version/s: 10.4(EOL)
Component/s: Admin statements, Optimizer, Storage Engine - Aria
Labels:
None
Environment:
PROD

Description

killing a long running query caused MySQL to crash:

| 7552649 | ttahon          | 10.10.0.55:49241  | PRODUCTION | Killed  |   56220 | Copying to tmp table on disk | SELECT

           PROD_PRODUITS.LIBELLE

          AS libelle,

       PROD_COMMANDES.D |    0.000 |

The above query created a very large temporary file and hence filled disk:

150616 20:13:41 [Warning] mysqld: Disk is full writing '/tmp/#sql_2741_0.MAD' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue afte

r freeing disk space)

150616 20:13:41 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs

150616 20:14:41 [Warning] mysqld: Disk is full writing '/tmp/#sql_2741_0.MAD' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue afte

r freeing disk space)

150616 20:14:41 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs

150616 20:15:41 [Warning] mysqld: Disk is full writing '/tmp/#sql_2741_0.MAD' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue afte

r freeing disk space)

150616 20:15:41 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs

150616 20:16:26 [Warning] mysqld: Disk is full writing '/tmp/#sql_2741_1.MAD' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue afte

r freeing disk space)

150616 20:16:26 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs

150616 20:25:41 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs

150616 20:26:26 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs

150616 20:35:41 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs

150616 20:36:26 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs

150616 20:45:41 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs

When we killed long running query:

MariaDB [PRODUCTION]> kill 7552649;

Query OK, 0 rows affected (0.00 sec)

MariaDB [PRODUCTION]> show processlist;

+---------+-----------------+-------------------+------------+---------+---------+------------------------------+------------------------------------------------------------------------------------------------------+----------+

| Id      | User            | Host              | db         | Command | Time    | State                        | Info                                                                                                 | Progress |

+---------+-----------------+-------------------+------------+---------+---------+------------------------------+------------------------------------------------------------------------------------------------------+----------+

|       1 | system user     |                   | NULL       | Sleep   | 3698081 | wsrep aborter idle           | NULL                                                                                                 |    0.000 |

|       2 | system user     |                   | NULL       | Sleep   |       0 | closing tables               | NULL                                                                                                 |    0.000 |

|       4 | event_scheduler | localhost         | NULL       | Daemon  | 3698078 | Waiting on empty queue       | NULL                                                                                                 |    0.000 |

| 7552649 | ttahon          | 10.10.0.55:49241  | PRODUCTION | Killed  |   56220 | Copying to tmp table on disk | SELECT

           PROD_PRODUITS.LIBELLE

          AS libelle,

       PROD_COMMANDES.D |    0.000 |

| 7562009 | user4qlickview  | 10.10.0.55:42983  | PRODUCTION | Query   |   50335 | converting HEAP to Aria      | SELECT distinct

        ID_PROD_ITEM as 'ID PROD ITEM',

        PAGES

Few minutes later MySQL server crashed:

150617 10:15:42 [ERROR] mysqld got signal 11 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.0.14-MariaDB-1~precise-wsrep-log

key_buffer_size=134217728

read_buffer_size=131072

max_used_connections=15

max_threads=502

thread_count=10

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5346411 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x7f0adfa93008

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x7f11c1557da0 thread_stack 0x48000

/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0xb9025b]

/usr/sbin/mysqld(handle_fatal_signal+0x398)[0x741088]

/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f11c12dfcb0]

Trying to get some variables.

Some pointers may be invalid and cause the dump to abort.

Query (0x7f0ac2c8b020): is an invalid pointer

Connection ID (thread ID): 7552649

Status: KILL_CONNECTION

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains

information that should help you find out what is causing the crash.

150617 10:15:43 mysqld_safe Number of processes running now: 0

150617 10:15:43 mysqld_safe WSREP: not restarting wsrep node automatically

150617 10:15:43 mysqld_safe mysqld from pid file /data/mysql/data/ifactory-sart-db-node-03.pid ended

150617 10:20:54 mysqld_safe Starting mysqld daemon with databases from /data/mysql/data

150617 10:20:54 mysqld_safe WSREP: Running position recovery with --log_error='/data/mysql/data/wsrep_recovery.l0XA1P' --pid-file='/data/mysql/data/ifactory-sart-db-node-03-recover.pid'

150617 10:21:14 mysqld_safe WSREP: Recovered position 10f20163-668b-11e4-bfda-7ef6da2a3a6b:408209687

150617 10:21:14 [Note] WSREP: wsrep_start_position var submitted: '10f20163-668b-11e4-bfda-7ef6da2a3a6b:408209687'

150617 10:21:14 [Note] WSREP: Read nil XID from storage engines, skipping position init

150617 10:21:14 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'

150617 10:21:14 [Note] WSREP: wsrep_load(): Galera 25.3.5-wheezy(rXXXX) by Codership Oy <info@codership.com> loaded successfully.

150617 10:21:14 [Note] WSREP: CRC-32C: using hardware acceleration.

150617 10:21:14 [Note] WSREP: Found saved state: 10f20163-668b-11e4-bfda-7ef6da2a3a6b:-1

150617 10:21:14 [Note] WSREP: Passing config to GCS: base_host = 10.10.19.3; base_port = 4567; cert.log_conflicts = no; debug = no; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /data/mysql/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/mysql/data//galera.cache; gcache.page_size = 128M; gcache.size = 10G; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = P30S; pc.weight = 1; proton

150617 10:21:14 [Note] WSREP: Service thread queue flushed.

Attachments

Issue Links

duplicates

MDEV-8385 Roll back a transaction that fills up the tmp directory and don't crash the mysqld

Closed

relates to

MDEV-8004 key_buffer related crashes in MyISAM table check, stacktrace in error log truncated

Closed

Server crash in pagecache_fwrite after killing query consuming a lot of tmp disk space

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration