[MDEV-14493] Mariadb nodes freeze with 1927 internal error Created: 2017-11-24  Updated: 2018-02-28  Resolved: 2018-02-28

Status: Closed
Project: MariaDB Server
Component/s: Galera, Server
Affects Version/s: 10.2.10
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Alon O Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

Ubuntu 16, 10.2.10+maria~xenial amd64



 Description   

When the system is loaded (only happens in production), I get the following log with a database freeze, upon restarting the mariadb process, I sometimes need to restore a backup due to partial data corruption that occurs (as far as the application - drupal is concerned).

Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Warning] WSREP: BF applier failed to open_and_lock_tables: 1927, fatal: 0 wsrep = (exec_mode: 1 conflict_state: 5 seqno: 903792)
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [ERROR] Slave SQL: Error executing row event: 'Connection was killed', Internal MariaDB error code: 1927
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Warning] WSREP: RBR event 3 Write_rows_v1 apply warning: 1927, 903792
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Note] WSREP: applier aborted: 903792
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Warning] WSREP: Failed to apply app buffer: seqno: 903792, status: 1
Nov 23 21:42:55 websrv4 mysqld[1725]: #011 at galera/src/trx_handle.cpp:apply():351
Nov 23 21:42:55 websrv4 mysqld[1725]: Retrying 2th time
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Warning] WSREP: BF applier failed to open_and_lock_tables: 1927, fatal: 0 wsrep = (exec_mode: 1 conflict_state: 5 seqno: 903792)
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [ERROR] Slave SQL: Error executing row event: 'Connection was killed', Internal MariaDB error code: 1927
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Warning] WSREP: RBR event 3 Write_rows_v1 apply warning: 1927, 903792
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Note] WSREP: applier aborted: 903792
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Warning] WSREP: Failed to apply app buffer: seqno: 903792, status: 1
Nov 23 21:42:55 websrv4 mysqld[1725]: #011 at galera/src/trx_handle.cpp:apply():351
Nov 23 21:42:55 websrv4 mysqld[1725]: Retrying 3th time
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Warning] WSREP: BF applier failed to open_and_lock_tables: 1927, fatal: 0 wsrep = (exec_mode: 1 conflict_state: 5 seqno: 903792)
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [ERROR] Slave SQL: Error executing row event: 'Connection was killed', Internal MariaDB error code: 1927
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Warning] WSREP: RBR event 3 Write_rows_v1 apply warning: 1927, 903792
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Note] WSREP: applier aborted: 903792
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Warning] WSREP: Failed to apply app buffer: seqno: 903792, status: 1
Nov 23 21:42:55 websrv4 mysqld[1725]: #011 at galera/src/trx_handle.cpp:apply():351
Nov 23 21:42:55 websrv4 mysqld[1725]: Retrying 4th time
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Warning] WSREP: BF applier failed to open_and_lock_tables: 1927, fatal: 0 wsrep = (exec_mode: 1 conflict_state: 5 seqno: 903792)
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [ERROR] Slave SQL: Error executing row event: 'Connection was killed', Internal MariaDB error code: 1927
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Warning] WSREP: RBR event 3 Write_rows_v1 apply warning: 1927, 903792
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Note] WSREP: applier aborted: 903792
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Warning] WSREP: failed to replay trx: source: 6ba51620-d084-11e7-9d2d-d7d0542d8744 version: 3 local: 1 state: REPLAYING flags: 1 conn_id: 475 trx_id: 1929982 seqnos (l: 3092, g: 903792, s: 903752, d: 903791, ts: 851004290800)
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [Warning] WSREP: Failed to apply trx 903792 4 times
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [ERROR] WSREP: trx_replay failed for: 6, schema: xxdbnamexx, query: `<94>B<B4>#035V
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081879742208 [ERROR] Aborting
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140082909173504 [Warning] Aborted connection 473 to db: 'xxxx' user: 'xxx' host: 'xxx' (Got an error reading communication packets)
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081878832896 [Warning] Aborted connection 478 to db: 'xxxx' user: 'xxx' host: 'xxx' (Got an error reading communication packets)
Nov 23 21:42:55 websrv4 mysqld[1725]: 2017-11-23 21:42:55 140081773995776 [Warning] Aborted connection 507 to db: 'xxxx' user: 'xxxx' host: 'xxxx' (Deadlock found when trying to get lock; try restarting transacti)
Nov 23 21:42:57 websrv4 mysqld[1725]: 2017-11-23 21:42:57 140082910992128 [Warning] Aborted connection 462 to db: 'xxxx' user: 'xxxx' host: 'xxxx' (Deadlock found when trying to get lock; try restarting transacti)
Nov 23 21:42:57 websrv4 mysqld[1725]: 2017-11-23 21:42:57 140082910082816 [Warning] Aborted connection 461 to db: 'xxxx' user: 'xxxx' host: 'xxxx' (Deadlock found when trying to get lock; try restarting transacti)
Nov 23 21:43:13 websrv4 mysqld[1725]: =====================================



 Comments   
Comment by Alon O [ 2017-11-26 ]

DB crashed today again, this is the error log:

Nov 26 12:50:25 websrv4 mysqld[22619]: 2017-11-26 12:50:25 140012249949952 [Warning] Aborted connection 3873 to db: 'proddb' user: 'CLIENT2' host: 'web3' (Got an error reading communication packets)
Nov 26 12:50:25 websrv4 mysqld[22619]: 2017-11-26 12:50:25 140012646905600 [Warning] Aborted connection 3874 to db: 'proddb' user: 'CLIENT2' host: 'web2' (Got an error reading communication packets)
Nov 26 12:50:26 websrv4 mysqld[22619]: 2017-11-26 12:50:26 140012646299392 [Warning] Aborted connection 3878 to db: 'proddb' user: 'CLIENT2' host: 'web2' (Got an error reading communication packets)
Nov 26 12:50:26 websrv4 mysqld[22619]: 171126 12:50:26 [ERROR] mysqld got signal 11 ;
Nov 26 12:50:26 websrv4 mysqld[22619]: This could be because you hit a bug. It is also possible that this binary
Nov 26 12:50:26 websrv4 mysqld[22619]: or one of the libraries it was linked against is corrupt, improperly built,
Nov 26 12:50:26 websrv4 mysqld[22619]: or misconfigured. This error can also be caused by malfunctioning hardware.
Nov 26 12:50:26 websrv4 mysqld[22619]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Nov 26 12:50:26 websrv4 mysqld[22619]: We will try our best to scrape up some info that will hopefully help
Nov 26 12:50:26 websrv4 mysqld[22619]: diagnose the problem, but since we have already crashed,
Nov 26 12:50:26 websrv4 mysqld[22619]: something is definitely wrong and this may fail.
Nov 26 12:50:26 websrv4 mysqld[22619]: Server version: 10.2.10-MariaDB-10.2.10+maria~xenial-log
Nov 26 12:50:26 websrv4 mysqld[22619]: key_buffer_size=134217728
Nov 26 12:50:26 websrv4 mysqld[22619]: read_buffer_size=2097152
Nov 26 12:50:26 websrv4 mysqld[22619]: max_used_connections=10
Nov 26 12:50:26 websrv4 mysqld[22619]: max_threads=302
Nov 26 12:50:26 websrv4 mysqld[22619]: thread_count=19
Nov 26 12:50:26 websrv4 mysqld[22619]: It is possible that mysqld could use up to
Nov 26 12:50:26 websrv4 mysqld[22619]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1992937 K bytes of memory
Nov 26 12:50:26 websrv4 mysqld[22619]: Hope that's ok; if not, decrease some variables in the equation.
Nov 26 12:50:26 websrv4 mysqld[22619]: Thread pointer: 0x7f56bc0009a8
Nov 26 12:50:26 websrv4 mysqld[22619]: Attempting backtrace. You can use the following information to find out
Nov 26 12:50:26 websrv4 mysqld[22619]: where mysqld died. If you see no messages after this, something went
Nov 26 12:50:26 websrv4 mysqld[22619]: terribly wrong...
Nov 26 12:50:26 websrv4 mysqld[22619]: stack_bottom = 0x7f574c0ded18 thread_stack 0x49000
Nov 26 12:50:26 websrv4 mysqld[22619]: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x560b09dfd0ae]
Nov 26 12:50:26 websrv4 mysqld[22619]: /usr/sbin/mysqld(handle_fatal_signal+0x305)[0x560b09897085]
Nov 26 12:50:26 websrv4 mysqld[22619]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f5752ce4390]
Nov 26 12:50:26 websrv4 mysqld[22619]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0x8b)[0x560b09963fdb]
Nov 26 12:50:26 websrv4 mysqld[22619]: /usr/sbin/mysqld(binlog_background_thread+0x32b)[0x560b0996464b]
Nov 26 12:50:26 websrv4 mysqld[22619]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f5752cda6ba]
Nov 26 12:50:26 websrv4 mysqld[22619]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f575238582d]
Nov 26 12:50:26 websrv4 mysqld[22619]: Trying to get some variables.
Nov 26 12:50:26 websrv4 mysqld[22619]: Some pointers may be invalid and cause the dump to abort.
Nov 26 12:50:26 websrv4 mysqld[22619]: Query (0x0):
Nov 26 12:50:26 websrv4 mysqld[22619]: Connection ID (thread ID): 8
Nov 26 12:50:26 websrv4 mysqld[22619]: Status: NOT_KILLED
Nov 26 12:50:26 websrv4 mysqld[22619]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on
Nov 26 12:50:26 websrv4 mysqld[22619]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
Nov 26 12:50:26 websrv4 mysqld[22619]: information that should help you find out what is causing the crash.
Nov 26 12:50:27 websrv4 systemd[1]: mariadb.service: Main process exited, code=dumped, status=11/SEGV
Nov 26 12:50:27 websrv4 systemd[1]: mariadb.service: Unit entered failed state.
Nov 26 12:50:27 websrv4 systemd[1]: mariadb.service: Failed with result 'core-dump'.
Nov 26 12:50:32 websrv4 systemd[1]: mariadb.service: Service hold-off time over, scheduling restart.
Nov 26 12:50:32 websrv4 systemd[1]: Stopped MariaDB database server.
Nov 26 12:50:32 websrv4 systemd[1]: Starting MariaDB database server...
Nov 26 12:50:37 websrv4 mysqld[7717]: WSREP: Recovered position d7d777bf-c235-11e7-ae1b-735d5455e9c8:1019175
Nov 26 12:50:37 websrv4 mysqld[7964]: 2017-11-26 12:50:37 139873024727232 [Note] /usr/sbin/mysqld (mysqld 10.2.10-MariaDB-10.2.10+maria~xenial-log) starting as process 7964 ...

Comment by Andrii Nikitin (Inactive) [ 2017-11-28 ]

This should be duplicate of MDEV-9510, which is marked fixed in 10.2.11 , which should be released soon.
Could you upgrade to 10.2.11 once it is available and confirm if the crash containing mark_xid_done still happening?

Alternatively, below are possible workarounds which worth to consider:

  • disable binary logging
  • investigate if some application(s) do `set sql_log_bin=0` . Usually monitoring or similar software may want to use it.
Comment by Alon O [ 2017-11-28 ]

Yes ofcourse, waiting for update regarding 10.2.11.
Thank you for your work on this.

Comment by Elena Stepanova [ 2018-01-27 ]

alon, have you experienced the problem after upgrading to 10.2.11+?

Generated at Thu Feb 08 08:14:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.