[MDEV-18308] Crash when starting with huge error_log file - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Affects Version/s: 10.3.12
Fix Version/s: N/A
Component/s: wsrep
Labels:
None
Environment:
Ubuntu 18.04.1 LTS

Description

Just ran into a very weird issue when starting a mariadb within a galera cluster that had a huge error_log file.
Basically two of the nodes had some issues and filled thir error_log file with a lot of data within just a couple of hours, it was full of

Record lock, heap no 322 PHYSICAL RECORD: n_fields 5; compact format; info bits 0

 0: len 6; hex 00000001e753; asc      S;;

 1: len 6; hex 000000000000; asc       ;;

 2: len 7; hex 80000000000000; asc        ;;

 3: len 4; hex 000004a0; asc     ;;

 4: len 4; hex 00002dda; asc   - ;;

Record lock, heap no 323 PHYSICAL RECORD: n_fields 5; compact format; info bits 0

 0: len 6; hex 00000001e754; asc      T;;

 1: len 6; hex 000000000000; asc       ;;

 2: len 7; hex 80000000000000; asc        ;;

 3: len 4; hex 0000048e; asc     ;;

 4: len 4; hex 00002ddd; asc   - ;;

Which itself is not the problem as the one of the galera nodes was still running without a problem. But the error_log file was 7GB big (logrotate would have rotated it, but only after 24h).

We then started mariadb and it crashed with:

2019-01-18 23:14:31 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

2019-01-18 23:14:31 2 [Note] WSREP: REPL Protocols: 9 (4, 2)

2019-01-18 23:14:31 2 [Note] WSREP: Assign initial position for certification: 6718718, protocol version: 4

2019-01-18 23:14:31 0 [Note] WSREP: Service thread queue flushed.

2019-01-18 23:14:31 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (c10bbe42-14dd-11e9-965c-ba31e4d39b18): 1 (Operation not permitted)

	 at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.

2019-01-18 23:14:31 0 [Note] WSREP: Member 1.0 (galera2) requested state transfer from '*any*'. Selected 0.0 (galera1)(SYNCED) as donor.

2019-01-18 23:14:31 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 6718721)

2019-01-18 23:14:31 2 [Note] WSREP: Requesting state transfer: success, donor: 0

2019-01-18 23:14:31 2 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> c10bbe42-14dd-11e9-965c-ba31e4d39b18:6718718

2019-01-18 23:14:33 0 [Note] WSREP: (6d41a12c, 'tcp://0.0.0.0:4567') connection to peer 6d41a12c with addr tcp://172.21.42.131:4567 timed out, no messages seen in PT3S

2019-01-18 23:14:33 0 [Note] WSREP: (6d41a12c, 'tcp://0.0.0.0:4567') turning message relay requesting off

2019-01-18 23:16:18 0 [Note] WSREP: 0.0 (galera1): State transfer to 1.0 (galera2) complete.

2019-01-18 23:16:18 0 [Note] WSREP: Member 0.0 (galera1) synced with group.

WSREP_SST: [INFO] Joiner cleanup. rsync PID: 25756 (20190118 23:16:18.814)

WSREP_SST: [INFO] Joiner cleanup done. (20190118 23:16:19.323)

2019-01-18 23:16:19 0 [Note] WSREP: SST complete, seqno: 6718724

2019-01-18 23:16:19 0 [Note] InnoDB: Using Linux native AIO

2019-01-18 23:16:19 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins

2019-01-18 23:16:19 0 [Note] InnoDB: Uses event mutexes

2019-01-18 23:16:19 0 [Note] InnoDB: Compressed tables use zlib 1.2.11

2019-01-18 23:16:19 0 [Note] InnoDB: Number of pools: 1

2019-01-18 23:16:19 0 [Note] InnoDB: Using SSE2 crc32 instructions

2019-01-18 23:16:19 0 [Note] InnoDB: Initializing buffer pool, total size = 4G, instances = 4, chunk size = 128M

2019-01-18 23:16:19 0 [Note] InnoDB: Completed initialization of buffer pool

2019-01-18 23:16:19 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().

2019-01-18 23:16:20 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=57817822986

2019-01-18 23:16:20 0 [Note] InnoDB: Starting final batch to recover 581 pages from redo log.

2019-01-18 23:16:45 0 [Note] InnoDB: 128 out of 128 rollback segments are active.

2019-01-18 23:16:45 0 [Note] InnoDB: Creating shared tablespace for temporary tables

2019-01-18 23:16:45 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...

2019-01-18 23:16:45 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.

2019-01-18 23:16:45 0 [Note] InnoDB: Waiting for purge to start

2019-01-18 23:16:45 0 [Note] InnoDB: 10.3.12 started; log sequence number 57819083632; transaction id 22266838

2019-01-18 23:16:45 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool

2019-01-18 23:16:45 0 [Note] InnoDB: Cannot open '/var/lib/mysql/ib_buffer_pool' for reading: No such file or directory

2019-01-18 23:16:45 0 [Note] Plugin 'FEEDBACK' is disabled.

2019-01-18 23:16:45 0 [Warning] Failed to setup SSL

2019-01-18 23:16:45 0 [Warning] SSL error: SSL_CTX_set_default_verify_paths failed

2019-01-18 23:16:45 0 [Warning] SSL error: error:02001002:system library:fopen:No such file or directory

2019-01-18 23:16:45 0 [Warning] SSL error: error:2006D080:BIO routines:BIO_new_file:no such file

2019-01-18 23:16:45 0 [Warning] SSL error: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib

2019-01-18 23:16:45 0 [Note] Server socket created on IP: '::'.

2019-01-18 23:16:45 0 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown

2019-01-18 23:16:45 0 [Note] WSREP: Stop replication

2019-01-18 23:16:45 0 [Note] WSREP: Closing send monitor...

2019-01-18 23:16:45 0 [Note] WSREP: Closed send monitor.

2019-01-18 23:16:45 0 [Note] WSREP: gcomm: terminating thread

2019-01-18 23:16:45 0 [Note] WSREP: gcomm: joining thread

2019-01-18 23:16:45 0 [Note] WSREP: gcomm: closing backend

2019-01-18 23:16:45 0 [Note] WSREP: view(view_id(NON_PRIM,3f8dc445,97) memb {

	6d41a12c,0

} joined {

} left {

} partitioned {

	3f8dc445,0

	dc9c1631,0

})

2019-01-18 23:16:45 0 [Note] WSREP: view((empty))

2019-01-18 23:16:45 0 [Note] WSREP: gcomm: closed

2019-01-18 23:16:45 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1

2019-01-18 23:16:45 0 [Note] WSREP: Flow-control interval: [4096, 4096]

2019-01-18 23:16:45 0 [Note] WSREP: Trying to continue unpaused monitor

2019-01-18 23:16:45 0 [Note] WSREP: Received NON-PRIMARY.

2019-01-18 23:16:45 0 [Note] WSREP: Shifting JOINER -> OPEN (TO: 6720610)

2019-01-18 23:16:45 0 [Note] WSREP: Received self-leave message.

2019-01-18 23:16:45 0 [Note] WSREP: Flow-control interval: [4096, 4096]

2019-01-18 23:16:45 0 [Note] WSREP: Trying to continue unpaused monitor

2019-01-18 23:16:45 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.

2019-01-18 23:16:45 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 6720610)

2019-01-18 23:16:45 0 [Note] WSREP: RECV thread exiting 0: Success

2019-01-18 23:16:45 0 [Note] WSREP: recv_thread() joined.

2019-01-18 23:16:45 0 [Note] WSREP: Closing replication queue.

2019-01-18 23:16:45 0 [Note] WSREP: Closing slave action queue.

2019-01-18 23:16:45 0 [Note] WSREP: Signalling provider to continue.

2019-01-18 23:16:45 0 [Note] WSREP: SST received: c10bbe42-14dd-11e9-965c-ba31e4d39b18:6718724

2019-01-18 23:16:45 2 [ERROR] WSREP: gcs/src/gcs.cpp:_join():892: Sending JOIN failed: -103 (Software caused connection abort).

2019-01-18 23:16:45 0 [ERROR] WSREP: Trying to launch slave threads before creating connection at 'gcomm://172.21.42.110,172.21.42.131,172.21.42.113'

mysqld: /home/buildbot/buildbot/build/mariadb-10.3.12/sql/wsrep_thd.cc:446: void wsrep_create_appliers(long int): Assertion `0' failed.

190118 23:16:45 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.3.12-MariaDB-1:10.3.12+maria~bionic-log

key_buffer_size=16777216

read_buffer_size=131072

max_used_connections=0

max_threads=1002

thread_count=7

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4271151 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

2019-01-18 23:16:45 2 [ERROR] WSREP: Failed to JOIN the cluster after SST

stack_bottom = 0x0 thread_stack 0x40000

*** buffer overflow detected ***: /usr/sbin/mysqld terminated

After deleting the log file mariadb started without any issues.
This happened on two mariadb servers, both always crashed if the error_log file was bigger than 7GB.

Attachments

Activity

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Michael Schmid

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2019-01-18 22:36

Updated:: 2022-01-24 01:02

Resolved:: 2022-01-24 01:02

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.