Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
-
10.3.12
-
None
-
Ubuntu 18.04.1 LTS
Description
Just ran into a very weird issue when starting a mariadb within a galera cluster that had a huge error_log file.
Basically two of the nodes had some issues and filled thir error_log file with a lot of data within just a couple of hours, it was full of
Record lock, heap no 322 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
|
0: len 6; hex 00000001e753; asc S;;
|
1: len 6; hex 000000000000; asc ;;
|
2: len 7; hex 80000000000000; asc ;;
|
3: len 4; hex 000004a0; asc ;;
|
4: len 4; hex 00002dda; asc - ;;
|
|
Record lock, heap no 323 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
|
0: len 6; hex 00000001e754; asc T;;
|
1: len 6; hex 000000000000; asc ;;
|
2: len 7; hex 80000000000000; asc ;;
|
3: len 4; hex 0000048e; asc ;;
|
4: len 4; hex 00002ddd; asc - ;;
|
Which itself is not the problem as the one of the galera nodes was still running without a problem. But the error_log file was 7GB big (logrotate would have rotated it, but only after 24h).
We then started mariadb and it crashed with:
2019-01-18 23:14:31 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2019-01-18 23:14:31 2 [Note] WSREP: REPL Protocols: 9 (4, 2)
|
2019-01-18 23:14:31 2 [Note] WSREP: Assign initial position for certification: 6718718, protocol version: 4
|
2019-01-18 23:14:31 0 [Note] WSREP: Service thread queue flushed.
|
2019-01-18 23:14:31 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (c10bbe42-14dd-11e9-965c-ba31e4d39b18): 1 (Operation not permitted)
|
at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
|
2019-01-18 23:14:31 0 [Note] WSREP: Member 1.0 (galera2) requested state transfer from '*any*'. Selected 0.0 (galera1)(SYNCED) as donor.
|
2019-01-18 23:14:31 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 6718721)
|
2019-01-18 23:14:31 2 [Note] WSREP: Requesting state transfer: success, donor: 0
|
2019-01-18 23:14:31 2 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> c10bbe42-14dd-11e9-965c-ba31e4d39b18:6718718
|
2019-01-18 23:14:33 0 [Note] WSREP: (6d41a12c, 'tcp://0.0.0.0:4567') connection to peer 6d41a12c with addr tcp://172.21.42.131:4567 timed out, no messages seen in PT3S
|
2019-01-18 23:14:33 0 [Note] WSREP: (6d41a12c, 'tcp://0.0.0.0:4567') turning message relay requesting off
|
2019-01-18 23:16:18 0 [Note] WSREP: 0.0 (galera1): State transfer to 1.0 (galera2) complete.
|
2019-01-18 23:16:18 0 [Note] WSREP: Member 0.0 (galera1) synced with group.
|
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 25756 (20190118 23:16:18.814)
|
WSREP_SST: [INFO] Joiner cleanup done. (20190118 23:16:19.323)
|
2019-01-18 23:16:19 0 [Note] WSREP: SST complete, seqno: 6718724
|
2019-01-18 23:16:19 0 [Note] InnoDB: Using Linux native AIO
|
2019-01-18 23:16:19 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
|
2019-01-18 23:16:19 0 [Note] InnoDB: Uses event mutexes
|
2019-01-18 23:16:19 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
|
2019-01-18 23:16:19 0 [Note] InnoDB: Number of pools: 1
|
2019-01-18 23:16:19 0 [Note] InnoDB: Using SSE2 crc32 instructions
|
2019-01-18 23:16:19 0 [Note] InnoDB: Initializing buffer pool, total size = 4G, instances = 4, chunk size = 128M
|
2019-01-18 23:16:19 0 [Note] InnoDB: Completed initialization of buffer pool
|
2019-01-18 23:16:19 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
|
2019-01-18 23:16:20 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=57817822986
|
2019-01-18 23:16:20 0 [Note] InnoDB: Starting final batch to recover 581 pages from redo log.
|
2019-01-18 23:16:45 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
|
2019-01-18 23:16:45 0 [Note] InnoDB: Creating shared tablespace for temporary tables
|
2019-01-18 23:16:45 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
|
2019-01-18 23:16:45 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
|
2019-01-18 23:16:45 0 [Note] InnoDB: Waiting for purge to start
|
2019-01-18 23:16:45 0 [Note] InnoDB: 10.3.12 started; log sequence number 57819083632; transaction id 22266838
|
2019-01-18 23:16:45 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
|
2019-01-18 23:16:45 0 [Note] InnoDB: Cannot open '/var/lib/mysql/ib_buffer_pool' for reading: No such file or directory
|
2019-01-18 23:16:45 0 [Note] Plugin 'FEEDBACK' is disabled.
|
2019-01-18 23:16:45 0 [Warning] Failed to setup SSL
|
2019-01-18 23:16:45 0 [Warning] SSL error: SSL_CTX_set_default_verify_paths failed
|
2019-01-18 23:16:45 0 [Warning] SSL error: error:02001002:system library:fopen:No such file or directory
|
2019-01-18 23:16:45 0 [Warning] SSL error: error:2006D080:BIO routines:BIO_new_file:no such file
|
2019-01-18 23:16:45 0 [Warning] SSL error: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib
|
2019-01-18 23:16:45 0 [Note] Server socket created on IP: '::'.
|
2019-01-18 23:16:45 0 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown
|
2019-01-18 23:16:45 0 [Note] WSREP: Stop replication
|
2019-01-18 23:16:45 0 [Note] WSREP: Closing send monitor...
|
2019-01-18 23:16:45 0 [Note] WSREP: Closed send monitor.
|
2019-01-18 23:16:45 0 [Note] WSREP: gcomm: terminating thread
|
2019-01-18 23:16:45 0 [Note] WSREP: gcomm: joining thread
|
2019-01-18 23:16:45 0 [Note] WSREP: gcomm: closing backend
|
2019-01-18 23:16:45 0 [Note] WSREP: view(view_id(NON_PRIM,3f8dc445,97) memb {
|
6d41a12c,0
|
} joined {
|
} left {
|
} partitioned {
|
3f8dc445,0
|
dc9c1631,0
|
})
|
2019-01-18 23:16:45 0 [Note] WSREP: view((empty))
|
2019-01-18 23:16:45 0 [Note] WSREP: gcomm: closed
|
2019-01-18 23:16:45 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
|
2019-01-18 23:16:45 0 [Note] WSREP: Flow-control interval: [4096, 4096]
|
2019-01-18 23:16:45 0 [Note] WSREP: Trying to continue unpaused monitor
|
2019-01-18 23:16:45 0 [Note] WSREP: Received NON-PRIMARY.
|
2019-01-18 23:16:45 0 [Note] WSREP: Shifting JOINER -> OPEN (TO: 6720610)
|
2019-01-18 23:16:45 0 [Note] WSREP: Received self-leave message.
|
2019-01-18 23:16:45 0 [Note] WSREP: Flow-control interval: [4096, 4096]
|
2019-01-18 23:16:45 0 [Note] WSREP: Trying to continue unpaused monitor
|
2019-01-18 23:16:45 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
|
2019-01-18 23:16:45 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 6720610)
|
2019-01-18 23:16:45 0 [Note] WSREP: RECV thread exiting 0: Success
|
2019-01-18 23:16:45 0 [Note] WSREP: recv_thread() joined.
|
2019-01-18 23:16:45 0 [Note] WSREP: Closing replication queue.
|
2019-01-18 23:16:45 0 [Note] WSREP: Closing slave action queue.
|
2019-01-18 23:16:45 0 [Note] WSREP: Signalling provider to continue.
|
2019-01-18 23:16:45 0 [Note] WSREP: SST received: c10bbe42-14dd-11e9-965c-ba31e4d39b18:6718724
|
2019-01-18 23:16:45 2 [ERROR] WSREP: gcs/src/gcs.cpp:_join():892: Sending JOIN failed: -103 (Software caused connection abort).
|
2019-01-18 23:16:45 0 [ERROR] WSREP: Trying to launch slave threads before creating connection at 'gcomm://172.21.42.110,172.21.42.131,172.21.42.113'
|
mysqld: /home/buildbot/buildbot/build/mariadb-10.3.12/sql/wsrep_thd.cc:446: void wsrep_create_appliers(long int): Assertion `0' failed.
|
190118 23:16:45 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.3.12-MariaDB-1:10.3.12+maria~bionic-log
|
key_buffer_size=16777216
|
read_buffer_size=131072
|
max_used_connections=0
|
max_threads=1002
|
thread_count=7
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4271151 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
2019-01-18 23:16:45 2 [ERROR] WSREP: Failed to JOIN the cluster after SST
|
stack_bottom = 0x0 thread_stack 0x40000
|
*** buffer overflow detected ***: /usr/sbin/mysqld terminated
|
After deleting the log file mariadb started without any issues.
This happened on two mariadb servers, both always crashed if the error_log file was bigger than 7GB.