Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18308

Crash when starting with huge error_log file

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.3.12
    • Fix Version/s: 10.3, 10.4
    • Component/s: wsrep
    • Labels:
      None
    • Environment:
      Ubuntu 18.04.1 LTS

      Description

      Just ran into a very weird issue when starting a mariadb within a galera cluster that had a huge error_log file.
      Basically two of the nodes had some issues and filled thir error_log file with a lot of data within just a couple of hours, it was full of

      Record lock, heap no 322 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
       0: len 6; hex 00000001e753; asc      S;;
       1: len 6; hex 000000000000; asc       ;;
       2: len 7; hex 80000000000000; asc        ;;
       3: len 4; hex 000004a0; asc     ;;
       4: len 4; hex 00002dda; asc   - ;;
       
      Record lock, heap no 323 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
       0: len 6; hex 00000001e754; asc      T;;
       1: len 6; hex 000000000000; asc       ;;
       2: len 7; hex 80000000000000; asc        ;;
       3: len 4; hex 0000048e; asc     ;;
       4: len 4; hex 00002ddd; asc   - ;;
      

      Which itself is not the problem as the one of the galera nodes was still running without a problem. But the error_log file was 7GB big (logrotate would have rotated it, but only after 24h).

      We then started mariadb and it crashed with:

      2019-01-18 23:14:31 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2019-01-18 23:14:31 2 [Note] WSREP: REPL Protocols: 9 (4, 2)
      2019-01-18 23:14:31 2 [Note] WSREP: Assign initial position for certification: 6718718, protocol version: 4
      2019-01-18 23:14:31 0 [Note] WSREP: Service thread queue flushed.
      2019-01-18 23:14:31 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (c10bbe42-14dd-11e9-965c-ba31e4d39b18): 1 (Operation not permitted)
      	 at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
      2019-01-18 23:14:31 0 [Note] WSREP: Member 1.0 (galera2) requested state transfer from '*any*'. Selected 0.0 (galera1)(SYNCED) as donor.
      2019-01-18 23:14:31 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 6718721)
      2019-01-18 23:14:31 2 [Note] WSREP: Requesting state transfer: success, donor: 0
      2019-01-18 23:14:31 2 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> c10bbe42-14dd-11e9-965c-ba31e4d39b18:6718718
      2019-01-18 23:14:33 0 [Note] WSREP: (6d41a12c, 'tcp://0.0.0.0:4567') connection to peer 6d41a12c with addr tcp://172.21.42.131:4567 timed out, no messages seen in PT3S
      2019-01-18 23:14:33 0 [Note] WSREP: (6d41a12c, 'tcp://0.0.0.0:4567') turning message relay requesting off
      2019-01-18 23:16:18 0 [Note] WSREP: 0.0 (galera1): State transfer to 1.0 (galera2) complete.
      2019-01-18 23:16:18 0 [Note] WSREP: Member 0.0 (galera1) synced with group.
      WSREP_SST: [INFO] Joiner cleanup. rsync PID: 25756 (20190118 23:16:18.814)
      WSREP_SST: [INFO] Joiner cleanup done. (20190118 23:16:19.323)
      2019-01-18 23:16:19 0 [Note] WSREP: SST complete, seqno: 6718724
      2019-01-18 23:16:19 0 [Note] InnoDB: Using Linux native AIO
      2019-01-18 23:16:19 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      2019-01-18 23:16:19 0 [Note] InnoDB: Uses event mutexes
      2019-01-18 23:16:19 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2019-01-18 23:16:19 0 [Note] InnoDB: Number of pools: 1
      2019-01-18 23:16:19 0 [Note] InnoDB: Using SSE2 crc32 instructions
      2019-01-18 23:16:19 0 [Note] InnoDB: Initializing buffer pool, total size = 4G, instances = 4, chunk size = 128M
      2019-01-18 23:16:19 0 [Note] InnoDB: Completed initialization of buffer pool
      2019-01-18 23:16:19 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
      2019-01-18 23:16:20 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=57817822986
      2019-01-18 23:16:20 0 [Note] InnoDB: Starting final batch to recover 581 pages from redo log.
      2019-01-18 23:16:45 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
      2019-01-18 23:16:45 0 [Note] InnoDB: Creating shared tablespace for temporary tables
      2019-01-18 23:16:45 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
      2019-01-18 23:16:45 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
      2019-01-18 23:16:45 0 [Note] InnoDB: Waiting for purge to start
      2019-01-18 23:16:45 0 [Note] InnoDB: 10.3.12 started; log sequence number 57819083632; transaction id 22266838
      2019-01-18 23:16:45 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
      2019-01-18 23:16:45 0 [Note] InnoDB: Cannot open '/var/lib/mysql/ib_buffer_pool' for reading: No such file or directory
      2019-01-18 23:16:45 0 [Note] Plugin 'FEEDBACK' is disabled.
      2019-01-18 23:16:45 0 [Warning] Failed to setup SSL
      2019-01-18 23:16:45 0 [Warning] SSL error: SSL_CTX_set_default_verify_paths failed
      2019-01-18 23:16:45 0 [Warning] SSL error: error:02001002:system library:fopen:No such file or directory
      2019-01-18 23:16:45 0 [Warning] SSL error: error:2006D080:BIO routines:BIO_new_file:no such file
      2019-01-18 23:16:45 0 [Warning] SSL error: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib
      2019-01-18 23:16:45 0 [Note] Server socket created on IP: '::'.
      2019-01-18 23:16:45 0 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown
      2019-01-18 23:16:45 0 [Note] WSREP: Stop replication
      2019-01-18 23:16:45 0 [Note] WSREP: Closing send monitor...
      2019-01-18 23:16:45 0 [Note] WSREP: Closed send monitor.
      2019-01-18 23:16:45 0 [Note] WSREP: gcomm: terminating thread
      2019-01-18 23:16:45 0 [Note] WSREP: gcomm: joining thread
      2019-01-18 23:16:45 0 [Note] WSREP: gcomm: closing backend
      2019-01-18 23:16:45 0 [Note] WSREP: view(view_id(NON_PRIM,3f8dc445,97) memb {
      	6d41a12c,0
      } joined {
      } left {
      } partitioned {
      	3f8dc445,0
      	dc9c1631,0
      })
      2019-01-18 23:16:45 0 [Note] WSREP: view((empty))
      2019-01-18 23:16:45 0 [Note] WSREP: gcomm: closed
      2019-01-18 23:16:45 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2019-01-18 23:16:45 0 [Note] WSREP: Flow-control interval: [4096, 4096]
      2019-01-18 23:16:45 0 [Note] WSREP: Trying to continue unpaused monitor
      2019-01-18 23:16:45 0 [Note] WSREP: Received NON-PRIMARY.
      2019-01-18 23:16:45 0 [Note] WSREP: Shifting JOINER -> OPEN (TO: 6720610)
      2019-01-18 23:16:45 0 [Note] WSREP: Received self-leave message.
      2019-01-18 23:16:45 0 [Note] WSREP: Flow-control interval: [4096, 4096]
      2019-01-18 23:16:45 0 [Note] WSREP: Trying to continue unpaused monitor
      2019-01-18 23:16:45 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      2019-01-18 23:16:45 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 6720610)
      2019-01-18 23:16:45 0 [Note] WSREP: RECV thread exiting 0: Success
      2019-01-18 23:16:45 0 [Note] WSREP: recv_thread() joined.
      2019-01-18 23:16:45 0 [Note] WSREP: Closing replication queue.
      2019-01-18 23:16:45 0 [Note] WSREP: Closing slave action queue.
      2019-01-18 23:16:45 0 [Note] WSREP: Signalling provider to continue.
      2019-01-18 23:16:45 0 [Note] WSREP: SST received: c10bbe42-14dd-11e9-965c-ba31e4d39b18:6718724
      2019-01-18 23:16:45 2 [ERROR] WSREP: gcs/src/gcs.cpp:_join():892: Sending JOIN failed: -103 (Software caused connection abort).
      2019-01-18 23:16:45 0 [ERROR] WSREP: Trying to launch slave threads before creating connection at 'gcomm://172.21.42.110,172.21.42.131,172.21.42.113'
      mysqld: /home/buildbot/buildbot/build/mariadb-10.3.12/sql/wsrep_thd.cc:446: void wsrep_create_appliers(long int): Assertion `0' failed.
      190118 23:16:45 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
       
      Server version: 10.3.12-MariaDB-1:10.3.12+maria~bionic-log
      key_buffer_size=16777216
      read_buffer_size=131072
      max_used_connections=0
      max_threads=1002
      thread_count=7
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4271151 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      2019-01-18 23:16:45 2 [ERROR] WSREP: Failed to JOIN the cluster after SST
      stack_bottom = 0x0 thread_stack 0x40000
      *** buffer overflow detected ***: /usr/sbin/mysqld terminated
      

      After deleting the log file mariadb started without any issues.
      This happened on two mariadb servers, both always crashed if the error_log file was bigger than 7GB.

        Attachments

          Activity

            People

            Assignee:
            jplindst Jan Lindström
            Reporter:
            Schnitzel Michael Schmid
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated: