Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10854

MariaDB 10.1.17 crashes when SST fails

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Won't Fix
    • 10.1.16, 10.1.17
    • N/A
    • Galera SST, wsrep
    • None
    • CENTOS 7.2 docker container

    Description

      1.) Start MariaDB to join existing cluster of SYNCED nodes
      2.) SST from selected donor DBPROD12 fails (due to misconfiguration: no write permission to /var/lib/mysql on DBPROD12)
      3.) mysqld should terminate, but it crashes with SIG11

      Log output:

      exec /usr/bin/mysqld --user=mysql --console --wsrep-node-name PROD_DB_CPY3 --wsrep-cluster-address=gcomm://192.168.5.0 --wsrep-sst-donor=DBPROD12
      2016-09-21  9:47:43 140115759253632 [Note] mysqld (mysqld 10.1.17-MariaDB) starting as process 1 ...
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: Read nil XID from storage engines, skipping position init
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: wsrep_load(): Galera 25.3.17(r3619) by Codership Oy <info@codership.com> loaded successfully.
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
      2016-09-21  9:47:43 140115759253632 [Warning] WSREP: Could not open state file for reading: '/var/lib/mysql//grastate.dat'
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.3.5; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 1G; gcache.size = 300M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc
      2016-09-21  9:47:43 140115317462784 [Note] WSREP: Service thread queue flushed.
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: wsrep_sst_grab()
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: Start replication
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: protonet asio version 0
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: Using CRC-32C for message checksums.
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: backend: asio
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: gcomm thread scheduling priority set to other:0 
      2016-09-21  9:47:43 140115759253632 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: restore pc from disk failed
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: GMCast version 0
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: (719a33a6, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: (719a33a6, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: EVS version 0
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: gcomm: connecting to group 'db_prod', peer '192.168.5.0:'
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: (719a33a6, 'tcp://0.0.0.0:4567') connection established to bab149a0 tcp://192.168.5.0:4567
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: (719a33a6, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.1.0:4567 tcp://192.168.3.0:4567 tcp://192.168.3.4:4567 
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: (719a33a6, 'tcp://0.0.0.0:4567') connection established to e20d6c1d tcp://192.168.1.0:4567
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: (719a33a6, 'tcp://0.0.0.0:4567') connection established to 34c496e7 tcp://192.168.3.0:4567
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: (719a33a6, 'tcp://0.0.0.0:4567') connection established to 379ead20 tcp://192.168.3.4:4567
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: declaring 34c496e7 at tcp://192.168.3.0:4567 stable
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: declaring 379ead20 at tcp://192.168.3.4:4567 stable
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: declaring bab149a0 at tcp://192.168.5.0:4567 stable
      2016-09-21  9:47:43 140115759253632 [Note] WSREP: declaring e20d6c1d at tcp://192.168.1.0:4567 stable
      2016-09-21  9:47:44 140115759253632 [Note] WSREP: Node 34c496e7 state prim
      2016-09-21  9:47:44 140115759253632 [Note] WSREP: view(view_id(PRIM,34c496e7,19) memb {
      	34c496e7,0
      	379ead20,0
      	719a33a6,0
      	bab149a0,0
      	e20d6c1d,0
      } joined {
      } left {
      } partitioned {
      })
      2016-09-21  9:47:44 140115759253632 [Note] WSREP: save pc into disk
      2016-09-21  9:47:45 140115759253632 [Note] WSREP: gcomm: connected
      2016-09-21  9:47:45 140115759253632 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
      2016-09-21  9:47:45 140115759253632 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
      2016-09-21  9:47:45 140115759253632 [Note] WSREP: Opened channel 'db_prod'
      2016-09-21  9:47:45 140115759253632 [Note] WSREP: Waiting for SST to complete.
      2016-09-21  9:47:45 140115246577408 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 5
      2016-09-21  9:47:45 140115246577408 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
      2016-09-21  9:47:45 140115246577408 [Note] WSREP: STATE EXCHANGE: sent state msg: 723eb4c1-7fe0-11e6-8c54-fe7027df9419
      2016-09-21  9:47:45 140115246577408 [Note] WSREP: STATE EXCHANGE: got state msg: 723eb4c1-7fe0-11e6-8c54-fe7027df9419 from 0 (DBPROD12)
      2016-09-21  9:47:45 140115246577408 [Note] WSREP: STATE EXCHANGE: got state msg: 723eb4c1-7fe0-11e6-8c54-fe7027df9419 from 1 (PROD_DB_CPY2)
      2016-09-21  9:47:45 140115246577408 [Note] WSREP: STATE EXCHANGE: got state msg: 723eb4c1-7fe0-11e6-8c54-fe7027df9419 from 3 (DBPROD13)
      2016-09-21  9:47:45 140115246577408 [Note] WSREP: STATE EXCHANGE: got state msg: 723eb4c1-7fe0-11e6-8c54-fe7027df9419 from 4 (DBPROD11)
      2016-09-21  9:47:45 140115246577408 [Note] WSREP: STATE EXCHANGE: got state msg: 723eb4c1-7fe0-11e6-8c54-fe7027df9419 from 2 (PROD_DB_CPY3)
      2016-09-21  9:47:45 140115246577408 [Note] WSREP: Quorum results:
      	version    = 3,
      	component  = PRIMARY,
      	conf_id    = 18,
      	members    = 4/5 (joined/total),
      	act_id     = 124760,
      	last_appl. = -1,
      	protocols  = 0/7/3 (gcs/repl/appl),
      	group UUID = bab21960-6ea4-11e6-ae86-86fdc580dcae
      2016-09-21  9:47:45 140115246577408 [Note] WSREP: Flow-control interval: [36, 36]
      2016-09-21  9:47:45 140115246577408 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 124760)
      2016-09-21  9:47:45 140115758938880 [Note] WSREP: State transfer required: 
      	Group state: bab21960-6ea4-11e6-ae86-86fdc580dcae:124760
      	Local state: 00000000-0000-0000-0000-000000000000:-1
      2016-09-21  9:47:45 140115758938880 [Note] WSREP: New cluster view: global state: bab21960-6ea4-11e6-ae86-86fdc580dcae:124760, view# 19: Primary, number of nodes: 5, my index: 2, protocol version 3
      2016-09-21  9:47:45 140115758938880 [Warning] WSREP: Gap in state sequence. Need state transfer.
      2016-09-21  9:47:45 140115217217280 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '192.168.3.5' --datadir '/var/lib/mysql/'   --parent '1' --binlog 'proddbcpy-bin' '
      2016-09-21  9:47:45 140115758938880 [Note] WSREP: Prepared SST request: rsync|192.168.3.5:4444/rsync_sst
      2016-09-21  9:47:45 140115758938880 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2016-09-21  9:47:45 140115758938880 [Note] WSREP: REPL Protocols: 7 (3, 2)
      2016-09-21  9:47:45 140115317462784 [Note] WSREP: Service thread queue flushed.
      2016-09-21  9:47:45 140115758938880 [Note] WSREP: Assign initial position for certification: 124760, protocol version: 3
      2016-09-21  9:47:45 140115317462784 [Note] WSREP: Service thread queue flushed.
      2016-09-21  9:47:45 140115758938880 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (bab21960-6ea4-11e6-ae86-86fdc580dcae): 1 (Operation not permitted)
      	 at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
      2016-09-21  9:47:45 140115246577408 [Note] WSREP: Member 2.0 (PROD_DB_CPY3) requested state transfer from 'DBPROD12'. Selected 0.0 (DBPROD12)(SYNCED) as donor.
      2016-09-21  9:47:45 140115246577408 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 124760)
      2016-09-21  9:47:45 140115758938880 [Note] WSREP: Requesting state transfer: success, donor: 0
      2016-09-21  9:47:47 140115246577408 [Warning] WSREP: 0.0 (DBPROD12): State transfer to 2.0 (PROD_DB_CPY3) failed: -255 (Unknown error 255)
      

      2016-09-21  9:47:47 140115246577408 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():736: Will never receive state. Need to abort.
      2016-09-21  9:47:47 140115246577408 [Note] WSREP: gcomm: terminating thread
      2016-09-21  9:47:47 140115246577408 [Note] WSREP: gcomm: joining thread
      2016-09-21  9:47:47 140115246577408 [Note] WSREP: gcomm: closing backend
      2016-09-21  9:47:47 140115246577408 [Note] WSREP: view(view_id(NON_PRIM,34c496e7,19) memb {
      	719a33a6,0
      } joined {
      } left {
      } partitioned {
      	34c496e7,0
      	379ead20,0
      	bab149a0,0
      	e20d6c1d,0
      })
      2016-09-21  9:47:47 140115246577408 [Note] WSREP: (719a33a6, 'tcp://0.0.0.0:4567') turning message relay requesting off
      2016-09-21  9:47:47 140115246577408 [Note] WSREP: view((empty))
      2016-09-21  9:47:47 140115246577408 [Note] WSREP: gcomm: closed
      2016-09-21  9:47:47 140115246577408 [Note] WSREP: mysqld: Terminated.
      160921  9:47:47 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
      Server version: 10.1.17-MariaDB
      key_buffer_size=0
      read_buffer_size=131072
      max_used_connections=0
      max_threads=153
      thread_count=2
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 336049 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
      Thread pointer: 0x0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x48400
      mysqld(my_print_stacktrace+0x2e)[0x7f6f3eb95ebe]
      mysqld(handle_fatal_signal+0x2d5)[0x7f6f3e6bc915]
      /lib64/libpthread.so.0(+0xf100)[0x7f6f3dcda100]
      /lib64/libc.so.6(abort+0x297)[0x7f6f3c034e37]
      /usr/lib64/galera/libgalera_smm.so(+0x803c7)[0x7f6f3681b3c7]
      addr2line: 'mysqld': No such file
      /usr/lib64/galera/libgalera_smm.so(_Z13gcs_core_recvP8gcs_coreP12gcs_act_rcvdx+0x1065)[0x7f6f36929855]
      /usr/lib64/galera/libgalera_smm.so(+0x191a24)[0x7f6f3692ca24]
      /lib64/libpthread.so.0(+0x7dc5)[0x7f6f3dcd2dc5]
      /lib64/libc.so.6(clone+0x6d)[0x7f6f3c0f4ced]
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      

      Attachments

        Activity

          People

            jplindst Jan Lindström (Inactive)
            marcriaDB Marc A. Mueller
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.