[MDEV-15770] We have three node galera cluster with mariadb, bootstrap primary node is running but other two nodes not able to recover the data from first node and crash after some data recover like after recovery 15GB data - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.1.31
Fix Version/s: 10.1(EOL)
Component/s: None
Labels:
None
Environment:
CentOS 7.4

Description

We have three node galera cluster with mariadb, bootstrap primary node is running but other two nodes not able to recover the data from first node and crash after some data recover like after recovery 15GB data.
------------------------------------------

Service Status:
-------------------
[root@AZABIR-ID01 data]# systemctl status mariadb.service
● mariadb.service - MariaDB 10.1.31 database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: failed (Result: exit-code) since Wed 2018-04-04 06:09:06 CEST; 56s ago
Docs: man:mysqld(8)
https://mariadb.com/kb/en/library/systemd/
Process: 25879 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=1/FAILURE)
Process: 25875 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)

Apr 04 06:09:06 AZABIR-ID01.azure.cloud.corp.local sh[25879]: 2018-04-04 6:09:05 139810169485568 [Note] Loaded 'file_key_management.so' with offset 0x7f28081fb000
Apr 04 06:09:06 AZABIR-ID01.azure.cloud.corp.local sh[25879]: 2018-04-04 6:09:05 139810169485568 [Note] InnoDB: Started in read only mode
Apr 04 06:09:06 AZABIR-ID01.azure.cloud.corp.local sh[25879]: 2018-04-04 6:09:05 139810169485568 [Note] InnoDB: Using mutexes to ref count buffer pool pages
Apr 04 06:09:06 AZABIR-ID01.azure.cloud.corp.local sh[25879]: 2018-04-04 6:09:05 139810169485568 [Note] InnoDB: The InnoDB memory heap is disabled
Apr 04 06:09:06 AZABIR-ID01.azure.cloud.corp.local sh[25879]: 2018-04-04 6:09:05 139810169485568 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
Apr 04 06:09:06 AZABIR-ID01.azure.cloud.corp.local sh[25879]: 2018-04-04 6:09:05 139810169485568 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for m...y barrier
Apr 04 06:09:06 AZABIR-ID01.azure.cloud.corp.local systemd[1]: mariadb.service: control process exited, code=exited status=1
Apr 04 06:09:06 AZABIR-ID01.azure.cloud.corp.local systemd[1]: Failed to start MariaDB 10.1.31 database server.
Apr 04 06:09:06 AZABIR-ID01.azure.cloud.corp.local systemd[1]: Unit mariadb.service entered failed state.
Apr 04 06:09:06 AZABIR-ID01.azure.cloud.corp.local systemd[1]: mariadb.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
[root@AZABIR-ID01 data]# /etc/init.d/mysql start
Starting mysql (via systemctl): Job for mariadb.service failed because the control process exited with error code. See "systemctl status mariadb.service" and "journalctl -xe" for details.
[FAILED]
=======================================

Log errors:

------------------------------------------------------------------------

2018-04-04 6:01:38 140400691869952 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer...
2018-04-04 6:01:38 140400691869952 [Note] InnoDB: Starting final batch to recover 272 pages from redo log
2018-04-04 6:01:38 140400691869952 [ERROR] InnoDB: Trying to access page number 254 in space 30854 space name DB_GRPUK_P/cache_form, which is outside the tablespace bounds. Byte offset 0, len 16384 i/o type 10.
2018-04-04 06:01:38 7fb1955d7900 InnoDB: Assertion failure in thread 140400691869952 in file ha_innodb.cc line 22015
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
180404 6:01:38 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.1.31-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=0
max_threads=2002
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4529069 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...

stack_bottom = 0x0 thread_stack 0x48400
mysys/stacktrace.c:268(my_print_stacktrace)[0x55df5319a1ce]
sql/signal_handler.cc:168(handle_fatal_signal)[0x55df52cbdfb5]
sigaction.c:0(__restore_rt)[0x7fb1951ed5e0]
:0(__GI_raise)[0x7fb1934c61f7]
:0(__GI_abort)[0x7fb1934c78e8]
handler/ha_innodb.cc:22015(ib_logf(ib_log_level_t, char const*, ...))[0x55df52f0b13f]
fil/fil0fil.cc:5982(fil_io(unsigned long, bool, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, void*, void*, unsigned long*, trx_t*, bool))[0x55df530bb323]
buf/buf0rea.cc:263(buf_read_page_low(dberr_t*, bool, unsigned long, unsigned long, unsigned long, unsigned long, long, unsigned long, trx_t*, bool))[0x55df5307b0a0]
buf/buf0rea.cc:1119(buf_read_recv_pages(unsigned long, unsigned long, unsigned long, unsigned long const*, unsigned long))[0x55df5307eb66]
log/log0recv.cc:1857(recv_apply_hashed_log_recs(bool))[0x55df52f5d44f]
srv/srv0start.cc:2663(innobase_start_or_create_for_mysql())[0x55df52ff076e]
handler/ha_innodb.cc:4479(innobase_init(void*))[0x55df52f0f67d]
sql/handler.cc:521(ha_initialize_handlerton(st_plugin_int*))[0x55df52cc0264]
sql/sql_plugin.cc:1409(plugin_initialize(st_mem_root*, st_plugin_int*, int*, char**, bool))[0x55df52b47e70]
sql/sql_plugin.cc:1686(plugin_init(int*, char**, int))[0x55df52b4975a]
sql/mysqld.cc:5133(init_server_components())[0x55df52a9eb88]
sql/mysqld.cc:5722(mysqld_main(int, char**))[0x55df52aa2630]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fb1934b2c05]
//sbin/mysqld(+0x39910d)[0x55df52a9610d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mysql.error.log
16 kB
2018-04-04 04:34
server.cnf
2 kB
2018-04-04 11:19
azabnl-id05_errors.txt
98 kB
2018-04-10 14:48
errors_azabnl-id01_chk1.txt
76 kB
2018-04-10 14:49

Issue Links

relates to

MDEV-11035 Restore removed disallow-writes for Galera

Closed

MDEV-10949 innodb_disallow_writes does not work as expected

Closed

We have three node galera cluster with mariadb, bootstrap primary node is running but other two nodes not able to recover the data from first node and crash after some data recover like after recovery 15GB data

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration