[MDEV-18411] Mariadb 10.2.21 unexpected crashes - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Cannot Reproduce
Affects Version/s: 10.2.21, 10.3.25
Fix Version/s: N/A
Component/s: Galera, wsrep
Labels:
- docker
- galera
- need_feedback
- wsrep
Environment:
Docker version 18.09.1, build 4c52b90
Container version: 10.2.21-MariaDB-1:10.2.21

Description

We have recently updated a mariadb 10.2.17 to 10.2.21.
The update was very smooth without any errors. At least once a day since we did the minor version update we have a crash. The crash happens mostly on our main write node, the other two are read nodes.

The differences in the logs, i see with the new version is this Warning multiple times throughout the day:

{"log":"2019-01-29  0:14:00 139950395401984 [Warning] WSREP: SQL statement was ineffective  thd: 2929856  buf: 226\n","stream":"stderr","time":"2019-01-28T22:14:00.682735534Z"}

{"log":"schema: xxxxxx \n","stream":"stderr","time":"2019-01-28T22:14:00.682766101Z"}

{"log":"QUERY: commit\n","stream":"stderr","time":"2019-01-28T22:14:00.682771205Z"}

{"log":" =\u003e Skipping replication\n","stream":"stderr","time":"2019-01-28T22:14:00.68277568Z"}

Now the crash happens after multiple of these warnings in this manner:

{"log":"2019-01-29  0:14:00 139950395401984 [Warning] WSREP: SQL statement was ineffective  thd: 2929856  buf: 226\n","stream":"stderr","time":"2019-01-28T22:14:00.682735534Z"}

{"log":"schema: XXXXXX \n","stream":"stderr","time":"2019-01-28T22:14:00.682766101Z"}

{"log":"QUERY: commit\n","stream":"stderr","time":"2019-01-28T22:14:00.682771205Z"}

{"log":" =\u003e Skipping replication\n","stream":"stderr","time":"2019-01-28T22:14:00.68277568Z"}

{"log":"2019-01-29  0:14:00 139950400808704 [Warning] WSREP: SQL statement was ineffective  thd: 2929858  buf: 226\n","stream":"stderr","time":"2019-01-28T22:14:00.692498864Z"}

{"log":"schema: XXXXXX \n","stream":"stderr","time":"2019-01-28T22:14:00.692525167Z"}

{"log":"QUERY: commit\n","stream":"stderr","time":"2019-01-28T22:14:00.692534055Z"}

{"log":" =\u003e Skipping replication\n","stream":"stderr","time":"2019-01-28T22:14:00.692541387Z"}

{"log":"2019-01-29  0:15:03 139950402430720 [Warning] WSREP: SQL statement was ineffective  thd: 2931591  buf: 214\n","stream":"stderr","time":"2019-01-28T22:15:03.76604226Z"}

{"log":"schema: XXXXXX \n","stream":"stderr","time":"2019-01-28T22:15:03.766070584Z"}

{"log":"QUERY: commit\n","stream":"stderr","time":"2019-01-28T22:15:03.766076151Z"}

{"log":" =\u003e Skipping replication\n","stream":"stderr","time":"2019-01-28T22:15:03.76608061Z"}

{"log":"2019-01-29  0:15:03 139950402430720 [ERROR] WSREP: FSM: no such a transition ROLLED_BACK -\u003e ROLLED_BACK\n","stream":"stderr","time":"2019-01-28T22:15:03.766085945Z"}

{"log":"190129  0:15:03 [ERROR] mysqld got signal 6 ;\n","stream":"stderr","time":"2019-01-28T22:15:03.766242364Z"}

After these error the line before the crash is :

{"log":"Fatal signal 11 while backtracing\n","stream":"stderr","time":"2019-01-28T22:15:14.47965349Z"}

The node later recovers connecting back to the cluster.

I have looked up this error:

 WSREP: FSM: no such a transition ROLLED_BACK

in issue: https://mariadb.atlassian.net/browse/MDEV-7217
It seems pretty old and i shouldn't have it in this up to date version of mariadb. Any suggestions are welcome. Any more information i would be glad to provide.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mariadb.10.2.25.log
8 kB
2021-04-14 09:50
mariadb102121_logs_jan29_2019.txt
20 kB
2019-01-29 14:48

Activity

Ascending order - Click to sort in descending order

Elena Stepanova added a comment - 2019-01-29 11:56

Please provide the complete error log.

Elena Stepanova added a comment - 2019-01-29 11:56 Please provide the complete error log.

george koumantaris added a comment - 2019-01-29 14:49

please see attachment

george koumantaris added a comment - 2019-01-29 14:49 please see attachment

george koumantaris added a comment - 2019-02-04 06:16

Hello,

Just want to mention that this happens regularly. The errors are exactly the same.

The same error:

[ERROR] WSREP: FSM: no such a transition ROLLED_BACK

followed by :

[ERROR] mysqld got signal 6

Other than that, its the multiple warnings:

[Warning] WSREP: SQL statement was ineffective

george koumantaris added a comment - 2019-02-04 06:16 Hello, Just want to mention that this happens regularly. The errors are exactly the same. The same error: [ERROR] WSREP: FSM: no such a transition ROLLED_BACK followed by : [ERROR] mysqld got signal 6 Other than that, its the multiple warnings: [Warning] WSREP: SQL statement was ineffective

Sergey Sokolov added a comment - 2021-04-14 09:51

We have similar error on 10.2.25.

Some warnings [Warning] WSREP: SQL statement was ineffective QUERY: commit and after that [ERROR] WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK.

mariadb.10.2.25.log

Sergey Sokolov added a comment - 2021-04-14 09:51 We have similar error on 10.2.25. Some warnings [Warning] WSREP: SQL statement was ineffective QUERY: commit and after that [ERROR] WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK. Log in attachement. mariadb.10.2.25.log

Jan Lindström (Inactive) added a comment - 2021-04-14 09:56

I suggest you upgrade to more recent version of MariaDB 10.2 and galera library and retry. If this still repeats, I would need some instructions how to reproduce.

Jan Lindström (Inactive) added a comment - 2021-04-14 09:56 I suggest you upgrade to more recent version of MariaDB 10.2 and galera library and retry. If this still repeats, I would need some instructions how to reproduce.

Sergey Sokolov added a comment - 2021-04-14 10:01

About month ago we try upgrade production environment to 10.2.36, but other problems forced us to roll back the version to 10.2.25. Ok, i'll try to reproduce problem in test environment and latest version of 10.2.

Sergey Sokolov added a comment - 2021-04-14 10:01 About month ago we try upgrade production environment to 10.2.36, but other problems forced us to roll back the version to 10.2.25. Ok, i'll try to reproduce problem in test environment and latest version of 10.2.

Lars Mikkelsen added a comment - 2021-05-03 12:49

Seing something similar on 10.3.25

2021-05-03 14:04:39 1669694 [ERROR] WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK

210503 14:04:39 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.3.25-MariaDB-log

key_buffer_size=134217728

read_buffer_size=131072

max_used_connections=701

max_threads=602

thread_count=382

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1461324 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f310845b2a8

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x7f31a29bad00 thread_stack 0x49000

/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55562a271cde]

/usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x555629d0720f]

sigaction.c:0(__restore_rt)[0x7f497ab0d5d0]

:0(__GI_raise)[0x7f4978de02c7]

:0(__GI_abort)[0x7f4978de19b8]

/usr/lib64/galera/libgalera_smm.so(+0x1a281c)[0x7f4954b2d81c]

src/fsm.hpp:104(galera::FSM<galera::TrxHandle::State, galera::TrxHandle::Transition, galera::EmptyGuard, galera::EmptyAction>::shift_to(galera::TrxHandle::State))[0x7f4954b23cc6]

src/gu_atomic.hpp:59(gu::Atomic<long long>::operator++())[0x7f4954b34975]

/usr/sbin/mysqld(_Z12wsrep_commitP10handlertonP3THDb+0xd2)[0x555629c72162]

/usr/sbin/mysqld(+0x7b4475)[0x555629d08475]

/usr/sbin/mysqld(_Z15ha_commit_transP3THDb+0x4e6)[0x555629d0a906]

/usr/sbin/mysqld(_Z12trans_commitP3THD+0x4a)[0x555629c116da]

/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x2e76)[0x555629b24396]

/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x36d)[0x555629b2a64d]

/usr/sbin/mysqld(+0x4db110)[0x555629a2f110]

/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x2eac)[0x555629b2ddac]

/usr/sbin/mysqld(_Z10do_commandP3THD+0x11b)[0x555629b2e17b]

/usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1d6)[0x555629c04a96]

/usr/sbin/mysqld(handle_one_connection+0x3d)[0x555629c04bad]

/usr/sbin/mysqld(+0xcce97d)[0x55562a22297d]

pthread_create.c:0(start_thread)[0x7f497ab05dd5]

2021-05-03 14:04:40 1669201 [Warning] Aborted connection 1669201 to db: 'catalogservice' user: 'catalogservice' host: 'maxscale01.java.jysk.netic.dk' (CLOSE_CONNECTION)

/lib64/libc.so.6(clone+0x6d)[0x7f4978ea7f6d]

Trying to get some variables.

Some pointers may be invalid and cause the dump to abort.

Query (0x7f310859a660): COMMIT

Connection ID (thread ID): 1669694

Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_s

can=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on

The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains

information that should help you find out what is causing the crash.

Writing a core file...

Working directory at /data/mysql/data

Resource Limits:

Limit                     Soft Limit           Hard Limit           Units

Max cpu time              unlimited            unlimited            seconds

Max file size             unlimited            unlimited            bytes

Max data size             unlimited            unlimited            bytes

Max stack size            8388608              unlimited            bytes

Max core file size        0                    unlimited            bytes

Max resident set          unlimited            unlimited            bytes

Max processes             127405               127405               processes

Max open files            16384                16384                files

Max locked memory         65536                65536                bytes

Max address space         unlimited            unlimited            bytes

Max file locks            unlimited            unlimited            locks

Max pending signals       127405               127405               signals

Max msgqueue size         819200               819200               bytes

Max nice priority         0                    0

Max realtime priority     0                    0

Max realtime timeout      unlimited            unlimited            us

Core pattern: core

Lars Mikkelsen added a comment - 2021-05-03 12:49 Seing something similar on 10.3.25 2021-05-03 14:04:39 1669694 [ERROR] WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK 210503 14:04:39 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.3.25-MariaDB-log key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=701 max_threads=602 thread_count=382 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1461324 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7f310845b2a8 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f31a29bad00 thread_stack 0x49000 /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55562a271cde] /usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x555629d0720f] sigaction.c:0(__restore_rt)[0x7f497ab0d5d0] :0(__GI_raise)[0x7f4978de02c7] :0(__GI_abort)[0x7f4978de19b8] /usr/lib64/galera/libgalera_smm.so(+0x1a281c)[0x7f4954b2d81c] src/fsm.hpp:104(galera::FSM<galera::TrxHandle::State, galera::TrxHandle::Transition, galera::EmptyGuard, galera::EmptyAction>::shift_to(galera::TrxHandle::State))[0x7f4954b23cc6] src/gu_atomic.hpp:59(gu::Atomic<long long>::operator++())[0x7f4954b34975] /usr/sbin/mysqld(_Z12wsrep_commitP10handlertonP3THDb+0xd2)[0x555629c72162] /usr/sbin/mysqld(+0x7b4475)[0x555629d08475] /usr/sbin/mysqld(_Z15ha_commit_transP3THDb+0x4e6)[0x555629d0a906] /usr/sbin/mysqld(_Z12trans_commitP3THD+0x4a)[0x555629c116da] /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x2e76)[0x555629b24396] /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x36d)[0x555629b2a64d] /usr/sbin/mysqld(+0x4db110)[0x555629a2f110] /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x2eac)[0x555629b2ddac] /usr/sbin/mysqld(_Z10do_commandP3THD+0x11b)[0x555629b2e17b] /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1d6)[0x555629c04a96] /usr/sbin/mysqld(handle_one_connection+0x3d)[0x555629c04bad] /usr/sbin/mysqld(+0xcce97d)[0x55562a22297d] pthread_create.c:0(start_thread)[0x7f497ab05dd5] 2021-05-03 14:04:40 1669201 [Warning] Aborted connection 1669201 to db: 'catalogservice' user: 'catalogservice' host: 'maxscale01.java.jysk.netic.dk' (CLOSE_CONNECTION) /lib64/libc.so.6(clone+0x6d)[0x7f4978ea7f6d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x7f310859a660): COMMIT Connection ID (thread ID): 1669694 Status: NOT_KILLED Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_s can=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash. Writing a core file... Working directory at /data/mysql/data Resource Limits: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 127405 127405 processes Max open files 16384 16384 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 127405 127405 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Core pattern: core

george koumantaris added a comment - 2021-10-05 14:28

Hello,
Since i have reported this bug i upgraded first to 10.4 and later on to 10.5. I did not experience any of these errors i was getting on 10.2 version. The dataset is the same and much larger.
thank you

george koumantaris added a comment - 2021-10-05 14:28 Hello, Since i have reported this bug i upgraded first to 10.4 and later on to 10.5. I did not experience any of these errors i was getting on 10.2 version. The dataset is the same and much larger. thank you

People

Assignee:: Ramesh Sivaraman

Reporter:: george koumantaris

Votes:: 3 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 2019-01-29 06:57

Updated:: 2024-07-07 23:23

Resolved:: 2021-11-22 10:11

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration