[MDEV-13478] Full SST sync fails because of the error in the cleaning part - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.1.20, 5.5.51-galera, 10.2.7
Fix Version/s: 10.1.30, 10.2.12, 5.5.59-galera, 10.0.34-galera
Component/s: Configuration
Labels:
- galera
- wsrep
Environment:

Hide
[root@eap-db01 ~]# rpm -qa | grep -i maria
MariaDB-compat-10.2.7-1.el7.centos.x86_64
MariaDB-shared-10.2.7-1.el7.centos.x86_64
MariaDB-common-10.2.7-1.el7.centos.x86_64
MariaDB-client-10.2.7-1.el7.centos.x86_64
MariaDB-server-10.2.7-1.el7.centos.x86_64

Show
[ root@eap-db01 ~]# rpm -qa | grep -i maria MariaDB-compat-10.2.7-1.el7.centos.x86_64 MariaDB-shared-10.2.7-1.el7.centos.x86_64 MariaDB-common-10.2.7-1.el7.centos.x86_64 MariaDB-client-10.2.7-1.el7.centos.x86_64 MariaDB-server-10.2.7-1.el7.centos.x86_64

Sprint:
10.1.30

Description

Hi MariaDB Folks,

We had the same problem with MariaDB verion 5.5 and also now after upgrading to 10.2.7

Problem desc:

When node needs a full state snapshot (SST) first sync attempt always fails in our case. It does not fail at the very beginning of SST phase but only after data transfer is completed. We are waiting 45 min to transfer 180GB and eventually it always fails.

Error message is seen in the logs when data are being streamed:

2017-08-08 23:14:56 140013350135552 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.28.253.22:5444' --datadir '/data/'   --parent '14018'  '' '

WSREP_SST: [INFO] Streaming with xbstream (20170808 23:14:56.343)

WSREP_SST: [INFO] Using socat as streamer (20170808 23:14:56.346)

WSREP_SST: [INFO] Stale sst_in_progress file: /data//sst_in_progress (20170808 23:14:56.354)

2017-08-08 23:14:56 140014532581120 [Note] WSREP: Prepared SST request: xtrabackup-v2|10.28.253.22:5444/xtrabackup_sst//1

2017-08-08 23:14:56 140014532581120 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

2017-08-08 23:14:56 140014532581120 [Note] WSREP: REPL Protocols: 7 (3, 2)

2017-08-08 23:14:56 140014532581120 [Note] WSREP: Assign initial position for certification: 436077364, protocol version: 3

2017-08-08 23:14:56 140014549366528 [Note] WSREP: Service thread queue flushed.

2017-08-08 23:14:56 140014532581120 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (d01e5332-2656-11e4-ae7b-56db42595f62): 1 (Operation not permitted)

         at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.

2017-08-08 23:14:56 140013358528256 [Note] WSREP: Member 0.0 (eap-db01) requested state transfer from 'eap-db02'. Selected 2.0 (eap-db02)(SYNCED) as donor.

2017-08-08 23:14:56 140013358528256 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 436077364)

2017-08-08 23:14:56 140014532581120 [Note] WSREP: Requesting state transfer: success, donor: 2

2017-08-08 23:14:56 140014532581120 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(d01e5332-2656-11e4-ae7b-56db42595f62:436077364)

WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:5444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20170808 23:14:56.402)

WSREP_SST: [INFO] Proceeding with SST (20170808 23:14:56.684)

WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:5444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20170808 23:14:56.685)

WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20170808 23:14:56.687)

## lot's of files removed here, output cut ###

removed ‘/data/ethangroup/bill_payments.ibd’

removed ‘/data/ethangroup/account_tokens.ibd’

removed ‘/data/ethangroup/identity.ibd’

removed ‘/data/ethangroup/sp_trans_xml.ibd’

removed ‘/data/phoneclub/bill_trans_to_usage.frm’

removed directory: ‘/data/phoneclub’

find: ‘/data/phoneclub/bill_trans_to_usage.frm’: No such file or directory

removed ‘/data/testing_logging/login_logs.ibd’

removed ‘/data/testing_logging/ec_dump2.frm’

removed ‘/data/testing_logging/ec_errorlog.frm’

removed ‘/data/testing_logging/queue_consumer.frm’

removed ‘/data/testing_logging/queue_function.frm’

removed directory: ‘/data/testing_logging’

removed ‘/data/aria_log.00000001’

removed ‘/data/multi-master.info’

removed ‘/data/mysql_upgrade_info’

removed ‘/data/ib_buffer_pool’

WSREP_SST: [ERROR] Cleanup after exit with status:1 (20170808 23:14:57.373)

So key thing in above snapshot is a second attempt to remove the file which already had been removed earlier:

find: ‘/data/phoneclub/bill_trans_to_usage.frm’: No such file or directory

two lines before:

removed ‘/data/phoneclub/bill_trans_to_usage.frm’

It's a random file, not always same.

As a result whole cleanup phase fails with error:

WSREP_SST: [ERROR] Cleanup after exit with status:1 (20170808 23:14:57.373)

Code responsible for this part:
/bin/wsrep_sst_xtrabackup-v2

    890         wsrep_log_info "Cleaning the existing datadir and innodb-data/log directories"

    891         find $ib_home_dir $ib_log_dir $ib_undo_dir $DATA -mindepth 1  -regex $cpat  -prune  -o -exec rm -rfv {} 1>&2 \+

Second SST attempt is successful as all files and directories have already been removed. I wasn't able to find the root cause why find is trying to remove same files again.

We have quite a lot databases and tables in mysql data directory, but I don't think it may cause this kind of issues.

[root@eap-db03 data]# ls -lR /data/| wc -l

To make the first SST workable we need to change line 891 to:

find $ib_home_dir $ib_log_dir $ib_undo_dir $DATA -mindepth 1  -regex $cpat  -prune -o -print0 | xargs -0 /bin/rm -rf

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

db_structure.txt
2017-08-21 13:47
359 kB
Kamil

Issue Links

relates to

MDEV-13789 mariabackup galera SST fail

Closed

Activity

People

Assignee:: Sachin Setiya (Inactive)

Reporter:: Kamil

Votes:: 2 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 2017-08-09 00:56

Updated:: 2024-07-08 00:38

Resolved:: 2017-12-19 18:56

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.