[MDEV-29893] SST fails when having datadir set to a symlink to actual data directory Created: 2022-10-27  Updated: 2023-12-07  Resolved: 2023-10-13

Status: Closed
Project: MariaDB Server
Component/s: Galera, Galera SST
Affects Version/s: 10.3.36, 10.4.26, 10.5.17, 10.10.1, 10.6.10, 10.7.6, 10.8.5, 10.9.3
Fix Version/s: 10.4.32, 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3, 11.2.2, 11.3.1

Type: Bug Priority: Critical
Reporter: Hartmut Holzgraefe Assignee: Julius Goryavsky
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Duplicate
is duplicated by MDEV-31074 Galera SST fails when datadir / logdi... Closed

 Description   

When having datadir pointing at a symbolic link to the actual data directory instead of the data directory itself an SST attempt using the mariabackup method fails on the --move-back step as the data directory still contains files like "ibdata1".

Looks as if the data directory purging step early in the SST workflow fails to detect that there are actual files still present in the data directory, wrongly interpreting a symbolic link as an empty directory.

How to reproduce:

  • Install a node without wsrep settings at first
  • shut down the node
  • move its datadir from /var/lib/mysql to e.g. /data/mysql
  • make /var/lib/mysql a symlink to /data/mysql
  • do not change datadir=... yet
  • add galera specific wsrep_* settings, using mariabackup as SST method
  • start up the node

The node will start to perform a SST, but will fail at the mariabackup --move-back step with e.g.:

    node-2: Oct 26 10:45:31 node-2 -innobackupex-move[2476]: [00] 2022-10-26 10:45:31 Error: Move file ibdata1 to /var/lib/mysql/ibdata1 failed: Destination file exists
    node-2: Oct 26 10:45:31 node-2 mariadbd[1922]: 2022-10-26 10:45:31 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.

This seems to have started with MariaDB 10.3, with 10.2 I can see the datadir being purged and SST succeeding even when using a symlink for datadir



 Comments   
Comment by Claudio Nanni [ 2023-09-11 ]

`find` does not follow the symlink when deleting the content of the datadir.

wsrep_sst_mariabackup:1429

       wsrep_log_info \
            "Cleaning the existing datadir and innodb-data/log directories"
        if [ "$OS" = 'FreeBSD' ]; then
            find -E ${ib_home_dir:+"$ib_home_dir"} \
                    ${ib_undo_dir:+"$ib_undo_dir"} \
                    ${ib_log_dir:+"$ib_log_dir"} \
                    ${ar_log_dir:+"$ar_log_dir"} \
                    "$DATA" -mindepth 1 -prune -regex "$cpat" \
                    -o -exec rm -rf {} >&2 \+
        else
            find ${ib_home_dir:+"$ib_home_dir"} \
                 ${ib_undo_dir:+"$ib_undo_dir"} \
                 ${ib_log_dir:+"$ib_log_dir"} \
                 ${ar_log_dir:+"$ar_log_dir"} \
                 "$DATA" -mindepth 1 -prune -regex "$cpat" \
                 -o -exec rm -rf {} >&2 \+
        fi

-L option should solve this.

[root@fedora 20614]# find /dbdata | head -5
/dbdata
[root@fedora 20614]# find -L /dbdata | head -5
/dbdata
/dbdata/mariabackup.move.log
/dbdata/performance_schema
/dbdata/performance_schema/db.opt
/dbdata/test

Generated at Thu Feb 08 10:12:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.