[MCOL-4376] In multi-node 5.4.1 cluster, WriteEngine on replica tries to create wrong path Created: 2020-11-03  Updated: 2020-11-03  Resolved: 2020-11-03

Status: Closed
Project: MariaDB ColumnStore
Component/s: writeengine
Affects Version/s: 5.4.1
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Geoff Montee (Inactive) Assignee: Todd Stoffel (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None


 Description   

I am encountering a problem where ColumnStore is trying to create a directory at the wrong path. It is trying to create a directory in the system's root directory, rather than in one of the dbroot directories.

This problem was seen with a multi-node ColumnStore 5.4.1 cluster that has 3 nodes. The cluster is using GlusterFS, but it is not using S3 storage. Thie cluster is on CentOS 8.

The problem happens when I try to create a table on the primary node (mcs1). On mcs1, I see the following error:

MariaDB [(none)]> CREATE TABLE inventory.products (
    ->   product_name varchar(11) NOT NULL DEFAULT '',
    ->   supplier varchar(128) NOT NULL DEFAULT '',
    ->   quantity varchar(128) NOT NULL DEFAULT '',
    ->   unit_cost varchar(128) NOT NULL DEFAULT ''
    -> ) ENGINE=Columnstore DEFAULT CHARSET=utf8;
ERROR 1815 (HY000): Internal error: CAL0009: (6)Create table failed due to  WE: Error creating column file for oid 3010;  Error in creating a directory.

The syslog on one of the replicas nodes (mcs2) has a couple more details:

Nov  3 01:25:37 mcs2 IDBFile[29632]: 37.564020 |0|0|0| E 35 CAL0002: Failed to create directories: "/000.dir", exception: boost::filesystem::create_directory: Permission denied: "/000.dir"

This error message seems to indicate that ColumnStore is trying to create a directory at the path /000.dir, which is in the system's root directory, rather than in one of ColumnStore's dbroot directories.

I wanted to find out more details about this error, so I decided to attach strace to the WriteEngine process on mcs2.

First, I got the WriteEngine's PID on mcs2:

  "mcs2": {
    "timestamp": "2020-11-03 01:17:44.584777",
    "uptime": 5857,
    "dbrm_mode": "slave",
    "cluster_mode": "readonly",
    "dbroots": [
      "2"
    ],
    "module_id": 1,
    "services": [
      {
        "name": "workernode",
        "pid": 29576
      },
      {
        "name": "PrimProc",
        "pid": 29587
      },
      {
        "name": "ExeMgr",
        "pid": 29622
      },
      {
        "name": "WriteEngine",
        "pid": 29632
      }
    ]
  },

And then I attached strace to the process:

$ mkdir writeengine_strace
$ sudo strace -s 256 -o ./writeengine_strace/strace_out -p 29632 -ff &

After reproducing the problem again, I looked at the strace output:

stat("/etc/columnstore/Columnstore.xml", {st_mode=S_IFREG|0644, st_size=19929, ...}) = 0
stat("/000.dir/000.dir/011.dir/194.dir/000.dir/FILE000.cdf", 0x7f466a1967d0) = -1 ENOENT (No such file or directory)
stat("/etc/columnstore/Columnstore.xml", {st_mode=S_IFREG|0644, st_size=19929, ...}) = 0
stat("/000.dir", 0x7f466a1967b0)        = -1 ENOENT (No such file or directory)
stat("/000.dir", 0x7f466a1963d0)        = -1 ENOENT (No such file or directory)
stat("/", {st_mode=S_IFDIR|0555, st_size=237, ...}) = 0
mkdir("/000.dir", 0777)                 = -1 EACCES (Permission denied)
stat("/000.dir", 0x7f466a196350)        = -1 ENOENT (No such file or directory)
getpid()                                = 29632
socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 18
connect(18, {sa_family=AF_UNIX, sun_path="/dev/log"}, 110) = 0
sendto(18, "<139>Nov  3 01:25:37 IDBFile[29632]: 37.564020 |0|0|0| E 35 CAL0002: Failed to create directories: \"/000.dir\", exception: boost::filesystem::create_directory: Permission denied: \"/000.dir\"\n ", 190, MSG_NOSIGNAL, NULL, 0) = 190
close(18)                               = 0
write(10, "7\301\373\24[\0\0\0\4\0\0\0\0\0\0\0%N\0\0\0WE: Error creating column file for oid 3010;  Error in creating a directory. \n", 99) = 99

The strace output confirms that ColumnStore is trying to create a directory at the path /000.dir, which is in the system's root directory, rather than in one of ColumnStore's dbroot directories.

I also confirmed that the WriteEngine process does not do a chroot by looking at /proc/PID/root:

$ sudo ls -l /proc/29632/root
lrwxrwxrwx. 1 mysql mysql 0 Nov  3 01:41 /proc/29632/root -> /

I also checked Columnstore.xml to confirm that all 3 dbroots are listed properly:

...
        <DBRootCount>3</DBRootCount>
        <DBRoot1>/var/lib/columnstore/data1</DBRoot1>
...
        <DBRoot2>/var/lib/columnstore/data2</DBRoot2>
        <DBRoot3>/var/lib/columnstore/data3</DBRoot3>
...

Of course, the mysql user can create directories in the dbroot directories without any permissions issues:

$ sudo -u mysql bash
$ whoami
mysql
$ mkdir -p /var/lib/columnstore/data1/000.dir
$ ls -ld /var/lib/columnstore/data1/000.dir
drwx------. 3 mysql mysql 21 Nov  2 23:51 /var/lib/columnstore/data1/000.dir
$ mkdir -p /var/lib/columnstore/data2/000.dir
$ ls -ld /var/lib/columnstore/data2/000.dir
drwxr-xr-x. 2 mysql mysql 6 Nov  3 01:50 /var/lib/columnstore/data2/000.dir
$ mkdir -p /var/lib/columnstore/data3/000.dir
$ ls -ld /var/lib/columnstore/data3/000.dir
drwxr-xr-x. 2 mysql mysql 6 Nov  3 01:50 /var/lib/columnstore/data3/000.dir

Why is ColumnStore trying to create a directory in the system's root directory, rather than in one of the dbroot directories?



 Comments   
Comment by Todd Stoffel (Inactive) [ 2020-11-03 ]

MariaDB [(none)]> create database inventory;
Query OK, 1 row affected (0.000 sec)
 
MariaDB [(none)]> CREATE TABLE inventory.products (
    -> product_name VARCHAR(11) NOT NULL DEFAULT '',
    -> supplier VARCHAR(128) NOT NULL DEFAULT '',
    -> quantity VARCHAR(128) NOT NULL DEFAULT '',
    -> unit_cost VARCHAR(128) NOT NULL DEFAULT ''
    -> ) ENGINE=Columnstore DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.712 sec)
 
MariaDB [(none)]> select * from inventory.products;
Empty set (0.068 sec)

[root@ip-172-31-36-175 columnstore]# ls -la
total 600
drwxrwxr-x.  8 mysql mysql    116 Nov  2 23:01 .
drwxr-xr-x. 34 root  root    4096 Nov  2 23:00 ..
drwxr-xr-x.  3 mysql mysql     18 Nov  2 23:01 data
drwxr-xr-t.  5 mysql mysql    103 Nov  2 23:04 data1
drwxr-xr-x.  5 mysql mysql     84 Nov  2 23:04 data2
drwxr-xr-x.  4 mysql mysql     39 Nov  3 02:54 data3
-rw-r--r--.  1 mysql mysql 608096 Oct 16 07:09 libjemalloc.so.2
drwxrwxr-x.  2 mysql mysql     20 Nov  2 23:01 local
drwxr-xr-x.  3 mysql mysql     51 Nov  2 23:01 storagemanager

[root@ip-172-31-36-175 columnstore]# cat /etc/fstab
UUID=935ebc4e-90b8-432b-a8f6-8a9934a94892	/		xfs	defaults,noatime,nodiratime	0 0
127.0.0.1:data1 /var/lib/columnstore/data1 glusterfs defaults 0 0
127.0.0.1:data2 /var/lib/columnstore/data2 glusterfs defaults 0 0
127.0.0.1:data3 /var/lib/columnstore/data3 glusterfs defaults 0 0
127.0.0.1:storagemanager /var/lib/columnstore/storagemanager glusterfs defaults 0 0

[root@ip-172-31-36-175 columnstore]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
 
CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="8"

Generated at Thu Feb 08 02:49:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.