Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4376

In multi-node 5.4.1 cluster, WriteEngine on replica tries to create wrong path

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 5.4.1
    • N/A
    • writeengine
    • None

    Description

      I am encountering a problem where ColumnStore is trying to create a directory at the wrong path. It is trying to create a directory in the system's root directory, rather than in one of the dbroot directories.

      This problem was seen with a multi-node ColumnStore 5.4.1 cluster that has 3 nodes. The cluster is using GlusterFS, but it is not using S3 storage. Thie cluster is on CentOS 8.

      The problem happens when I try to create a table on the primary node (mcs1). On mcs1, I see the following error:

      MariaDB [(none)]> CREATE TABLE inventory.products (
          ->   product_name varchar(11) NOT NULL DEFAULT '',
          ->   supplier varchar(128) NOT NULL DEFAULT '',
          ->   quantity varchar(128) NOT NULL DEFAULT '',
          ->   unit_cost varchar(128) NOT NULL DEFAULT ''
          -> ) ENGINE=Columnstore DEFAULT CHARSET=utf8;
      ERROR 1815 (HY000): Internal error: CAL0009: (6)Create table failed due to  WE: Error creating column file for oid 3010;  Error in creating a directory.
      

      The syslog on one of the replicas nodes (mcs2) has a couple more details:

      Nov  3 01:25:37 mcs2 IDBFile[29632]: 37.564020 |0|0|0| E 35 CAL0002: Failed to create directories: "/000.dir", exception: boost::filesystem::create_directory: Permission denied: "/000.dir"
      

      This error message seems to indicate that ColumnStore is trying to create a directory at the path /000.dir, which is in the system's root directory, rather than in one of ColumnStore's dbroot directories.

      I wanted to find out more details about this error, so I decided to attach strace to the WriteEngine process on mcs2.

      First, I got the WriteEngine's PID on mcs2:

        "mcs2": {
          "timestamp": "2020-11-03 01:17:44.584777",
          "uptime": 5857,
          "dbrm_mode": "slave",
          "cluster_mode": "readonly",
          "dbroots": [
            "2"
          ],
          "module_id": 1,
          "services": [
            {
              "name": "workernode",
              "pid": 29576
            },
            {
              "name": "PrimProc",
              "pid": 29587
            },
            {
              "name": "ExeMgr",
              "pid": 29622
            },
            {
              "name": "WriteEngine",
              "pid": 29632
            }
          ]
        },
      

      And then I attached strace to the process:

      $ mkdir writeengine_strace
      $ sudo strace -s 256 -o ./writeengine_strace/strace_out -p 29632 -ff &
      

      After reproducing the problem again, I looked at the strace output:

      stat("/etc/columnstore/Columnstore.xml", {st_mode=S_IFREG|0644, st_size=19929, ...}) = 0
      stat("/000.dir/000.dir/011.dir/194.dir/000.dir/FILE000.cdf", 0x7f466a1967d0) = -1 ENOENT (No such file or directory)
      stat("/etc/columnstore/Columnstore.xml", {st_mode=S_IFREG|0644, st_size=19929, ...}) = 0
      stat("/000.dir", 0x7f466a1967b0)        = -1 ENOENT (No such file or directory)
      stat("/000.dir", 0x7f466a1963d0)        = -1 ENOENT (No such file or directory)
      stat("/", {st_mode=S_IFDIR|0555, st_size=237, ...}) = 0
      mkdir("/000.dir", 0777)                 = -1 EACCES (Permission denied)
      stat("/000.dir", 0x7f466a196350)        = -1 ENOENT (No such file or directory)
      getpid()                                = 29632
      socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 18
      connect(18, {sa_family=AF_UNIX, sun_path="/dev/log"}, 110) = 0
      sendto(18, "<139>Nov  3 01:25:37 IDBFile[29632]: 37.564020 |0|0|0| E 35 CAL0002: Failed to create directories: \"/000.dir\", exception: boost::filesystem::create_directory: Permission denied: \"/000.dir\"\n ", 190, MSG_NOSIGNAL, NULL, 0) = 190
      close(18)                               = 0
      write(10, "7\301\373\24[\0\0\0\4\0\0\0\0\0\0\0%N\0\0\0WE: Error creating column file for oid 3010;  Error in creating a directory. \n", 99) = 99
      

      The strace output confirms that ColumnStore is trying to create a directory at the path /000.dir, which is in the system's root directory, rather than in one of ColumnStore's dbroot directories.

      I also confirmed that the WriteEngine process does not do a chroot by looking at /proc/PID/root:

      $ sudo ls -l /proc/29632/root
      lrwxrwxrwx. 1 mysql mysql 0 Nov  3 01:41 /proc/29632/root -> /
      

      I also checked Columnstore.xml to confirm that all 3 dbroots are listed properly:

      ...
              <DBRootCount>3</DBRootCount>
              <DBRoot1>/var/lib/columnstore/data1</DBRoot1>
      ...
              <DBRoot2>/var/lib/columnstore/data2</DBRoot2>
              <DBRoot3>/var/lib/columnstore/data3</DBRoot3>
      ...
      

      Of course, the mysql user can create directories in the dbroot directories without any permissions issues:

      $ sudo -u mysql bash
      $ whoami
      mysql
      $ mkdir -p /var/lib/columnstore/data1/000.dir
      $ ls -ld /var/lib/columnstore/data1/000.dir
      drwx------. 3 mysql mysql 21 Nov  2 23:51 /var/lib/columnstore/data1/000.dir
      $ mkdir -p /var/lib/columnstore/data2/000.dir
      $ ls -ld /var/lib/columnstore/data2/000.dir
      drwxr-xr-x. 2 mysql mysql 6 Nov  3 01:50 /var/lib/columnstore/data2/000.dir
      $ mkdir -p /var/lib/columnstore/data3/000.dir
      $ ls -ld /var/lib/columnstore/data3/000.dir
      drwxr-xr-x. 2 mysql mysql 6 Nov  3 01:50 /var/lib/columnstore/data3/000.dir
      

      Why is ColumnStore trying to create a directory in the system's root directory, rather than in one of the dbroot directories?

      Attachments

        Activity

          People

            toddstoffel Todd Stoffel (Inactive)
            GeoffMontee Geoff Montee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.