[MCOL-4129] contoller node SEGV faults when /var/lib/columnstore/data1* doesn't exist Created: 2020-06-30  Updated: 2023-10-25  Resolved: 2023-10-25

Status: Closed
Project: MariaDB ColumnStore
Component/s: writeengine
Affects Version/s: 1.4, 1.5.3
Fix Version/s: Icebox

Type: Bug Priority: Minor
Reporter: Daniel Black Assignee: Leonid Fedorov
Resolution: Won't Fix Votes: 0
Labels: None

Issue Links:
PartOf
is part of MCOL-4134 Clean and fix remaining columnstore c... Closed

 Description   

strace /usr/bin/controllernode fg

openat(AT_FDCWD, "/dev/shm/InfiniDB-shm-00020001", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0666) = -1 EEXIST (File exists)
openat(AT_FDCWD, "/dev/shm/InfiniDB-shm-00020001", O_RDWR|O_NOFOLLOW|O_CLOEXEC) = 9
fstat(9, {st_mode=S_IFREG|0666, st_size=64000, ...}) = 0
fstat(9, {st_mode=S_IFREG|0666, st_size=64000, ...}) = 0
mmap(NULL, 64000, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0) = 0x7fffb3650000
fstat(9, {st_mode=S_IFREG|0666, st_size=64000, ...}) = 0
mmap(NULL, 64000, PROT_READ, MAP_SHARED, 9, 0) = 0x7fffb3640000
munmap(0x7fffb3650000, 64000)           = 0
openat(AT_FDCWD, "/var/lib/columnstore/data1/systemFiles/dbrm/oidbitmap", O_RDWR|O_CREAT|O_TRUNC, 0666) = -1 ENOENT (No such file or directory)
futex(0x7fffb75b120c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
chmod("/var/lib/columnstore/data1/systemFiles/dbrm/oidbitmap", 0664) = -1 ENOENT (No such file or directory)
stat("/etc/columnstore/Columnstore.xml", {st_mode=S_IFREG|0644, st_size=19465, ...}) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_BNDERR, si_addr=0x474e5543432b2b28} ---
getpid()                                = 12007

strack trace

(gdb) bt
#0  0x00007ffff6cae1bc in BRM::OIDServer::writeData(unsigned char*, long, int) const () from /usr/lib/powerpc64le-linux-gnu/libbrm.so
#1  0x00007ffff6cb0154 in BRM::OIDServer::initializeBitmap() const () from /usr/lib/powerpc64le-linux-gnu/libbrm.so
#2  0x00007ffff6cb0888 in BRM::OIDServer::OIDServer() () from /usr/lib/powerpc64le-linux-gnu/libbrm.so
#3  0x000000010002be38 in ?? ()
#4  0x0000000100011e68 in ?? ()
#5  0x00007ffff5c2441c in generic_start_main (main=0x100011c40, argc=<optimised out>, argv=0x7ffffffff458, auxvec=0x7ffffffff550, init=<optimised out>, 
    rtld_fini=<optimised out>, stack_end=<optimised out>, fini=<optimised out>) at ../csu/libc-start.c:310
#6  0x00007ffff5c24618 in __libc_start_main (argc=<optimised out>, argv=<optimised out>, ev=<optimised out>, auxvec=<optimised out>, rtld_fini=<optimised out>, 
    stinfo=<optimised out>, stack_on_entry=<optimised out>) at ../sysdeps/unix/sysv/linux/powerpc/libc-start.c:116
#7  0x0000000000000000 in ?? ()



 Comments   
Comment by Daniel Black [ 2020-06-30 ]

also segfaults when dbrm/tablelocks doesn't exist

openat(AT_FDCWD, "/var/lib/columnstore/data1/systemFiles/dbrm/oidbitmap", O_RDWR) = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=2099202, ...}) = 0
_llseek(3, 2097152, [2097152], SEEK_SET) = 0
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048
read(3, "\0\0", 4096)                   = 2
stat("/etc/columnstore/Columnstore.xml", {st_mode=S_IFREG|0644, st_size=19465, ...}) = 0
openat(AT_FDCWD, "/var/lib/columnstore/data1/systemFiles/dbrm/tablelocks", O_RDONLY) = -1 ENOENT (No such file or directory)
futex(0x7fff8ff20218, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7fff9127120c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_BNDERR, si_addr=0x474e5543432b2b18} ---

(gdb) bt full
#0  0x00007ffff6cd3478 in BRM::TableLockServer::load (this=0x100122d50) at ./storage/columnstore/columnstore/versioning/BRM/tablelockserver.cpp:108
        size = 1
        i = 0
        tli = {<messageqcpp::Serializeable> = {_vptr.Serializeable = 0x7ffff6d2e068 <vtable for BRM::TableLockInfo+16>}, id = 140737488349536, tableOID = 69, 
          ownerName = "", ownerPID = 1267969, ownerSessionID = 1, ownerTxnID = 1275056, state = BRM::CLEANUP, creationTime = 140737322703104, 
          dbrootList = std::vector of length 0, capacity 0}
        filename_p = <optimised out>
        in = {px = 0x100139cf0}
#1  0x00007ffff6cd42f8 in BRM::TableLockServer::TableLockServer (this=0x100122d50, sm=<optimised out>)
    at ./storage/columnstore/columnstore/versioning/BRM/tablelockserver.cpp:51
        lk = <optimised out>
        config = 0x100113730
        lk = <optimised out>
        config = <optimised out>
#2  0x000000010002bf20 in BRM::MasterDBRMNode::MasterDBRMNode (this=0x100113d00) at ./storage/columnstore/columnstore/versioning/BRM/masterdbrmnode.cpp:111
        config = 0x100113730
        retStr = ""
        secondsToWait = <optimised out>
        config = <optimised out>
        retStr = <optimised out>
        secondsToWait = <optimised out>
#3  0x0000000100011e68 in main (argc=<optimised out>, argv=<optimised out>) at ./storage/columnstore/columnstore/versioning/BRM/masternode.cpp:161
        retries = 0
        err = <optimised out>
        arg = "fg"
        ign = {__sigaction_handler = {sa_handler = 0x100035f80 <fatalHandler(int)>, sa_sigaction = 0x100035f80 <fatalHandler(int)>}, sa_mask = {__val = {
              0 <repeats 16 times>}}, sa_flags = 0, sa_restorer = 0x0}
(gdb) p *this
$1 = {_vptr.TableLockServer = 0x7ffff6d2e608 <vtable for BRM::TableLockServer+16>, mutex = {m = pthread_mutex_t = {Type = Normal, 
      Status = Acquired, possibly with no waiters, Owner ID = 58439, Robust = No, Shared = No, Protocol = None}}, locks = std::map with 0 elements, 
  filename = "/var/lib/columnstore/data1/systemFiles/dbrm/tablelocks", sms = 0x100113d00}
(gdb) list
103	        return;
104	    }
105	
106	    try
107	    {
108	        in->read((char*) &size, 4);
109	
110	        for (i = 0; i < size; i++)
111	        {
112	            tli.deserialize(in.get());
(gdb) p size
$2 = 1
(gdb) p *in
Attempt to take address of value not located in memory.
(gdb) p in
$3 = {px = 0x100139cf0}
(gdb) p in->px
Attempt to take address of value not located in memory.
(gdb) p *in->px
Attempt to take address of value not located in memory.

Comment by Roman [ 2020-07-13 ]

controllernode should fail if the directory doesn't exist and it is expected to have the dbroot1 directory /var/lib/columnstore/data1/systemFiles in place for a single-node installation that is the default.

It seems that dbbuilder command called in /usr/bin/columnstore-post-install had failed in the first place so you will have a lot of unexpected artifacts.

Comment by Daniel Black [ 2020-07-17 ]

Agree, its seem like I had missing artifacts in the /var/lib/columnstore/data1/systemFiles/dbrm directory.

This is mainly a request that the controllernode display errors on missing files rather than segfaulting.

Comment by Roman [ 2020-07-20 ]

Thx for let me know about this. We'll make it less crash-prone.

Generated at Thu Feb 08 02:48:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.