[MCOL-1077] Two applications using Bulk Insert API Created: 2017-12-04  Updated: 2023-10-26  Resolved: 2018-07-20

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.1.2, 1.1.6
Fix Version/s: 1.1.3

Type: New Feature Priority: Minor
Reporter: Sasha V Assignee: Andrew Hutchings (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Multi-server ColumnStore instance


Issue Links:
Relates
relates to MCOL-1079 mcsapi getTableLock not failing Closed
relates to MCOL-1094 mcsapi should have view/clear table l... Closed
relates to MCOL-1108 After rollback() an active transactio... Closed
relates to MCOL-1362 Add a export function that utilizes (... Closed
relates to MCOL-1726 mcsapi stale transactions Closed

 Description   

Thank you for releasing Bulk Insert API that enable applications to stream data remotely to the ColumnStore nodes.

By their nature, data streaming applications run continuously. Redundant applications could increase data streaming uptime, since if one application fails, a second application would still be running.

For such redundancy, I would like to run two applications that write to the same table from remote hosts. Applications alternate holding the table write lock in a following sequence:

The first application:

  • polls the database to check that the table is not locked;
  • when the table is not locked, calls createBulkInsert() and locks the table;
  • buffers data by calling writeRow();
  • calls commit() successfully and releases the lock.

The second application:

  • polls the database to check that the table is not locked;
  • when the table is no longer locked, the second application calls createBulkInsert() and locks the table;
  • buffers data by calling writeRow();
  • calls commit() successfully and releases the lock.

Then first application gets the table lock, and so on.

Could the Bulk Insert API be extended with a call to check if the table is currently locked?



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2017-12-05 ]

This can be done, but in your scenario there could be a race which would probably need to be accounted for:

1. app 1 checks the lock, sees nothing
2. app 2 checks the lock, sees nothing
3. app 1 locks the table
4. app 2 attempts to lock the table and fails

In this scenario you could catch the failure at 4 for the same effect. I think there is a bug where 4 isn't happening though but I've hit another issue too. I'll file some bugs around this.

Comment by Sasha V [ 2017-12-10 ]

Hi linuxjedi,

Thank you for fixing the MCOL-1079 issue quickly. With that fix in place, I was able to test a different scenario:

  • Application 1 streams the data remotely to a ColumnStore table and, thus, holds the table lock.
  • Application 2 runs (e.g. on a different node) as a standby/failover process. The application calls createBulkInsert() periodically to check if it can acquire the table lock (held by Application 1).

Hopefully, in case of an unexpected failure of the Application 1, the table lock will be released. In that case, the Application 2 gets the table lock and continues data streaming that was interrupted due to the Application 1 failure.

How can I assure that the table lock will be released upon the failure of the Application 1?

Comment by Andrew Hutchings (Inactive) [ 2017-12-10 ]

Hi,

The table lock will be automatically released if the ColumnStoreBulkInsert is object is deleted (or on commit/rollback). If your application crashes or ends before that happens the lock will remain and will need to be removed with viewTableLock/clearTableLock.

Comment by Sasha V [ 2017-12-11 ]

Hi linuxjedi

Could the Bulk Insert API be extended with the (remote) viewTableLock/clearTableLock functionality then?

Comment by Andrew Hutchings (Inactive) [ 2017-12-11 ]

It is possible. I have created MCOL-1094 to track those functions.

Comment by Sasha V [ 2017-12-12 ]

Should the Bulk Insert API be extended with the (remote) print transactions functionality as well?

After several tests that kill my Bulk Write SDK applications with various signals, I ended up with the MariaDB ColumnStore system that has no table locks, but has active transactions reported by the shutdown command:

mcsadmin shutdown y 
shutdownsystem   Tue Dec 12 15:42:24 2017
 
This command stops the processing of applications on all Modules within the MariaDB ColumnStore System
 
   Checking for active transactions
 
   There are active transactions being processed
Your options are:
    Cancel    -- Cancel the shutdown request
    Wait      -- Wait for write operations to end and then shutdown
    Force     -- Force a shutdown
What would you like to do: [Cancel]: Cancel

while the rollback -p command does not print any transactions and there is no file /usr/local/mariadb/columnstore/data1/systemFiles/dataTransaction

Comment by Andrew Hutchings (Inactive) [ 2017-12-12 ]

This message also appears if the DBRM is locked for some reason. I think it very unlikely that a transaction could exist without a lock. It could be that killing the API caused some other behaviour problem we haven't seen before. This would likely be observable in ColumnStore's err.log.

Comment by Sasha V [ 2017-12-12 ]

Upon handling kill signals, my Bulk Insert API application calls the rollback(). This call logs the following in /var/log/mariadb/columnstore/info.log

Dec 11 19:25:30 s_columnstore@mcs112 writeengine[796]: 30.254893 |0|0|0| I 19 CAL0084: ClearTableLock: Starting bulk rollback for table test_data.test (OID-3000); lock-123; initiated by BatchInsert.
Dec 11 19:25:30 s_columnstore@mcs112 writeengine[796]: 30.255409 |0|0|0| I 19 CAL0072: ClearTableLock: Rollback extents   for table test_data.test (OID-3000).
Dec 11 19:25:30 s_columnstore@mcs112 writeengine[796]: 30.255775 |0|0|0| I 19 CAL0090: ClearTableLock: Nothing to rollback for table test_data.test (OID-3000) on DBRoot1
Dec 11 19:25:30 s_columnstore@mcs112 writeengine[796]: 30.255927 |0|0|0| I 19 CAL0085: ClearTableLock: Ending bulk rollback for table test_data.test (OID-3000); lock-123; initiated by BatchInsert. (Nothing to rollback).

Similar lines are logged upon executing the clearTableLock command:

Dec 11 19:15:43 s_columnstore@mcs112 writeengine[14253]: 43.020218 |0|0|0| I 19 CAL0088: ClearTableLock: Initiating cleartablelock command for table test_data.test; lock-113
Dec 11 19:15:43 s_columnstore@mcs112 writeengine[796]: 43.023183 |0|0|0| I 19 CAL0084: ClearTableLock: Starting bulk rollback for table test_data.test (OID-3000); lock-113; initiated by cleartablelock command.
Dec 11 19:15:43 s_columnstore@mcs112 writeengine[796]: 43.023502 |0|0|0| I 19 CAL0072: ClearTableLock: Rollback extents   for table test_data.test (OID-3000).
Dec 11 19:15:43 s_columnstore@mcs112 writeengine[796]: 43.023565 |0|0|0| I 19 CAL0090: ClearTableLock: Nothing to rollback for table test_data.test (OID-3000) on DBRoot1
Dec 11 19:15:43 s_columnstore@mcs112 writeengine[796]: 43.023603 |0|0|0| I 19 CAL0085: ClearTableLock: Ending bulk rollback for table test_data.test (OID-3000); lock-113; initiated by cleartablelock command. (Nothing to rollback).
Dec 11 19:15:43 s_columnstore@mcs112 writeengine[14253]: 43.060225 |0|0|0| I 19 CAL0089: ClearTableLock: cleartablelock command for table test_data.test; lock-113.  Completed successfully

However these are preceded and followed by extra lines, which as absent in case of the BatchInsert.

Comment by Sasha V [ 2017-12-12 ]

For some reasons that are not clear to me, my test MariaDB ColumnStore instance has no err.log file:

ls -ltrFa /var/log/mariadb/columnstore/*log
-rwxr-xr-x 1 root root    127 Dec  8 15:36 /var/log/mariadb/columnstore/warning.log*
-rwxr-xr-x 1 root root    956 Dec  8 21:03 /var/log/mariadb/columnstore/crit.log*
-rw-r--r-- 1 root root  13437 Dec 11 14:50 /var/log/mariadb/columnstore/alarm.log
-rw------- 1 root root 272508 Dec 11 19:25 /var/log/mariadb/columnstore/info.log
-rw------- 1 root root 332995 Dec 11 21:18 /var/log/mariadb/columnstore/debug.log
-rw-r--r-- 1 root root   2990 Dec 12 15:42 /var/log/mariadb/columnstore/uiCommands.log

Comment by Andrew Hutchings (Inactive) [ 2017-12-12 ]

The two extra lines are actually logged by the "cleartablelock" command (hence the different process ID in the square braces). The API uses the same calls as cleartablelock when executing a rollback.

Comment by Sasha V [ 2017-12-13 ]

I reproduced the issue with basic_bulk_insert example as described in MCOL-1108.

Comment by David Thompson (Inactive) [ 2018-07-20 ]

Closing this as i think the fix to MCOL-1079 addressed this problem, can you confirm sasha ?

Comment by Sasha V [ 2018-09-17 ]

Alas, the problem is not fully resolved. Here is my test sequence:

1) On one Ubuntu 18.04 docker node I installed ColumnStore server 1.1.6 postConfigured as a multi-server (with a single node).
2) On another Ubuntu 18.04 docker node I installed ColumnStore C++ BulkWrite SDK source from columnstore-1.1.6.tar.gz
I also copied the Columnstore.xml from the server node to the api node.
The unmodified basic_bulk_insert example worked as expected, while inserting remotely.

I modified the basic_bulk_insert.cpp adding the sleep(10) equivalent by adding three lines as shown below:

...
#include <iostream>
#include <chrono>
#include <thread>
 
int main(void)
...
	std::this_thread::sleep_for(std::chrono::seconds(10));
        bulk->commit();
...

Then launched the "sleeping" basic_bulk_insert and terminated this process with kill -9.
As expected, the ColumnStore server left with the table lock:

/usr/local/mariadb/columnstore/bin/viewtablelock
 There is 1 table lock
 
  Table    LockID  Process  PID  Session  Txn  CreationTime              State    DBRoots  
  test.t1  26      mcsapi   410  127693   10   Mon Sep 17 01:23:22 2018  LOADING  1        

Also the second (original) basic_bulk_insert attempt resulted in

/usr/local/share/doc/mcsapi/example/basic_bulk_insert
Error caught: Table already locked by PID: 410 'mcsapi' session ID: 127693 txn ID: 10

As Andrew directed above, I cleared this lock on the server side with:

/usr/local/mariadb/columnstore/bin/cleartablelock 26
Rolling back and clearing table lock for table test.t1; table lock 26
 
Sending rollback request to PM1...
Successful rollback response from PM1
Sending cleanup request to PM1...
Successful cleanup response from PM1
 
Table lock 26 for table test.t1 is cleared.
 
/usr/local/mariadb/columnstore/bin/viewtablelock
 No tables are locked in the database.

After clearing the lock, the (original) basic_bulk_insert attempt was successful.

Unfortunately, the problem reported above, remains.

mcsadmin shutdown y
shutdownsystem   Mon Sep 17 01:32:28 2018
 
This command stops the processing of applications on all Modules within the MariaDB ColumnStore System
 
   Checking for active transactions
 
   There are active transactions being processed
Your options are:
    Cancel    -- Cancel the shutdown request
    Wait      -- Wait for write operations to end and then shutdown
    Force     -- Force a shutdown
What would you like to do: [Cancel]:

As earlier, the rollback -p command does not print any transactions and there is no file /usr/local/mariadb/columnstore/data1/systemFiles/dataTransaction

Comment by Andrew Hutchings (Inactive) [ 2018-09-17 ]

Actually what you are describing here is more like MCOL-1195 which should be fixed in 1.1.6. With this bug a stale transaction ID was being held by the system after a an abort due to a table already locked. I've filed MCOL-1726 to investigate this.

Generated at Thu Feb 08 02:26:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.