[MCOL-455] redistribute data's "START REMOVE" option did not move data from the requested dbroot Created: 2016-12-08  Updated: 2023-10-26  Resolved: 2017-01-23

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.0.6
Fix Version/s: 1.0.7

Type: Bug Priority: Minor
Reporter: Daniel Lee (Inactive) Assignee: Ben Thompson (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Issue Links:
Relates
relates to MCOL-788 redistributeData does nothing for par... Closed
Sprint: 2016-24, 2016-25, 2017-01, 2017-2

 Description   

Build tested: 1.0.6-1

mcsadmin> getsoft
getsoftwareinfo Thu Dec 8 15:32:44 2016

Name : mariadb-columnstore-platform
Version : 1.0.6
Release : 1
Architecture: x86_64
Install Date: Thu 08 Dec 2016 03:21:18 PM UTC
Group : Applications/Databases
Size : 10017001
License : Copyright (c) 2016 MariaDB Corporation Ab., all rights reserved; redistributable under the terms of the GPL, see the file COPYING for details.
Signature : (none)
Source RPM : mariadb-columnstore-platform-1.0.6-1.src.rpm
Build Date : Wed 07 Dec 2016 07:08:09 PM UTC

This issue was identified when testing MCOL-307.

"redistributedata START REMOVE" does move any data. It finished immediately

mcsadmin> redistributedata start remove 3
redistributedata Wed Dec 7 19:07:42 2016
redistributeData START Removing dbroots: 3
Source dbroots: 1 2 3 4
Destination dbroots: 1 2 4

WriteEngineServer returned status 1: Cleared.
WriteEngineServer returned status 2: Redistribute is started.
mcsadmin> redistributedata status
redistributedata Wed Dec 7 19:07:47 2016
WriteEngineServer returned status 3: Redistribute is finished.
0 success, 0 skipped, 0 failed.
Total time: 0 seconds.

[6:10]
--------------------------------+

count idbdbroot(l_orderkey)

--------------------------------+

332115454 2
319282160 4
336056627 1
338592834 3

--------------------------------+
4 rows in set (8.80 sec)



 Comments   
Comment by David Hall (Inactive) [ 2017-01-06 ]

There are two issues involved here. First is a logic error that caused the algorithm to stop too soon in certain circumstances. Second is built in logic to prevent moving segments of a partition to a dbroot already containing segments from that partition. This logic is designed to attempt to keep segments distributed accross the dbroots. However, when trying to remove a dbroot, it may be necessary to move segments to a dbroot already with that partition. This is especially true when a partition is spread across all dbroots.

Logic was added to allow segments to be added to dbroots with that partition only in the case of removing a dbroot.

This is not an ideal solution. If one were to remove one or more dbroots, say because the hardware was being replaced, and then moved back later, our current algorithm would not redistribute the segments properly. It would move stuff such that the new hardware would have the correct amount of data, but those segments piled on top in a dbroot would forever remain fused together.

The algorithm will eventually be changed to account for segment distribution, attempting to keep them as distributed as feasible.

Comment by Daniel Lee (Inactive) [ 2017-01-20 ]

Build tested: 1.0.7-1

mcsadmin> getsoft
getsoftwareinfo Fri Jan 20 14:33:41 2017

Name : mariadb-columnstore-platform
Version : 1.0.7
Release : 1
Architecture: x86_64
Install Date: Fri 20 Jan 2017 12:37:05 AM UTC
Group : Applications/Databases
Size : 10001266
License : Copyright (c) 2016 MariaDB Corporation Ab., all rights reserved; redistributable under the terms of the GPL, see the file COPYING for details.
Signature : (none)

Found couple issues:

1) writeengine crashed, redistributedata returned fail status

mcsadmin> redistributedata start
redistributedata Thu Jan 19 19:19:07 2017
redistributeData START
Source dbroots: 1 2
Destination dbroots: 1 2

WriteEngineServer returned status 1: Cleared.
WriteEngineServer returned status 2: Redistribute is started.
mcsadmin> redistributedata status
redistributedata Thu Jan 19 19:19:13 2017
WriteEngineServer returned status 2: Redistribute is in progress: total 5 logical partitions are planned to move.
0 success, 0 skipped, 0 failed, 0%.
mcsadmin> redistributedata status
redistributedata Thu Jan 19 19:19:20 2017
WriteEngineServer returned status 2: Redistribute is in progress: total 5 logical partitions are planned to move.
0 success, 0 skipped, 0 failed, 0%.
mcsadmin> redistributedata status
redistributedata Thu Jan 19 19:26:53 2017
WriteEngineServer returned status 3: Redistribute is finished.
5 success, 0 skipped, 0 failed.
Total time: 202 seconds.

[1:36]
on a 1um2pm system, it finished with no error. It was done on the um. I exited out and went to both pms to check for dbroot data size using the du command.

[1:37]
On UM, I then ran redistributedata again and it failed.

[1:37]
mcsadmin> redistributedata start
redistributedata Thu Jan 19 19:29:36 2017
redistributeData START
Source dbroots: 1 2
Destination dbroots: 1 2

WriteEngineServer returned status 1: Cleared.
WriteEngineServer returned status 2: Redistribute is started.
mcsadmin> redistributedata status
redistributedata Thu Jan 19 19:29:41 2017
WriteEngineServer returned status 5: Redistribute is failed.

mcsadmin> redistributedata status
redistributedata Thu Jan 19 19:30:03 2017
WriteEngineServer returned status 5: Redistribute is failed.

mcsadmin> redistributedata status
redistributedata Thu Jan 19 19:34:14 2017
WriteEngineServer returned status 5: Redistribute is failed.

[1:38]
PM1 indicated that WriteEngineServer was restarted.

[1:38]
In the crit.log file [root@localhost columnstore]# cat crit.log
Jan 19 19:29:37 localhost ProcessMonitor[12450]: 37.568064 |0|0|0| C 18 CAL0000: *****Calpont Process Restarting: WriteEngineServer, old PID = 14778

2) The "start move n" option does not distribute data among remaining dbroots

MariaDB [mytest]> select idbdbroot(o_orderkey) d, count from orders group by d;
--------------+

d count

--------------+

2 75380000
4 75370000
1 75370000
3 75380000

--------------+
4 rows in set (25.24 sec)

[8:29]
MariaDB [mytest]> select idbdbroot(o_orderkey) d, count from orders group by d;
---------------+

d count

---------------+

2 75380000
4 150750000
1 75370000

---------------+
3 rows in set (10.47 sec)

Comment by Daniel Lee (Inactive) [ 2017-01-20 ]

MariaDB [mytest]> select idbdbroot(o_orderkey) d, idbpartition(o_orderkey) p, count from orders group by d, p;
-------------------

d p count

-------------------

2 0.0.2 16764000
2 4.0.2 8287328
4 0.3.4 16764000
2 1.0.2 16777216
4 2.2.4 16774240
1 3.2.1 16777216
1 2.0.1 16774240
4 2.3.4 16774240
4 4.1.4 8287328
4 4.3.4 8279136
4 1.1.4 16777216
2 2.1.2 16774240
4 3.1.4 16777216
4 3.3.4 16777216
4 1.2.4 16777216
1 0.1.1 16762192
4 0.2.4 16762192
2 3.0.2 16777216
1 4.2.1 8279136
1 1.3.1 16777216

-------------------
20 rows in set (1 min 0.64 sec)

Comment by Daniel Lee (Inactive) [ 2017-01-20 ]

I forgot to mention issue #3

3) The help text for redistributedata does not have the "START REMOVE n" option.

Comment by David Hall (Inactive) [ 2017-01-20 ]

Crash is caused by rewind of unopened plan file when no data is scheduled to be moved. This happens in the displayPlan() function.

Comment by Andrew Hutchings (Inactive) [ 2017-01-20 ]

Fix for the problem Daniel found merged

Comment by Daniel Lee (Inactive) [ 2017-01-21 ]

Build tested: 1.0.7-1

mcsadmin> getsoft
getsoftwareinfo Fri Jan 20 23:58:59 2017

Name : mariadb-columnstore-platform
Version : 1.0.7
Release : 1
Architecture: x86_64
Install Date: Fri 20 Jan 2017 10:46:12 PM UTC
Group : Applications/Databases
Size : 10001348
License : Copyright (c) 2016 MariaDB Corporation Ab., all rights reserved; redistributable under the terms of the GPL, see the file COPYING for details.
Signature : (none)
Source RPM : mariadb-columnstore-platform-1.0.7-1.src.rpm
Build Date : Fri 20 Jan 2017 05:03:05 PM UTC

Retested and not seeing the write engine crashing issue anymore. Still waiting for the help text to be fixed.

Comment by David Thompson (Inactive) [ 2017-01-21 ]

update the help text to (only change is the new START REMOVE line in args):

Command: redistributeData

Description: Redistribute table data accross all dbroots to balance disk usage

Arguments: START to begin a redistribution
START REMOVE n to being a redistribution where data is removed from dbroot 'n'
STOP to stop redistribution before completion
STATUS to to view statistics and progress

Comment by David Hill (Inactive) [ 2017-01-21 ]

review pull request

Comment by David Hill (Inactive) [ 2017-01-21 ]

development test with new build

mcsadmin> help redistributeData
help Fri Jan 20 22:50:28 2017

Command: redistributeData

Description: Redistribute table data accross all dbroots to balance disk usage

Arguments: START to begin a redistribution
START REMOVE n to being a redistribution where data is removed from dbroot 'n'
STOP to stop redistribution before completion
STATUS to to view statistics and progress

Comment by Daniel Lee (Inactive) [ 2017-01-23 ]

Build verified: 1.0.7-1

Name : mariadb-columnstore-platform
Version : 1.0.7
Release : 1
Architecture: x86_64
Install Date: Mon 23 Jan 2017 04:14:28 PM UTC
Group : Applications/Databases
Size : 10013744
License : Copyright (c) 2016 MariaDB Corporation Ab., all rights reserved; redistributable under the terms of the GPL, see the file COPYING for details.
Signature : (none)
Source RPM : mariadb-columnstore-platform-1.0.7-1.src.rpm
Build Date : Sat 21 Jan 2017 10:09:56 PM UTC

mcsadmin> help redistributedata
help Mon Jan 23 18:32:19 2017

Command: redistributeData

Description: Redistribute table data accross all dbroots to balance disk usage

Arguments: START to begin a redistribution
START REMOVE n to being a redistribution where data is removed from dbroot 'n'
STOP to stop redistribution before completion
STATUS to to view statistics and progress

Made suggestion on improving the help text, which is in MCOL-496 now:

For the redistributedata, we may want to consider change this line of help text. We can do it in the later and we don't need to redo the packages thought. "START REMOVE n to being a redistribution where data is removed from dbroot 'n'". The word removed is just a bit too alarming for me. Some think like "...where data is moved off dbroot n" maybe a bit better.

Generated at Thu Feb 08 02:21:12 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.