[MCOL-3360] Make the CS-side and SM-side safe vs connection failures Created: 2019-06-04 Updated: 2019-09-26 Resolved: 2019-09-26 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ? |
| Affects Version/s: | None |
| Fix Version/s: | 1.4.0 |
| Type: | Task | Priority: | Major |
| Reporter: | Patrick LeBlanc (Inactive) | Assignee: | Ben Thompson (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Epic Link: | ObjectStore support |
| Description |
|
When starting/stopping SM (but not CS), I've noticed that the CS side stops instead of reconnecting. We need to make sure it keeps trying to reconnect after a connection failure, and likewise that SM can handle connection failures as well. We'll have to also spell out exactly what should happen when each side gets a connection error at each point of processing. Common sense decides what should happen, just need to codify it. |
| Comments |
| Comment by Patrick LeBlanc (Inactive) [ 2019-08-14 ] |
|
Made the CS side reconnect for up to 10s when it notices the connection went down. It's now possible for CS to resume as if nothing happened if SM goes away. I suspect there's more to do for this ticket. |
| Comment by Patrick LeBlanc (Inactive) [ 2019-08-26 ] |
|
Noticed a potentially significant optimization opportunity. Writetask (and appendtask I assume) are breaking the data to write into max 1MB chunks before calling IOC::write/append(). That made sense when IOC was a passthru for syscalls (M1), but doesn't make sense now. Going to make a ticket for that and investigate. |
| Comment by Ben Thompson (Inactive) [ 2019-09-16 ] |
|
Found an issue when writing unit_tests that SM would "read" more data from the buffer than was actually sent over the socket. adding more tests and fixing. |