Details
-
New Feature
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Now, string cell can be stored in 3 different ways:
- Strings less than configurable InlineLimit (20 bytes by default) are stored directly in rgdata
- Strings between InlineLimit and 65532 bytes are stored in chunks of 64KiB
- All other strings are stored in dynamic memory
In cases 2 and 3, there is a logical memory leak when a new value is written to a cell - the pointer to the data is simply overwritten without releasing the previous data. For example, when processing a query like select min(text_field) from table, in an unfortunate case, all the values of text_field will be stored in rgdata string storage, although there is only one actual value.
What can be improved:
- in cases 2 and 3, if a new string is not longer than the old one, we could reuse the already allocated memory
- in case 3, we can free the memory used by the old cell value
Case 2 makes it difficult to get rid of the leaks, since the values of different cells, including actual ones, can be stored in one chunk. It might be better to abandon chunk storage and always store strings separately, as in case 3.The disadvantages of this approach include memory fragmentation and an increased number of allocations.
Perhaps, garbage collection could be used at certain points (for example before dumping rgdata to disk during disk aggregation, or marshaling RGData from one SQL operator to another.