|
The current implementation follows the following logic:
When a query is executed in the notebook, the kernel stores in memory the result returned by the MariaDB server.
Only the result of the last executed query (be it SELECT, SHOW, etc. Anything that returns a result set) is stored in the kernel.
The user can write in the notebook: `%df name_of_output_file.csv` and the kernel will write the result set returned by the last query in a CSV-format file.
The user can open a Python3 notebook and import the result set with something like: `dataframe = pandas.read_csv('./name_of_output_file.csv')` and a DataFrame object is returned containing the data described above.
The obvious flaw of this solution is that the result set of the last query is stored in memory. This can easily crash the kernel if the result set is big. But at this stage in the project, I don't see a urgency to implement something more efficient.
Better / more efficient solutions:
1. Only store the text of the last query, e.g. `select * from mydatabase.huge_table`
When the user executes `%df file.csv`, the kernel gets the query and wraps it with `SELECT .. INTO OUTFILE`, thus the result set is written into CSV format without potentially crashing the kernel by loading the entire set into memory.
2. Change the `%df` magic command to accept as argument the query that the user wants to export as CSV.
`%df outputfile.csv "select * from mydatabase.huge_table"`
The kernel would just wrap the query like in example 1..
New Issue should be reopened when we decide we want some better solution for this magic command.
|