[MDBF-88] Provide Jupyter level integration of MariaDB and Python Created: 2020-05-25  Updated: 2022-02-01  Due: 2020-06-22

Status: Closed
Project: MariaDB Foundation Development
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Kaj Arnö Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: jupyter
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
PartOf
is part of MDBF-55 Development of Jupyter maria kernel, ... Closed

 Description   

Provide Jupyter level integration of MariaDB and Python, through %magic commands that make SELECT results easily available in separate Python Jupyter kernel, as a pandas DataFrame (via .csv file is initially enough, using SELECT .. INTO OUTFILE and LOAD DATA INFILE; more efficient alternatives to be explored over time)



 Comments   
Comment by Robert Bindar [ 2020-10-30 ]

The current implementation follows the following logic:

When a query is executed in the notebook, the kernel stores in memory the result returned by the MariaDB server.
Only the result of the last executed query (be it SELECT, SHOW, etc. Anything that returns a result set) is stored in the kernel.

The user can write in the notebook: `%df name_of_output_file.csv` and the kernel will write the result set returned by the last query in a CSV-format file.
The user can open a Python3 notebook and import the result set with something like: `dataframe = pandas.read_csv('./name_of_output_file.csv')` and a DataFrame object is returned containing the data described above.

The obvious flaw of this solution is that the result set of the last query is stored in memory. This can easily crash the kernel if the result set is big. But at this stage in the project, I don't see a urgency to implement something more efficient.

Better / more efficient solutions:
1. Only store the text of the last query, e.g. `select * from mydatabase.huge_table`
When the user executes `%df file.csv`, the kernel gets the query and wraps it with `SELECT .. INTO OUTFILE`, thus the result set is written into CSV format without potentially crashing the kernel by loading the entire set into memory.
2. Change the `%df` magic command to accept as argument the query that the user wants to export as CSV.
`%df outputfile.csv "select * from mydatabase.huge_table"`
The kernel would just wrap the query like in example 1..

New Issue should be reopened when we decide we want some better solution for this magic command.

Generated at Thu Feb 08 03:35:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.