[MDBF-79] Decide on basic output formatting Created: 2020-05-25  Updated: 2022-02-01  Due: 2020-06-03

Status: Closed
Project: MariaDB Foundation Development
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Kaj Arnö Assignee: Robert Bindar
Resolution: Unresolved Votes: 0
Labels: jupyter
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
PartOf
is part of MDBF-55 Development of Jupyter maria kernel, ... Closed

 Description   

Decide how to counter Robert’s observation that “the mariadb client does some weird stuff to make the output look better”



 Comments   
Comment by Vicențiu Ciorbaru [ 2020-06-01 ]

This will be problematic! Parsing output from mariadb client is tricky as the data itself can look like column separators.

For example, what if the table contains values like |. This looks rather messy and there is no clear way to fix this. Looking at mariadb client docs, --raw and --batch flags sent to mariadb client might be the key to distinguish between column data and column separators.

Comment by Kaj Arnö [ 2020-06-03 ]

That is why my preference would be to avoid the need for parsing, and instead write the contents via a CSV file using a pandas DataFrame. Then, no need to parse.

Robert?

Comment by Robert Bindar [ 2020-06-03 ]

Indeed it is problematic, I can't see all the edge cases now, I guess we'll encounter more as we move forward with the project. But for the case cvicentiu mentioned above, --batch is probably not necessarily needed and --raw only disables escaping of special characters. With --silent, you get rid of special bytes used for beautifying text in the terminal and also get escaping enabled for chars such as \n \t. But indeed, the arbitrary data that can reside in some of our fields might give us headaches.
One solution is to do what Kaj said with something like this (https://mariadb.com/kb/en/select-into-outfile/#example), that is wrap selects with something like that, have the output in a CSV file and then stream that to jupyterlab to display it in a nice dataframe-like into the notebook.
Another one is to just get results in XML format directly from the client (./client/mariadb -X) or even in HTML format (./client/mariadb -H) if it will help us more when delivering the data to jupyterlab interface. Going from xml/html to dataframe format for the case when we want to make it available to other jupyterlab tabs running ipython should be doable with just some python library calls.

Comment by Kaj Arnö [ 2020-06-03 ]

My suggestion: Let's avoid even trying to do parsing of --raw (due to issues mentioned by Vicentiu).

I think all the three other options you list – CSV, XML, HTML – sound safer and better. Feel free to experiment!

Comment by Robert Bindar [ 2020-10-28 ]

`--silent --html` seemed like the best choice.
`--silent` helps us get rid of some information we don't need like description text and version info when the client is first started and some other characters used for terminal formatting of the text.
`--html` makes our life much much easier, query results can easily be wrapped in a jupyter message and they are nicely displayed by Notebook as if they are Pandas DataFrames. Also we escape the dangers of parsing and we get to avoid collisions with arbitrary data that could be returned from MariaDB tables.

Issue to be reopened if at some point we realize there are better options available.

Generated at Thu Feb 08 03:35:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.