[MDEV-6125] Connect engine - cannot read XML file with default XML namespace defined Created: 2014-04-17 Updated: 2014-05-26 Resolved: 2014-05-23 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 10.0.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Rasmus Johansson (Inactive) | Assignee: | Olivier Bertrand |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | Connect-Engine | ||
| Attachments: |
|
| Description |
|
It's not possible to read an XML file through the Connect engine if the XML file has the default XML namespace specified, i.e. the root node includes the attribute xmlns="[namespace URI]". Notice that namespaces with prefix works well, i.e. xmlns:[prefix]="[namespace URI]". I'm including an example to be able to reproduce the problem. Attached find the file 20140401_1846_Running.gpx, which has the following root node:
In the root node the default namespace is given in the xmlns -attribute:
Let's then create a Connect engine -table of the XML:
Do a SELECT over the created table:
Instead of two rows full of zeros and nulls it should have included a lot of rows without zero and null values. Let's edit the XML file 20140401_1846_Running.gpx and remove the xmlns="http://www.topografix.com/GPX/1/1" -attribute and after that run the same SELECT over the table again. This time we get what we expected:
|
| Comments |
| Comment by Olivier Bertrand [ 2014-04-18 ] | ||||||||
|
CONNECT does not handle the processing of XML files but delegates it to specialized libraries, MS DOMDOC or libxml2 on windows and libxml2 on Linux. On Windows, with the default DOMDOC library, this table is normally handled with no error. However, when specifying libxml2, the above error occurs. Therefore, this seems to be a libxml2 error, not a CONNECT one. | ||||||||
| Comment by Olivier Bertrand [ 2014-04-18 ] | ||||||||
|
Looking more closely at what happens with this example, I found that the problem is that a general name space is defined and all XPATH are looking in this name space. When trying to locate the main table node, CONNECT construct an XPATH of '//trkseg' that fails to find the corresponding node. Currently, when the table node is not found, CONNECT tries to use the ROOT node instead, which in that case, produces the wrong answer. Providing a general fix seems difficult but meawhile you can bypass this issue by specifying the TABNAME option as an XPATH that will ignore the the currently defined name space, in this example:
For me, this worked. | ||||||||
| Comment by Olivier Bertrand [ 2014-04-20 ] | ||||||||
|
As a matter of facts, it did not work completely because if the row node attributes were retrieved normally, the column nodes were not found for the same reason (not in the default name space) To have a complete result this table must be created as:
With the field format xpath specified as above, the complete result is returned. | ||||||||
| Comment by Rasmus Johansson (Inactive) [ 2014-05-26 ] | ||||||||
|
Olivier, the xpath tweaks worked well. I published a blog post a while ago already where I used it, https://blog.mariadb.org/crunching-xml-files-with-mariadb/ |