Parsing Hierarchical Data
The provider offers three basic configurations to model nested data in XML as tables.
Flattened Documents Model
For users who simply need access to the entirety of their XML data, flattening the data into a single table is the best option. The provider will use streaming and only parses the XML data once per query in this mode.
Joining Object Arrays into a Single Table
With DataModel set to "FlattenedDocuments", the provider returns a separate table for each object array, but implicitly JOINed to the parent table. Any nested sibling XPath values (child paths at the same height) will be treated as a SQL CROSS JOIN.
Example
Below is a sample query and the results, based on the sample document in Raw Data and parsing based on the XPaths /root/people, /root/people/vehicles, and /root/people/vehicles/maintenance. This implicitly JOINs the people element with the vehicles element and implicitly JOINs the vehicles element with the maintenance element.
Connection String
Use the following connection string to query the Raw Data in this example.
|
Query
The following query drills into the nested elements in each people element. Since the XPath property included the vehicles node as an XML path, you can query an element of a vehicle explicitly.
|
Results
With horizontal and vertical flattening based on the described paths, each vehicle element is implicitly JOINed to its parent people element and each maintenance element is implicitly JOINed to its parent vehicle element.
age | gender | first_name | last_name | source | type | model | ins_company | ins_policy_num | maint_date | maint_desc | |
20 | M | John | Doe | internet | car | Honda Civic | ABC Insurance | 12345 | 2017-07-17 | oil change | |
20 | M | John | Doe | internet | car | Honda Civic | ABC Insurance | 12345 | 2018-01-03 | new tires | |
20 | M | John | Doe | internet | truck | Dodge Ram | ABC Insurance | 12345 | 2017-08-27 | new tires | |
20 | M | John | Doe | internet | truck | Dodge Ram | ABC Insurance | 12345 | 2018-01-08 | oil change | |
24 | F | Jane | Roberts | phone | car | Toyota Camry | Car Insurance | 98765 | 2017-05-11 | tires rotated | |
24 | F | Jane | Roberts | phone | car | Toyota Camry | Car Insurance | 98765 | 2017-11-03 | oil change | |
24 | F | Jane | Roberts | phone | car | Honda Accord | Car Insurance | 98765 | 2017-10-07 | new air filter | |
24 | F | Jane | Roberts | phone | car | Honda Accord | Car Insurance | 98765 | 2018-01-13 | new brakes |
Relational Model:
The provider for XML can be configured to create a relational model of the data in the XML file or source, treating each XPath as an individual table containing a primary key and a foreign key that links to the parent document. This is particularly useful if you need to work with your XML data in existing BI, reporting, and ETL tools that expect a relational data model.
Joining Nested Arrays as Tables
With DataModel set to "Relational", any JOINs are controlled by the query. Any time you perform a JOIN query, the XML file or source will be queried once for each table included in the query.
Example
Below is a sample query against the sample document in Raw Data, using a relational model based on the XML paths "/root/people", "/root/people/vehicles", and "/root/people/vehicles/maintenance".
Connecting String
Set the DataModel connection property to "Relational" and set the XPath connection property to "/root/people;/root/people/vehicles;/root/people/vehicles/maintenance;" to perform the following query and see the example result set.
|
Query
The following query explicitly JOINs the people, vehicles, and maintenance tables.
|
Results
In the example query, each maintenance element is JOINed to its parent vehicle element, which is JOINed to its parent people element to produce a table with 8 rows (2 maintenance entries for each of 2 vehicles each for 2 people).
age | gender | first_name | last_name | source | type | model | ins_company | ins_policy_num | maint_date | maint_desc | ||
20 | M | John | Doe | internet | car | Honda Civic | ABC Insurance | 12345 | 2017-07-17 | oil change | ||
20 | M | John | Doe | internet | car | Honda Civic | ABC Insurance | 12345 | 2018-01-03 | new tires | ||
20 | M | John | Doe | internet | truck | Dodge Ram | ABC Insurance | 12345 | 2017-08-27 | new tires | ||
20 | M | John | Doe | internet | truck | Dodge Ram | ABC Insurance | 12345 | 2018-01-08 | oil change | ||
24 | F | Jane | Roberts | phone | car | Toyota Camry | Car Insurance | 98765 | 2017-05-11 | tires rotated | ||
24 | F | Jane | Roberts | phone | car | Toyota Camry | Car Insurance | 98765 | 2017-11-03 | oil change | ||
24 | F | Jane | Roberts | phone | car | Honda Accord | Car Insurance | 98765 | 2017-10-07 | new air filter | ||
24 | F | Jane | Roberts | phone | car | Honda Accord | Car Insurance | 98765 | 2018-01-13 | new brakes |
Top-Level Document Model
Using a top-level document view of the data provides ready access to top-level elements. The provider returns nested elements in aggregate, as single columns.
One aspect to consider is performance. You forego the time and resources to process and parse nested elements -- the provider parses the returned data once, using streaming to read the data. Another consideration is your need to access any data stored in nested parent elements, and the ability of your tool or application to process XML.
Modeling a Top-Level Document View
With DataModel set to "Document" (the default), the provider scans only a single element, the top-level element by default. The top-level elements are available as columns due to default object flattening. Nested elements are returned as aggregated XML.
You can set XPath to specify an element other than the top-level one.
Example
Below is a sample query and the results, based on the sample document in Raw Data. The query results in a single "people" table based on the XPath "/root/people".
Connection String
Set the DataModel connection property to "Document" to perform the following query and see the example result set. The provider will scan only the XPath value below:
|
Query
The following query pulls the top-level elements and the subelements of the vehicles element into the results.
|
Results
With a document view of the data, the personal element is flattened into 4 columns and the source and vehicles elements are returned as individual columns, resulting in a table with 6 columns.
age | gender | name_first | name_last | source | vehicles | |
20 | M | John | Doe | internet | ||
24 | F | Jane | Roberts | phone |
Last updated