Parsing Hierarchical Data

The provider offers three basic configurations to model nested data in XML as tables.

Flattened Documents Model

For users who simply need access to the entirety of their XML data, flattening the data into a single table is the best option. The provider will use streaming and only parses the XML data once per query in this mode.

Joining Object Arrays into a Single Table

With DataModel set to "FlattenedDocuments", the provider returns a separate table for each object array, but implicitly JOINed to the parent table. Any nested sibling XPath values (child paths at the same height) will be treated as a SQL CROSS JOIN.

Example

Below is a sample query and the results, based on the sample document in Raw Data and parsing based on the XPaths /root/people, /root/people/vehicles, and /root/people/vehicles/maintenance. This implicitly JOINs the people element with the vehicles element and implicitly JOINs the vehicles element with the maintenance element.

Connection String

Use the following connection string to query the Raw Data in this example.

URI=C:\people.txt;DataModel=FlattenedDocuments;XPath='/root/people;/root/people/vehicles;/root/people/vehicles/maintenance;'

Query

The following query drills into the nested elements in each people element. Since the XPath property included the vehicles node as an XML path, you can query an element of a vehicle explicitly.

SELECT [personal.age] AS age, [personal.gender] AS gender, [personal.name.first] AS name_first, [personal.name.last] AS name_last, [source], [type], [model], [insurance.company] AS ins_company, [insurance.policy_num] AS ins_policy_num, [date] AS maint_date, [desc] AS maint_descFROM [people]

Results

With horizontal and vertical flattening based on the described paths, each vehicle element is implicitly JOINed to its parent people element and each maintenance element is implicitly JOINed to its parent vehicle element.

age

gender

first_name

last_name

source

type

model

ins_company

ins_policy_num

maint_date

maint_desc

20

M

John

Doe

internet

car

Honda Civic

ABC Insurance

12345

2017-07-17

oil change

20

M

John

Doe

internet

car

Honda Civic

ABC Insurance

12345

2018-01-03

new tires

20

M

John

Doe

internet

truck

Dodge Ram

ABC Insurance

12345

2017-08-27

new tires

20

M

John

Doe

internet

truck

Dodge Ram

ABC Insurance

12345

2018-01-08

oil change

24

F

Jane

Roberts

phone

car

Toyota Camry

Car Insurance

98765

2017-05-11

tires rotated

24

F

Jane

Roberts

phone

car

Toyota Camry

Car Insurance

98765

2017-11-03

oil change

24

F

Jane

Roberts

phone

car

Honda Accord

Car Insurance

98765

2017-10-07

new air filter

24

F

Jane

Roberts

phone

car

Honda Accord

Car Insurance

98765

2018-01-13

new brakes

Relational Model:

The provider for XML can be configured to create a relational model of the data in the XML file or source, treating each XPath as an individual table containing a primary key and a foreign key that links to the parent document. This is particularly useful if you need to work with your XML data in existing BI, reporting, and ETL tools that expect a relational data model.

Joining Nested Arrays as Tables

With DataModel set to "Relational", any JOINs are controlled by the query. Any time you perform a JOIN query, the XML file or source will be queried once for each table included in the query.

Example

Below is a sample query against the sample document in Raw Data, using a relational model based on the XML paths "/root/people", "/root/people/vehicles", and "/root/people/vehicles/maintenance".

Connecting String

Set the DataModel connection property to "Relational" and set the XPath connection property to "/root/people;/root/people/vehicles;/root/people/vehicles/maintenance;" to perform the following query and see the example result set.

URI=C:\people.txt;DataModel=Relational;XPath='/root/people;/root/people/vehicles;/root/people/vehicles/maintenance;'

Query

The following query explicitly JOINs the people, vehicles, and maintenance tables.

SELECT [people].[personal.age] AS age, [people].[personal.gender] AS gender, [people].[personal.name.first] AS first_name, [people].[personal.name.last] AS last_name, [people].[source], [vehicles].[type], [vehicles].[model], [vehicles].[insurance.company] AS ins_company, [vehicles].[insurance.policy_num] AS ins_policy_num, [maintenance].[date] AS maint_date, [maintenance].[desc] AS maint_descFROM [people]JOIN [vehicles]ON [people].[_id] = [vehicles].[people_id]JOIN [maintenance]ON [vehicles].[_id] = [maintenance].[vehicles_id]

Results

In the example query, each maintenance element is JOINed to its parent vehicle element, which is JOINed to its parent people element to produce a table with 8 rows (2 maintenance entries for each of 2 vehicles each for 2 people).

age

gender

first_name

last_name

source

type

model

ins_company

ins_policy_num

maint_date

maint_desc

20

M

John

Doe

internet

car

Honda Civic

ABC Insurance

12345

2017-07-17

oil change

20

M

John

Doe

internet

car

Honda Civic

ABC Insurance

12345

2018-01-03

new tires

20

M

John

Doe

internet

truck

Dodge Ram

ABC Insurance

12345

2017-08-27

new tires

20

M

John

Doe

internet

truck

Dodge Ram

ABC Insurance

12345

2018-01-08

oil change

24

F

Jane

Roberts

phone

car

Toyota Camry

Car Insurance

98765

2017-05-11

tires rotated

24

F

Jane

Roberts

phone

car

Toyota Camry

Car Insurance

98765

2017-11-03

oil change

24

F

Jane

Roberts

phone

car

Honda Accord

Car Insurance

98765

2017-10-07

new air filter

24

F

Jane

Roberts

phone

car

Honda Accord

Car Insurance

98765

2018-01-13

new brakes

Top-Level Document Model

Using a top-level document view of the data provides ready access to top-level elements. The provider returns nested elements in aggregate, as single columns.

One aspect to consider is performance. You forego the time and resources to process and parse nested elements -- the provider parses the returned data once, using streaming to read the data. Another consideration is your need to access any data stored in nested parent elements, and the ability of your tool or application to process XML.

Modeling a Top-Level Document View

With DataModel set to "Document" (the default), the provider scans only a single element, the top-level element by default. The top-level elements are available as columns due to default object flattening. Nested elements are returned as aggregated XML.

You can set XPath to specify an element other than the top-level one.

Example

Below is a sample query and the results, based on the sample document in Raw Data. The query results in a single "people" table based on the XPath "/root/people".

Connection String

Set the DataModel connection property to "Document" to perform the following query and see the example result set. The provider will scan only the XPath value below:

URI=C:\people.txt;DataModel=Document;XPath='/root/people';

Query

The following query pulls the top-level elements and the subelements of the vehicles element into the results.

SELECT [personal.age] AS age, [personal.gender] AS gender, [personal.name.first] AS name_first, [personal.name.last] AS name_last, [source], [vehicles]FROM [people]

Results

With a document view of the data, the personal element is flattened into 4 columns and the source and vehicles elements are returned as individual columns, resulting in a table with 6 columns.

age

gender

name_first

name_last

source

vehicles

20

M

John

Doe

internet

24

F

Jane

Roberts

phone

Last updated