Azure Data Lake Storage Gen1

SELECT Statements

A SELECT statement can consist of the following basic clauses.

  • SELECT

  • INTO

  • FROM

  • JOIN

  • WHERE

  • GROUP BY

  • HAVING

  • UNION

  • ORDER BY

  • LIMIT

SELECT Syntax

The following syntax diagram outlines the syntax supported by the SQL engine of the provider:

SELECT {

[ TOP <numeric_literal> | DISTINCT ]

{

*

| {

<expression> [ [ AS ] <column_reference> ]

| { <table_name> | <correlation_name> } .*

} [ , ... ]

}

[ INTO csv:// [ filename= ] <file_path> [ ;delimiter=tab ] ]

{

FROM <table_reference> [ [ AS ] <identifier> ]

} [ , ... ]

[ [

INNER | { { LEFT | RIGHT | FULL } [ OUTER ] }

] JOIN <table_reference> [ ON <search_condition> ] [ [ AS ] <identifier> ]

] [ ... ]

[ WHERE <search_condition> ]

[ GROUP BY <column_reference> [ , ... ]

[ HAVING <search_condition> ]

[ UNION [ ALL ] <select_statement> ]

[

ORDER BY

<column_reference> [ ASC | DESC ] [ NULLS FIRST | NULLS LAST ]

]

[

LIMIT <expression>

[

{ OFFSET | , }

<expression>

]

]

}

<expression> ::=

| <column_reference>

| @ <parameter>

| ?

| COUNT( * | { [ DISTINCT ] <expression> } )

| { AVG | MAX | MIN | SUM | COUNT } ( <expression> )

| NULLIF ( <expression> , <expression> )

| COALESCE ( <expression> , ... )

| CASE <expression>

WHEN { <expression> | <search_condition> } THEN { <expression> | NULL } [ ... ]

[ ELSE { <expression> | NULL } ]

END

| <literal>

| <sql_function>

<search_condition> ::=

{

<expression> { = | > | < | >= | <= | <> | != | LIKE | NOT LIKE | IN | NOT IN | IS NULL | IS NOT NULL | AND | OR | CONTAINS | BETWEEN } [ <expression> ]

} [ { AND | OR } ... ]

Examples

  1. Return all columns:

    SELECT * FROM Resources

  2. Rename a column:

    SELECT [Permission] AS MY_Permission FROM Resources

  3. Cast a column's data as a different data type:

    SELECT CAST(AnnualRevenue AS VARCHAR) AS Str_AnnualRevenue FROM Resources

  4. Search data:

    SELECT * FROM Resources WHERE Type = 'FILE'

  5. Return the number of items matching the query criteria:

    SELECT COUNT(*) AS MyCount FROM Resources

  6. Return the number of unique items matching the query criteria:

    SELECT COUNT(DISTINCT Permission) FROM Resources

  7. Return the unique items matching the query criteria:

    SELECT DISTINCT Permission FROM Resources

  8. Summarize data:

    SELECT Permission, MAX(AnnualRevenue) FROM Resources GROUP BY Permission

    See Aggregate Functions below for details.

  9. Retrieve data from multiple tables.

    SELECT Customers.ContactName, Orders.OrderDate FROM Customers, Orders WHERE Customers.CustomerId=Orders.CustomerId

    See JOIN Queries below for details.

  10. Sort a result set in ascending order:

    SELECT FullPath, Permission FROM Resources ORDER BY Permission ASC

  11. Restrict a result set to the specified number of rows:

    SELECT FullPath, Permission FROM Resources LIMIT 10

  12. Parameterize a query to pass in inputs at execution time. This enables you to create prepared statements and mitigate SQL injection attacks.

    SELECT * FROM Resources WHERE Type = @param

Aggregate Functions

COUNT

Returns the number of rows matching the query criteria.

SELECT COUNT(*) FROM Resources WHERE Type = 'FILE'

COUNT(DISTINCT)

Returns the number of distinct, non-null field values matching the query criteria.

SELECT COUNT(DISTINCT FullPath) AS DistinctValues FROM Resources WHERE Type = 'FILE'

AVG

Returns the average of the column values.

SELECT Permission, AVG(AnnualRevenue) FROM Resources WHERE Type = 'FILE' GROUP BY Permission

MIN

Returns the minimum column value.

SELECT MIN(AnnualRevenue), Permission FROM Resources WHERE Type = 'FILE' GROUP BY Permission

MAX

Returns the maximum column value.

SELECT Permission, MAX(AnnualRevenue) FROM Resources WHERE Type = 'FILE' GROUP BY Permission

SUM

Returns the total sum of the column values.

SELECT SUM(AnnualRevenue) FROM Resources WHERE Type = 'FILE'

JOIN Queries

The Provider for Azure Data Lake Storage supports standard SQL joins like the following examples.

Inner Join

An inner join selects only rows from both tables that match the join condition:

SELECT Customers.ContactName, Orders.OrderDate FROM Customers, Orders WHERE Customers.CustomerId=Orders.CustomerId

Left Join

A left join selects all rows in the FROM table and only matching rows in the JOIN table:

SELECT Customers.ContactName, Orders.OrderDate FROM Customers LEFT OUTER JOIN Orders ON Customers.CustomerId=Orders.CustomerId

Date Literal Functions

The following date literal functions can be used to filter date fields using relative intervals. Note that while the <, >, and = operators are supported for these functions, <= and >= are not.

L_TODAY()

The current day.

SELECT * FROM MyTable WHERE MyDateField = L_TODAY()

L_YESTERDAY()

The previous day.

SELECT * FROM MyTable WHERE MyDateField = L_YESTERDAY()

L_TOMORROW()

The following day.

SELECT * FROM MyTable WHERE MyDateField = L_TOMORROW()

L_LAST_WEEK()

Every day in the preceding week.

SELECT * FROM MyTable WHERE MyDateField = L_LAST_WEEK()

L_THIS_WEEK()

Every day in the current week.

SELECT * FROM MyTable WHERE MyDateField = L_THIS_WEEK()

L_NEXT_WEEK()

Every day in the following week.

SELECT * FROM MyTable WHERE MyDateField = L_NEXT_WEEK()

Also available:

  • L_LAST/L_THIS/L_NEXT MONTH

  • L_LAST/L_THIS/L_NEXT QUARTER

  • L_LAST/L_THIS/L_NEXT YEAR

L_LAST_N_DAYS(n)

The previous n days, excluding the current day.

SELECT * FROM MyTable WHERE MyDateField = L_LAST_N_DAYS(3)

L_NEXT_N_DAYS(n)

The following n days, including the current day.

SELECT * FROM MyTable WHERE MyDateField = L_NEXT_N_DAYS(3)

Also available:

  • L_LAST/L_NEXT_90_DAYS

L_LAST_N_WEEKS(n)

Every day in every week, starting n weeks before current week, and ending in the previous week.

SELECT * FROM MyTable WHERE MyDateField = L_LAST_N_WEEKS(3)

L_NEXT_N_WEEKS(n)

Every day in every week, starting the following week, and ending n weeks in the future.

SELECT * FROM MyTable WHERE MyDateField = L_NEXT_N_WEEKS(3)

Also available:

  • L_LAST/L_NEXT_N_MONTHS(n)

  • L_LAST/L_NEXT_N_QUARTERS(n)

  • L_LAST/L_NEXT_N_YEARS(n)

Last updated