Apache HDFS
SELECT Statements
A SELECT statement can consist of the following basic clauses.
SELECT
INTO
FROM
JOIN
WHERE
GROUP BY
HAVING
UNION
ORDER BY
LIMIT
SELECT Syntax
The following syntax diagram outlines the syntax supported by the SQL engine of the provider:
SELECT
{
[ TOP
<numeric_literal> | DISTINCT
]
{
*
| {
<expression> [ [ AS
] <column_reference> ]
| { <table_name> | <correlation_name> } .*
} [ , ... ]
}
[ INTO
csv:// [ filename= ] <file_path> [ ;delimiter=tab ] ]
{
FROM
<table_reference> [ [ AS
] <identifier> ]
} [ , ... ]
[ [
INNER
| { { LEFT
| RIGHT
| FULL
} [ OUTER
] }
] JOIN
<table_reference> [ ON
<search_condition> ] [ [ AS
] <identifier> ]
] [ ... ]
[ WHERE
<search_condition> ]
[ GROUP
BY
<column_reference> [ , ... ]
[ HAVING
<search_condition> ]
[ UNION
[ ALL
] <select_statement> ]
[
ORDER
BY
<column_reference> [ ASC
| DESC
] [ NULLS FIRST
| NULLS LAST
]
]
[
LIMIT <expression>
[
{ OFFSET | , }
<expression>
]
]
}
<expression> ::=
| <column_reference>
| @ <parameter>
| ?
| COUNT( * | { [ DISTINCT
] <expression> } )
| { AVG
| MAX
| MIN
| SUM
| COUNT
} ( <expression> )
| NULLIF
( <expression> , <expression> )
| COALESCE
( <expression> , ... )
| CASE
<expression>
WHEN
{ <expression> | <search_condition> } THEN
{ <expression> | NULL
} [ ... ]
[ ELSE
{ <expression> | NULL
} ]
END
| <literal>
| <sql_function>
<search_condition> ::=
{
<expression> { = | > | < | >= | <= | <> | != | LIKE
| NOT
LIKE
| IN
| NOT
IN
| IS
NULL
| IS
NOT
NULL
| AND
| OR
| CONTAINS
| BETWEEN
} [ <expression> ]
} [ { AND
| OR
} ... ]
Examples
Return all columns:
SELECT * FROM Files
Rename a column:
SELECT [ChildrenNum] AS MY_ChildrenNum FROM Files
Cast a column's data as a different data type:
SELECT CAST(Length AS VARCHAR) AS Str_Length FROM Files
Search data:
SELECT * FROM Files WHERE FileId = '119116'
Return the number of items matching the query criteria:
SELECT COUNT(*) AS MyCount FROM Files
Return the number of unique items matching the query criteria:
SELECT COUNT(DISTINCT ChildrenNum) FROM Files
Return the unique items matching the query criteria:
SELECT DISTINCT ChildrenNum FROM Files
Summarize data:
SELECT ChildrenNum, MAX(Length) FROM Files GROUP BY ChildrenNum
See Aggregate Functions below for details.
Retrieve data from multiple tables.
SELECT c.Owner, o.OwnerRead, o.OwnerWrite, o.OwnerExecute FROM Files c INNER JOIN Permissions o ON c.FullPath = o.FullPath
See JOIN Queries below for details.
Sort a result set in ascending order:
SELECT FileId, ChildrenNum FROM Files ORDER BY ChildrenNum ASC
Restrict a result set to the specified number of rows:
SELECT FileId, ChildrenNum FROM Files LIMIT 10
Parameterize a query to pass in inputs at execution time. This enables you to create prepared statements and mitigate SQL injection attacks.
SELECT * FROM Files WHERE FileId = @param
Aggregate Functions
COUNT
Returns the number of rows matching the query criteria.
SELECT COUNT(*) FROM Files WHERE FileId = '119116'
COUNT(DISTINCT)
Returns the number of distinct, non-null field values matching the query criteria.
SELECT COUNT(DISTINCT FileId) AS DistinctValues FROM Files WHERE FileId = '119116'
AVG
Returns the average of the column values.
SELECT ChildrenNum, AVG(Length) FROM Files WHERE FileId = '119116'
GROUP BY ChildrenNum
MIN
Returns the minimum column value.
SELECT MIN(Length), ChildrenNum FROM Files WHERE FileId = '119116'
GROUP BY ChildrenNum
MAX
Returns the maximum column value.
SELECT ChildrenNum, MAX(Length) FROM Files WHERE FileId = '119116'
GROUP BY ChildrenNum
SUM
Returns the total sum of the column values.
SELECT SUM(Length) FROM Files WHERE FileId = '119116'
JOIN Queries
The Provider for HDFS supports standard SQL joins like the following examples.
Inner Join
An inner join selects only rows from both tables that match the join condition:
SELECT c.Owner, o.OwnerRead, o.OwnerWrite, o.OwnerExecute FROM Files c INNER JOIN Permissions o ON c.FullPath = o.FullPath
Left Join
A left join selects all rows in the FROM table and only matching rows in the JOIN table:
SELECT c.Group, o.GroupRead, o.GroupWrite, o.GroupExecute FROM Files c LEFT JOIN Permissions o ON c.FullPath = o.FullPath
Date Literal Functions
The following date literal functions can be used to filter date fields using relative intervals. Note that while the <, >, and = operators are supported for these functions, <= and >= are not.
L_TODAY()
The current day.
SELECT * FROM MyTable WHERE MyDateField = L_TODAY()
L_YESTERDAY()
The previous day.
SELECT * FROM MyTable WHERE MyDateField = L_YESTERDAY()
L_TOMORROW()
The following day.
SELECT * FROM MyTable WHERE MyDateField = L_TOMORROW()
L_LAST_WEEK()
Every day in the preceding week.
SELECT * FROM MyTable WHERE MyDateField = L_LAST_WEEK()
L_THIS_WEEK()
Every day in the current week.
SELECT * FROM MyTable WHERE MyDateField = L_THIS_WEEK()
L_NEXT_WEEK()
Every day in the following week.
SELECT * FROM MyTable WHERE MyDateField = L_NEXT_WEEK()
Also available:
L_LAST/L_THIS/L_NEXT MONTH
L_LAST/L_THIS/L_NEXT QUARTER
L_LAST/L_THIS/L_NEXT YEAR
L_LAST_N_DAYS(n)
The previous n days, excluding the current day.
SELECT * FROM MyTable WHERE MyDateField = L_LAST_N_DAYS(3)
L_NEXT_N_DAYS(n)
The following n days, including the current day.
SELECT * FROM MyTable WHERE MyDateField = L_NEXT_N_DAYS(3)
Also available:
L_LAST/L_NEXT_90_DAYS
L_LAST_N_WEEKS(n)
Every day in every week, starting n weeks before current week, and ending in the previous week.
SELECT * FROM MyTable WHERE MyDateField = L_LAST_N_WEEKS(3)
L_NEXT_N_WEEKS(n)
Every day in every week, starting the following week, and ending n weeks in the future.
SELECT * FROM MyTable WHERE MyDateField = L_NEXT_N_WEEKS(3)
Also available:
L_LAST/L_NEXT_N_MONTHS(n)
L_LAST/L_NEXT_N_QUARTERS(n)
L_LAST/L_NEXT_N_YEARS(n)
Last updated