ElasticSearch

SELECT Statements

A SELECT statement can consist of the following basic clauses.

  • SELECT

  • INTO

  • FROM

  • JOIN

  • WHERE

  • GROUP BY

  • HAVING

  • UNION

  • ORDER BY

  • LIMIT

SELECT Syntax

The following syntax diagram outlines the syntax supported by the SQL engine of the provider:

SELECT {

[ TOP <numeric_literal> | DISTINCT ]

{

*

| {

<expression> [ [ AS ] <column_reference> ]

| { <table_name> | <correlation_name> } .*

} [ , ... ]

}

[ INTO csv:// [ filename= ] <file_path> [ ;delimiter=tab ] ]

{

FROM <table_reference> [ [ AS ] <identifier> ]

} [ , ... ]

[ [

INNER | { { LEFT | RIGHT | FULL } [ OUTER ] }

] JOIN <table_reference> [ ON <search_condition> ] [ [ AS ] <identifier> ]

] [ ... ]

[ WHERE <search_condition> ]

[ GROUP BY <column_reference> [ , ... ]

[ HAVING <search_condition> ]

[ UNION [ ALL ] <select_statement> ]

[

ORDER BY

<column_reference> [ ASC | DESC ] [ NULLS FIRST | NULLS LAST ]

]

[

LIMIT <expression>

[

{ OFFSET | , }

<expression>

]

]

} | SCOPE_IDENTITY()

<expression> ::=

| <column_reference>

| @ <parameter>

| ?

| COUNT( * | { [ DISTINCT ] <expression> } )

| { AVG | MAX | MIN | SUM | COUNT } ( <expression> )

| NULLIF ( <expression> , <expression> )

| COALESCE ( <expression> , ... )

| CASE <expression>

WHEN { <expression> | <search_condition> } THEN { <expression> | NULL } [ ... ]

[ ELSE { <expression> | NULL } ]

END

| <literal>

| <sql_function>

<search_condition> ::=

{

<expression> { = | > | < | >= | <= | <> | != | LIKE | NOT LIKE | IN | NOT IN | IS NULL | IS NOT NULL | AND | OR | CONTAINS | BETWEEN } [ <expression> ]

} [ { AND | OR } ... ]

Examples

  1. Return all columns:

    SELECT * FROM [CData].[Elasticsearch].Employee

  2. Rename a column:

    SELECT [Name] AS MY_Name FROM [CData].[Elasticsearch].Employee

  3. Cast a column's data as a different data type:

    SELECT CAST(AnnualRevenue AS VARCHAR) AS Str_AnnualRevenue FROM [CData].[Elasticsearch].Employee

  4. Search data:

    SELECT * FROM [CData].[Elasticsearch].Employee WHERE Industry = 'Floppy Disks'

  5. Return the number of items matching the query criteria:

    SELECT COUNT(*) AS MyCount FROM [CData].[Elasticsearch].Employee

  6. Return the number of unique items matching the query criteria:

    SELECT COUNT(DISTINCT Name) FROM [CData].[Elasticsearch].Employee

  7. Return the unique items matching the query criteria:

    SELECT DISTINCT Name FROM [CData].[Elasticsearch].Employee

  8. Summarize data:

    SELECT Name, MAX(AnnualRevenue) FROM [CData].[Elasticsearch].Employee GROUP BY Name

    See Aggregate Functions below for details.

  9. Retrieve data from multiple tables.

    SELECT Customer.FirstName, Customer.LastName, Purchases.ItemName FROM Purchases INNER JOIN Customer ON Purchases.CustomerId = Customer.Id

    See JOIN Queries below for details.

  10. Sort a result set in ascending order:

    SELECT Id, Name FROM [CData].[Elasticsearch].Employee ORDER BY Name ASC

  11. Restrict a result set to the specified number of rows:

    SELECT Id, Name FROM [CData].[Elasticsearch].Employee LIMIT 10

  12. Parameterize a query to pass in inputs at execution time. This enables you to create prepared statements and mitigate SQL injection attacks.

    SELECT * FROM [CData].[Elasticsearch].Employee WHERE Industry = @param

Aggregate Functions

COUNT

Returns the number of rows matching the query criteria.

SELECT COUNT(*) FROM [CData].[Elasticsearch].Employee WHERE Industry = 'Floppy Disks'

COUNT(DISTINCT)

Returns the number of distinct, non-null field values matching the query criteria.

SELECT COUNT(DISTINCT Id) AS DistinctValues FROM [CData].[Elasticsearch].Employee WHERE Industry = 'Floppy Disks'

AVG

Returns the average of the column values.

SELECT Name, AVG(AnnualRevenue) FROM [CData].[Elasticsearch].Employee WHERE Industry = 'Floppy Disks' GROUP BY Name

MIN

Returns the minimum column value.

SELECT MIN(AnnualRevenue), Name FROM [CData].[Elasticsearch].Employee WHERE Industry = 'Floppy Disks' GROUP BY Name

MAX

Returns the maximum column value.

SELECT Name, MAX(AnnualRevenue) FROM [CData].[Elasticsearch].Employee WHERE Industry = 'Floppy Disks' GROUP BY Name

SUM

Returns the total sum of the column values.

SELECT SUM(AnnualRevenue) FROM [CData].[Elasticsearch].Employee WHERE Industry = 'Floppy Disks'

JOIN Queries

The Provider for Elasticsearch supports standard SQL joins like the following examples.

Inner Join

An inner join selects only rows from both tables that match the join condition:

SELECT Customer.FirstName, Customer.LastName, Purchases.ItemName FROM Purchases INNER JOIN Customer ON Purchases.CustomerId = Customer.Id

Left Join

A left join selects all rows in the FROM table and only matching rows in the JOIN table:

SELECT Customer.FirstName, Customer.LastName, Purchases.ItemName FROM Purchases LEFT JOIN Customer ON Purchases.CustomerId = Customer.Id

Date Literal Functions

The following date literal functions can be used to filter date fields using relative intervals. Note that while the <, >, and = operators are supported for these functions, <= and >= are not.

L_TODAY()

The current day.

SELECT * FROM MyTable WHERE MyDateField = L_TODAY()

L_YESTERDAY()

The previous day.

SELECT * FROM MyTable WHERE MyDateField = L_YESTERDAY()

L_TOMORROW()

The following day.

SELECT * FROM MyTable WHERE MyDateField = L_TOMORROW()

L_LAST_WEEK()

Every day in the preceding week.

SELECT * FROM MyTable WHERE MyDateField = L_LAST_WEEK()

L_THIS_WEEK()

Every day in the current week.

SELECT * FROM MyTable WHERE MyDateField = L_THIS_WEEK()

L_NEXT_WEEK()

Every day in the following week.

SELECT * FROM MyTable WHERE MyDateField = L_NEXT_WEEK()

Also available:

  • L_LAST/L_THIS/L_NEXT MONTH

  • L_LAST/L_THIS/L_NEXT QUARTER

  • L_LAST/L_THIS/L_NEXT YEAR

L_LAST_N_DAYS(n)

The previous n days, excluding the current day.

SELECT * FROM MyTable WHERE MyDateField = L_LAST_N_DAYS(3)

L_NEXT_N_DAYS(n)

The following n days, including the current day.

SELECT * FROM MyTable WHERE MyDateField = L_NEXT_N_DAYS(3)

Also available:

  • L_LAST/L_NEXT_90_DAYS

L_LAST_N_WEEKS(n)

Every day in every week, starting n weeks before current week, and ending in the previous week.

SELECT * FROM MyTable WHERE MyDateField = L_LAST_N_WEEKS(3)

L_NEXT_N_WEEKS(n)

Every day in every week, starting the following week, and ending n weeks in the future.

SELECT * FROM MyTable WHERE MyDateField = L_NEXT_N_WEEKS(3)

Also available:

  • L_LAST/L_NEXT_N_MONTHS(n)

  • L_LAST/L_NEXT_N_QUARTERS(n)

  • L_LAST/L_NEXT_N_YEARS(n)

Predicate Functions

COMMON(expression, cutoff_frequency)

Used to explicitly specify the query type to send and thus will send 'expression' in a common terms query.

Example SQL Query:

SELECT * FROM employee WHERE COMMON(about) = 'like to build'

Elasticsearch Query:

{"common":{"about":{"query":"like to build"}}}

  • expression: The expression to search for.

  • cutoff_frequency: The cutoff frequency value used to allocate terms to the high or low frequency group. Can be an absolute frequency (>=1) or a relative frequency (0.0 .. 1.0).

FILTER(expression)

Used to explicitly specify the filter context and thus will send 'expression' in a filter context, rather than a query context. A filter context does not affect the calculated scores. This is useful when performing queries where you want part of the filter to be used to calculate scores but filter the results returned (without affecting the score) using additional criteria.

Example SQL Query:

SELECT * FROM employee WHERE FILTER(TERM(first_name)) = 'john'

Elasticsearch Query:

{"filter":{"bool":{"must":{"term":{"first_name":"john"}}}}}

  • expression: Either a column or another function.

GEO_BOUNDING_BOX(column, top_left, bottom_right)

Used to specify a query to filter hits based on a point location using a bounding box.

Example SQL Query:

SELECT * FROM cities WHERE GEO_BOUNDING_BOX(location, '[-74.1,40.73]', '[-71.12,40.01]')

Elasticsearch Query:

{"bool":{"filter":{"geo_bounding_box":{"location":{"top_left":[-74.1,40.73],"bottom_right":[-71.12,40.01]}}},"must":[{"match_all":{}}]}}

  • column: A Geo-point column to perform the GEO_BOUNDING_BOX filter on.

  • top_left: The top-left coordinates of the bounding box. This value can be an array [shown in example], object of lat and lon values, comma-separated list, or a geohash of a latitude and longitude value.

  • bottom_right: The bottom-right coordinates of the bounding box. This value can be an array [shown in example], object of lat and lon values, comma-separated list, or a geohash of a latitude and longitude value.

GEO_BOUNDING_BOX(column, top, left, bottom, right)

Used to specify a query to filter hits based on a point location using a bounding box.

Example SQL Query:

SELECT * FROM cities WHERE GEO_BOUNDING_BOX(location, -74.1, 40.73, -71.12, 40.01)

Elasticsearch Query:

{"bool":{"filter":{"geo_bounding_box":{"location":{"top":-74.1,"left":40.73,"bottom":-71.12,"right":40.01}}},"must":[{"match_all":{}}]}}

  • column: A Geo-point column to perform the GEO_BOUNDING_BOX filter on.

  • top: The top coordinate of the bounding box.

  • left: The left coordinate of the bounding box.

  • bottom: The bottom coordinate of the bounding box.

  • right: The right coordinate of the bounding box.

GEO_DISTANCE(column, point_lat_lon, distance)

Used to specify a query to filter documents that include only the hits that exist within a specific distance from a geo point.

Example SQL Query:

SELECT * FROM cities WHERE GEO_DISTANCE(location, '40,-70', '12mi')

Elasticsearch Query:

{"bool":{"filter":{"geo_distance":{"location":"40,-70","distance":"12mi"}},"must":[{"match_all":{}}]}}

  • column: A Geo-point column to perform the GEO_DISTANCE filter on.

  • point_lat_lon: The coordinates of a geo point that will be used to measure the distance from. This value can be an array, object of lat and lon values, comma-separated list [shown in example], or a geohash of a latitude and longitude value.

  • distance: The distance to search within from the specified geo point. This value takes an numeric value along with a distance unit. Common distance units are: mi (miles), yd (yards), ft (feet), in (inch), km (kilometers), m (meters). Please see Elastic documentation for complete list of distance units.

GEO_DISTANCE_RANGE(column, point_lat_lon, from_distance, to_distance)

Used to specify a query to filter documents that include only the hits that exist within a range from a specific geo point.

Example SQL Query:

SELECT * FROM cities WHERE GEO_DISTANCE_RANGE(location, 'drn5x1g8cu2y', '10mi', '20mi')

Elasticsearch Query:

{"bool":{"filter":{"geo_distance_range":{"location":"drn5x1g8cu2y","from":"10mi","to":"20mi"}},"must":[{"match_all":{}}]}}

  • column: A Geo-point column to perform the GEO_DISTANCE_RANGE filter on.

  • point_lat_lon: The coordinates of a geo point that will be used to measure the range from. This value can be an array, object of lat and lon values, comma-separated list, or a geohash [shown in example] of a latitude and longitude value.

  • from_distance: The starting distance to calculate the range from the specified geo point. This value takes an numeric value along with a distance unit. Common distance units are: mi (miles), yd (yards), ft (feet), in (inch), km (kilometers), m (meters). Please see Elastic documentation for complete list of distance units.

  • to_distance: The end distance to calculate the range from the specified geo point. This value takes an numeric value along with a distance unit. Common distance units are: mi (miles), yd (yards), ft (feet), in (inch), km (kilometers), m (meters). Please see Elastic documentation for complete list of distance units.

GEO_POLYGON(column, points)

Used to specify a query to filter hits that only fall within a polygon of points.

Example SQL Query:

SELECT * FROM cities WHERE GEO_POLYGON(location, '[{"lat":40,"lon":-70},{"lat":30,"lon":-80},{"lat":20,"lon":-90}]')

Elasticsearch Query:

{"bool":{"filter":{"geo_polygon":{"location":{"points":[{"lat":40,"lon":-70},{"lat":30,"lon":-80},{"lat":20,"lon":-90}]}}},"must":[{"match_all":{}}]}}

  • column: A Geo-point column to perform the GEO_POLYGON filter on.

  • points: A JSON array of points that make up a polygon. This value can be an array of arrays, object of lat and lon values [shown in example], comma-separated lists, or geohashes of a latitude and longitude value.

GEO_SHAPE(column, type, points [, relation])

Used to specify an inline shape query to filter documents using the geo_shape type to find documents that have a shape that intersects with the query shape.

Example SQL Query:

SELECT * FROM shapes WHERE GEO_SHAPE(my_shape, 'envelope', '[[13.0, 53.0], [14.0, 52.0]]

Elasticsearch Query:

{"bool":{"filter":{"geo_shape":{"my_shape":{"shape":{"type":"envelope","coordinates":[[13.0, 53.0], [14.0, 52.0]]}}}},"must":[{"match_all":{}}]}}

  • column: A Geo-shape column to perform the GEO_SHAPE filter on.

  • type: The type of shape to search for. Valid values: point, linestring, polygon, multipoint, multilinestring, multipolygon, geometrycollection, envelope, and circle. Please see Elastic documentation for further information regarding these shapes.

  • points: The coordinates for the shape type specified. These coordinates and their structure will vary depending upon the shape type desired. Please see Elastic search documentation for further details.

  • relation: The name of the spatial relation operator to use at search time. Valid values: intersects (default), disjoint, within, and contains. Please see Elastic documentation for further information regarding spatial relations.

INARRAY(column)

Used to search for values contained within a primitive array. Supports comparison operators based on the data type contained within the array, including the LIKE operator.

Example SQL Query:

SELECT * FROM employee WHERE INARRAY(skills) = 'coding'

  • column: A primitive array column to filter on.

MATCH(column)

Used to explicitly specify the query type to send and thus will send 'column' in a match query.

Example SQL Query:

SELECT * FROM employee WHERE MATCH(last_name) = 'SMITH'

Elasticsearch Query:

{"match":{"last_name":"SMITH"}}

  • column: A column to perform the match query on.

MATCH_PHRASE(column)

Used to explicitly specify the query type to send and thus will send 'column' in a match phrase query.

Example SQL Query:

SELECT * FROM employee WHERE MATCH_PHRASE(about) = 'rides motorbikes'

Elasticsearch Query:

{"match_phrase":{"about":"rides motorbikes"}}

  • column: A column to perform the match phrase query on.

MATCH_PHRASE_PREFIX(column)

Used to explicitly specify the query type to send and thus will send 'column' in a match phrase prefix query. The match phrase prefix query is the same as a match query except that it allows for prefix matches on the last term in the text.

Example SQL Query:

SELECT * FROM employee WHERE MATCH_PHRASE_PREFIX(about) = 'quick brown f'

Elasticsearch Query:

{"match_phrase_prefix":{"about":"quick brown f"}}

  • expression: A column to perform the match phrase prefix query on.

TERM(column)

Used to explicitly specify the query type to send and thus will send 'column' in a term query.

Example SQL Query:

SELECT * FROM employee WHERE TERM(last_name) = 'jacobs'

Elasticsearch Query:

{"term":{"last_name":"jacobs"}}

  • column: A column to perform the term query on.

DSLQuery([table,] dsl_json)

Used to explicitly specify the Elasticsearch DSL query to send in the request. Can be used along with other filters and the AND and OR operators.

DSL query JSON can contain a full 'bool' query object, a 'must', 'should', 'must_not', or 'filter' occurrence type, or just a clause object (which will append to a 'must' (default) or 'should' occurrence type depending on whether an AND or OR operator is used).

Example SQL Query (These examples generate the same query using a 'bool' object, 'must' occurrence type, and query object):

SELECT * FROM employee WHERE DSLQuery('{"bool":{"must":[{"query_string":{"default_field":"last_name","query":"\\"Smith\\""}}]}}')SELECT * FROM employee WHERE DSLQuery('{"must":[{"query_string":{"default_field":"last_name","query":"\\"Smith\\""}}]}')SELECT * FROM employee WHERE DSLQuery('{"query_string":{"default_field":"last_name","query":"\\"Smith\\""}}')

Elasticsearch Query:

{"bool":{"must":[{"query_string":{"default_field":"last_name","query":"\\"Smith\\""}}]}}

Example SQL Query (with OR operator):

SELECT * FROM employee WHERE Age < 10 OR DSLQuery('{"should":[{"query_string":{"default_field":"last_name","query":"\"Smith\""}}]}')

Elasticsearch Query:

{"bool":{"should":[{"range":{"age":{"lt":10}}},{"query_string":{"default_field":"last_name","query":"\"Smith\""}}]}}

Additionally you can specify the table that you want the DSLQuery to be issued on, this is useful when executing queries against multiple tables such as JOIN queries.

Example SQL Query:

SELECT * FROM employee JOIN job ON employee.jobid = job.id WHERE DSLQuery(employee, '{"bool":{"must":[{"query_string":{"default_field":"last_name","query":"\\"Smith\\""}}]}}')

  • column: A column to perform the term query on.

ORDER BY Functions

MAPFIELD(column, data_type)

Used to explicitly specify a mapping (by sending the 'unmapped_type' sort option) for a column that does not have a mapping associated with it, which will enable sorting on the column. By default, if a column does not have a mapping, an exception will be thrown containing an error message similar to: "No mapping found for [column] in order to sort on".

Example SQL Query:

SELECT * FROM employee ORDER BY MAPFIELD(start_date, 'long') DESC

Elasticsearch Sort:

{"start_date":{"order":"desc", "unmapped_type": "long"}}

  • column: The column to perform the order by on.

  • data_type: The Elasticsearch data type to map the column to.

Last updated