Monday, October 21, 2019

Building SQL Expressions in ArcGIS and QGIS

In the modern GIS industry, programming skill is an essential asset and one of the programming languages the is very popular within the industry is Structure Query Language (SQL) as you will later find out how it is been used in few moments.

Most of the query expressions used in ArcGIS or QGIS desktop software are derived from SQL. SQL is a standard language for storing, manipulating and retrieving data in databases.

Both ArcGIS and QGIS support the following common database engines: SQLite, MySQL, SQL Server, MS Access, Oracle, Sybase, Informix, PostgreSQL, and other database systems. When connected to any of them, you can take full advantage of SQL in GIS.

For small GIS projects where database isn't required, we make use of traditional GIS files type such as "Shapefile" which allow us to use query expressions that adhere to standard SQL expressions.


What is an expressions?
An expressions is a combination of "Constants, Variables, Operators and Functions" organized in an ordered statement that returns an output value. Expressions are unique to the computer language they are created in. An example of SQL expressions is: "SELECT * FROM <Layer_name> WHERE <Field_name> <Operator> <Value or String>".

If you data resides in a shapefiles or any of these (coverages, INFO tables, and dBASE tables), this part of the SQL expression (SELECT * FROM <Layer_name> WHERE) is automatically supplied for you, so you only provide this part (<Field_name> <Operator> <Value or String>) to query your data.

Since GIS data is made up of Spatial and Attribute, it is worth noting that 'Attribute Query' is similar to the standard SQL queries found in traditional database systems and this will be our focus in this article. On the other hand, 'Spatial Query' which allow operations such as "Contains, Equals, Intersects, Is Disjoint, Overlaps, Touches, Within and Crosses" requires some extension installed on traditional database systems to make them work.


Building SQL Expressions for Attribute Query

There are many places you can build expressions within both ArcGIS and QGIS software, some of the common places are listed below:-

ArcGIS Tools:
~ Select By Attribute
~ Definition Query
~ Field Calculator
~ Label Expression Dialog Box
~ Add Query Layer



QGIS Tools:
~ Select By Expression
~ Filter Query Builder
~ Field Calculator
~ Label Expression Dialog Box
~ Layer Property Display
~ Database Manager



Now that we know where to build and run SQL expressions, let's learn how to write the SQL commands. There are several tutorials on the internet, but the one I will recommend and use in this article is that on w3schools. So, to keep things simple, I will convert the customers table used on the SQL Tutorial into a shapefile for demonstrations in ArcGIS and QGIS.

How I converted the table into shapefile was simply by Geocoding the addresses into lat/long and subsequently importing the file into the GIS software, simple!

Note that shapefiles and other file based layers uses a limited version of SQL that doesn't support functions on many features of SQL (shapefile have other more limitations a listed on the is web page SwitchFromShapefile). This means with a shapefile layer based database, we can only use SQL to SELECT (extracts data from a shapefile database) and we can not perform some of the most important SQL commands such as:-
~ UPDATE – updates data in a database
~ DELETE – deletes data from a database
~ INSERT INTO – inserts new data into a database
~ CREATE DATABASE – creates a new database
~ ALTER DATABASE – modifies a database
~ CREATE TABLE – creates a new table
~ ALTER TABLE – modifies a table
~ DROP TABLE – deletes a table
~ CREATE INDEX – creates an index (search key)
~ DROP INDEX – deletes an index

Shapefile supports limited field/column types as listed below:-

~ Whole Number/Integer (Short/Long/32bit/64bit)
~ Decimal/Float/Double Number
~ Text/String/Variable Character
~ Date and Time

  
QGIS Add field types                                       ArcGIS Add field types


As you can see from above, a typical database will support more additional field types such as BINARY, BLOB, money, Currency, Hyperlink, etc as listed on this 'SQL Data Types' page.

In general, there are three main data types: string, numeric, and date/time. The frequently use field types are string and number types, so I will focus on those two in this tutorial.

String Attribute Query:

Works on attribute column/field with type as string.
General form for string attribute query expressions:- SELECT * FROM <Layer_name> WHERE
~ <Field_name> <Operator> <String>
~ <Field_name> <Operator> <String> <Connector> <Field_name> <Operator> <String>

The second line above is for compound queries. Optionally, parentheses () can be used for defining the order of operations in compound queries. The part before the WHERE clause, that is "SELECT * FROM <Layer_name> WHERE" is hard-coded and you can only build the expression part after the WHERE clause.

For file based layers, field/column names are enclosed in double quotes ("") or without anything. Enclosing it with single quote or square brackets only works with database layer. Values or Strings are always enclosed in single quote, except for numerical values.


Database: SELECT * FROM <Table_name> WHERE <Field_name> <Operator> <String>

Shapefile: <Field_name> <Operator> <String>


Examples 1: ---- String search ----
Database: SELECT * FROM Customers WHERE "Country" = 'Germany'
Database: SELECT * FROM Customers WHERE "Country" = 'USA' OR "Country" = 'usa'
Database: SELECT * FROM Customers WHERE "Country" IN ('USA', 'usa')

Shapefile: "Country" = 'Germany'
Shapefile: "Country" = 'USA' OR "Country" = 'usa'
Shapefile: "Country" IN ('USA', 'usa')





Examples 2: ---- String search for specific columns ----
Database: SELECT CustomerName, ContactName FROM Customers WHERE "Country" = 'USA'

Shapefile: NOT POSSIBLE



Examples 3: ---- Partial string search ----
Database: SELECT * FROM Customers WHERE CustomerName LIKE 'L%'
Database: SELECT * FROM Customers WHERE CustomerName LIKE '%t'
Database: SELECT * FROM Customers WHERE CustomerName LIKE '%t%'

(ArcGIS doesn't auto convert cases)
Shapefile: "CustomerNa" LIKE 'L%'
Shapefile: "CustomerNa" LIKE '%t'
Shapefile: "CustomerNa" LIKE '%t%'




Examples 4: ---- Compound partial string search ----
Database: SELECT * FROM Customers WHERE (CustomerName LIKE 'L%' OR CustomerName LIKE 'R%' OR CustomerName LIKE 'W%') AND Country='USA'

Shapefile: ("CustomerNa" LIKE 'L%' OR "CustomerNa" LIKE 'R%' OR "CustomerNa" LIKE 'W%') AND "Country"='USA'





Number Attribute Query:

Works on attribute column/field with type as Number.
This is just like with 'String Attribute Query' above, the only difference here is that the numerical value is never enclosed in single quotes.

 SELECT * FROM <Layer_name> WHERE
~ <Field_name> <Operator> <Value>
~ <Field_name> <Operator> <Value> <Connector> <Field_name> <Operator> <Value>


Database: SELECT * FROM <Table_name> WHERE <Field_name> <Operator> <Number_Value>

Shapefile: <Field_name> <Operator> <Number_Value>


Examples 1:---- Numeric field search as string (Works in W3S and QGIS) ----
Database: SELECT * FROM Customers WHERE "CustomerID" = 44
Database: SELECT * FROM Customers WHERE "CustomerID" = '44'

Shapefile: "CustomerID" = 44
Shapefile: "CustomerID" = '44'


Examples 2:---- Numeric search to compare value (Works in W3S and QGIS) ----
Database: SELECT * FROM Customers WHERE "CustomerID" >= 44
Database: SELECT * FROM Customers WHERE "CustomerID" >= '44'
Database: SELECT * FROM Customers WHERE "CustomerID" >= 44 AND "Country" IN ('USA', 'Italy', 'France', 'Brazil' )

Shapefile: "CustomerID" >= 44
Shapefile: "CustomerID" >= '44'
Shapefile: "CustomerID" >= 44 AND "Country" IN ('USA', 'Italy', 'France', 'Brazil' )


Examples 3:---- Calculation (Works in W3S and QGIS) ----
Select all customers with even ID number... Calculate the modulus, if the remainder is zero then it is an even ID.

Database: SELECT * FROM Customers WHERE CustomerID % 2 = 0

Shapefile: "CustomerID" % 2 = 0



Date/Time Attribute Query:

The current database has no date/time fields. But the approach is just like the two above.

PostgreSQL Database: SELECT * FROM <Table_name> WHERE Datefield = TIMESTAMP 'YYYY-MM-DD'

Shapefile: "Datefield" = date 'yyyy-mm-dd'


Spatial SQL

This is the query that works with the geometrical or map data in GIS. This works best with spatial enabled databases such PostGIS/PostgreSQL.

The prefix 'ST' in spatial SQL function like: ST_funtionName.... ST_Polygon, ST_IsEmpty, ST_Buffer, ST_Difference, etc stands for Spatial Type (originally, it was for Spatial and Temporal data).

The following list contains commonly used functions built into PostGIS, a free geodatabase which is a PostgreSQL extension (the term 'geometry' refers to a point, line, box or other two or three dimensional shape):-
functionName(parameter(s)) : return type
~ ST_Distance(geometry, geometry) : number
~ ST_Equals(geometry, geometry) : boolean
~ ST_Disjoint(geometry, geometry) : boolean
~ ST_Intersects(geometry, geometry) : boolean
~ ST_Touches(geometry, geometry) : boolean
~ ST_Crosses(geometry, geometry) : boolean
~ ST_Overlaps(geometry, geometry) : boolean
~ ST_Contains(geometry, geometry) : boolean
~ ST_Length(geometry) : number
~ ST_Area(geometry) : number
~ ST_Centroid(geometry) : geometry


Exercise:

Try to write out the SQL expression for the following in database and shapefile based GIS environment? The expected result is provided below each question.

1) Select all customers from the USA or from Brazil?



2) Select all customers from the UK but not based in London?



3) Select all customers in London with odd ID number?




Summary in four Videos:















References:

1) SQL Expressions in ArcMap
2) Building a query expression
3) SQL reference for query expressions used in ArcGIS



Thank you reading and hope it serves as a quick start road map for you to get started with SQL in GIS.

No comments:

Post a Comment