Monday, August 8, 2016

GIS Programming with Python and QGIS - Part 2

Hello,
Now that I have introduced you to GIS Programming with Python and QGIS Python Console in part 1 (read part 1 here), lets dive deeper here in part 2.

In this article, am going to run you through some important basics of python GIS programming on the QGIS Python Console. This tutorial will serve as a quick crash course on geoprocessing with Python programming language. It focuses on what Geospatial scientists need. Is also a quick introduction to Python.

Since this tutorial is for the purpose of Geospatial Information System, am going to keep this basic python introduction closely involved with GIS concepts and terminologies. For example to demonstrate the use of print statement the (print() function in python 3), instead of typing {print ‘Hello World’}, I will type {print ‘Hello GIS’}. I hope you got the gist.

The primary difference between conventional programming and this GIS programming we are learning is: "the ability to relate spatial or positional elements (usually in the form of Latitude and Longitude) to GIS programming". So the goal of GIS programming is for you to be able to write script for automating and manipulating a series of spatially related tasks.

This article gives you an overview of what is available in the Python programming universe to help you with GIS programming and integrating with QGIS tools.

Now fire up your QGIS and open the Python Console.

Using Python as a Calculator


You can easily use the python console as a calculator to perform simple and complex arithmetic. Example; lets carry out some arithmetic on coordinates and areas.


Python integer division, like C or Fortran integer division, truncates the remainder and returns an integer. At least it does in Python version 2. In version 3, Python returns a floating point number. You can get a sneak preview of this feature in Python 2 by importing the module from the future features:

from __future__ import division


Import math module


To perform complex arithmetic you will need to use libraries such as math, numpy etc.

Python has a huge number of libraries included with the distribution. To keep things simple, most of these variables and functions are not accessible from a normal Python interactive session. Instead, you have to import the name. For example, there is a math module containing many useful functions. To access, say, the square root function, you can can import math using any of the following styles;-

a) import math
math.sqrt(81)

b) import math as m
m.sqrt(81)

c) from math import sqrt
sqrt(81)

d) from math import *
sqrt(81)

Whichever style you used above will give back the same result for the square root of 81.



There are other functions in the math module, used dir(math) to see the full list.



Variables in Python


You can define variables using the equals (=) sign.
You can name a variable almost anything you want. It needs to start with an alphabetical character or "_", can contain alphanumeric charcters plus underscores ("_"). Certain words, however, are reserved for the language:

and, as, assert, break, class, continue, def, del, elif, else, except, 
exec, finally, for, from, global, if, import, in, is, lambda, not, or,
pass, print, raise, return, try, while, with, yield

Trying to define a variable using one of these will result in a syntax error.

width = 20
length = 30
area = length*width
area

depth = 10
volume = area*depth
volume


Python Data types and Data Structures


Following are some data type and structures, which are used in Python. You should be familiar with them in order to use them as appropriate.


Strings


Strings are lists of printable characters, and can be defined using either single quotes, double quotes or triple quotes. But not both at the same time, unless you want one of the symbols to be part of the string.

print 'Hello Quantum GIS'

print "Hello Quantum GIS"

print "Hello Quantum GIS, it's me Umar Yusuf"

Strings can simply be defined by use of single ( ‘ ), double ( ” ) or triple ( ”’ ) inverted commas.
Strings enclosed in tripe quotes ( ”’ ) can span over multiple lines and are used frequently in docstrings (Python’s way of documenting functions). \ is used as an escape character. Please note that Python strings are immutable, so you can not change part of strings.


Lists


Lists are one of the most versatile data structure in Python. A list can simply be defined by writing a list of comma separated values in square brackets. Lists might contain items of different types, but usually the items all have the same type. Python lists are mutable and individual elements of a list can be changed.
Here is a quick example to define a list and then access it:


You can access the elements of a list by calling the list name and the index number in square brackets. Note that in python, index counting starts from zero (0).


Tuples


A tuple is represented by a number of values separated by commas. Tuples are immutable and the output is surrounded by parentheses so that nested tuples are processed correctly. Additionally, even though tuples are immutable, they can hold mutable data if needed.
Since Tuples are immutable and can not change, they are faster in processing as compared to lists. Hence, if your list is unlikely to change, you should use tuples, instead of lists.


Just lists, can also access the elements of a tuple.


Dictionary


Dictionary is an unordered set of key: value pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: {}. 




Flow control statements


Flow control statements are "if", "for", and "while". There is no "switch" in pytghon; instead, use "if".
Iteration, Indentation, and Blocks: One of the most useful things you can do with lists is to iterate through them, i.e. to go through each element one at a time. To do this in Python, we use the "for" statement:

plots = ['plot1', 'plot2', 'plot3', 'plot4', 'plot5']

for p in plots:
    print p

Above is "for" loop (iteration): You can loop over the elements of a list like that. The space under "for" is the Indentation while the entire "for" statement is the Blocks. Below is another example:-

areas = [23.45, 20.09, 25.89, 24.76]
for a in areas:
    print a

If you want access to the index of each element within the body of a loop, use the built-in enumerate function:

areas = [23.45, 20.09, 25.89, 24.76]
for idx, area in enumerate(areas):
    print "Plot", '#%d: %s' % (idx + 1, area)
# Prints "Plot #1: 23.45", "Plot #2: 20.09", "Plot #3: 25.89", "Plot #4: 24.76" each on its own line


Functions


Functions are declared with the "def" keyword. Optional arguments are set in the function declaration after the mandatory arguments by being assigned a default value. For named arguments, the name of the argument is assigned a value. Functions can return a tuple (and using tuple unpacking you can effectively return multiple values). Lambda functions are ad hoc functions that are comprised of a single statement. Parameters are passed by reference, but immutable types (tuples, ints, strings, etc) *cannot be changed*. This is because only the memory location of the item is passed, and binding another object to a variable discards the old one, so immutable types are replaced. For example:

# defining an area function
def area(a, b):
    print a * b
    
area(30, 30)
area(15, 30)
area(50, 20)

# defining an area function using Lambda function
areaFunc = lambda a, b: a * b

print areaFunc(50, 100)
print areaFunc(100, 100)



Exceptions


Exceptions in Python are handled with try-except [ExceptionName] blocks. For exampl;-

    try:
        # Division by zero raises an exception
        10 / 0
    except ZeroDivisionError:
        print "Oops, invalid."
    else:
        # Exception didn't occur, we're good.
        pass
    finally:
        # This is executed after the code block is run
        # and all exceptions have been handled, even
        # if a new exception is raised while handling.
        print "We're done with that."


Python Object Oriented Programming (OOP)


Python supports OOP. You can read more about OOP in python here on this blog.



Python Libraries


I have already mention something about libraries above.
Lets take one step ahead in our journey to learn Python by getting acquainted with some useful libraries. The first step is obviously to learn to import them into our environment. There are several ways of doing so in Python as stated above, lets talk about two here: (import math as m) or (from math import *).

In the first manner, we have defined an alias m to library math. We can now use various functions from math library (e.g. factorial) by referencing it using the alias m.factorial().

In the second manner, you have imported the entire name space in math i.e. you can directly use factorial() without referring to math.

Tip: Google recommends that you use first style of importing libraries, as you will know where the functions have come from.

Following are a list of libraries, you will need for any scientific computations and Geospatial data analysis:

SciPy, NumPy, PySAL, shapely, Fiona, GeoPandas, Pandas, GDAL, enum, cligj, affine, pyQGIS, pyshp, pyproj, matplotlib, prettyplotlib, descartes, cartopy, Rasterio, scikit-learn, scikit-image, Statsmodels, Seaborn, Sympy, Bokeh, BeautifulSoup, Scrapy, Blaze, regular expressions (re), networkx and igraph

There are other additional libraries, you might need that are not listed above.

Breake down of the libraries


Let me be more specific and classified the libraries into Data Science and GeoData Science. Down below, you will find list of essential python geospatial libraries.

Data Science Libraries


NumPy stands for Numerical Python. The most powerful feature of NumPy is n-dimensional array. This library also contains basic linear algebra functions, Fourier transforms,  advanced random number capabilities and tools for integration with other low level languages like Fortran, C and C++

SciPy stands for Scientific Python. SciPy is built on NumPy. It is one of the most useful library for variety of high level science and engineering modules like discrete Fourier transform, Linear Algebra, Optimization and Sparse matrices.

Matplotlib for plotting vast variety of graphs, starting from histograms to line plots to heat plots.. You can use Pylab feature in ipython notebook (ipython notebook –pylab = inline) to use these plotting features inline. If you ignore the inline option, then pylab converts ipython environment to an environment, very similar to Matlab. You can also use Latex commands to add math to your plot.

Pandas for structured data operations and manipulations. It is extensively used for data munging and preparation. Pandas were added relatively recently to Python and have been instrumental in boosting Python’s usage in data scientist community.

Scikit Learn for machine learning. Built on NumPy, SciPy and matplotlib, this library contains a lot of effiecient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction.

Statsmodels for statistical modeling. Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.

Seaborn for statistical data visualization. Seaborn is a library for making attractive and informative statistical graphics in Python. It is based on matplotlib. Seaborn aims to make visualization a central part of exploring and understanding data.

Bokeh for creating interactive plots, dashboards and data applications on modern web-browsers. It empowers the user to generate elegant and concise graphics in the style of D3.js. Moreover, it has the capability of high-performance interactivity over very large or streaming datasets.

Blaze for extending the capability of Numpy and Pandas to distributed and streaming datasets. It can be used to access data from a multitude of sources including Bcolz, MongoDB, SQLAlchemy, Apache Spark, PyTables, etc. Together with Bokeh, Blaze can act as a very powerful tool for creating effective visualizations and dashboards on huge chunks of data.

Scrapy for web crawling. It is a very useful framework for getting specific patterns of data. It has the capability to start at a website home url and then dig through web-pages within the website to gather information.

SymPy for symbolic computation. It has wide-ranging capabilities from basic symbolic arithmetic to calculus, algebra, discrete mathematics and quantum physics. Another useful feature is the capability of formatting the result of the computations as LaTeX code.

Requests for accessing the web. It works similar to the the standard python library urllib2 but is much easier to code. You will find subtle differences with urllib2 but for beginners, Requests might be more convenient.


GeoData Science Libraries


PySAL: PySAL is an open source library of spatial analysis functions written in Python intended to support the development of high level applications. PySAL is built upon the Python scientific stack including numpy and scipy.

Shapely: Shapely is a BSD-licensed Python package for manipulation and analysis of planar geometric objects. It is based on the widely deployed GEOS (the engine of PostGIS) and JTS (from which GEOS is ported) libraries. Shapely is not concerned with data formats or coordinate systems, but can be readily integrated with packages that are.

Fiona: Fiona is OGR’s neat, nimble, no-nonsense API for Python programmers. Fiona is designed to be simple and dependable. It focuses on reading and writing data in standard Python IO style and relies upon familiar Python types and protocols such as files, dictionaries, mappings, and iterators instead of classes specific to OGR. Fiona can read and write real-world data using multi-layered GIS formats and zipped virtual file systems and integrates readily with other Python GIS packages such as pyproj, Rtree, and Shapely. Fiona provides a minimal, uncomplicated Python interface to the open source GIS community’s most trusted geodata access library and integrates readily with other Python GIS packages such as pyproj, Rtree and Shapely.

GeoPandas: GeoPandas is a project to add support for geographic data to pandas objects. The goal of GeoPandas is to make working with geospatial data in python easier. GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. Geopandas further depends on fiona for file access and descartes and matplotlib for plotting.

GDAL/OGR: GDAL: Geospatial Data Abstraction Library. This Python package and extensions are a number of tools for programming and manipulating the GDAL_ Geospatial Data Abstraction Library. Actually, it is
two libraries -- GDAL for manipulating geospatial raster data and OGR for manipulating geospatial vector data.

Cligj: Cligj is a small library which can be used to standardise processing of geoJSON in Python command line programs. cligj is for Python developers who create command line interfaces for geospatial data. cligj allows you to quickly build consistent, well-tested and interoperable CLIs for handling GeoJSON.

PyQGIS: PyQGIS is a blending of Python and Quantum GIS to extend and enhance your open source GIS toolbox. With PyQGIS you can write scripts and plugins to implement new features and perform automated tasks.

Pyshp: PyShp provides read and write support for the Esri Shapefile format. The Shapefile format is a popular Geographic Information System vector data format created by Esri.

Pyproj: pyproj is python interface to PROJ4 library for cartographic transformations. The Proj class can convert from geographic (longitude,latitude) to native map projection (x,y) coordinates and vice versa, or from one map projection coordinate system directly to another.

Rasterio: Rasterio employs GDAL to read and writes files using GeoTIFF and many other formats. Its API uses familiar Python and SciPy interfaces and idioms like context managers, iterators, and ndarrays. Fast and direct raster I/O for Python programmers who use Numpy. Rasterio is a GDAL and Numpy-based Python library designed to make your work with geospatial raster data more productive, more fun — more Zen. It’s a new open source project from the satellite team at Mapbox.

Cartopy: Cartopy is a cartographic python library with matplotlib support.

Geographiclib: For solving geodesic problems. Geodesic class to Python.

GeoDjango: GeoDjango is a Django application that is now included in the Django trunk with a lot of excellent stuff for developing GIS web application. GeoDjango installation is based on Python, Django and two kinds of components: a Spatial Database and Geospatial libraries.

Simplekml: The python package simplekml was created to generate kml (or kmz). It was designed to alleviate the burden of having to study KML in order to achieve anything worthwhile with it. If you have a simple understanding of the structure of KML, then simplekml is easy to run with and create usable KML.

Kartograph: Kartograph is a simple and lightweight framework for building interactive map applications without Google Maps or any other mapping service. It was created with the needs of designers and data journalists in mind.


You can get more python geospatial libraries at PyPi: GIS


Conclusion


Now that we are familiar with Python fundamentals and additional libraries, lets take a deep dive into problem solving through Python. In the process, we make use of some powerful libraries and also come across the next level of data structures.

Note: This is never a complete tutorial about the Python Programming. You can find more thorough and complete tutorials on the python official website or check here for over 70+ free tutorials

Thank you for reading.

1 comment: