Wednesday, October 9, 2019

Calculating the total size of zip code maps


On this US printable zip codes maps page, there is a list of all the US states zip code maps with their respective sizes in braces like this "Alabama ZIP Code Map (3.59MB)" as seen below...



Lets calculate the total size of all the maps using python scripting!

Off course, there are several or even better ways to get this done. But here we want to test our python skills on this, let us stick to using python 😏.

Some other reasons it is good idea we use python is that we can easily use our python skill to:-
1) make HTTP request to scrape/download the map data
2) generate the download links on the fly
3) create a bot to monitor change in map size (which could indicate the map has been updated).
4) visualization of the string including map/geographic visualization.

The list can go on and on, but I will keep it simple here to just calculate the total sum the map sizes.

Step 1:
First thing is to get the string/text off the web page into our python environment. There are several ways to do this as I have mentioned above, but I will just select, copy and paste it in a CSV file as seen below.



Step 2:
Read the CSV file in python. Here I will use the pandas module to read the CSV file, could have also used the CSV module to do this.



Monday, October 7, 2019

Filtering Missing Zip codes out of master Zip codes list

Here I have a list of zip codes, I want to know the missing zip code from the given list (these are the postal code in Texas, USA).




List 'available_zipcodes' contains the master zip codes and list 'given_zipcodes' contain the provided or working zip codes. Now I want check and filter out those zipcode that are NOT in the master zip codes.

These three lines of python code below will do it. It uses the 'for' loop with and 'if' statement. Basically, we loop through the list of 'given_zipcodes' and if it is not in the 'available_zipcodes', then we print it out.




If you care to run the script and don't want to type all that out, here below is the Code is...

available_zipcodes = [77389, 77086, 77346, 77018, 77040, 77388, 77065, 77080, 77041, 77396, 77385, 77354, 77382, 77067, 77066, 77090, 77345, 77355, 77373, 77339, 77043, 77302, 77304, 77070, 77375, 77095, 77433, 77069, 77038, 77091, 77380, 77092, 77316, 77429, 77377, 77379, 77064, 77088, 77338, 77449, 77386, 77381, 77493, 77356, 77068, 77014, 77084, 77055, 77301, 77303, 77384]

given_zipcodes = [77325, 77339, 77345, 77346, 77380, 77381, 77382, 77383, 77384, 77385, 77386, 77301, 77302, 77303, 77304, 77316, 77354, 77356, 77389, 77014, 77018, 77038, 77040, 77041, 77043, 77055, 77064, 77065, 77066, 77067, 77068, 77069, 77070, 77080, 77084, 77086, 77088, 77090, 77091, 77092, 77095, 77375, 77377, 77379, 77388, 77429, 77433, 77449, 77493, 77373, 77338, 77347, 77391, 77396, 77355]


for zipcode in given_zipcodes:
    if zipcode not in available_zipcodes:
        print(zipcode)

In the case above, the missing zip codes are: 77325, 77383, 77347, 77391

Note: In a production job. these zip codes will probably come in a text file, just read the file into python lists and loop through as seen above.

That is it!

Tuesday, September 24, 2019

Limitations of a Shapefile

For along time, shapefile has being my primary GIS file for working with vector data. I have never had any reason to look beyond shapefile for handling my GIS vector datasets not until recently when I have a need to store some large quantity of text string in the attribute table.

Before I share my story, let make a point to what a shapefile is just in case you don't know it.

Shapefile is a file type developed by ESRI to handle vector map data in the form of points, polylines and polygons. More details can be found on the Wikipedia page as summarized in the picture below, also on the 'Shapefile Technical Description' document.



Limitations of a Shapefile
Specifically, I was trying to convert a KML file to shapefile. Then one of the columns that had alot of text/string content gets truncated when converted to shapefile. I couldn't figure out why and what caused that until I found this website (Switch from Shapefile) that listed listed some its limitations and that one that affected my situation directly was that the maximum characters is 254.


No way! My attribute table has way more than 254 characters. Then I had to look beyond a shapefile. I actually settled with a GeoJSON file type.

Once again as listed on the website Switch from Shapefile, other limitations include:-
~ No coordinate reference system definition.
~ It's a multifile format.
~ Attribute names are limited to 10 characters.
~ Only 255 attributes. The DBF file does not allow you to store more then 255 attribute fields.
~ Limited data types. Data types are limited to float, integer, date and text with a maximum 254 characters.
~ Unknown character set. There is no way to specify the character set used in the database.
~ It's limited to 2GB of file size. Although some tools are able to surpass this limit, they can never exceed 4GB of data.
~ No topology in the data. There is no way to describe topological relations in the format.
~ Single geometry type per file. There is no way to save mixed geometry features.
~ More complicated data structures are impossible to save. It's a "flat table" format.
~ There is no way to store 3D data with textures or appearances such as material definitions. There is also no way to store solids or parametric objects.
~ Projections definition. They are incompatible or missing.
~ Line and polygon geometry type, single or multipart, cannot be reliably determined at the layer level, it must be determined at the individual feature level.


Now you know some troubles you may encounter with you shapefile data are due to some of these limitations, so no need to full your hair just switch to a more advanced GIS file type.

Thursday, September 19, 2019

QGIS Calculate the Mid Coordinates of Polygons

In QGIS field calculator, you can calculate the center point of all polygons within a polygon layer.

Formula 1:
x($geometry), y($geometry)

Formula 2:
xmin(centroid($geometry)), ymin(centroid($geometry))

Formula 3:
x(centroid($geometry)), y(centroid($geometry))


Note that: 'x' standards for Longitude while 'y' standards for Latitude. $geometry represent the variable polygon geometry.


As you can see the preview result for the three formulas are the same.

Sunday, September 1, 2019

Map from GIS to CAD

Introduction

No doubt, on the desktop ESRI ArcGIS is the top GIS software while AutoDesk AutoCAD is the top CAD software.

Both are capable of making maps and in this article, I will demo how to convert existing map in ArcGIS to AutoCAD. But before I go into that, lets get to know what GIS and CAD mean.



What is GIS and CAD?

GIS = Geographic Information System
CAD = Computer Aided Design



What is the Difference between GIS and CAD?

Both GIS and CAD can be used for making maps however, they are very different technologies with different applications.

GIS: analyzing/visualizing map data
CAD: creating/editing accurate map data

GIS allows data to be attached to the points, lines, and polygons used in the map. This makes GIS the best tool for analyzing and visualizing data through the use of a map.

CAD easily allows a user to create a very accurate drawing whether it is a map, site plan, profile etc. CAD allows the drawing of maps by the use of coordinates or through distances/bearings in different types of unit.


Map displayed in ArcGIS



Map displayed in AutoCAD




How to converting map data from GIS to CAD and vice versa

GIS to CAD:
In ArcGIS, you use the command at: ArcToolBox >> ConversionTools >> To CAD to concert map layer to CAD.





CAD to GIS:
In AutoCAD you simply save the map as .dxf or .dwg file to have it usable in GIS.




That is it!

Wednesday, August 28, 2019

QGIS Remove Black Background Boarder from Raster Image


Often times, you are left with black boarder around an image you manipulated in QGIS as seen below. This is often cause because there is no data to display around data part of the image.



Here is how to get ride of the black background in QGIS 3.

Open the raster image layer property window and select the 'Transparency' tab. Then enter '0' under No data value >> Additional no data value.



Click 'Ok' to apply the changes. Your raster image should now have no black background color surrounding it as seen below.




That is it.

Monday, August 26, 2019

Get the row count of multiple excel spreadsheet files

Here I have many excel spreadsheet files within a folder as seen below...


The task is to return the number of rows in each of the excel files. I can go manually, open each file, scroll to the bottom and note down the row number. That will be cumbersome and time consuming given that number of files I have to cover.

So, I have to write a simple script in python that will handle this boring task accordingly as follow:-

Step 1: First things first, lets find a way to read all the .xlsx files. Here I used the glob module to handle this.

import glob

folder_xlsx = r"C:\Users\Yusuf_08039508010\Desktop\my-xlsx-folder"

# read all the individual order xlsx files
xlsx_files = glob.glob(folder_xlsx + '/*.xlsx')
what I have above is a list that contains path to all the excel files in the folder. Lets move on...


Step 2: Next step is to read each excel file into a pandas dataframe and use a function to count the number of rows in the dataframes. There are many functions to count the number of rows as seen below, but I will use this function 'len(df.index)'.


Here is the solution for the fisrt dataframe.

df = pd.read_excel(xlsx_files[0])

row_count = len(df.index)

To do for the whole excel files, we just write a for loop and save the into a list as seen below. Noticed that I used rsplit() function to get the file names to print it along its corresponding row count.

import pandas as pd
row_count_list = []
for xls_file in xlsx_files:
    df = pd.read_excel(xls_file)
    row_count = len(df.index)
    
    file_name = xls_file.rsplit('\\', 1)[1]
    
    file_details = file_name, row_count
    
    row_count_list.append(file_details)
    
print (row_count_list)



That is it!


P.S: You could easily extend the script above to do many other thing with the files. An example will be to merge all the files into one file using the pandas concat() method. So, instead of appending the file names and the row counts, we will simply append the dataframe as seen below.

df_list = []
for xls_file in xlsx_files:
    df = pd.read_excel(xls_file)
    
    df_list.append(df)
    
merge_df = pd.concat(df_list)

Thursday, August 8, 2019

Split string at the last occurrence of a string


I have a list of strings with varying length. However, the each string always end with certain same information (country in this case) as seen below.


data_list = ['Adams Smith, white, UK', 
             'Samuel Tom, Black, 29 leen st. NY, USA', 
             'Yaks Ramson, New Student, Yet to register, Romania']
    

As you can see, there are three items in the list and each item ends with a country name after a comma (,) sign.

When you loop through the items, you can split each item by comma like this: item.split(','). However, this isn't what I wanted, I want to split just at the last comma. In other words, I want to plit each of the string at the last occurrence of the comma (,) sign.

So, here the solution is to use a list method call rsplit(',', 1), which accept a second argument that tells how many times you want to split a string. Here I want to split the string just once, so my script will look like this...

data_list = ['Adams Smith, white, UK', 
             'Samuel Tom, Black, 29 leen st. NY, USA', 
             'Yaks Ramson, New Student, Yet to register, Romania']

item_list = []
for item in data_list:
    item_1 = item.rsplit(',', 1), # Not item.split(',')
    
    item_list.append(item_1)

Now, each item is split into two and you can access the individual countries as seen below:-


Sunday, July 28, 2019

Ways to create and add SVG maps to a web page

SVG stands for 'Scalable Vector Graphics' which defines vector-based graphics in XML format. It is just another format for displaying images on the web and every element and every attribute of its can be animated. It have greater advantages over other image file types such as PNG, JPG, GIF, BMP, etc.

Some of its notable advantages are:-
~ It is scalable. That is it doesn't loss quality when stretched or compressed.
~ It has interactive ability with CSS and JS
~ It can be created and edited with any text editor
~ It can be searched, indexed and scripted
~ It can be printed with high quality at any resolution

Ways to create SVG image maps

SVG maps can be created with either of the two ways namely; text editor or drawing program. The examples of each is given below:-
1) Text editor (Code): any text editor such as notepad++, sublime, atom, etc can be used.
2) Drawing tools/program: Inkscape, Adobe Illustrator, etc. While most of these tools can save SVG files directly, it is worth noting that there are some such as ArcGIS and QGIS that edit maps in other formats such as shapefile then other online such as Mapshaper,  Mapstarter, Geoconverter, etc are used to convert the shapefile to SVG file.

Creating SVGs with Code allows you to understand the different svg elements and the attributes that make up the file. This is very import when you want to manipulate and interact with the SVG using CSS and JS scripting.

If your SVG is a complex one such as a state map administrative boundaries, you are better off creating the SVGs with drawing tools.

Ways to add SVG image maps into HTML web page

1) Using inline SVG tag (<svg>...</svg>)

2) Using image tag (<img src='filename.svg' >)

3) Using CSS background-image property
body{
background: url(filename.svg);
}

4) Using HTML object, iframe or embed tags

Using inline SVG tag to add SVG images are the most powerful and flexible, as it allows certain CSS and JS operations with SVG that other ways don't allow. This method also helps with very fast loading speed of the web page.

On the other hand, a major draw back for using the inline SVG tag is that it is very poor in "search engine indexing".


That is it, hope it was useful.
Thank you for reading.

Saturday, July 6, 2019

Toggle cell line numbers in Jupyter notebook

When you are switching from a text editor to jupyter notebook environment for writing python code, you will definitely wish to see line numbers on jupyter notebook cells. And one obvious reason is that it makes it easier to trace errors as seen below.



In the script pictured above, there is an 'ElementNotVisibleException' error on line 86. If the cell line number was off, it will be very difficult to count and locate the line where the error occurred. But will line number enabled, we just scroll to the line number as seen below.



How to enable cell line numbers in Jupyter notebook

To toggle cell line numbers in Jupyter notebook, you can use two keyboard shortcuts in "Command Mode" as follow:-
1) L - to toggle line numbers
2) Shift-L - to toggles line numbers in all cells, and persist the setting



Note that the Jupyter Notebook has two different keyboard input modes.
Edit mode allows you to type code or text into a cell and is indicated by a green cell border.



Command mode binds the keyboard to notebook level commands and is indicated by a grey cell border with a blue left margin.




That is it!