Saturday, July 31, 2021

Convert bearings in "Decimal Degree" to "Degree Minute Second" in QGIS

  Here I got attribute column of bearings for some lines in "Decimal Degree" and I want to convert them to "Degree Minute Second" with each unit in a separate attribute column.


The conversion factor is as follow:-

  • 1° = 60'
  • 1' = 60"
  • 1° = 3600"

Note that the bearings where generated using the azimuth expression listed below:-

degrees( azimuth( start_point($geometry), end_point($geometry) ) )


The conversion

As an example, lets convert this 104.580420 (which is in Decimal Degree) to Degree Minute Second.

  • Degree = INT(104.580420) = 104°
  • Minute = INT((104.580420 - 104) * 60) = 34'
  • Seconds = (34.82520000000022 - 34) * 60 = 49.512"

Therefore, the result of 104.580420 is: 104° 34' 49.512"


brg = 104.580420

try:
    # Making sure the brg is a float number...
    brg = float(brg)
    
    # Degree...
    brg_degree = int(brg)

    # Minute
    brg_minute = int((brg - brg_degree) * 60)

    # Second
    brg_second = round( (((brg - brg_degree) * 60) - brg_minute) * 60, 2)

    # Display or return the result...
    print(str(brg_degree) +'° '+ str(brg_minute) +"' "+ str(brg_second) +'" ')
    
except ValueError:
    
    print('Error: Invalid Bearing')



The code above can now be added as a function expression in QGIS field calculator to create a new column like this...



from qgis.core import *
from qgis.gui import *

@qgsfunction(args='auto', group='Custom')
def bearing_func(brg, feature, parent):
    try:
        # Making sure the brg is a float number...
        brg = float(brg)
        
        # Degree...
        brg_degree = int(brg)

        # Minute
        brg_minute = int((brg - brg_degree) * 60)

        # Second
        brg_second = round( (((brg - brg_degree) * 60) - brg_minute) * 60, 2)

        # Display or return the result...
        return str(brg_degree) +'° '+ str(brg_minute) +"' "+ str(brg_second) +'"'
        
    except ValueError:
        
        return 'Error: Invalid Bearing'

Note that the output field type is set to 'Text (String)' because of the symbols used in the result.


That is it!

Wednesday, July 28, 2021

Heatmap with python Folium

 To create a heatmap, we need dataset in the form of XYZ. Where X and Y are the usual long/lat locations and Z can be any value/parameter you want the heatmap to represent (in this can our Z is temperature values).

Here I got this dataset as seen below:-


First we have to read the data and prepare it into a format that the folium heatmap plugin wanted.

The expected format is a list of list... that is: [ [x1, y1, z1], [x2, y2, z2], [x3, y3, z3] ... ]. This is easy to prepare with the pandas module.

import pandas as pd

import folium
from folium import plugins
from folium.plugins import HeatMap


# Load the data in df...
heatmap_df = pd.read_excel(r"Folium Heatmap Data.xlsx")
# heatmap_df.head()

# Convert the df individual rows into list 
df_list = heatmap_df.values.tolist()
df_list

Now, that the data is ready we need to plot the heatmap as follow;-

lon, lat = -105.028306, 39.725343

m = folium.Map([lat, lon], zoom_start=10)

HeatMap(df_list, min_opacity=0.1).add_to(folium.FeatureGroup(name='Heat Map').add_to(m))
folium.LayerControl().add_to(m)

display(m)


The example below use random arrays generated from numpy library and the tile basemap is set to 'stamentoner' as seen below.

import numpy as np
import folium
from folium import plugins
from folium.plugins import HeatMap

lon, lat = -86.276, 30.935 
zoom_start = 5


data = (
    np.random.normal(size=(100, 3)) *
    np.array([[1, 1, 1]]) +
    np.array([[48, 5, 1]])
).tolist()


m = folium.Map([48, 5], tiles='stamentoner', zoom_start=6)

HeatMap(data, min_opacity=0.1).add_to(folium.FeatureGroup(name='Heat Map').add_to(m))
folium.LayerControl().add_to(m)

display(m)


That is it!


Reference resources

  • https://github.com/python-visualization/folium
  • https://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/TilesExample.ipynb
  • https://leaflet-extras.github.io/leaflet-providers/preview/
  • https://nbviewer.jupyter.org/github/python-visualization/folium_contrib/tree/master/notebooks/

Friday, July 23, 2021

Parsing XML data in python

 Extensible Markup Language (XML) is one of the data exchange formats. In the post, we will work through using python's 'xml.etree.ElementTree' module to parse its data content.

An XML data can be available in one of the three forms, namely:-

  • XML file
  • XML URL
  • XML String

Depending on how your XML in made available to you, there is slightly different approaches of reading it use the 'xml.etree.ElementTree' module. Lets see how below.

The sample XML I will use is the CD Catalog by W3Schools. I have copied the data into the three formats listed above as seen below;-


# XML file...
xml_file = r"C:\Users\Yusuf_08039508010\Documents\w3s_sample.xml"

# XML URL...
xml_url = 'https://www.w3schools.com/xml/cd_catalog.xml'

# XML String...
xml_string = '''<?xml version="1.0" encoding="UTF-8"?>
<CATALOG>
  <CD>
    <TITLE>Empire Burlesque</TITLE>
    <ARTIST>Bob Dylan</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Columbia</COMPANY>
    <PRICE>10.90</PRICE>
    <YEAR>1985</YEAR>
  </CD>
  <CD>
    <TITLE>Hide your heart</TITLE>
    <ARTIST>Bonnie Tyler</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>CBS Records</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1988</YEAR>
  </CD>
  <CD>
    <TITLE>Greatest Hits</TITLE>
    <ARTIST>Dolly Parton</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>RCA</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1982</YEAR>
  </CD>
  <CD>
    <TITLE>Still got the blues</TITLE>
    <ARTIST>Gary Moore</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>Virgin records</COMPANY>
    <PRICE>10.20</PRICE>
    <YEAR>1990</YEAR>
  </CD>
  <CD>
    <TITLE>Eros</TITLE>
    <ARTIST>Eros Ramazzotti</ARTIST>
    <COUNTRY>EU</COUNTRY>
    <COMPANY>BMG</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1997</YEAR>
  </CD>
  <CD>
    <TITLE>One night only</TITLE>
    <ARTIST>Bee Gees</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>Polydor</COMPANY>
    <PRICE>10.90</PRICE>
    <YEAR>1998</YEAR>
  </CD>
  <CD>
    <TITLE>Sylvias Mother</TITLE>
    <ARTIST>Dr.Hook</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>CBS</COMPANY>
    <PRICE>8.10</PRICE>
    <YEAR>1973</YEAR>
  </CD>
  <CD>
    <TITLE>Maggie May</TITLE>
    <ARTIST>Rod Stewart</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>Pickwick</COMPANY>
    <PRICE>8.50</PRICE>
    <YEAR>1990</YEAR>
  </CD>
  <CD>
    <TITLE>Romanza</TITLE>
    <ARTIST>Andrea Bocelli</ARTIST>
    <COUNTRY>EU</COUNTRY>
    <COMPANY>Polydor</COMPANY>
    <PRICE>10.80</PRICE>
    <YEAR>1996</YEAR>
  </CD>
  <CD>
    <TITLE>When a man loves a woman</TITLE>
    <ARTIST>Percy Sledge</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Atlantic</COMPANY>
    <PRICE>8.70</PRICE>
    <YEAR>1987</YEAR>
  </CD>
  <CD>
    <TITLE>Black angel</TITLE>
    <ARTIST>Savage Rose</ARTIST>
    <COUNTRY>EU</COUNTRY>
    <COMPANY>Mega</COMPANY>
    <PRICE>10.90</PRICE>
    <YEAR>1995</YEAR>
  </CD>
  <CD>
    <TITLE>1999 Grammy Nominees</TITLE>
    <ARTIST>Many</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Grammy</COMPANY>
    <PRICE>10.20</PRICE>
    <YEAR>1999</YEAR>
  </CD>
  <CD>
    <TITLE>For the good times</TITLE>
    <ARTIST>Kenny Rogers</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>Mucik Master</COMPANY>
    <PRICE>8.70</PRICE>
    <YEAR>1995</YEAR>
  </CD>
  <CD>
    <TITLE>Big Willie style</TITLE>
    <ARTIST>Will Smith</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Columbia</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1997</YEAR>
  </CD>
  <CD>
    <TITLE>Tupelo Honey</TITLE>
    <ARTIST>Van Morrison</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>Polydor</COMPANY>
    <PRICE>8.20</PRICE>
    <YEAR>1971</YEAR>
  </CD>
  <CD>
    <TITLE>Soulsville</TITLE>
    <ARTIST>Jorn Hoel</ARTIST>
    <COUNTRY>Norway</COUNTRY>
    <COMPANY>WEA</COMPANY>
    <PRICE>7.90</PRICE>
    <YEAR>1996</YEAR>
  </CD>
  <CD>
    <TITLE>The very best of</TITLE>
    <ARTIST>Cat Stevens</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>Island</COMPANY>
    <PRICE>8.90</PRICE>
    <YEAR>1990</YEAR>
  </CD>
  <CD>
    <TITLE>Stop</TITLE>
    <ARTIST>Sam Brown</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>A and M</COMPANY>
    <PRICE>8.90</PRICE>
    <YEAR>1988</YEAR>
  </CD>
  <CD>
    <TITLE>Bridge of Spies</TITLE>
    <ARTIST>T'Pau</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>Siren</COMPANY>
    <PRICE>7.90</PRICE>
    <YEAR>1987</YEAR>
  </CD>
  <CD>
    <TITLE>Private Dancer</TITLE>
    <ARTIST>Tina Turner</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>Capitol</COMPANY>
    <PRICE>8.90</PRICE>
    <YEAR>1983</YEAR>
  </CD>
  <CD>
    <TITLE>Midt om natten</TITLE>
    <ARTIST>Kim Larsen</ARTIST>
    <COUNTRY>EU</COUNTRY>
    <COMPANY>Medley</COMPANY>
    <PRICE>7.80</PRICE>
    <YEAR>1983</YEAR>
  </CD>
  <CD>
    <TITLE>Pavarotti Gala Concert</TITLE>
    <ARTIST>Luciano Pavarotti</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>DECCA</COMPANY>
    <PRICE>9.90</PRICE>
    <YEAR>1991</YEAR>
  </CD>
  <CD>
    <TITLE>The dock of the bay</TITLE>
    <ARTIST>Otis Redding</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>Stax Records</COMPANY>
    <PRICE>7.90</PRICE>
    <YEAR>1968</YEAR>
  </CD>
  <CD>
    <TITLE>Picture book</TITLE>
    <ARTIST>Simply Red</ARTIST>
    <COUNTRY>EU</COUNTRY>
    <COMPANY>Elektra</COMPANY>
    <PRICE>7.20</PRICE>
    <YEAR>1985</YEAR>
  </CD>
  <CD>
    <TITLE>Red</TITLE>
    <ARTIST>The Communards</ARTIST>
    <COUNTRY>UK</COUNTRY>
    <COMPANY>London</COMPANY>
    <PRICE>7.80</PRICE>
    <YEAR>1987</YEAR>
  </CD>
  <CD>
    <TITLE>Unchain my heart</TITLE>
    <ARTIST>Joe Cocker</ARTIST>
    <COUNTRY>USA</COUNTRY>
    <COMPANY>EMI</COMPANY>
    <PRICE>8.20</PRICE>
    <YEAR>1987</YEAR>
  </CD>
</CATALOG>
'''

Tuesday, July 20, 2021

Reversing the order (rows and columns) of dataframe in R and Python

 Lets see how to reverse the order of dataframe rows and columns in both R and Python scripting.


Reversing the order of rows and columns in Python

import pandas as pd

# Define the DataFrame...
df = pd.DataFrame({'Col1':['Africa', 'Asia', 'Europe', 'North America', 'South America'], 
                     'Col2':[1000, 3000, 6000, 9000, 12000]})

df


# Reversing the order of ROWS
df[::-1].reset_index(drop=True)

# Or

df.iloc[::-1]


# Reversing the order of COLUMNS
df[df.columns[::-1]]

# Or

df.iloc[:, ::-1]



Reversing the order of  rows and columns in R

# Define the DataFrame...
df <- data.frame('Col1'=c('Africa', 'Asia', 'Europe', 'North America', 'South America'),
                'Col2'=c(1000, 3000, 6000, 9000, 12000))


# Reversing the order of ROWS
rev_dfr <- apply(df, 2, rev)

# Reversing the order of COLUMNS
rev_dfc <- rev(df)


That is it!

Saturday, July 17, 2021

Conditional coloring of pandas dataframe cells

In this post, I will go over some pandas function that allow the styling of datafame table in jupyter notebook.

We will learn to style our dataframe to look beautiful which is useful for presentation and highlighting important cells. Our styled dataframe can be exported to excel (.xlsx) spreadsheet without loosing its style decoration.

Knowledge of HTML/CSS table is highly recommended as most of the properties syntaxes are applicable here.

I will make use of the CSV table that contains names of Nigerian Distinguished Senators as seen below. Feel free to make use of your own table of choice, the concept is what matters not the table data per se.
df_table = pd.read_csv(r"C:\Users\Yusuf_08039508010\Desktop\Distinguished Senators.csv")

df_table



Let see how we can change the default table above to our liking...

Styling the entire table using: set_table_styles() function

Using the function "set_table_styles()", we can style the entire table (i.e: header, columns, rows, border etc.) by applying CSS styling tags.

# Template...
# { '...':'...', '...':[(...), (...), (...)] }
# { 'selector':'th/td/tr:nth-of-type(odd/even)', 'props':[('pro1', 'value'), ('pro2', 'value'), ('pro3', 'value')] }


# Styling the entire table...
df_table.style.set_table_styles([
# Style table head...
    {'selector':'th',
    'props':[('background', '#123000'), ('color', 'white'), ('font-size', '15px')]
    },
    
# Style table row/column... border option: solid, dashed, dotted
    { 'selector':'td', 'props':[('border', '2px dotted green'), ('', '')] },
    
    # Style nth odd/even rows....
    { 'selector': 'tr:nth-of-type(odd)', 
     'props':[('background', 'white'), ('border', '2px dashed #D0D0D0')] },

])


The code above output this table styled below...


We can add hover style like so...
... {'selector': 'td:hover',
     'props': [('background-color', 'gray'), ('font-size', '2em')] },
    
    {'selector': 'tr:hover',
     'props': [('background-color', 'yellow')] }, ...





set_properties() function

df_table.style.set_properties(**{
    'background-color':'cyan',
    'color':'brown',
    'border':'2px solid black',
})
The set_properties() function could also be used to style the table as seen.


With set_properties() option, you can export/save the styled dataframe to .xlsx excel file.

Saturday, July 10, 2021

Python web map with Flask/Django and Folium

 By the end of this tutorial, we will build a web app in both Flask and Django that will display at least four maps with different tile styles as seen below.


These map styles are the default styles available in the Folium package. There names are:-
  • OpenStreetMap
  • Stamen Terrain
  • Stamen Watercolor
  • Stamen Toner

You can find more tiles available in leafletjs on this web page address.




Main logic
The primary logic behind this maps is the use of folium package to generate web based leafletjs map like this:-

start_coords = (9.069, 7.472)
folium_map = folium.Map(location=start_coords, 
    zoom_start=13,
    tiles="OpenStreetMap",
    # attr='CartoDB attribution'
    max_zoom=16, 
    min_zoom=10, 
    )

folium_map.save('map.html')



Flask App

After you setup your flask app, the routes that power the maps should look like this:-

@app.route('/map1')
def index_1():
    start_coords = (9.069, 7.472)
    folium_map = folium.Map(location=start_coords, 
        zoom_start=13,
        tiles="OpenStreetMap",
        # attr='CartoDB attribution'
        max_zoom=16, 
        min_zoom=10, 
        )

    folium.Marker(
            location = start_coords,
            popup = '<h1>FATIMAH</h1>',
            tooltip = 'HEHEHE,,,,'
        ).add_to(folium_map)

    return folium_map._repr_html_()



Django Project

In the case of django, our app/views.py and html template will look like below:-

app/views.py

def map1(request):
	start_coords = (9.069, 7.472)
	folium_map = folium.Map(location=start_coords, 
        zoom_start=13,
        tiles="OpenStreetMap",
        # attr='CartoDB attribution'
        max_zoom=16, 
        min_zoom=10, 
        )

	mymap = folium_map._repr_html_() # HTML representation of the map

	context = {
		'map': mymap,
		'map_title': 'Map 1 - OpenStreetMap'
	}

	# return HttpResponse('<h1>This is map page</h1>')
	return render(request, 'map1.html', context)

html template

<div class="container">
    <div>
    	<!-- <p>This the map canvas... </p> -->
    	{{map | safe}}
    </div>


</div>

Note the part {{map | safe}} in the html, that is where the magic takes place. It displays the context folium map variable from the views.py using the safe flag.





Source Code

Download the complete code from the github page.


That is it!

Monday, July 5, 2021

Separating a String column into individual columns/cells

 In this post, I will be separating a 'String' or 'List' column into individual columns/cells.


The problem

I got a table where each cell is a bulky string separated by newline character '\n' as seen below. 


It is required that each separate be on its own cell/column.


The Solution

  • Step 1: Read the table into dataframe
  • Step 2: Convert the string column to 'list' where '\n' serves as the separator
  • Step 3: Convert the list element into independent columns/cells
# Step 1: Read the table into dataframe
df = pd.read_excel('Sep_Str_List.xlsx')

# Step 2: Convert the string column to 'list' where '\n' serves as the separator
# Since the string is separated by '\n', let split on it to convert to list
df['List_Col'] = df['Str_Col'].apply(lambda x: x.split('\n'))

# Step 3: Convert the list element into independent columns/cells
# Separating a 'List' column into individual columns/cells
df_individual_col = df['List_Col'].apply(pd.Series)

# Concatenate df to df_individual_col
df_final = pd.concat([df, df_individual_col], axis=1)
df_final



That is it!

Thursday, July 1, 2021

Mapping Poor And Vulnerable Nigerians by state

In this post, I will visualize the table below on a map. The types of map we will produce will include Choropleth map, Proportional Symbol map etc.


The table is on Nigeria Poor & Vulnerable Households provided by "The National Social Register of Poor & Vulnerable Households (PVHHs)" as of 31st March, 2020.
Note that some few states (Ebonyi and Ogun states in this case) are missing from the record.

Preparing the Choropleth maps

I will use QGIS software to map this data by following these steps listed below:-
  1. Convert the table above to a CSV file and load it into QGIS
  2. Get Shapefile map of Nigeria states from GADM or Diva-GIS and load it into QGIS
  3. Join the two layers above based on the state name column
  4. Make sure the joined attribute columns (Households and Individuals) are numeric/integer columns. You can use the 'Refactor fields' algorithm to edit the structure of the attributes table.
  5. Next use the fields (Households and Individuals) to symbolize the map using graduated symbols.

Poor & Vulnerable Households



Poor & Vulnerable Individuals


Obviously, for this kind of map, you will edit the key/legend number range to the nearest thousands. I decided to leave the default since this is just for demo purpose.

Tuesday, June 29, 2021

GIS Programming with Python and QGIS - Part 3

GIS Programming with Python and QGIS - Part 3 >> The PyQGIS module

If you missed Part 1 and Part 2 read them before you continue reading...

In this part I will cover the following topics:-
1~ Introduction to PyQGIS
2~ Basic GIS Operations using pyqgis

Let's get started...


1~ Introduction to PyQGIS
PyQGIS is the official python module for the QGIS software. It allows you to perform spatial operations and automate spatial processes in the QGIS software.

The official python documentation is available on this website url (the C++ documentation is available here. Some notable resources for learning to use the module includes:-
i) PyQGIS developer cookbook - part of QGIS Documentation
ii) PyQGIS Programmer's Guide by Gary Sherman
iii) PyQGIS 101 by Anita Graser
iv) Open Source Option YouTube Channel on PyQGIS

The resources above are good enough to get you started.



2~ Basic GIS Operations using PyQGIS
Lets take a look at some common operations we usually do in QGIS software and see how we can do the same in PyQGIS.

i) Loading and rendering shapefile in PyQGIS
The two important methods needed to load a vector layer onto the map canvas are: QgsVectorLayer(vector_file, 'DISPLAYNAME', 'ogr') and QgsProject.instance().addMapLayer(layer)

See proper usage below:-

# Path to the shapefile...
vector_file = r"C:\Users\Yusuf_08039508010\Desktop\Working_Files\GIS Data\NGR\Nig Adm\NGA_adm2.shp"

# Create vector layer object...
layer_1 = QgsVectorLayer(vector_file, 'DISPLAYNAME', 'ogr')

# Add shapefile to map canvas...
QgsProject.instance().addMapLayer(layer_1)
If you want to see a more useful scenario of adding/loading vector layers into qgis layer panel using pyqgis see this page.


We can also interact with the project file (.qgs or .qgz) as follow:-
# Create project instance object
project = QgsProject.instance()

# Get project file path name
project.fileName()

# Get project file base name
project.baseName()

# Set project file to new title
project.setTitle('New Title Name...')

# Get lenght/number of layers in a project layer panel
len(project.mapLayers())

# Find more attribute and methods for QgsProject
dir(QgsProject.instance())




ii) Accessing Shapefile Attributes Table
To access the attribute table fields/column use the layer_1.fields() method like this in a forloop expression...

# get attribute columns names
for f in layer_1.fields():
    print(f.name())
This will printout all the field/column names. You can manipulate individual column from the temporary variable f.


Saturday, June 26, 2021

Function in Python, Javascript and R

A function is what make a code reusable, you write a piece of code once and you use it as many times as you want. These function can either be built-in or custom constructed (user-defined) functions.

Programming languages (Python, Javascript and R in this case) have built in functions. These are functions that are already available by default within the language engine for you to use. There are many of these built-in functions, just to give an example of one lets look at a function that calculates the square root of a number.


Built-in square root function in Python

In python, this function is available in the built-in math module.
import math 
print(math.sqrt(25))

User-defined/Constructed square root function in Python

def sqrt_num(x):
    num_sqrt = x ** 0.5
    print(num_sqrt)

sqrt_num(25)


Built-in square root function in JavaScript

In JavaScript, this function is available in the built-in math library.
Math.sqrt(25);

User-defined/Constructed square root function in JavaScript

function sqrt_num(x){
    return num_sqrt = x ** 0.5;
}


Built-in square root function in R

In R, this function is directly available as seen below.
sqrt(25)

User-defined/Constructed square root function in R

sqrt_num <- function (x){
    num_sqrt = x ** 0.5
    print(num_sqrt)
}



That is it!

Monday, June 21, 2021

Python data wrangling examples (Practical Exercises with Solutions)

Data wrangling (aka data munging or data preprocessing) is the process of transforms “raw” data into a more suitable form ready for analysis. It is a vital skill in completing many data science project.

Some data wrangling methods we will use in this tutorial are: filtering, gathering/collecting, merging, grouping, concatenating, transforming, manipulating, extracting, formatting, chaining, subset, handling missing data, summarizing, combining, splitting etc. Depending on your data wrangling project objective, you will only utilize a handful of these methods per project.

Here below, you will find some data wrangling Exercises and Solutions. Lets get our hands dirty....


The dataset: All the data used are available for download from the github page.


Exercise 1:

Extract the "positions" of the following list of lawyers table into a new column.


Solution 1:-

We will read the file into dataframe and then replace the names from column 'name' by empty string in column 'position'.

import pandas as pd

# Read in the file...
df = pd.read_excel(r"C:\Users\Yusuf_08039508010\Desktop\DATA WRANGLING\Blog post Data\Exercise 1\lawyers_RAW.xlsx")

# Iterate over the rows and replace unwanted string...
role_list = []
for index, row in df.iterrows():
    role = row['position'].replace(row['name'], '').strip()
    role_list.append(role)
    # print(role)

df['ROLE'] = role_list

df

Thursday, June 17, 2021

PyQGIS - Convert from one vector format to another

 If you got a valid QgsVectorLayer, then the following class "QgsVectorFileWriter.writeAsVectorFormat(...)" is used to convert any gdal/ogr supported vector driver format to another.

There are many vector drivers supported by GDAL/OGR as listed on this page.

For example, to convert GeoJSON to Shapefile, we will read the GeoJSON file using QgsVectorLayer and convert it to ESRI Shapefile driver using QgsVectorFileWriter.writeAsVectorFormat(...) as demonstrated below.



# Convert GeoJSON to SHP
input_shp = QgsVectorLayer(r"C:\Users\Yusuf_08039508010\Desktop\US Zip Codes\ak_alaska_zip_codes_geo.min.json","polygon","ogr")
input_shp.isValid() 

QgsVectorFileWriter.writeAsVectorFormat(input_shp, r"C:\Users\Yusuf_08039508010\Desktop\US ZipCode\poly.shp" , "UTF-8", input_shp.crs(), "ESRI Shapefile")

# ----------------------------------------- for Bulk Conversion
import glob

input_files = glob.glob(r'C:\Users\Yusuf_08039508010\Desktop\Working_Files\GIS Data\US Zip Codes\*.json')
for f in input_files:
    out_filename = f.split('\\')[-1].split('.')[0]
    input_file = QgsVectorLayer(f, "polygon", "ogr")
    
    if input_file.isValid() == True:
        QgsVectorFileWriter.writeAsVectorFormat(input_file, rf"C:\Users\Yusuf_08039508010\Desktop\Working_Files\SHP\US ZipCode\{out_filename}.shp", "UTF-8", input_file.crs(), "ESRI Shapefile")
    else:
        print(f, 'is not a valid input file')
        
print('Done Processing..., ', f)


Note that "QgsVectorFileWriter.writeAsVectorFormat(...)" takes in the following; the input file, the output file, the encoding, the coordinate system, the output file driver.

That is it!

Tuesday, June 15, 2021

Python variables in HTML String Literals

Using Python Variables in HTML multi-line Python string

Running the script below will generate an HTML page like this...



# ------------- HTML Template ------------------------
html_head = '''

<!doctype html>
<html lang="en">
  <head>
    <!-- Required meta tags -->
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">

    <!-- Bootstrap CSS -->
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.1/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-+0n0xVW2eSR5OomGNYDnhzAbDsOXxcvSN1TPprVMTNDbiYZCxYbOOl7+AMvyTG2x" crossorigin="anonymous">

    <title>{main_title}</title>
  </head>
  <body>

'''


html_tail = '''

    <!-- Option 1: Bootstrap Bundle with Popper -->
    <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.0.1/dist/js/bootstrap.bundle.min.js" integrity="sha384-gtEjrD/SeCtmISkJkNUaaKMoLD0//ElJ19smozuHV6z3Iehds+3Ulb9Bn9Plx0x4" crossorigin="anonymous"></script>

  </body>
</html>

'''


html_body1 = '''
<div class="container">

    <div class="row">
      <div class="col-3">
        <h3>{main_title}</h3>
        <img width= '250px' src="{main_image}">
        <p>{main_description}</p>

      </div>
      
      <div class="col-9">

'''


html_body2 = '''
      
        <b>{item_date}</b>
        <h3>{item_title}</h3>
        <p>{item_description}</p>
        <audio controls> <source src="{item_mp3_file}" /> </audio>
        <hr>

'''

html_body3 = '''
      </div> \n
    </div>
</div>
'''
# -------------------------------------------------------

html_filename = 'index'
main_title = 'The HTML title' 
p_description = 'Placeholder.com is a free image placeholder service for web designers, serving billions and billions of images each year. You can specify image size & format (.GIF, .JPG, .PNG, .WEBP), background color & text color, as well as the text.'
main_image = 'https://via.placeholder.com/150'
p_date = '12-06-2021'
p_title = 'How To Use Our Placeholders'
main_description = 'Placeholder.com is a free image placeholder service for web designers, serving billions of images per year.'
p_mp3 = 'https://file-examples-com.github.io/uploads/2017/11/file_example_MP3_1MG.mp3'


with open(f'{html_filename}.html', 'w', encoding='utf-8') as html5:
    html5.write(html_head.format(main_title=main_title))

    html5.write(html_body1.format(main_title=main_title, main_description=main_description, main_image=main_image))
    html5.write(html_body2.format(item_date=p_date, item_title=p_title, item_description=p_description, item_mp3_file=p_mp3))
    html5.write(html_body3)

    html5.write(html_tail)

print('Finished...')


That is it.

Thursday, June 10, 2021

PyQGIS - Attribute Group Counter

 If you have a vector shapefile layer that has an attribute which is a categorized group for other attributes and you want to get the count/number of items in each group, then here is a quick script to get you the result.

A typical scenario where you will find this useful is when you count counties in a state or similar scenario. Assuming we have a vector layer with a category attribute column named "group" as seen below, then we can count the occurrences of each group using the following script.


from collections import Counter

# Get active vector layer...
layer = iface.activeLayer()

# Lets count the number of features...
print(layer.featureCount())

# Lets check the field names...
for f in layer.fields():
    print(f.name())

# Get attributes into a list...
attr_list = []
for feature in layer.getFeatures():
    attr_list.append(feature['group'])

# Count item occurrences using 'collections.Counter'
attr_count = dict( Counter(attr_list) )
print(attr_count)

The primary logic here is in counting item occurrences within a list using collections.Counter module as seen below.




That is it!

Monday, June 7, 2021

PyQGIS for fun - lets write fun little scripts in pyqgis

 There are already many pyqgis scripts that allow you interact with various spatial datasets. In this post, I want to take a look at something different just for fun and hopefully that will help you understanding pygis ecosystem!

So we won't be interacting with core GIS data per se, instead we will write some fun little projects in PyQGIS that has little or nothing to do with GIS data directly.

These project will help us understand pyqgis and the QGIS interface as a whole. Lets get started...


Note: When you are scripting in PyQGIS, you are directly or indirectly using three technologies namely: Qt, PyQt and PyQGIS

Qt is a widget toolkit for creating Graphical User Interfaces (GUI) in C++.

PyQt is the Python interface to Qt. PyQt provides classes and functions to interact with Qt widgets.

PyQGIS uses PyQt instead of the default python GUI library (TkInter) that comes with most python installations. TkInter was removed from the python that comes with QGIS, the last time I check on QGIS version 3.16 it throughs error like so...


I think the QGIS developers don't want you to use another GUI tool kit within QGIS (which I think they are right, as doing this will only introduce another layer of complexity), sorry TkInter lovers 😊.

Sunday, May 30, 2021

PyQGIS - Investigating Attribute field type for bulk shapefiles

 In this scenario, I wanted to merge some shapefiles into one file using the "Merge Vector Layer" algorithm. However, this doesn't end successfully because of varying attribute field type in some of the fields that oaths to have the same field name and field type.


The error is the case is as seen below. A field (ALAND10) in one of the layers (al_alabama_zip_codes_geohas different data type than in other layers.

ALAND10 field in layer al_alabama_zip_codes_geo has different data type than in other layers (Integer64 instead of Real) Execution failed after 0.97 seconds


Obviously, when all the data type in all the fields are set to match the process should work fine. But there are many layers to edit and many attribute fields to lookup. We definitely need an automated way to investigate and fix this.

So, lets lookup what was happening and possibly group those files with similar field type so we can merge those once together. Then we can edit the few once before merging them.


# Get all layers on layer panel...
layersDict = QgsProject.instance().mapLayers()

# extract the keys to list...
keys = list(layersDict.keys())

# extract the values to list...
values = list(layersDict.values())

data_list = []
for k, v in zip(keys, values):
    data_dict = {}
    
    data_dict['geom_name'] = v.name()
    
    # get attribute columns names and dataType
    attr_name_list = [ field.name() for field in layersDict[k].fields() ]
    attr_type_list = [ field.typeName() for field in layersDict[k].fields() ]
    
    data_dict['attr_name'] = attr_name_list
    data_dict['attr_type'] = attr_type_list
    
    data_list.append(data_dict)

print('Done...')

After running the script, we have a list containing dictionaries where keys are 'geometry name', 'attribute name' and 'attribute type'.

For convenience, I will copy this list into variable named ATTR_DETAILS and countinue the data wrangling in a jupyter notebook.

ATTR_DETAILS =[{'geom_name': 'ak_alaska_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Real', 'String', 'String', 'String']}, {'geom_name': 'al_alabama_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ar_arkansas_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'az_arizona_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ca_california_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'co_colorado_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ct_connecticut_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'dc_district_of_columbia_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'de_delaware_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'fl_florida_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ga_georgia_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'hi_hawaii_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ia_iowa_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'id_idaho_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'il_illinois_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'in_indiana_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ks_kansas_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ky_kentucky_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'la_louisiana_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ma_massachusetts_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'md_maryland_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'me_maine_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'mi_michigan_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'mn_minnesota_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'mo_missouri_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ms_mississippi_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'mt_montana_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nc_north_carolina_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nd_north_dakota_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ne_nebraska_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nh_new_hampshire_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nj_new_jersey_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nm_new_mexico_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nv_nevada_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ny_new_york_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'oh_ohio_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ok_oklahoma_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'or_oregon_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'pa_pennsylvania_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ri_rhode_island_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'sc_south_carolina_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'sd_south_dakota_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'tn_tennessee_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'tx_texas_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ut_utah_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'va_virginia_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'vt_vermont_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'wa_washington_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'wi_wisconsin_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'wv_west_virginia_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'wy_wyoming_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}]

Basically, I will use pandas to categorize/group the attribute type column. Then use it to get corresponding file names for each group, then merge each group together. We now have fewer files to edit their attribute type then subsequently merge them together.


import pandas as pd


ATTR_DETAILS = [{'geom_name': 'ak_alaska_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Real', 'String', 'String', 'String']}, {'geom_name': 'al_alabama_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ar_arkansas_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'az_arizona_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ca_california_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'co_colorado_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ct_connecticut_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'dc_district_of_columbia_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'de_delaware_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'fl_florida_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ga_georgia_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'hi_hawaii_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ia_iowa_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'id_idaho_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'il_illinois_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'in_indiana_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ks_kansas_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ky_kentucky_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'la_louisiana_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ma_massachusetts_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'md_maryland_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'me_maine_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'mi_michigan_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'mn_minnesota_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'mo_missouri_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ms_mississippi_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'mt_montana_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nc_north_carolina_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nd_north_dakota_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ne_nebraska_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nh_new_hampshire_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nj_new_jersey_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nm_new_mexico_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nv_nevada_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ny_new_york_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'oh_ohio_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ok_oklahoma_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'or_oregon_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'pa_pennsylvania_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ri_rhode_island_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'sc_south_carolina_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'sd_south_dakota_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'tn_tennessee_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'tx_texas_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ut_utah_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'va_virginia_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'vt_vermont_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'wa_washington_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'wi_wisconsin_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'wv_west_virginia_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'wy_wyoming_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}]

attr_df = pd.DataFrame(ATTR_DETAILS)
# attr_df
# --------------------------------------

# Convert attr_type column to string...
def list_to_str(a_list):
    return ",".join(a_list)

attr_df['attr_type_STR'] = attr_df['attr_type'].apply(lambda x:list_to_str(x))

# attr_df
# -----------------------------------------


# Check uniques item in the string attribute type column...
print(attr_df.attr_type_STR.unique())
# -----------------------------------------


# Categorise/Group the string attribute type column...
def category_func(cell):
    if cell == 'String,String,String,String,String,String,Real,Real,String,String,String':
        return 'X'
    elif cell == 'String,String,String,String,String,String,Integer64,Integer64,String,String,String':
        return 'Y'
    else:
        return 'Z'
    
attr_df['Category'] = attr_df['attr_type_STR'].apply(lambda x:category_func(x))

attr_df


Now we can get the file names from each group like this...

attr_df[attr_df['Category'] == 'X'].reset_index(drop=True)['geom_name'].to_list()


# X group - 1 file
'ak_alaska_zip_codes_geo'


# Y group - 30 files
'al_alabama_zip_codes_geo', 'ar_arkansas_zip_codes_geo', 'ct_connecticut_zip_codes_geo', 'dc_district_of_columbia_zip_codes_geo', 'de_delaware_zip_codes_geo', 'ga_georgia_zip_codes_geo', 'hi_hawaii_zip_codes_geo', 'ia_iowa_zip_codes_geo', 'il_illinois_zip_codes_geo', 'in_indiana_zip_codes_geo', 'ky_kentucky_zip_codes_geo', 'la_louisiana_zip_codes_geo', 'ma_massachusetts_zip_codes_geo', 'md_maryland_zip_codes_geo', 'mi_michigan_zip_codes_geo', 'mo_missouri_zip_codes_geo', 'ms_mississippi_zip_codes_geo', 'nc_north_carolina_zip_codes_geo', 'nh_new_hampshire_zip_codes_geo', 'nj_new_jersey_zip_codes_geo', 'ny_new_york_zip_codes_geo', 'oh_ohio_zip_codes_geo', 'pa_pennsylvania_zip_codes_geo', 'ri_rhode_island_zip_codes_geo', 'sc_south_carolina_zip_codes_geo', 'tn_tennessee_zip_codes_geo', 'va_virginia_zip_codes_geo', 'vt_vermont_zip_codes_geo', 'wi_wisconsin_zip_codes_geo', 'wv_west_virginia_zip_codes_geo'


# Z group - 20 files
'az_arizona_zip_codes_geo', 'ca_california_zip_codes_geo', 'co_colorado_zip_codes_geo', 'fl_florida_zip_codes_geo', 'id_idaho_zip_codes_geo', 'ks_kansas_zip_codes_geo', 'me_maine_zip_codes_geo', 'mn_minnesota_zip_codes_geo', 'mt_montana_zip_codes_geo', 'nd_north_dakota_zip_codes_geo', 'ne_nebraska_zip_codes_geo', 'nm_new_mexico_zip_codes_geo', 'nv_nevada_zip_codes_geo', 'ok_oklahoma_zip_codes_geo', 'or_oregon_zip_codes_geo', 'sd_south_dakota_zip_codes_geo', 'tx_texas_zip_codes_geo', 'ut_utah_zip_codes_geo', 'wa_washington_zip_codes_geo', 'wy_wyoming_zip_codes_geo'
So, instead of editing 51 layers we can now merge the groups and edit just two layers.

That is it!