Sunday, May 30, 2021

PyQGIS - Investigating Attribute field type for bulk shapefiles

 In this scenario, I wanted to merge some shapefiles into one file using the "Merge Vector Layer" algorithm. However, this doesn't end successfully because of varying attribute field type in some of the fields that oaths to have the same field name and field type.


The error is the case is as seen below. A field (ALAND10) in one of the layers (al_alabama_zip_codes_geohas different data type than in other layers.

ALAND10 field in layer al_alabama_zip_codes_geo has different data type than in other layers (Integer64 instead of Real) Execution failed after 0.97 seconds


Obviously, when all the data type in all the fields are set to match the process should work fine. But there are many layers to edit and many attribute fields to lookup. We definitely need an automated way to investigate and fix this.

So, lets lookup what was happening and possibly group those files with similar field type so we can merge those once together. Then we can edit the few once before merging them.


# Get all layers on layer panel...
layersDict = QgsProject.instance().mapLayers()

# extract the keys to list...
keys = list(layersDict.keys())

# extract the values to list...
values = list(layersDict.values())

data_list = []
for k, v in zip(keys, values):
    data_dict = {}
    
    data_dict['geom_name'] = v.name()
    
    # get attribute columns names and dataType
    attr_name_list = [ field.name() for field in layersDict[k].fields() ]
    attr_type_list = [ field.typeName() for field in layersDict[k].fields() ]
    
    data_dict['attr_name'] = attr_name_list
    data_dict['attr_type'] = attr_type_list
    
    data_list.append(data_dict)

print('Done...')

After running the script, we have a list containing dictionaries where keys are 'geometry name', 'attribute name' and 'attribute type'.

For convenience, I will copy this list into variable named ATTR_DETAILS and countinue the data wrangling in a jupyter notebook.

ATTR_DETAILS =[{'geom_name': 'ak_alaska_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Real', 'String', 'String', 'String']}, {'geom_name': 'al_alabama_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ar_arkansas_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'az_arizona_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ca_california_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'co_colorado_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ct_connecticut_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'dc_district_of_columbia_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'de_delaware_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'fl_florida_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ga_georgia_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'hi_hawaii_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ia_iowa_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'id_idaho_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'il_illinois_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'in_indiana_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ks_kansas_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ky_kentucky_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'la_louisiana_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ma_massachusetts_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'md_maryland_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'me_maine_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'mi_michigan_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'mn_minnesota_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'mo_missouri_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ms_mississippi_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'mt_montana_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nc_north_carolina_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nd_north_dakota_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ne_nebraska_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nh_new_hampshire_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nj_new_jersey_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nm_new_mexico_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nv_nevada_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ny_new_york_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'oh_ohio_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ok_oklahoma_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'or_oregon_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'pa_pennsylvania_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ri_rhode_island_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'sc_south_carolina_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'sd_south_dakota_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'tn_tennessee_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'tx_texas_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ut_utah_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'va_virginia_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'vt_vermont_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'wa_washington_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'wi_wisconsin_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'wv_west_virginia_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'wy_wyoming_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}]

Basically, I will use pandas to categorize/group the attribute type column. Then use it to get corresponding file names for each group, then merge each group together. We now have fewer files to edit their attribute type then subsequently merge them together.


import pandas as pd


ATTR_DETAILS = [{'geom_name': 'ak_alaska_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Real', 'String', 'String', 'String']}, {'geom_name': 'al_alabama_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ar_arkansas_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'az_arizona_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ca_california_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'co_colorado_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ct_connecticut_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'dc_district_of_columbia_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'de_delaware_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'fl_florida_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ga_georgia_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'hi_hawaii_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ia_iowa_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'id_idaho_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'il_illinois_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'in_indiana_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ks_kansas_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ky_kentucky_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'la_louisiana_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ma_massachusetts_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'md_maryland_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'me_maine_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'mi_michigan_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'mn_minnesota_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'mo_missouri_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ms_mississippi_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'mt_montana_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nc_north_carolina_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nd_north_dakota_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ne_nebraska_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nh_new_hampshire_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nj_new_jersey_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nm_new_mexico_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'nv_nevada_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ny_new_york_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'oh_ohio_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ok_oklahoma_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'or_oregon_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'pa_pennsylvania_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ri_rhode_island_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'sc_south_carolina_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'sd_south_dakota_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'tn_tennessee_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'tx_texas_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'ut_utah_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'va_virginia_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'vt_vermont_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'wa_washington_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'wi_wisconsin_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'wv_west_virginia_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Integer64', 'Integer64', 'String', 'String', 'String']}, {'geom_name': 'wy_wyoming_zip_codes_geo', 'attr_name': ['STATEFP10', 'ZCTA5CE10', 'GEOID10', 'CLASSFP10', 'MTFCC10', 'FUNCSTAT10', 'ALAND10', 'AWATER10', 'INTPTLAT10', 'INTPTLON10', 'PARTFLG10'], 'attr_type': ['String', 'String', 'String', 'String', 'String', 'String', 'Real', 'Integer64', 'String', 'String', 'String']}]

attr_df = pd.DataFrame(ATTR_DETAILS)
# attr_df
# --------------------------------------

# Convert attr_type column to string...
def list_to_str(a_list):
    return ",".join(a_list)

attr_df['attr_type_STR'] = attr_df['attr_type'].apply(lambda x:list_to_str(x))

# attr_df
# -----------------------------------------


# Check uniques item in the string attribute type column...
print(attr_df.attr_type_STR.unique())
# -----------------------------------------


# Categorise/Group the string attribute type column...
def category_func(cell):
    if cell == 'String,String,String,String,String,String,Real,Real,String,String,String':
        return 'X'
    elif cell == 'String,String,String,String,String,String,Integer64,Integer64,String,String,String':
        return 'Y'
    else:
        return 'Z'
    
attr_df['Category'] = attr_df['attr_type_STR'].apply(lambda x:category_func(x))

attr_df


Now we can get the file names from each group like this...

attr_df[attr_df['Category'] == 'X'].reset_index(drop=True)['geom_name'].to_list()


# X group - 1 file
'ak_alaska_zip_codes_geo'


# Y group - 30 files
'al_alabama_zip_codes_geo', 'ar_arkansas_zip_codes_geo', 'ct_connecticut_zip_codes_geo', 'dc_district_of_columbia_zip_codes_geo', 'de_delaware_zip_codes_geo', 'ga_georgia_zip_codes_geo', 'hi_hawaii_zip_codes_geo', 'ia_iowa_zip_codes_geo', 'il_illinois_zip_codes_geo', 'in_indiana_zip_codes_geo', 'ky_kentucky_zip_codes_geo', 'la_louisiana_zip_codes_geo', 'ma_massachusetts_zip_codes_geo', 'md_maryland_zip_codes_geo', 'mi_michigan_zip_codes_geo', 'mo_missouri_zip_codes_geo', 'ms_mississippi_zip_codes_geo', 'nc_north_carolina_zip_codes_geo', 'nh_new_hampshire_zip_codes_geo', 'nj_new_jersey_zip_codes_geo', 'ny_new_york_zip_codes_geo', 'oh_ohio_zip_codes_geo', 'pa_pennsylvania_zip_codes_geo', 'ri_rhode_island_zip_codes_geo', 'sc_south_carolina_zip_codes_geo', 'tn_tennessee_zip_codes_geo', 'va_virginia_zip_codes_geo', 'vt_vermont_zip_codes_geo', 'wi_wisconsin_zip_codes_geo', 'wv_west_virginia_zip_codes_geo'


# Z group - 20 files
'az_arizona_zip_codes_geo', 'ca_california_zip_codes_geo', 'co_colorado_zip_codes_geo', 'fl_florida_zip_codes_geo', 'id_idaho_zip_codes_geo', 'ks_kansas_zip_codes_geo', 'me_maine_zip_codes_geo', 'mn_minnesota_zip_codes_geo', 'mt_montana_zip_codes_geo', 'nd_north_dakota_zip_codes_geo', 'ne_nebraska_zip_codes_geo', 'nm_new_mexico_zip_codes_geo', 'nv_nevada_zip_codes_geo', 'ok_oklahoma_zip_codes_geo', 'or_oregon_zip_codes_geo', 'sd_south_dakota_zip_codes_geo', 'tx_texas_zip_codes_geo', 'ut_utah_zip_codes_geo', 'wa_washington_zip_codes_geo', 'wy_wyoming_zip_codes_geo'
So, instead of editing 51 layers we can now merge the groups and edit just two layers.

That is it!

Thursday, May 20, 2021

PyQGIS Shapefile attribute lookup

 This pyqgis script will read a shapefile attribute column and compare the entries with a given list.

This script is useful when you want to join a tabular data to a shapefile and you want check if the all records on the table exits on the shapefile attribute table. It ensure you are safely joining all records on your new table to the shapefile.

To keep it simple, we will create list of records from a table we want to join to the shapefile. Then read the shapefile attribute column that serves as the join into a list. We then compare/lookup the list making sure there are no difference between them. If there differences, then count them and print out the values so we know what value would have a blank record after the join operation. 

Let's get our hands dirty... 

First, let me say the primary logic for our script is similar to what I did here "Filtering Missing Zip codes out of master Zip codes list". 

available_zipcodes = [77389, 77086, 77346, 77018, 77040, 77388, 77065, 77080, 77041, 77396, 77385, 77354, 77382, 77067, 77066, 77090, 77345, 77355, 77373, 77339, 77043, 77302, 77304, 77070, 77375, 77095, 77433, 77069, 77038, 77091, 77380, 77092, 77316, 77429, 77377, 77379, 77064, 77088, 77338, 77449, 77386, 77381, 77493, 77356, 77068, 77014, 77084, 77055, 77301, 77303, 77384]

given_zipcodes = [77325, 77339, 77345, 77346, 77380, 77381, 77382, 77383, 77384, 77385, 77386, 77301, 77302, 77303, 77304, 77316, 77354, 77356, 77389, 77014, 77018, 77038, 77040, 77041, 77043, 77055, 77064, 77065, 77066, 77067, 77068, 77069, 77070, 77080, 77084, 77086, 77088, 77090, 77091, 77092, 77095, 77375, 77377, 77379, 77388, 77429, 77433, 77449, 77493, 77373, 77338, 77347, 77391, 77396, 77355]


for zipcode in given_zipcodes:
    if zipcode not in available_zipcodes:
        print(zipcode)

The first list (available_...) will come from the shapefile while the second list (given_...) will come from the join table.


Here I got a polygon shapefile with 774 LGA in Nigeria. I then got a spreadsheet file from a different source which I want to join to the polygon shapefile. Due to incomplete data, variation names etc, there may be some polygon that won have attribute from the joined spreadsheet.


The object of the script is to lookup the difference between the two data sources before we perform the join operation.

Saturday, May 15, 2021

Python Django Setup Template

Here listed below are code snippets for creating a "PROJECT_NAME" Django Project with "APP_NAME" App.

1) Initial Set Up

  • mkdir My_Django_Folder
  • cd My_Django_Folder
  • pipenv install django
  • pipenv shell
  • django-admin startproject PROJECT_FOLDER_NAME .
  • python manage.py migrate
  • python manage.py createsuperuser
  • python manage.py startapp APP_FOLDER_NAME
  • python manage.py runserver

mkdir My_Django_Folder
cd My_Django_Folder
pipenv install django
pipenv shell
django-admin startproject PROJECT_FOLDER_NAME .

python manage.py migrate
python manage.py createsuperuser

python manage.py startapp APP_FOLDER_NAME
python manage.py runserver

Your project folder structure should look like this when you type 'tree' on the command prompt:-


It should also contain two files ("Pipfile" and "Pipfile.lock") that makes the virtual environment work properly.

If everything went successfully, you should get the welcome screen that congratulates you when you access the localhost with the given port number.





2) Add HTML/CSS/JS templates

At this point, you want to display a custom front-end the user instead of the default seen above. A front-end framework such as Bootstrap can be use, however, I will keep it simple by using a simple html/css/js files.

Recall from the step above, we create a django PROJECT and APP. Now, we need to edit some files within both folders like this:-
  • PROJECT folder: here we will edit these files - settings.py and urls.py
  • APP folder: here we will edit/create these files - urls.pyviews.py, and models.py. Note we could also add our templates folder here for the index.html but I prefer to create a root templates folder and point to it by updating the TEMPLATES list in PROJECT settings.py.

The following code snippets will add HTML/CSS/JS templates
  • mkdir templates
  • echo > templates/index.html (touch templates/index.html)
  • add your html code to the templates\index.html file
  • Update INSTALLED_APPS and TEMPLATES from PROJECT's settings.py
  • Update urls.py file from PROJECT's folder
  • Create a new urls.py file in app folder 'APP_FOLDER_NAME' and add view path
  • Update views.py file from the app folder 'APP_FOLDER_NAME' by defining the view added above. Since the name was called 'home', it most be the same in the views.py file.


If you do python manage.py runserver, this should load our html page from the templates folder as seen below:-


Notice that this is a very basic web page that didn't make use of a database, hence we didn't talk about the models.py file.

To add more pages for example, lets add contact page. We will create the contact.html file in the templates folder and update urls.py and views.py file in the app folder.



Note: If you don't want the .html extension to appear in the url like this "http://localhost:8000/contact.html", the edit urls.py file to remove the extension so that the path is like so... "http://localhost:8000/contact"



3) Adding Static file

In django, to use external CSS, JS, or Image files within HTML document you need to tell it how to get them by creating a root folder named "static" to house all the static files. The steps involved are as follow:-
  1. First thing is to update PROJECT's settings.py file by adding STATICFILES_DIRS


  2. Create a root folder named "static" and create another sub-folder with name of the app 'APP_FOLDER_NAME' 
  3. Add your static files there, for example I added an image 'buhari.jpg'
  4. Update the html file by first typing "{% load static %}" on the very first line. Then point to the file using this "{% static 'APP_FOLDER_NAME/buhari.jpg' %}"

The new home page should look like this...



This is the django way of doing things! In normal html, the inserted image tag will just look like this: <img src="APP_FOLDER_NAME/buhari.jpg " />

Likewise inserting a css or js file like this: <link rel="stylesheet" type="text/css" href="style.css"> in normal html will look like this: <link rel="stylesheet" type="text/css" href="{% static 'APP_FOLDER_NAME/style.css' %}"> in django way.

There are various delimeters, but the commonly used once are:-
  • {% %} - statements
  • {{ }} - expressions to print to the template output
  • {# #} - comments which are not included in the template output
  • # ## - line statements

Here is a nice tutorial on jinja template  by by zetcode.com which works similar to the django template.

Monday, May 10, 2021

Extracting Information from PDF with Python

There are several python libraries available that allows the extraction of pdf data. Some notable once are listed below:-

1) PyPDF2
2) Tabula-py (requires JAVA installed)
3) Zamzar.com API
4) Camelot
5) TIKA
6) PDFMiner

All the above are great have there pros and cons which depends on the pdf file in question. You will need to try each one out to find out the one that works best for your specific case.

In my own case, TIKA has been the best so far for all the pdf I have worked with. So in this blog post, I will demonstrate how the 'tika' module is used to extract information from a pdf file.

Lets get started....

To install this module, you most has java 7 or above installed on your machine. Then using it is just as easy as this:-

from tika import parser # pip install tika (need Java 7+ installed)

raw = parser.from_file('C:/Desktop/pdf_file_name.pdf')
print(raw['content'])

The content can then be stored in a variable for further manipulation.



Example

Lets extract the content of this PDF file:-



The result will look like this below. Note that I added .strip() function that will remove extra white spaces at the beginning and end of the content result.







That is it!

Thursday, May 6, 2021

Python script to separate shapefile based on type (Point, Line and Polygon)

 Here I got a folder that contains several shapefiles of mixed types (i.e Point, Line and Polygon shapefiles all in the same folder), now I want separate each type of shapefile into one separate folder. That is all Point shapefiles into same folder, all Line shapefiles into same folder, and all Polygon shapefiles into same folder.


Here we got twelve shapefiles that look like this:-


To complete this manually, we have to open all the twelve files and save each to its respective folder. This is not we wanted and it is time consuming, so I will use python scripting to complete this.

Let's get our hands dirty...

Monday, May 3, 2021

Geo-calibrating an SVG map for use in mapsvg.com wordpress plugin

 Scalable Vector Graphic (SVG) file isn't the best for creating geospatial referenced maps. However, it is possible to maintain map data that have geo-like coordinates that are close to the geolocation they represent. SVGs are not commonly used for geo-enable maps on the web, instead they are commonly used for drawing maps just for visual purpose on web pages.

If you really want to work with geo-enabled map on a website page where spatial reference system is of paramount importance, then you should use file format such as GeoJSON. Or store your map in a spatial database.

There are many reasons why you would want to make your SVG file have spatial capability. A common reason is when you want to overlay your SVG drawing on a geographical base map as used in MapSVG wordpress plugin. The mapsvg plugin allows you to add custom maps in svg format to wordpress web pages. If you want the map overlay the svg map on Google maps background, then you must geo-calibrate the map as we will see in a moment.

The plugin isn't free, at the time of writing it is about $49. However, there is a demo admin panel (with the following login details: demo / demo123) where you can try it out for free. We will use this to upload a custom map that is geo-enable.