Thursday, May 20, 2021

PyQGIS Shapefile attribute lookup

 This pyqgis script will read a shapefile attribute column and compare the entries with a given list.

This script is useful when you want to join a tabular data to a shapefile and you want check if the all records on the table exits on the shapefile attribute table. It ensure you are safely joining all records on your new table to the shapefile.

To keep it simple, we will create list of records from a table we want to join to the shapefile. Then read the shapefile attribute column that serves as the join into a list. We then compare/lookup the list making sure there are no difference between them. If there differences, then count them and print out the values so we know what value would have a blank record after the join operation. 

Let's get our hands dirty... 

First, let me say the primary logic for our script is similar to what I did here "Filtering Missing Zip codes out of master Zip codes list". 

available_zipcodes = [77389, 77086, 77346, 77018, 77040, 77388, 77065, 77080, 77041, 77396, 77385, 77354, 77382, 77067, 77066, 77090, 77345, 77355, 77373, 77339, 77043, 77302, 77304, 77070, 77375, 77095, 77433, 77069, 77038, 77091, 77380, 77092, 77316, 77429, 77377, 77379, 77064, 77088, 77338, 77449, 77386, 77381, 77493, 77356, 77068, 77014, 77084, 77055, 77301, 77303, 77384]

given_zipcodes = [77325, 77339, 77345, 77346, 77380, 77381, 77382, 77383, 77384, 77385, 77386, 77301, 77302, 77303, 77304, 77316, 77354, 77356, 77389, 77014, 77018, 77038, 77040, 77041, 77043, 77055, 77064, 77065, 77066, 77067, 77068, 77069, 77070, 77080, 77084, 77086, 77088, 77090, 77091, 77092, 77095, 77375, 77377, 77379, 77388, 77429, 77433, 77449, 77493, 77373, 77338, 77347, 77391, 77396, 77355]


for zipcode in given_zipcodes:
    if zipcode not in available_zipcodes:
        print(zipcode)

The first list (available_...) will come from the shapefile while the second list (given_...) will come from the join table.


Here I got a polygon shapefile with 774 LGA in Nigeria. I then got a spreadsheet file from a different source which I want to join to the polygon shapefile. Due to incomplete data, variation names etc, there may be some polygon that won have attribute from the joined spreadsheet.


The object of the script is to lookup the difference between the two data sources before we perform the join operation.

spreadsheet_name = ['UMUNNEOCHI', 'ISUKWUATO', 'OHAFIA', 'AROCHUKWU', 'BENDE', 'UMUAHIA NORTH', 'UMUAHIA SOUTH', 'IKWUANO', 'ISIALA NGWA NORTH', 'ISIALA NGWA SOUTH', 'ABA NORTH', 'ABA SOUTH', 'UGWUNAGBO', 'OBINGWA', 'UKWA EAST', 'UKWA WEST & OSISIOMA ', 'ETINAN', 'IBESIKPO ASUTAN', 'IBIONO IBOM', 'ITU', 'NSIT ATAI', 'NSIT IBOM', 'NSIT UBIUM', 'URUAN', 'UYO', 'ABAK', 'ESSIEN UDIM', 'ETIM EKPO', 'IKA', 'IKONO', 'IKOT EKPENE', 'INI', 'OBOT AKARA', 'ORUK ANAM', 'UKANAFUN', 'EASTERN OBOLO', 'EKET', 'ESIT EKET', 'IBENO', 'IKOT ABASI', 'MBO', 'MKPAT ENIN', 'OKOBO', 'ONNA', 'ORON', 'UDUNG UKO,URUE OFFONG/ORUKO', 'ADAMAWA', 'MADAGALI', 'MAIHA', 'MICHIKA', 'MUBI NORTH', 'MUBI SOUTH ', 'DEMSA', 'GANYE', 'GUYUK', 'JADA', 'MAYO-BELWA', 'NUMAN', 'SHELLENG', 'TOUNGO', 'LAMURDE', 'HONG', 'FUFORE', 'SONG', 'YOLA NORTH', 'YOLA SOUTH', 'GERFI', 'ONITSHA NORTH', 'ONITSHA SOUTH', 'OYI', 'OGBARU', 'ANAMBRA EAST', 'ANAMBRA WEST', 'AYAMELUM', 'AWKA NORTH', 'AWKA SOUTH', 'NJIKOKA', 'ANAOCHA', 'IDEMILI NORTH', 'IDEMILI SOUTH', 'DUNUKOFIA', 'IHIALA', 'NNEWI NORTH', 'NNEWI SOUTH', 'ORUMBA SOUTH', 'ORUMBA NORTH', 'AGUATA', 'EKWUSIGO', 'ALKALERI', 'KIRFI', 'BAUCHI', 'TAFAWA BALEWA', 'BOGORO', 'DASS', 'TORO', 'NINGI', 'WARJI', 'DARAZO', 'GANJUWA', 'MISAU', 'DAMBAN', 'ZAKI', 'GAMAWA', 'JAMA’ARE', 'ITAS-GADAU', 'SHIRA', 'GIADE', 'KATAGUM', 'BRASS', 'NEMBE', 'OGBIA', 'KOLOKUMA/OPOKUMA', 'SOUTHERN IJAW', 'YENAGOA', 'EKEREMOR', 'SAGBAMA', 'BENUE', 'KATSINA-ALA', 'KONSHISHA', 'KWANDE', 'LOGO', 'UKUM', 'USHONGO', 'VANDEIKYA', 'BURUKU', 'GBOKO', 'GUMA', 'GWER-EAST', 'GWER-WEST', 'MARKURDI', 'TARKA', 'ADO', 'AGATU', 'APA', 'OBI', 'OGBADIBO', 'OHIMINI', 'OJU,OKPOKWU', 'OTUKPO', 'ABADAM', 'GUBIO', 'GUZAMALA', 'KUKAWA', 'MAGUMERI', 'MARTE', 'MOBBAR', 'NGANZA', 'MOGUNO', 'DIKWA', 'JERE', 'KALA BALGE', 'MAFA', 'METROPOLITAN', 'NGALA', 'KONDUGA', 'BAMA', 'KAGA', 'ASKIRA/UBA', 'BAYO', 'BIU', 'CHIBOK', 'DAMABOA', 'GWOZA', 'HAWUL', 'KWAYA', 'SHANI', 'BEKWARRA', 'OBANLIKU', 'OBUDU', 'OGOJA', 'YALA', 'BIASE', 'ABI', 'BOKI', 'ETUNG', 'IKOM', 'OBUBRA', 'YAKURR', 'CALABAR MUNICIPAL', 'CALABAR SOUTH', 'AKAMKPA', 'AKPABUYO', 'BAKASSI', 'ODUKPANI', 'ETHIOPE EAST', 'ETHIOPE WEST', 'OKPE', 'SAPELE,UDU', 'UGHELLI NORTH', 'UGHELI SOUTH', 'UVWIE', 'ANIOCHA NORTH', 'ANIOCHA SOUTH', 'IKA NORTH-EAST', 'IKA SOUTH', 'NDOKWA EAST', 'NDOKWA WEST', 'OSHIMILI NORTH', 'OSHIMILI SOUTH', 'UKWUANI', 'BOMADI', 'BURUTU', 'ISOKO NORTH', 'ISOKO SOUTH', 'PATANI', 'WARRI NORTH', 'WARRI SOUTH', 'WARRI SOUTH WEST', 'ABAKALIKI', 'EBONYI', 'IZZI', 'OHAUKWU', 'EZZA NORTH', 'EZZA SOUTH', 'IKWO', 'ISHIELU', 'AFIKPO NORTH', 'AFIKPO SOUTH', 'IVO', 'OHAOZARA', 'ONICHA', 'ESAN CENTRAL', 'ESAN NORTH EAST', 'ESAN SOUTH EAST', 'ESAN WEST', 'IGUEBEN', 'AKOKO EDO', 'ETSAKO EAST', 'ETSAKO CENTRAL', 'ETSAKO WEST', 'OWAN EAST', 'OWAN WEST', 'OREDO', 'ORHIONMWON', 'OVIA NORTH EAST', 'OVIA SOUTH WEST', 'EGOR', 'UHUNMWODE', 'IKPOBA OKHA', 'IKOLE', 'OYE', 'IDO/OSI', 'MOBA', 'ILEJEMEJE', 'ADO EKITI', 'IREPODUN/IFELODUN', 'IJERO', 'EFON', 'EKITI WEST', 'EKITI SOUTH WEST', 'IKERE', 'EMURE', 'ISE/ORUN', 'GBONYIN', 'EKITI EAST', 'ENUGU EAST', 'ENUGU NORTH', 'ENUGU SOUTH', 'ISI-UZO', 'NKANU EAST', 'NKANU WEST', 'ANINIRI', 'AWGU', 'EZEAGU', 'OJI RIVER', 'UDI ', 'IGBO-ETITI', 'IGBO-EZE NORTH', 'IGO-EZE SOUTH', 'UZO UWANI', 'UDENU', 'NSUKKA', 'AKKO', 'YAMALTU/DEBA', 'BALANGA', 'BILLIRI', 'KALTUNGO', 'SHONGOM', 'FUNAKAYE', 'NAFADA', 'GOMBE', 'KWAMI,DUKKU', 'ABOH MBAISE', 'AHIAZU MBAISE', 'EZINIHITTE', 'IKEDURU', 'MBAITOLI', 'NGOR OKPALA', 'OWERRI MUNICIPAL', 'OWERRI NORTH', 'OWERRI WEST', 'IDEATO NORTH', 'IDEATO SOUTH', 'ISU', 'NJABA', 'NKWERE', 'NWANGELE', 'OGUTA', 'OHAJI/EGBEMA', 'ORLU', 'ORSU', 'ORU WEST', 'ORU EAST', 'EHIME MBANO', 'IHITE/UBOMA', 'ISIALA MBANO', 'OBOWO', 'OKIGWE', 'ONUIMO', 'BIRNIN KUDU', 'BUJI', 'DUTSE', 'GWARAM', 'KIYAWA', 'JAUN', 'MIGA', 'AUYO', 'BIRNIWA', 'GURI', 'HADEJIA', 'KAUGAMA', 'K/HAUSA', 'K/KASAMMA', 'M/MADORI', 'BABURA', 'GAGARA', 'GARKI', 'GWIWA', 'KAZAURE', 'MAIGATAR', 'RONI', 'RINGIN', 'S/TANKAR', 'TAURA', 'YANKWASHI', 'GUME', 'KUBAU', 'IKARA', 'MAKARFI', 'SOBA', 'SABON GARI', 'ZARIA', 'LERE', 'KUDAN', 'BIRNIN GWARI', 'GIWA', 'IGABI', 'KADUNA NORTH', 'KADUNA SOUTH', 'CHIKUN', 'KAJURU', 'JEMA’A', 'JABA', 'KAURA', 'ZANGON', 'KAURA', 'KACHIA', 'KAGARKO', 'SANGA', 'DALA', 'GWALE', 'DAWAKIN', 'GEZAWA', 'TARAUNI', 'FAGGE', 'GARUM MALLAM', 'KANO MUNICIPAL', 'KUMBOTSO', 'KURA', 'MADOBI', 'MINJIBIR', 'NASSARAWA', 'UNGOGO', 'WARAWA', 'BICHI', 'SHANONO', 'BAGWAI', 'DANBATTA', 'MAKODA', 'DAWAKIN', 'GABASAWA', 'GWARZO', 'KABO', 'RIMI GADO', 'TOFA', 'TSANYAWA', 'KUNCHI', 'KARAYE', 'ALBASU', 'BEBEJI', 'BUNKURE', 'DOGUWA', 'GAYA', 'KIRU', 'RANO', 'TAKAI', 'AJINGI', 'ROGO', 'KIBIYA', 'TUDUN WADA', 'GARKO', 'WUDIL', 'SUMAILA', 'DAURA', 'ZANGO', 'MAIADUA', 'MASHI', 'MANI', 'INGAWA', 'BINDAWA', 'SANDAMU', 'DUTSI', 'KANKIA', 'KUSADA', 'BAURE', 'FUNTUA', 'FASKARI', 'DANJA', 'DANDUME', 'BAKORI', 'KANKARA', 'MALUMEFASHI', 'KAFUR', 'MUSAWA', 'MATAZU', 'SABUWA', 'KATSINA', 'KAITA', 'JIBIA', 'BATSARI', 'SAFANA', 'DUTSIN-MA', 'KURFI', 'BATAGARAWA', 'RIMI', 'CHARANCHI', 'DANMUSA', 'AREWA', 'ARGUNGU', 'AUGIE', 'BAGUDO', 'DANDI', 'SURU', 'JEGA', 'ALEIRO', 'BIRNIN KEBBI', 'BUNZA', 'GWANDU', 'KALGO', 'KOKO BESSE', 'MAIYAMA', 'FAKAI', 'NGASKI', 'SAKABA', 'SHANGA', 'WASAGU/DANKO', 'YAURI', 'ZURU', 'ADAVI', 'AJAOKUTA', 'OGORI/MAGONGO', 'OKEHI', 'OKENE', 'ANKPA', 'BASSA', 'DEKINA', 'IBAJI', 'IDAH', 'IGALAMELA-ODOLU', 'OFU', 'OLAMABORO', 'OMALA', 'IJUMU', 'KABBA', 'KOGI', 'LOKOJA', 'MOPAMURO', 'YAGBA EAST', 'YAGBA WEST', 'BARUTEN', 'EDU', 'PATIGI', 'KAIAMA', 'MORO', 'ASA', 'ILORIN EAST', 'ILORIN SOUTH', 'ILORIN WEST', 'OFFA', 'EKITI', 'OKE-ERO', 'IFELODUN', 'IREPODUN', 'ISIN', 'OYUN', 'LAGOS ISLAND', 'LAGOS MAINLAND', 'SURULERE', 'APAPA', 'ETI-OSA', 'SHOMOLU', 'KOSOFE', 'EPE', 'IBEJU-LEKKI', 'IKORODU', 'AGEGE,IFAKO-IJAYE', 'ALIMOSHO', 'BADAGRY', 'OJO', 'AJEROMI/IFELODUN', 'AMUWO-ODOFIN', 'OSHODI/ISOLO', 'IKEJA', 'MUSHIN', 'NASSARAWA', 'AKAWANGA', 'NASSARAWA EGGON', 'WAMBA', 'NASSARAWA', 'KEFFI', 'KOKONA', 'KARU', 'TOTO', 'LAFIA', 'AWE', 'DOMA', 'KEANA', 'OBI', 'BOSSO', 'CHACHANGA', 'GURARA', 'PAIKORO', 'RAFI', 'SHIRORO', 'MUYA', 'SULEJA', 'TAFA', 'AGWARA', 'BORGU', 'KONTOGORA', 'MARIGA', 'RIJAU', 'WUSHISHI', 'MASHEGU MAGAMA', 'AGAIE', 'BIDA', 'KATCHA', 'BATAGI', 'LAPAI', 'LAVUN', 'EDATI-IDATI', 'MOKWA', 'IFO', 'EWEKORO', 'OBAFEMI/OWODE', 'ABEOKUTA NORTH', 'ABEOKUTA SOUTH', 'ODEDA', 'SAGAMU', 'IKENNE', 'REMO NORTH', 'IJEBU-ODE', 'ODOGBOLU', 'IJEBU NORTH EAST', 'IJEBU NORTH', 'IJEBU EAST', 'OGUN WATERSIDE', 'IMEKO AFON', 'EGBADO NORTH', 'EGBADO SOUTH', 'IPOKIA', 'ADO-ODO/OTA', 'AKOKO NORTH EAST', 'AKOKO NORTH WEST', 'AKOKO SOUTH WEST,AKOKO SOUTH EAST', 'OSE', 'OWO', 'AKURE NORTH', 'AKURE SOUTH', 'IFEDORE', 'IDANRE', 'ONDO EAST', 'ONDO WEST', 'ILEOLUJI/OKEIGBO', 'ODIGBO', 'IRELE', 'OKITIPUPA', 'ESE-ODO', 'ILAJE', 'BORIPE', 'BOLOWADURO', 'IFELODUN', 'ILA', 'IFEDAYO', 'IREPODUN', 'OROLU', 'ODO-OTIN', 'OLORUNDA', 'OSOGBO', 'ATAKUNMOSA EAST', 'ATAKUNMOSA WEST', 'IFE CENTRAL', 'IFE EAST', 'IFE NORTH', 'IFE SOUTH', 'ILESA EAST', 'ILESA WEST', 'OBOKUN', 'ORIADE', 'AYADAADE', 'AYEDIRE', 'EDE NORTH', 'EDE SOUTH', 'EGBEDORE', 'EJIGBO', 'IREWOLE', 'ISOKAN', 'IWO', 'OLA-OLUWA', 'AFIJIO', 'AKINYELE', 'EGBEDA', 'OGO OLUWA', 'SURULERE', 'LAGELU', 'OLUYOLE', 'ONA-ARA', 'OYO EAST', 'OYO WEST', 'ATIBA', 'SAKI WEST', 'SAKI EAST', 'ATIGBO', 'IREPO', 'OLORUNSOGO', 'KAJOLA', 'IWAJOWA', 'OGBOMOSHO NORTH', 'OGBOMOSHO SOUTH', 'ISEYIN', 'OORELOPE', 'ORIRE', 'ITESIWAJU', 'IBADAN NORTH', 'IBADAN NORTH EAST', 'IBADAN NORTH WEST', 'IBADAN SOUTH EAST', 'IBADAN SOUTH WEST', 'IBARAPA CENTRAL', 'IBARAPA NORTH', 'IBARAPA EAST', 'IDO', 'LANTANG NORTH', 'LANTANG SOUTH', 'MIKANG', 'QUA’AN PAN', 'SHENDAM', 'WASE', 'BOKKOS', 'KANKE', 'MANGU', 'PANKSHIN', 'KANAM', 'BARKIN LADI', 'BASSA', 'JOS EAST', 'JOS NORTH', 'JOS SOUTH', 'RIYOM', 'ETCHE', 'OMUMA', 'IKWERRE', 'OBIO AKPOR', 'PORT HARCOURT', 'OKIRIKA', 'OGU BOLO', 'EMOHUA', 'OPOBO/NKORO', 'ANDONI', 'OYIGBO', 'TAI', 'ELEME', 'GOKANA', 'KHANA', 'ASARI-TORU', 'AKUKU-TORU', 'DEGEMA', 'OGBA/EGBEMA/NDONI', 'ABUA-ODUAL', 'AHOADA EAST', 'AHOADA WEST', 'BONNY', 'ISA', 'SABON BIRNI', 'WURNO', 'GORONYO', 'RABAH', 'GADA', 'ILLELA', 'GWADABAWA', 'TANGAZA', 'BINJI', 'SILAME', 'GUDU', 'KWARE', 'WAMAKKO,SOKOTO NORTH', 'SOKOTO SOUTH', 'DANGE SHUNI', 'TURETA', 'BODINGA', 'SHAGARI', 'YABO', 'TAMBUWA', 'KEBBE', 'WUKARI', 'IBI', 'DONGA', 'USSA', 'TAKUM', 'SARDAUNA', 'KURMI', 'BALI', 'GASHAKA', 'GASSOL', 'JALINGO', 'YORRO', 'ZING', 'LAU', 'ARDO-KOLA', 'KARIM-LAMIDO', 'BURSARI', 'GEIDAM,GUJBA', 'GULANI', 'TARMUWA', 'YUNUSARI', 'BADE', 'JAKUSKO', 'MACHINA', 'KARASUWA', 'NGURU', 'YUSUFARI', 'FIKA', 'FUNE', 'NANGERE', 'POTISKUM', 'DAMATURU', 'K/NAMODA', 'SHINKAFI', 'ZURMI', 'BIRNIN MAGAJI', 'T/MAFARA', 'GUSAU', 'TSAFE', 'BUNGUDU', 'MARU', 'MARADUN', 'BAKURA', 'ANKA', 'BUKKUYUM', 'GUMMI', 'ABAJI AREA COUNCIL', 'GWAGWA AREA COUNCIL', 'KUJE AREA COUNCIL', 'MUNICIPAL AREA COUNCIL', 'KWALI AREA COUNCIL', 'BWARI AREA COUNCIL']

A quick feature/record count on both the polygon shapefile and the spreadsheet data shows variable number of records.



The polygon shapefile has 774 records while the spreadsheet data has number of 766 records. This means that about 8 features won't be joined to the shapefile.

The problem doesn't stop there, because some of the names may vary between the spreadsheet data and the shapefile attribute join column. This is the real thing we most check and add it to the 8 missing records above to get the total numbers of record that won't succeed after the join operation.


Using the script below, lets read the shapefile attribute column into a list variable named attr_list and use it to lookup the spreadsheet source list (spreadsheet_name).


# Get column attribute into list
attr_list = []
for feature in layer.getFeatures():
    # print(feature['LGA_CAPS'])
    attr_list.append(feature['LGA_CAPS'])


for lga in attr_list:
    if lga not in spreadsheet_name:
        print(lga)

for lga in spreadsheet_name:
    if lga not in attr_list:
        print(lga)

This revealed over 100+ record that are not common between the two list due to different factors. Hence, this now gives us a clear number of expected records that won't be joined.

The most common reason mismatch are typos or misspelled (however, there are few that were written slightly different due to locale). For example, the word "police" could be written/misspelled as "polis, poliza, polizia, policĂ­a, polizei, polisie, polici, polizĂ®, policija etc" in different locals.

When you are expecting this kind of scenario in attribute data sources, then it becomes important to lookup the attribute as we have discussed above before performing the join operation.


Thank you for following.

No comments:

Post a Comment