Wednesday, September 15, 2021

Working with Regular Expression in QGIS

Regular Expression (RegEx or RegExp) is a tool used to handle strings/texts and data validation, searching, search & replace, string splitting etc. RegEx has now become a standard features in a wide range of languages and popular tools, including GIS tools, Text editors, word processors, system tools, database engines, etc.

In this article, we will specifically look at RegExp in the QGIS tool.

RegEx is available from many parts in QGIS, here we will only be looking at it in manipulating values of the attribute table.

From the 'Select Feature using an Expression' dialog window, you should find group of 'String Functions'. This group contains functions that operates on strings (e.g., that replace, convert to upper case).



As you can see, there are three main functions with direct support for regular expression namely; regexp_match(), regexp_replace() and regexp_substr().

regexp_match
Returns the first matching position matching a regular expression within a string, or 0 if the substring is not found.

regexp_replace
Returns a string with the supplied regular expression replaced.

regexp_substr
Returns the portion of a string which matches a supplied regular expression.

Friday, September 10, 2021

R packages for working with shapefile

 A shapefile (points, lines, and polygons) can be read into R object using any of the following packages: sf, rgdal, maptools and PBSmapping.

First you need to install them as follow: install.packages(c('sf', 'rgdal', 'maptools', 'PBSmapping'))


The code below shows how to each package to read in shapefile into an object for further processing.

library(sf)
library(rgdal)
library(maptools)
library(PBSmapping)


# read in shapefiles using 'sf'
my_map <- st_read("C:/Users/Yusuf_08039508010/Desktop/Working_Files/2021/08-August/R Poor and Vulnerable/SHP/NIG_ADM.shp")



# read in shapefiles using 'rgdal'
my_map <- readOGR("C:/Users/Yusuf_08039508010/Desktop/Working_Files/2021/08-August/R Poor and Vulnerable/SHP", "NIG_ADM")



# read in shapefiles using 'maptools'
# my_map1 <- readShapePoints("...")
# my_map2 <- readShapeLines("...")
my_map3 <- readShapePoly("C:/Users/Yusuf_08039508010/Desktop/Working_Files/2021/08-August/R Poor and Vulnerable/SHP/NIG_ADM")



# read in shapefiles using 'PBSmapping'
my_map <- importShapefile("C:/Users/Yusuf_08039508010/Desktop/Working_Files/2021/08-August/R Poor and Vulnerable/SHP/NIG_ADM")

Note that using maptools is deprecated and you will get a warning message that reads: readShapePoly is deprecated; use rgdal::readOGR or sf::st_read 

For more, read the web archive on Read and write ESRI Shapefiles with R.


That is it!

Wednesday, September 8, 2021

Generate Emails given owner and domain names

 This python script will generate four variant emails based on the company officers name and domain names.



If the officer name is "John Smith" and his company's domain name is "example.com", then the script is required to return four emails with the formats as follow:-

Email 1: john.smith@example.com

Email 2: jsmith@example.com

Email 3: info@example.com

Email 4: sales@example.com

I think the requirement is fairly clear and requires no further explanations.


import pandas as pd

df = pd.read_excel(r"C:\Users\Yusuf_08039508010\App_Data.xlsx")

df

Create function to perform the magics.

# Defining the email functions...

# Email 1 - john.smith@example.com
def email_1(name, website):
    return name.replace(' ', '.').lower() +"@"+ website.lower()


# Email 2 - jsmith@example.com
def email_2(name, website):
    return name.split(' ')[0][0].lower() + name.split(' ')[1].lower()+"@"+ website.lower()


# Email 3 - info@example.com
def email_3(website):
    return 'info' +"@"+ website.lower()


# Email 4 - sales@example.com
def email_4(website):
    return 'sales' +"@"+ website.lower()

Apply the functions to the columns.
# df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)
# df['col_3'] = df.apply(lambda x: f(x['col_1'], x['col_2']), axis=1)

df['Email 1'] = df.apply( lambda x : email_1(x['Company Staff Name'], x['Company Domain Name']), axis=1 )
df['Email 2'] = df.apply( lambda x : email_2(x['Company Staff Name'], x['Company Domain Name']), axis=1 )
df['Email 3'] = df.apply( lambda x : email_3(x['Company Domain Name']), axis=1 )
df['Email 4'] = df.apply( lambda x : email_4(x['Company Domain Name']), axis=1 )

df



That is it!

Friday, September 3, 2021

RegEx - I used recently

1) RegEx to Match string - starting with 2 or 3 numbers ending with character 'd'.

12d

123d

^[0-9]{2,3}+(d)

[] - Range

{} - Number Count

() - Group



2) Starts with 1, 2, 3 or 4 numbers followed by space and 'pts'

12  pts

234  pts

^[0-9]{1,4}+( pts)



3) Starts with 2 alphabetic characters followed by .

dr.

sd.

^[a-z]{2}+[.]



4) Starts with any of these strings

^(support|info|Hello|contact|inquiries|sales|orders|Customerservice)



5) Ends with any of these strings

(bad-elf.com|balancedbites.com|bellamihair.com|bellybandit.com|betseyjohnson.com|betterwayhealth.com|blenderseyewear.com|capbeauty.com|child1st.com|chilitechnology.com|concealmentexpress.com|ctl.net|kognito.com|koparibeauty.com|nrsworld.com|nzxt.com|olloclip.com)$



6) Extract country name and phone code from a 'select' HTML tag

Example is: Afghanistan +93 and AF +93. The 'select' HTML tag is seen below:-

<span class="a-dropdown-container"><select name="countryCode" autocomplete="off" data-a-touch-header="Country/Region Code" id="auth-country-picker" tabindex="-1" class="a-native-dropdown">
<option data-calling-code="93" data-country-code="AF" value="AF" data-a-html-content="Afghanistan +93">
AF +93
</option>
<option data-calling-code="355" data-country-code="AL" value="AL" data-a-html-content="Albania +355">
AL +355
</option>
<option data-calling-code="213" data-country-code="DZ" value="DZ" data-a-html-content="Algeria +213">
DZ +213
</option>
<option data-calling-code="1" data-country-code="AS" value="AS" data-a-html-content="American Samoa +1">
AS +1
</option>
<option data-calling-code="376" data-country-code="AD" value="AD" data-a-html-content="Andorra +376">
AD +376
</option>
<option data-calling-code="244" data-country-code="AO" value="AO" data-a-html-content="Angola +244">
AO +244
</option>
<option data-calling-code="1" data-country-code="AG" value="AG" data-a-html-content="Antigua &amp; Barbuda +1">
AG +1
</option>
<option data-calling-code="54" data-country-code="AR" value="AR" data-a-html-content="Argentina +54">
AR +54
</option>
<option data-calling-code="374" data-country-code="AM" value="AM" data-a-html-content="Armenia +374">
AM +374
</option>
<option data-calling-code="297" data-country-code="AW" value="AW" data-a-html-content="Aruba +297">
AW +297
</option>
<option data-calling-code="61" data-country-code="AU" value="AU" data-a-html-content="Australia +61">
AU +61
</option>
<option data-calling-code="43" data-country-code="AT" value="AT" data-a-html-content="Austria +43">
AT +43
</option>
<option data-calling-code="994" data-country-code="AZ" value="AZ" data-a-html-content="Azerbaijan +994">
AZ +994
</option>
<option data-calling-code="1" data-country-code="BS" value="BS" data-a-html-content="Bahamas +1">
BS +1
</option>
<option data-calling-code="973" data-country-code="BH" value="BH" data-a-html-content="Bahrain +973">
BH +973
</option>
<option data-calling-code="880" data-country-code="BD" value="BD" data-a-html-content="Bangladesh +880">
BD +880
</option>
<option data-calling-code="1" data-country-code="BB" value="BB" data-a-html-content="Barbados +1">
BB +1
</option>
<option data-calling-code="375" data-country-code="BY" value="BY" data-a-html-content="Belarus +375">
BY +375
</option>
<option data-calling-code="32" data-country-code="BE" value="BE" data-a-html-content="Belgium +32">
BE +32
</option>
<option data-calling-code="501" data-country-code="BZ" value="BZ" data-a-html-content="Belize +501">
BZ +501
</option>
<option data-calling-code="229" data-country-code="BJ" value="BJ" data-a-html-content="Benin +229">
BJ +229
</option>
<option data-calling-code="1" data-country-code="BM" value="BM" data-a-html-content="Bermuda +1">
BM +1
</option>
<option data-calling-code="975" data-country-code="BT" value="BT" data-a-html-content="Bhutan +975">
BT +975
</option>
<option data-calling-code="591" data-country-code="BO" value="BO" data-a-html-content="Bolivia +591">
BO +591
</option>
<option data-calling-code="387" data-country-code="BA" value="BA" data-a-html-content="Bosnia &amp; Herzegovina +387">
BA +387
</option>
<option data-calling-code="267" data-country-code="BW" value="BW" data-a-html-content="Botswana +267">
BW +267
</option>
<option data-calling-code="55" data-country-code="BR" value="BR" data-a-html-content="Brazil +55">
BR +55
</option>
<option data-calling-code="1" data-country-code="VG" value="VG" data-a-html-content="British Virgin Islands +1">
VG +1
</option>
<option data-calling-code="673" data-country-code="BN" value="BN" data-a-html-content="Brunei +673">
BN +673
</option>
<option data-calling-code="359" data-country-code="BG" value="BG" data-a-html-content="Bulgaria +359">
BG +359
</option>
<option data-calling-code="226" data-country-code="BF" value="BF" data-a-html-content="Burkina Faso +226">
BF +226
</option>
<option data-calling-code="257" data-country-code="BI" value="BI" data-a-html-content="Burundi +257">
BI +257
</option>
<option data-calling-code="855" data-country-code="KH" value="KH" data-a-html-content="Cambodia +855">
KH +855
</option>
<option data-calling-code="237" data-country-code="CM" value="CM" data-a-html-content="Cameroon +237">
CM +237
</option>
<option data-calling-code="1" data-country-code="CA" value="CA" data-a-html-content="Canada +1">
CA +1
</option>
<option data-calling-code="238" data-country-code="CV" value="CV" data-a-html-content="Cape Verde +238">
CV +238
</option>
<option data-calling-code="1" data-country-code="KY" value="KY" data-a-html-content="Cayman Islands +1">
KY +1
</option>
<option data-calling-code="236" data-country-code="CF" value="CF" data-a-html-content="Central African Republic +236">
CF +236
</option>
<option data-calling-code="235" data-country-code="TD" value="TD" data-a-html-content="Chad +235">
TD +235
</option>
<option data-calling-code="56" data-country-code="CL" value="CL" data-a-html-content="Chile +56">
CL +56
</option>
<option data-calling-code="86" data-country-code="CN" value="CN" data-a-html-content="China +86">
CN +86
</option>
<option data-calling-code="57" data-country-code="CO" value="CO" data-a-html-content="Colombia +57">
CO +57
</option>
<option data-calling-code="269" data-country-code="KM" value="KM" data-a-html-content="Comoros +269">
KM +269
</option>
<option data-calling-code="242" data-country-code="CG" value="CG" data-a-html-content="Congo - Brazzaville +242">
CG +242
</option>
<option data-calling-code="243" data-country-code="CD" value="CD" data-a-html-content="Congo - Kinshasa +243">
CD +243
</option>
<option data-calling-code="682" data-country-code="CK" value="CK" data-a-html-content="Cook Islands +682">
CK +682
</option>
<option data-calling-code="506" data-country-code="CR" value="CR" data-a-html-content="Costa Rica +506">
CR +506
</option>
<option data-calling-code="385" data-country-code="HR" value="HR" data-a-html-content="Croatia +385">
HR +385
</option>
<option data-calling-code="53" data-country-code="CU" value="CU" data-a-html-content="Cuba +53">
CU +53
</option>
<option data-calling-code="357" data-country-code="CY" value="CY" data-a-html-content="Cyprus +357">
CY +357
</option>
<option data-calling-code="420" data-country-code="CZ" value="CZ" data-a-html-content="Czech Republic +420">
CZ +420
</option>
<option data-calling-code="225" data-country-code="CI" value="CI" data-a-html-content="C&ocirc;te d&rsquo;Ivoire +225">
CI +225
</option>
<option data-calling-code="45" data-country-code="DK" value="DK" data-a-html-content="Denmark +45">
DK +45
</option>
<option data-calling-code="253" data-country-code="DJ" value="DJ" data-a-html-content="Djibouti +253">
DJ +253
</option>
<option data-calling-code="1" data-country-code="DM" value="DM" data-a-html-content="Dominica +1">
DM +1
</option>
<option data-calling-code="1" data-country-code="DO" value="DO" data-a-html-content="Dominican Republic +1">
DO +1
</option>
<option data-calling-code="593" data-country-code="EC" value="EC" data-a-html-content="Ecuador +593">
EC +593
</option>
<option data-calling-code="20" data-country-code="EG" value="EG" data-a-html-content="Egypt +20">
EG +20
</option>
<option data-calling-code="503" data-country-code="SV" value="SV" data-a-html-content="El Salvador +503">
SV +503
</option>
<option data-calling-code="240" data-country-code="GQ" value="GQ" data-a-html-content="Equatorial Guinea +240">
GQ +240
</option>
<option data-calling-code="291" data-country-code="ER" value="ER" data-a-html-content="Eritrea +291">
ER +291
</option>
<option data-calling-code="372" data-country-code="EE" value="EE" data-a-html-content="Estonia +372">
EE +372
</option>
<option data-calling-code="251" data-country-code="ET" value="ET" data-a-html-content="Ethiopia +251">
ET +251
</option>
<option data-calling-code="500" data-country-code="FK" value="FK" data-a-html-content="Falkland Islands +500">
FK +500
</option>
<option data-calling-code="298" data-country-code="FO" value="FO" data-a-html-content="Faroe Islands +298">
FO +298
</option>
<option data-calling-code="679" data-country-code="FJ" value="FJ" data-a-html-content="Fiji +679">
FJ +679
</option>
<option data-calling-code="358" data-country-code="FI" value="FI" data-a-html-content="Finland +358">
FI +358
</option>
<option data-calling-code="33" data-country-code="FR" value="FR" data-a-html-content="France +33">
FR +33
</option>
<option data-calling-code="594" data-country-code="GF" value="GF" data-a-html-content="French Guiana +594">
GF +594
</option>
<option data-calling-code="689" data-country-code="PF" value="PF" data-a-html-content="French Polynesia +689">
PF +689
</option>
<option data-calling-code="241" data-country-code="GA" value="GA" data-a-html-content="Gabon +241">
GA +241
</option>
<option data-calling-code="220" data-country-code="GM" value="GM" data-a-html-content="Gambia +220">
GM +220
</option>
<option data-calling-code="995" data-country-code="GE" value="GE" data-a-html-content="Georgia +995">
GE +995
</option>
<option data-calling-code="49" data-country-code="DE" value="DE" data-a-html-content="Germany +49">
DE +49
</option>
<option data-calling-code="233" data-country-code="GH" value="GH" data-a-html-content="Ghana +233">
GH +233
</option>
<option data-calling-code="350" data-country-code="GI" value="GI" data-a-html-content="Gibraltar +350">
GI +350
</option>
<option data-calling-code="30" data-country-code="GR" value="GR" data-a-html-content="Greece +30">
GR +30
</option>
<option data-calling-code="299" data-country-code="GL" value="GL" data-a-html-content="Greenland +299">
GL +299
</option>
<option data-calling-code="1" data-country-code="GD" value="GD" data-a-html-content="Grenada +1">
GD +1
</option>
<option data-calling-code="590" data-country-code="GP" value="GP" data-a-html-content="Guadeloupe +590">
GP +590
</option>
<option data-calling-code="1" data-country-code="GU" value="GU" data-a-html-content="Guam +1">
GU +1
</option>
<option data-calling-code="502" data-country-code="GT" value="GT" data-a-html-content="Guatemala +502">
GT +502
</option>
<option data-calling-code="224" data-country-code="GN" value="GN" data-a-html-content="Guinea +224">
GN +224
</option>
<option data-calling-code="245" data-country-code="GW" value="GW" data-a-html-content="Guinea-Bissau +245">
GW +245
</option>
<option data-calling-code="592" data-country-code="GY" value="GY" data-a-html-content="Guyana +592">
GY +592
</option>
<option data-calling-code="509" data-country-code="HT" value="HT" data-a-html-content="Haiti +509">
HT +509
</option>
<option data-calling-code="504" data-country-code="HN" value="HN" data-a-html-content="Honduras +504">
HN +504
</option>
<option data-calling-code="852" data-country-code="HK" value="HK" data-a-html-content="Hong Kong +852">
HK +852
</option>
<option data-calling-code="36" data-country-code="HU" value="HU" data-a-html-content="Hungary +36">
HU +36
</option>
<option data-calling-code="354" data-country-code="IS" value="IS" data-a-html-content="Iceland +354">
IS +354
</option>
<option data-calling-code="91" data-country-code="IN" value="IN" data-a-html-content="India +91">
IN +91
</option>
<option data-calling-code="62" data-country-code="ID" value="ID" data-a-html-content="Indonesia +62">
ID +62
</option>
<option data-calling-code="98" data-country-code="IR" value="IR" data-a-html-content="Iran +98">
IR +98
</option>
<option data-calling-code="964" data-country-code="IQ" value="IQ" data-a-html-content="Iraq +964">
IQ +964
</option>
<option data-calling-code="353" data-country-code="IE" value="IE" data-a-html-content="Ireland +353">
IE +353
</option>
<option data-calling-code="972" data-country-code="IL" value="IL" data-a-html-content="Israel +972">
IL +972
</option>
<option data-calling-code="39" data-country-code="IT" value="IT" data-a-html-content="Italy +39">
IT +39
</option>
<option data-calling-code="1" data-country-code="JM" value="JM" data-a-html-content="Jamaica +1">
JM +1
</option>
<option data-calling-code="81" data-country-code="JP" value="JP" data-a-html-content="Japan +81">
JP +81
</option>
<option data-calling-code="962" data-country-code="JO" value="JO" data-a-html-content="Jordan +962">
JO +962
</option>
<option data-calling-code="7" data-country-code="KZ" value="KZ" data-a-html-content="Kazakhstan +7">
KZ +7
</option>
<option data-calling-code="254" data-country-code="KE" value="KE" data-a-html-content="Kenya +254">
KE +254
</option>
<option data-calling-code="686" data-country-code="KI" value="KI" data-a-html-content="Kiribati +686">
KI +686
</option>
<option data-calling-code="965" data-country-code="KW" value="KW" data-a-html-content="Kuwait +965">
KW +965
</option>
<option data-calling-code="996" data-country-code="KG" value="KG" data-a-html-content="Kyrgyzstan +996">
KG +996
</option>
<option data-calling-code="856" data-country-code="LA" value="LA" data-a-html-content="Laos +856">
LA +856
</option>
<option data-calling-code="371" data-country-code="LV" value="LV" data-a-html-content="Latvia +371">
LV +371
</option>
<option data-calling-code="961" data-country-code="LB" value="LB" data-a-html-content="Lebanon +961">
LB +961
</option>
<option data-calling-code="266" data-country-code="LS" value="LS" data-a-html-content="Lesotho +266">
LS +266
</option>
<option data-calling-code="231" data-country-code="LR" value="LR" data-a-html-content="Liberia +231">
LR +231
</option>
<option data-calling-code="218" data-country-code="LY" value="LY" data-a-html-content="Libya +218">
LY +218
</option>
<option data-calling-code="423" data-country-code="LI" value="LI" data-a-html-content="Liechtenstein +423">
LI +423
</option>
<option data-calling-code="370" data-country-code="LT" value="LT" data-a-html-content="Lithuania +370">
LT +370
</option>
<option data-calling-code="352" data-country-code="LU" value="LU" data-a-html-content="Luxembourg +352">
LU +352
</option>
<option data-calling-code="853" data-country-code="MO" value="MO" data-a-html-content="Macau +853">
MO +853
</option>
<option data-calling-code="389" data-country-code="MK" value="MK" data-a-html-content="Macedonia +389">
MK +389
</option>
<option data-calling-code="261" data-country-code="MG" value="MG" data-a-html-content="Madagascar +261">
MG +261
</option>
<option data-calling-code="265" data-country-code="MW" value="MW" data-a-html-content="Malawi +265">
MW +265
</option>
<option data-calling-code="60" data-country-code="MY" value="MY" data-a-html-content="Malaysia +60">
MY +60
</option>
<option data-calling-code="960" data-country-code="MV" value="MV" data-a-html-content="Maldives +960">
MV +960
</option>
<option data-calling-code="223" data-country-code="ML" value="ML" data-a-html-content="Mali +223">
ML +223
</option>
<option data-calling-code="356" data-country-code="MT" value="MT" data-a-html-content="Malta +356">
MT +356
</option>
<option data-calling-code="692" data-country-code="MH" value="MH" data-a-html-content="Marshall Islands +692">
MH +692
</option>
<option data-calling-code="596" data-country-code="MQ" value="MQ" data-a-html-content="Martinique +596">
MQ +596
</option>
<option data-calling-code="222" data-country-code="MR" value="MR" data-a-html-content="Mauritania +222">
MR +222
</option>
<option data-calling-code="230" data-country-code="MU" value="MU" data-a-html-content="Mauritius +230">
MU +230
</option>
<option data-calling-code="52" data-country-code="MX" value="MX" data-a-html-content="Mexico +52">
MX +52
</option>
<option data-calling-code="691" data-country-code="FM" value="FM" data-a-html-content="Micronesia +691">
FM +691
</option>
<option data-calling-code="373" data-country-code="MD" value="MD" data-a-html-content="Moldova +373">
MD +373
</option>
<option data-calling-code="377" data-country-code="MC" value="MC" data-a-html-content="Monaco +377">
MC +377
</option>
<option data-calling-code="976" data-country-code="MN" value="MN" data-a-html-content="Mongolia +976">
MN +976
</option>
<option data-calling-code="382" data-country-code="ME" value="ME" data-a-html-content="Montenegro +382">
ME +382
</option>
<option data-calling-code="1" data-country-code="MS" value="MS" data-a-html-content="Montserrat +1">
MS +1
</option>
<option data-calling-code="212" data-country-code="MA" value="MA" data-a-html-content="Morocco +212">
MA +212
</option>
<option data-calling-code="258" data-country-code="MZ" value="MZ" data-a-html-content="Mozambique +258">
MZ +258
</option>
<option data-calling-code="95" data-country-code="MM" value="MM" data-a-html-content="Myanmar (Burma) +95">
MM +95
</option>
<option data-calling-code="264" data-country-code="NA" value="NA" data-a-html-content="Namibia +264">
NA +264
</option>
<option data-calling-code="674" data-country-code="NR" value="NR" data-a-html-content="Nauru +674">
NR +674
</option>
<option data-calling-code="977" data-country-code="NP" value="NP" data-a-html-content="Nepal +977">
NP +977
</option>
<option data-calling-code="31" data-country-code="NL" value="NL" data-a-html-content="Netherlands +31">
NL +31
</option>
<option data-calling-code="687" data-country-code="NC" value="NC" data-a-html-content="New Caledonia +687">
NC +687
</option>
<option data-calling-code="64" data-country-code="NZ" value="NZ" data-a-html-content="New Zealand +64">
NZ +64
</option>
<option data-calling-code="505" data-country-code="NI" value="NI" data-a-html-content="Nicaragua +505">
NI +505
</option>
<option data-calling-code="227" data-country-code="NE" value="NE" data-a-html-content="Niger +227">
NE +227
</option>
<option data-calling-code="234" data-country-code="NG" value="NG" data-a-html-content="Nigeria +234">
NG +234
</option>
<option data-calling-code="683" data-country-code="NU" value="NU" data-a-html-content="Niue +683">
NU +683
</option>
<option data-calling-code="672" data-country-code="NF" value="NF" data-a-html-content="Norfolk Island +672">
NF +672
</option>
<option data-calling-code="850" data-country-code="KP" value="KP" data-a-html-content="North Korea +850">
KP +850
</option>
<option data-calling-code="47" data-country-code="NO" value="NO" data-a-html-content="Norway +47">
NO +47
</option>
<option data-calling-code="968" data-country-code="OM" value="OM" data-a-html-content="Oman +968">
OM +968
</option>
<option data-calling-code="92" data-country-code="PK" value="PK" data-a-html-content="Pakistan +92">
PK +92
</option>
<option data-calling-code="680" data-country-code="PW" value="PW" data-a-html-content="Palau +680">
PW +680
</option>
<option data-calling-code="970" data-country-code="PS" value="PS" data-a-html-content="Palestinian Territories +970">
PS +970
</option>
<option data-calling-code="507" data-country-code="PA" value="PA" data-a-html-content="Panama +507">
PA +507
</option>
<option data-calling-code="675" data-country-code="PG" value="PG" data-a-html-content="Papua New Guinea +675">
PG +675
</option>
<option data-calling-code="595" data-country-code="PY" value="PY" data-a-html-content="Paraguay +595">
PY +595
</option>
<option data-calling-code="51" data-country-code="PE" value="PE" data-a-html-content="Peru +51">
PE +51
</option>
<option data-calling-code="63" data-country-code="PH" value="PH" data-a-html-content="Philippines +63">
PH +63
</option>
<option data-calling-code="48" data-country-code="PL" value="PL" data-a-html-content="Poland +48">
PL +48
</option>
<option data-calling-code="351" data-country-code="PT" value="PT" data-a-html-content="Portugal +351">
PT +351
</option>
<option data-calling-code="1" data-country-code="PR" value="PR" data-a-html-content="Puerto Rico +1">
PR +1
</option>
<option data-calling-code="974" data-country-code="QA" value="QA" data-a-html-content="Qatar +974">
QA +974
</option>
<option data-calling-code="40" data-country-code="RO" value="RO" data-a-html-content="Romania +40">
RO +40
</option>
<option data-calling-code="7" data-country-code="RU" value="RU" data-a-html-content="Russia +7">
RU +7
</option>
<option data-calling-code="250" data-country-code="RW" value="RW" data-a-html-content="Rwanda +250">
RW +250
</option>
<option data-calling-code="262" data-country-code="RE" value="RE" data-a-html-content="R&eacute;union +262">
RE +262
</option>
<option data-calling-code="685" data-country-code="WS" value="WS" data-a-html-content="Samoa +685">
WS +685
</option>
<option data-calling-code="378" data-country-code="SM" value="SM" data-a-html-content="San Marino +378">
SM +378
</option>
<option data-calling-code="966" data-country-code="SA" value="SA" data-a-html-content="Saudi Arabia +966">
SA +966
</option>
<option data-calling-code="221" data-country-code="SN" value="SN" data-a-html-content="Senegal +221">
SN +221
</option>
<option data-calling-code="381" data-country-code="RS" value="RS" data-a-html-content="Serbia +381">
RS +381
</option>
<option data-calling-code="248" data-country-code="SC" value="SC" data-a-html-content="Seychelles +248">
SC +248
</option>
<option data-calling-code="232" data-country-code="SL" value="SL" data-a-html-content="Sierra Leone +232">
SL +232
</option>
<option data-calling-code="65" data-country-code="SG" value="SG" data-a-html-content="Singapore +65">
SG +65
</option>
<option data-calling-code="421" data-country-code="SK" value="SK" data-a-html-content="Slovakia +421">
SK +421
</option>
<option data-calling-code="386" data-country-code="SI" value="SI" data-a-html-content="Slovenia +386">
SI +386
</option>
<option data-calling-code="677" data-country-code="SB" value="SB" data-a-html-content="Solomon Islands +677">
SB +677
</option>
<option data-calling-code="252" data-country-code="SO" value="SO" data-a-html-content="Somalia +252">
SO +252
</option>
<option data-calling-code="27" data-country-code="ZA" value="ZA" data-a-html-content="South Africa +27">
ZA +27
</option>
<option data-calling-code="82" data-country-code="KR" value="KR" data-a-html-content="South Korea +82">
KR +82
</option>
<option data-calling-code="211" data-country-code="SS" value="SS" data-a-html-content="South Sudan +211">
SS +211
</option>
<option data-calling-code="34" data-country-code="ES" value="ES" data-a-html-content="Spain +34">
ES +34
</option>
<option data-calling-code="94" data-country-code="LK" value="LK" data-a-html-content="Sri Lanka +94">
LK +94
</option>
<option data-calling-code="1" data-country-code="KN" value="KN" data-a-html-content="St. Kitts &amp; Nevis +1">
KN +1
</option>
<option data-calling-code="1" data-country-code="LC" value="LC" data-a-html-content="St. Lucia +1">
LC +1
</option>
<option data-calling-code="508" data-country-code="PM" value="PM" data-a-html-content="St. Pierre &amp; Miquelon +508">
PM +508
</option>
<option data-calling-code="1" data-country-code="VC" value="VC" data-a-html-content="St. Vincent &amp; Grenadines +1">
VC +1
</option>
<option data-calling-code="249" data-country-code="SD" value="SD" data-a-html-content="Sudan +249">
SD +249
</option>
<option data-calling-code="597" data-country-code="SR" value="SR" data-a-html-content="Suriname +597">
SR +597
</option>
<option data-calling-code="268" data-country-code="SZ" value="SZ" data-a-html-content="Swaziland +268">
SZ +268
</option>
<option data-calling-code="46" data-country-code="SE" value="SE" data-a-html-content="Sweden +46">
SE +46
</option>
<option data-calling-code="41" data-country-code="CH" value="CH" data-a-html-content="Switzerland +41">
CH +41
</option>
<option data-calling-code="963" data-country-code="SY" value="SY" data-a-html-content="Syria +963">
SY +963
</option>
<option data-calling-code="239" data-country-code="ST" value="ST" data-a-html-content="S&atilde;o Tom&eacute; &amp; Pr&iacute;ncipe +239">
ST +239
</option>
<option data-calling-code="886" data-country-code="TW" value="TW" data-a-html-content="Taiwan +886">
TW +886
</option>
<option data-calling-code="992" data-country-code="TJ" value="TJ" data-a-html-content="Tajikistan +992">
TJ +992
</option>
<option data-calling-code="255" data-country-code="TZ" value="TZ" data-a-html-content="Tanzania +255">
TZ +255
</option>
<option data-calling-code="66" data-country-code="TH" value="TH" data-a-html-content="Thailand +66">
TH +66
</option>
<option data-calling-code="670" data-country-code="TL" value="TL" data-a-html-content="Timor-Leste +670">
TL +670
</option>
<option data-calling-code="228" data-country-code="TG" value="TG" data-a-html-content="Togo +228">
TG +228
</option>
<option data-calling-code="676" data-country-code="TO" value="TO" data-a-html-content="Tonga +676">
TO +676
</option>
<option data-calling-code="1" data-country-code="TT" value="TT" data-a-html-content="Trinidad &amp; Tobago +1">
TT +1
</option>
<option data-calling-code="216" data-country-code="TN" value="TN" data-a-html-content="Tunisia +216">
TN +216
</option>
<option data-calling-code="90" data-country-code="TR" value="TR" data-a-html-content="Turkey +90">
TR +90
</option>
<option data-calling-code="993" data-country-code="TM" value="TM" data-a-html-content="Turkmenistan +993">
TM +993
</option>
<option data-calling-code="1" data-country-code="TC" value="TC" data-a-html-content="Turks &amp; Caicos Islands +1">
TC +1
</option>
<option data-calling-code="688" data-country-code="TV" value="TV" data-a-html-content="Tuvalu +688">
TV +688
</option>
<option data-calling-code="1" data-country-code="VI" value="VI" data-a-html-content="U.S. Virgin Islands +1">
VI +1
</option>
<option data-calling-code="256" data-country-code="UG" value="UG" data-a-html-content="Uganda +256">
UG +256
</option>
<option data-calling-code="380" data-country-code="UA" value="UA" data-a-html-content="Ukraine +380">
UA +380
</option>
<option data-calling-code="971" data-country-code="AE" value="AE" data-a-html-content="United Arab Emirates +971">
AE +971
</option>
<option data-calling-code="44" data-country-code="GB" value="GB" data-a-html-content="United Kingdom +44">
GB +44
</option>
<option data-calling-code="1" data-country-code="US" value="US" data-a-html-content="United States +1" selected>
US +1
</option>
<option data-calling-code="598" data-country-code="UY" value="UY" data-a-html-content="Uruguay +598">
UY +598
</option>
<option data-calling-code="998" data-country-code="UZ" value="UZ" data-a-html-content="Uzbekistan +998">
UZ +998
</option>
<option data-calling-code="678" data-country-code="VU" value="VU" data-a-html-content="Vanuatu +678">
VU +678
</option>
<option data-calling-code="58" data-country-code="VE" value="VE" data-a-html-content="Venezuela +58">
VE +58
</option>
<option data-calling-code="84" data-country-code="VN" value="VN" data-a-html-content="Vietnam +84">
VN +84
</option>
<option data-calling-code="967" data-country-code="YE" value="YE" data-a-html-content="Yemen +967">
YE +967
</option>
<option data-calling-code="260" data-country-code="ZM" value="ZM" data-a-html-content="Zambia +260">
ZM +260
</option>
<option data-calling-code="263" data-country-code="ZW" value="ZW" data-a-html-content="Zimbabwe +263">
ZW +263
</option>
<option data-calling-code="358" data-country-code="AX" value="AX" data-a-html-content="&Aring;land Islands +358">
AX +358
</option>
</select><span tabindex="-1" class="a-button a-button-dropdown"><span class="a-button-inner"><span class="a-button-text a-declarative" data-action="a-dropdown-button" role="button" tabindex="0" aria-hidden="true"><span class="a-dropdown-prompt">US +1</span></span><i class="a-icon a-icon-dropdown"></i></span></span></span>

Grab text, space, + and number between double quotes: "[A-Z a-z]+ \+[0-9]{1,3}"



The expression above only matches 204 records but I was expecting 222 records. The expression that worked was to select everything from content= to the end: content=".*

The from there we can filter out unwanted parts.



To select country code and phone code the expression is: ^[A-Z]* \+[0-9]+



Have a nice day.

Thursday, September 2, 2021

Nairaland website Data Scraper

₦airaLand.com Data Scraper Bot

In this information age, the need and importance of extracting data from the web is becoming increasingly obvious.

Over the years attempts has been made to duplicate nairaland website structure by developers using different programming languages such PHP, Python, .NET, Perl, Ruby, C#, Java etc. But unfortunately little or no attempt has been made to scrape or extract useful data from the forum for legitimate purpose.

If you know nairaland.com, then I don't need to tell you that it is the equivalent of Facebook or Twitter for Nigerians that houses abundant information related to Nigeria and environs. So, as a data person, you know what that means!

If you want to measure the opinion of Nigerians online, don't use data from sources like facebook or twitter, instead use the data from Nairaland. As a the moment, nairaland.com has about 1.5 million active user accounts (90% of them are Nigerians residing in the country) and more than 3 million topics on different subjects has been created.

The problem now is, how to extract these data legally and freely without breaking the site and you pocket.

Off course, you can always copy, paste and edit contents from any section of the forum. But in situations were you have to do this repeatedly, then you need a way to automate the process to ease your task.

Imagine if you have to copy the title of the topics that made front page everyday, you will select all the content and copy paste in a text editor to edit into a friendly format. That is how you will do it every single day! Won't it be nice if you have a script/program that does that for you with just a mouse click?


Legal Warning before Scraping a website

There are a few points that we need to go over before we start scraping.

~ Always check the website’s terms and conditions before you scrape them. They usually have terms that limit how often you can scrape or what you can you scrape
~ Because your script will run much faster than a human can browse, make sure you don’t hammer their website with lots of requests. This may even be covered in the terms and conditions of the website.
~ You can get into legal trouble if you overload a website with your requests or you attempt to use it in a way that violates the terms and conditions you agreed to.
~ Websites change all the time, so your scraper will break some day. Know this: You will have to maintain your scraper if you want it to keep working.
~ Unfortunately the data you get from websites can be a mess. As with any data parsing activity, you will need to clean it up to make it useful to you.

With that out of the way, let’s start scraping!

Lets get started scraping data from nairaland.

Here I present to you a solution that allows you scrape or extract the following datasets from Nairaland:-
1) Front Page Topics
2) Members and Guests Online
3) Section Topics and poster usernames
4) First Post content (Original Post) from thread
5) Images from thread
6) Email Addresses from thread


How to use the data

What can the scraped data be used for?
Data from Nairaland can be used for Data Science, Machine Learning, Computer Vision etc as follows:-

~ Text mining
~ Sentiment Analysis
~ Natural Language Processing
~ Polls and Opinions Study
~ Trends Analysis
~ Market Research
~ Automatic summarization
~ Machine translation
~ Named entity recognition
~ Relationship extraction
~ Sentiment analysis
~ Speech recognition
~ Words embedding
~ Topic segmentation
~ etc.


Understand Nairaland Structure

The structure of the website has pretty much remained the same for some years now. See the look of the forum in 2005, 2011, 2014, and 2017.

Year 2005


Year 2011



Year 2014



Year 2017


This means a web crawler script written for nairaland will remain functional for a long time, until the structure is changed.

Also, the HTML structure uses a lot of tables. Most of the data we will be scrapping are nested in html table structures.

Monday, August 30, 2021

Choropleth and Bubble Maps in R - A case study of mapping Nigeria Poor & Vulnerable Households

According to WikiPedia, a choropleth map is a type of thematic map in which a set of pre-defined areas is colored or patterned in proportion to a statistical variable.

Here our statistical variable is going to be the table on Nigeria Poor & Vulnerable Households provided by "The National Social Register of Poor & Vulnerable Households (PVHHs)" as of 31st March, 2020.



There are many R packages for making and working with maps and GIS/spatial data in general. Some of them are: ggplot2, ggmap, maps, mapdata, tmap, sp, sf, rgdal, rgeos, mapproj, etc. More related packages can be reviewed on this page.

For the purpose of making the Choropleth map, I will make use of the following packages:
~  tmap (to plot the map),
~ sf or rgdal (to read shapefile as spatial data.frame) and
~ readr (to read .csv file) ( install.packages(c('tmap', 'rgdal')) ).


Choropleth map
The script below will plot a basic map based on the default attribute columns.
# For reading CSV files
library(readr)
# For reading, writing and working with spatial objects
library(sf)
library(rgdal)
# For creating map
library(tmap)



# Read the CSV data...
# Note missing data for: Ebonyi and Ogun states
Poor_and_Vulnerable <- read_csv("C:/Users/Yusuf_08039508010/Desktop/Working_Files/Fiverr/2021/08-August/R Poor and Vulnerable/Poor and Vulnerable.csv")

# Read the NIG Admin shapefile... using sf
ng_map1 <- st_read('C:/Users/Yusuf_08039508010/Desktop/Working_Files/Fiverr/2021/08-August/R Poor and Vulnerable/SHP/NIG_ADM.shp')


# Read the NIG Admin shapefile... using rgdal
# ng_map2 <- readOGR("C:/Users/Yusuf_08039508010/Desktop/Working_Files/Fiverr/2021/08-August/R Poor and Vulnerable/SHP", "NIG_ADM")


colnames(ng_map1)

# Quick base R plot...
plot(ng_map1) # by all columns
plot(ng_map1['geographic']) # by column name
plot(ng_map1['geometry']) # by column name


# Simple plot using tmap...
tm_shape(ng_map1) + 
  tm_polygons(col='geographic')

Note that: I was facing this error: https://github.com/mtennekes/tmap/issues/571, so I downgrade sf from version 1.0.0 to 0.9.8
# Installing specific version (0.9.8) of sf package... www.support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages
# Saerch for specific package at: https://cran.r-project.org/src/contrib/Archive/

packageurl <- 'https://cran.r-project.org/src/contrib/Archive/sf/sf_0.9-8.tar.gz'
install.packages(packageurl, repos=NULL, type="source")





However, the map we wanted is based on a dataset which is in a CSV file we read into a variable named "Poor_and_Vulnerable". So we have to find a way of combining the CSV data to the map to be able to plot the choropleth map showing the Poor & Vulnerable Households/Individuals in Nigeria.

The process is very simple using the merge() function as follow;-

# Merge the CSV data to the Shp data...

# Check the col names for both the CSV and shp data...
names(Poor_and_Vulnerable)
names(ng_map1)

m <- merge(ng_map1, Poor_and_Vulnerable, by.x='state_name', by.y='State')

names(m)


# Plot choropleth map by Households using tmap...
tm_shape(m) + 
  tm_polygons(col='Households')

 
# Plot choropleth map by Individuals using tmap...
tm_shape(m) + 
  tm_polygons(col='Individuals')
We just need to lookup the merge/common column names and provide is as an parameter in the merge() function. We will then plot the new merged object (m) as it is called above.


Note that after the merge some states were missing. One reason for this could be because of missing record or mismatch names between the two columns.


Bubble map

A bubble map uses circles of different size to represent a numeric value on a territory. It displays one bubble per geographic coordinate, or one bubble per region (in this case the bubble is usually displayed in the baricentre of the region).

It takes few lines of code to make bubble map using tmap as seen below...

# Bubble Map....
tm_shape(ng_map1) + 
  tm_polygons(col='black') + 
  
  tm_shape(m) + 
  tm_bubbles("Households", col='red')



If you are interested in traditional GIS graphical approach of producing similar maps, check this post on 'Mapping Poor And Vulnerable Nigerians by state', where I used QGIS to produce similar maps.


Happy mapping!

Tuesday, August 24, 2021

Extrapolating Nigeria population by states, LGAs and wards from WorldPop database

 On WorldPop.org you will find open and high-resolution geospatial data on population distributions, demographic and dynamics, with a focus on low and middle income countries.

In this article, I will explain how to get population counts for states, LGAs and wards in Nigeria. More specifically, I will use the "Constrained Individual countries 2020 UN adjusted  (100m resolution) for Nigeria".

Download the data which is in raster GeoTif format and about 59mb at the time of writing. The load the population raster together with states, LGAs and wards polygon shapefiles into QGIS project.



As seen above, the darker dots are where we have higher concentration of people (high population). Now we need to extract the pixel population values for each state, LGA and ward.


Zonal statistics algorithm 

Zonal statistics which is an algorithm that calculates statistics of a raster layer for each feature of an overlapping polygon vector layer. This algorithm can be accessed from the: Processing toolbox >>  Raster analysis >> Zonal statistics.


We will use this algorithm to calculate the sum of population value within each state, LGA and ward.


So, we just need to run the algorithm for the three polygon layers (state, LGA and ward).

Thursday, August 19, 2021

Work with Rsater data in R/RStudio

 There are several packages in R used in read and writing raster data (a GeoTIFF file). In this post, we will see a few notably tmap, ggplot, rgdal and raster.

The raster data I will use for this demonstration is this (n08_e007_3arc_v2.tif) Landsat8 image in .tif format. You can get such rasters from the USGS EarthExplorer platform or many other sources out there.

First things first is to install and import the packages if you don't have them already.

# packages <- c('raster', 'rgdal', 'ggplot2', 'tmap')
# install.packages(packages)

library(raster)
library(rgdal)
library(ggplot2)
library(tmap)


Reading and exploring the raster attributes

The read the raster file, we use the raster package. 

raster_img <- raster("C:/Users/Yusuf_08039508010/Desktop/Working_Files/IMG/n08_e007_3arc_v2.tif")

# Explore the loaded raster to get familiar with it...
class(raster_img) # 'RasterLayer' [package "raster"]
View(raster_img)
str(raster_img) # here we see that it only 1 band image
print(raster_img) # here we can find the CRS, Min/Max vaules, Extent, Dimensions etc
names(raster_img)
summary(raster_img)

By exploring the raster using the base R functions above, we already seen a lot of useful properties and information about our raster data. More metadata attributes can be accessed from the raster attribute using the @ slot symbol like this:-
crs(raster_img)
raster_img@crs
raster_img@extent
raster_img@file
raster_img@data
raster_img@rotated
raster_img@legend
raster_img@ncols
raster_img@nrows
raster_img@history
nlayers(raster_img)# Tells if the raster is "Single Layer (or Band) vs Multi-Layer (Band Geotiffs)"
The lines above are self explanatory, feel free to explore them and use them.


Visualizing the raster

Lets use the base R plot function to quickly visualize the image raster.

plot(raster_img)

The line above should yield this figure below:-


Monday, August 16, 2021

Using Easting and Northing coordinates to plot survey site plan in AutoCAD

  Assuming you conducted a field survey with a GPS or similar instrument that allows us to generate the following record listed of coordinates.

It is expected that we plot the coordinates to scale using AutoCAD software.

The site reconnaissance diagram is provided below:-


Recce (reconnaissance) diagram - source: author site inspection


Now, we want to plot this data to scale using AutoCAD. Here are the steps you need to follow:-


Step 1: Launch AutoCAD software and Setup the drawing units.

Open the units dialog box by typing "UNITS" and set the values as per your data as seen below.



Remember to check the "Clockwise" button and set "Direction" to North. This because survey bearings are measured in clockwise direction from the north pole.


Step 2: Next is to prepare and plot the data using an expression as seen in the sample output box above.

The expression is like so: Easting,Northing that is: 355295.411,942748.140 for the first point P1.

The complete expression will look like this:-

355295.411,942748.140
355265.635,942751.796
355295.983,942774.665
355301.938,942773.934


Step 3: Pick LINE command and type the expressions above one after the other.

Alternatively, you could copy paste the entire expressions after picking line command. This will plot the entire lines on the fly.

Note that you can use circle or point command instead of using the line command.



That is it!

Saturday, August 14, 2021

Using Bearing and Distance to plot survey site plan in AutoCAD

 Assuming you conducted a field survey with compass or similar instrument that reads bearing to generate the following record listed below.

The table above is assumed to be the final corrected observations for our plotting. If you made a mistake in your field measurements, there are several ways to apply corrections to the measurements which is outside the scope of this article.

The site reconnaissance diagram is provided below:-


Recce (reconnaissance) diagram - source: author site inspection


Now, we want to plot this data to scale using AutoCAD. Here are the steps you need to follow:-


Step 1: Launch AutoCAD software and Setup the drawing units.

Open the units dialog box by typing "UNITS" and set the values as per your data as seen below.



Remember to check the "Clockwise" button and set "Direction" to North. This because survey bearings are measured in clockwise direction from the north pole.


Step 2: Next is to prepare and plot the data using an expression as seen in the sample output box above.

The expression is like so: @Distance<Bearing that is: @30.00<277d00'00" for the first line P1-P2.

The complete expression will look like this:-

@30.00<277d00'00"
@38.00<53d00'00"
@6.00<97d00'00"
@26.60<194d12'00"


Step 3: Pick LINE command and click anywhere on the drawing area and type the expressions above one after the other.


Alternatively, you could copy paste the entire expressions after picking line command and clicking on the drawing canvas. This will plot the entire lines on the fly.


That is it!