Geospatial Solutions Expert: 2021

Wednesday, December 1, 2021

Get IP address from domain name

Given a domain name like "Google.com", the python script below will return its server IP address like this "216.58.223.238".

import socket
import pandas as pd

websites_df = pd.read_html('https://en.wikipedia.org/wiki/List_of_most_visited_websites')

for d in websites_df[0]['Domain Name']:
    IP_addres = socket.gethostbyname(d)
    print(d, ' - ', IP_addres)

That is it!

Friday, November 26, 2021

Several ways of doing same thing in programming - mapping two lists into one dictionary

In programming there is always more than one way to solve the same problem. This variations depends on individual skills and way of thinking.

In this article I will demonstrate different ways to solve the same problem using python scripting.

The problem:

Python program to map two lists into a dictionary.

countries = ['Nigeria', 'Germany', 'Italy', 'USA', 'Japan', 'Ghana']
score = [39, 23, 12, 67, 45, 11]

The Solution:

1) Using zip() function

# Using zip() function
countries = ['Nigeria', 'Germany', 'Italy', 'USA', 'Japan', 'Ghana']
score = [39, 23, 12, 67, 45, 11]

data = dict(zip(countries, score))
print(data)

2) Using Dictionary Comprehension

# Using Dictionary Comprehension
countries = ['Nigeria', 'Germany', 'Italy', 'USA', 'Japan', 'Ghana']
score = [39, 23, 12, 67, 45, 11]

data  = {key:value for key, value in zip(countries, score)}
print(data)

3) Using For loop

# Using For loop
countries = ['Nigeria', 'Germany', 'Italy', 'USA', 'Japan', 'Ghana']
score = [39, 23, 12, 67, 45, 11]

countries_score = zip(countries, score)

data_dict = {}

for key, value in countries_score:
    if key in data_dict:
        # handling duplicate keys
        pass 
    else:
        data_dict[key] = value
        
print(data_dict)

4) Using For and Range

# Using For and Range
countries = ['Nigeria', 'Germany', 'Italy', 'USA', 'Japan', 'Ghana']
score = [39, 23, 12, 67, 45, 11]

data = {countries[i]: score[i] for i in range(len(countries))}
print(data)

All the four solutions above will give same output result as seen below:-

{'Nigeria': 39, 'Germany': 23, 'Italy': 12, 'USA': 67, 'Japan': 45, 'Ghana': 11}

That is it!

Tuesday, November 16, 2021

Making contour map using QuickGrid Software

QuickGrid is a free software for making contour maps or 3D mesh using XYZ dataset. It is a good free alternative to Surfer.

Download and in the QuickGrid lets see how quick it is to generate a contour map.

First we have to prepare our dataset like this:-

The first column is X (Easting or Longitude), second column is Y (Northing or Latitude) and the last column is Z (Height or Altitude). Note that there is not column name for the data and is saved as a .CSV file.

To load in the dataset, go to: File >> Input scattered data points >> Input metric data points

This will display a gridded contour map immediately as seen above.

Now we can style the contour interval and labels like this:-

If you want to use the contour in AutoCAD, set the output option to use polyline for DXF output.

Then you can export to AutoCAD from the 'File' menu.

There is a lot more you can do with QuickGrid, however this is a good start for you to explore more on the software.

Thank you for following.

Wednesday, November 10, 2021

Overlay old image map on leafletjs web map

In this post, I will show how to overlay an image map on the leafletjs web map interface.

The image map is a seen below:-

We need to get the coordinates of the opposite corners in this format "[[y1, x1], [y2, x2]]".

To overlay rhe image, the code is thus:-

<!DOCTYPE html>
<html>
<head>
	
	<title>Leaflet Map...</title>

	<meta charset="utf-8" />
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	
    <link rel="stylesheet" href="https://unpkg.com/leaflet@1.7.1/dist/leaflet.css" integrity="sha512-xodZBNTC5n17Xt2atTPuE1HxjVMSvLVW9ocqUKLsCC5CXdbqCmblAshOMAS6/keqq/sMZMZ19scR4PsZChSR7A==" crossorigin=""/>

    <script src="https://unpkg.com/leaflet@1.7.1/dist/leaflet.js" integrity="sha512-XQoYMqMTK8LvdxXYG3nZ448hOEQiglfqkJs1NOQV44cWnUrBc8PkAOcXy20w0vlaXaVUearIOBhiXZ5V3ynxwA==" crossorigin=""></script>

    <!-- JQuery -->
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>    


<style type="text/css">

	#slider{
		position: fixed;
		z-index: 900;
		border: 2px solid gray;
		top: 100px;
		left: 20px;
	}
</style>
	
</head>
<body>


<div id="slider">
	<h4>Image Opacity: <span id="image-opacity">0.5<span/>  </h4>
	<input type='range' id='sldOpacity' min='0' max="1" step='0.1' value='0.5' >
</div>


<div id="mapid" style="width: 100%; height: 600px;"></div>



<script>
	// Create the map obj...
	var mymap = L.map('mapid', {minZoom: 2, maxZoom: 20})
				 .setView([8.54090, 7.71428], 13);


	// Set a default base map to...
    L.tileLayer('http://{s}.google.com/vt/lyrs=m&x={x}&y={y}&z={z}',{
        // minZoom: 0,
     	maxZoom: 18,
        subdomains:['mt0','mt1','mt2','mt3']
    }).addTo(mymap);


  	var imageUrl = "Group_Assignment.jpg",
  	imageBounds = [ [8.58061, 7.68495], [8.50206, 7.75181] ];
	var Old_Imge = L.imageOverlay(imageUrl, imageBounds, {
		opacity:0.4,
	}).addTo(mymap);

	Old_Imge.bringToFront();





var overlayMaps = {
    'Old Image' : Old_Imge
};



var baseLayers = {
	// Basemaps go here...
};

// Adding baseMaps and overlayMaps
L.control.layers(baseLayers, overlayMaps, {collapsed: false}).addTo(mymap);


$(document).ready(function(){
	  // jQuery methods go here...

	  $('#sldOpacity').on('change', function () {
	  		$('#image-opacity').html(this.value);

	  		Old_Imge.setOpacity(this.value);
	  });


}); // end Jquery doc ready.





</script>



</body>
</html>

Note that the code above include adding the image to layer control and a slider to give some controls on the overlaid image. See live demo below:-

Live Demo

That is it!

Thursday, November 4, 2021

Python GIS data wrangling - Mapping supper eagles head coaches since 1949

The Nigerian senior national football team (super eagle) has had several coaches from 1949 till date. Lets prepare a data I found online about these coaches for use in any GIS platform.

The dataset for this exercise was collected from this hash tag: #Born2RichSports #Deliveringthebestinsports.

We will use python to wrangle this data into a GIS friendly format. Lets get started...

See all Super Eagles coach list from 1949 till date
------------------------------------
England: Jack Finch (1949)
Nigeria: Daniel Anyiam (1954–1956)
England: Les Courtier (1956–1960)
Israel: Moshe “Jerry” Beit haLevi (1960–1961)
Hungary: George Vardar (1961–1963)
England: Joey Blackwell (1963–1964)
Nigeria: Daniel Anyiam (1964–1965)
Hungary: József Ember (1965–1968)
Spain: Sabino Barinaga (1968–1969)
Nigeria: Peter ‘Eto’ Amaechina (1969–1970)
West Germany: Karl-Heinz Marotzke (1970–1971)
Brazil: Jorge Penna (1972–1973)
West Germany: Karl-Heinz Marotzke (1974)
Socialist Federal Republic of Yugoslavia: Tihomir Jelisavčić (1974–1978)
Brazil: Otto Glória (1979–1982)
West Germany: Gottlieb Göller (1981)
Nigeria: Festus Onigbinde (1983–1984)
Nigeria: Chris Udemezue (1984–1986)
Nigeria: Patrick Ekeji (1985)
Nigeria: Paul Hamilton (1987–1989)
West Germany: Manfred Höner (fr) (1988–1989)
Netherlands: Clemens Westerhof (1989–1994) as Technical Adviser
Nigeria: Shaibu Amodu (1994–1995)
Netherlands: Jo Bonfrere (1995–1996)
Nigeria: Shaibu Amodu (1996–1997)
France: Philippe Troussier (1997)
Nigeria: Monday Sinclair (1997–1998)
Federal Republic of Yugoslavia: Bora Milutinović (1998)
Netherlands: Thijs Libregts (1999)
Netherlands: Jo Bonfrere (1999–2001)
Nigeria: Shaibu Amodu (2001–2002)
Nigeria: Festus Onigbinde (2002)
Nigeria: Christian Chukwu (2002–2005)
Nigeria: Augustine Eguavoen (2005–2007)
Germany: Berti Vogts (2007–2008)
Nigeria: James Peters (2008)
Nigeria: Shaibu Amodu (2008–2010)
Sweden: Lars Lagerbäck (2010)
Nigeria: Augustine Eguavoen (2010)
Nigeria: Samson Siasia (2010–2011)
Nigeria: Stephen Keshi (2011–2014)
Nigeria: Shaibu Amodu (2014)
Nigeria: Stephen Keshi (2014)
Nigeria: Daniel Amokachi (2014–2015)
Nigeria: Stephen Keshi (2015)
Nigeria: Sunday Oliseh (2015-2016)
Germany: Gernot Rohr (2016–present)

#Born2RichSports #Deliveringthebestinsports
COPIED

Each row consist of the coach's country, coach's name and the year/period he severed. We need to separate each detail into its own column (that is three columns in this case).

There are several ways to prepare this data, here I saved the text above in a text file to read it into python object like this...

Then read each row/line into a list item for pandas dataframe as seen below...

with open(r"C:\Users\Yusuf_08039508010\Desktop\SuperEagle Coaches.txt", encoding='utf-8') as f:
    data = f.read()

coaches_list = data.split('\n')
print(coaches_list)

Now read the list into a dataframe. Next we can split the entries into separate columns for use in a GIS software.

coaches_df = pd.DataFrame(coaches_list, columns=['Coaches'])
coaches_df

coaches_df['Country'] = coaches_df['Coaches'].apply( lambda x: x.split(': ')[0] )
coaches_df['Coach Name'] = coaches_df['Coaches'].apply( lambda x: x.split(': ')[1].split(' (')[0] )
coaches_df['Period'] = coaches_df['Coaches'].apply( lambda x: x.split(': ')[1].split(' (')[1].replace(')', '') )

coaches_df

Now we have a beautiful table like this that we can integrate into GIS for further analysis.

For example, a quick look at the country column we see that the coaches came from 13 unique countries.

That is it!

Wednesday, November 3, 2021

Google base Maps in LeafletJS

LeafletJS web map supports not just open source basemaps but also other preparatory basemaps such as Google maps and ESRI maps.

In this post, we shall see how to add varying flavors of Google basemaps.

The code below was inspired by this stackoverflow question.

<!DOCTYPE html>
<html>
<head>
	
	<title>Leaflet Map...</title>

	<meta charset="utf-8" />
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	
	<!-- <link rel="shortcut icon" type="image/x-icon" href="docs/images/favicon.ico" /> -->

    <link rel="stylesheet" href="https://unpkg.com/leaflet@1.7.1/dist/leaflet.css" integrity="sha512-xodZBNTC5n17Xt2atTPuE1HxjVMSvLVW9ocqUKLsCC5CXdbqCmblAshOMAS6/keqq/sMZMZ19scR4PsZChSR7A==" crossorigin=""/>

    <script src="https://unpkg.com/leaflet@1.7.1/dist/leaflet.js" integrity="sha512-XQoYMqMTK8LvdxXYG3nZ448hOEQiglfqkJs1NOQV44cWnUrBc8PkAOcXy20w0vlaXaVUearIOBhiXZ5V3ynxwA==" crossorigin=""></script>

	
</head>
<body>


<div id="mapid" style="width: 100%; height: 600px;"></div>



<script>
	// Create the map obj...
	var mymap = L.map('mapid', {minZoom: 2, maxZoom: 20})
				 .setView([0, 0], 2);


	// Restrict panning to this bounds
	var southWest = L.latLng(-90, -180),
		northEast = L.latLng(90, 180);
	var bounds = L.latLngBounds(southWest, northEast);

	mymap.setMaxBounds(bounds);


	// Set a default base map to...
    L.tileLayer('http://{s}.google.com/vt/lyrs=m&x={x}&y={y}&z={z}',{
        // minZoom: 0,
     	maxZoom: 18,
        subdomains:['mt0','mt1','mt2','mt3']
    }).addTo(mymap);



// CREATE GOOGLE MAP LAYER
	// 1- Streets...
	googleStreets = L.tileLayer('http://{s}.google.com/vt/lyrs=m&x={x}&y={y}&z={z}',{
	    maxZoom: 20,
	    subdomains:['mt0','mt1','mt2','mt3']
	});


	// 2- Hybrid...
	googleHybrid = L.tileLayer('http://{s}.google.com/vt/lyrs=s,h&x={x}&y={y}&z={z}',{
	    maxZoom: 20,
	    subdomains:['mt0','mt1','mt2','mt3']
	});


	// 3- Satellite...
	googleSat = L.tileLayer('http://{s}.google.com/vt/lyrs=s&x={x}&y={y}&z={z}',{
	    maxZoom: 20,
	    subdomains:['mt0','mt1','mt2','mt3']
	});



	// 4- Terrain...
	googleTerrain = L.tileLayer('http://{s}.google.com/vt/lyrs=p&x={x}&y={y}&z={z}',{
	    maxZoom: 20,
	    subdomains:['mt0','mt1','mt2','mt3']
	});



var overlayMaps = {
    // Other layers will go here....
};



var baseLayers = {
	'Google Street Map':googleStreets,
	'Google Hybrid Map':googleHybrid,
	'Google Satellite Map':googleSat,
	'Google Terrain Map':googleTerrain,
};

// Adding baseMaps and overlayMaps
L.control.layers(baseLayers, overlayMaps, {collapsed: false}).addTo(mymap);


</script>



</body>
</html>

That is it!

Sunday, October 31, 2021

LeafletJS - 3 ways to plot lines and add to layer control

In this post, I will share three ways to plot lines and add it to layer control. The general concept is that you need to have at least two pairs of coordinates (line starting point and line ending point) to draw a line with the L.polyline() function.

Lets get started...

1) Plot line from given point coordinates

Here we will plot line between two point by simply providing the starting and ending point coordinates of the line.

<!DOCTYPE html>
<html>
<head>
	
	<title>Leaflet Map...</title>

	<meta charset="utf-8" />
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	
	<!-- <link rel="shortcut icon" type="image/x-icon" href="docs/images/favicon.ico" /> -->

    <link rel="stylesheet" href="https://unpkg.com/leaflet@1.7.1/dist/leaflet.css" integrity="sha512-xodZBNTC5n17Xt2atTPuE1HxjVMSvLVW9ocqUKLsCC5CXdbqCmblAshOMAS6/keqq/sMZMZ19scR4PsZChSR7A==" crossorigin=""/>

    <script src="https://unpkg.com/leaflet@1.7.1/dist/leaflet.js" integrity="sha512-XQoYMqMTK8LvdxXYG3nZ448hOEQiglfqkJs1NOQV44cWnUrBc8PkAOcXy20w0vlaXaVUearIOBhiXZ5V3ynxwA==" crossorigin=""></script>

	
</head>
<body>

<!-- Several ways to plot lines and add to layer control: From set of corrdinates and from geoJSON file -->

<div id="mapid" style="width: 100%; height: 600px;"></div>



<script>
	// Create the map obj...
	var mymap = L.map('mapid', {minZoom: 2, maxZoom: 20})
				 .setView([0, 0], 2);


	// Restrict panning to this bounds
	var southWest = L.latLng(-90, -180),
		northEast = L.latLng(90, 180);
	var bounds = L.latLngBounds(southWest, northEast);

	mymap.setMaxBounds(bounds);


// Google Street Map - Set basemap...
    L.tileLayer('http://{s}.google.com/vt/lyrs=m&x={x}&y={y}&z={z}',{
        // minZoom: 0,
     	maxZoom: 18,
        subdomains:['mt0','mt1','mt2','mt3']
    }).addTo(mymap);




// ------------------------------
// 1) Plot line from given point coordinates...
var pointA = new L.LatLng(40.71560515,-74.03714387);
var pointB = new L.LatLng(9.72406588,10.99793816);
var pointC = new L.LatLng(51.51778519,-0.09678893);

var pointList1 = [pointA, pointC];
var pointList2 = [pointB, pointC];

var line_network = L.polyline([pointList1, pointList2], {
	color: 'red',
	weight: 3,
	opacity: 0.5,
	smoothFactor: 1

}); //.addTo(mymap)

line_network.bindPopup('This line was drawn from given point coordinates')

// Add to layerGroup and layer control...
var lines = L.layerGroup([line_network]);

var overlayMaps = {
    'Line Method 1': lines,
};

var baseLayers = {
	// basemaps goes here...
};


L.control.layers(baseLayers, overlayMaps, {collapsed: false}).addTo(mymap);


</script>



</body>
</html>

2) Plot line from geoJSON file

Here you need to prepare the line GeoJSON file, then use L.geoJSON() function to draw it on the map. So, create a JS file and save the GeoJSON code in it with a variable name you will refer to in the main HTML/JS code (depending on how you structure you code files).

In this case I had a Lines.js file as seen below which holds the GeoJSON code in a variable named "line_data", then I referenced it in index.html file as seen below.

GIS data wrangling with python - Gridded Street Finder

Here we got two shapefile layers, one is a line representing roads and the other is polygon representing grid mesh labeled as seen below.

A road could extend multiple grids and we want to get all the names of the grid in which a road is found.

As an example, the selected road in yellow color below extends from grid A1, B1, C1, D1, E1, F1, G1 and H1.

We now need to join the attributes of the two layers and export to CSV file as seen below like so: Vector >> Data Management Tools >> Join Attribute by Location..

The output should be a roads layer with attribute from the grids layer. Export the attribute to CSV for wrangling in python.

As you can see from the CSV file above, there multiple road names with different grid labels. As an example, we have three rows for '1st Avenue' as highlighted above. We just want it on on row with all the grid labels separated by comma like this: 1st Avenue - F3, E4, F4.

# Read the CSV file...
df = pd.read_csv(r"C:\Users\Yusuf_08039508010\Desktop\Gwarinpa St Guide\Street_Finder.csv")

# Group the data by street col...
group = df.groupby('Street')

# Create a series where each group item has unique values from the grid col....
df2 = group.apply(lambda x: x['Grid'].unique())

df2

What the code above does is: for each unique value in 'Street' column, get unique values in 'Grid' column. As seen from the output above, that is what we wanted. If you save the series above to file, you will have the below output which you can further clean to the final format.

That is it!

Wednesday, September 15, 2021

Working with Regular Expression in QGIS

Regular Expression (RegEx or RegExp) is a tool used to handle strings/texts and data validation, searching, search & replace, string splitting etc. RegEx has now become a standard features in a wide range of languages and popular tools, including GIS tools, Text editors, word processors, system tools, database engines, etc.

In this article, we will specifically look at RegExp in the QGIS tool.

RegEx is available from many parts in QGIS, here we will only be looking at it in manipulating values of the attribute table.

From the 'Select Feature using an Expression' dialog window, you should find group of 'String Functions'. This group contains functions that operates on strings (e.g., that replace, convert to upper case).

As you can see, there are three main functions with direct support for regular expression namely; regexp_match(), regexp_replace() and regexp_substr().

regexp_match

Returns the first matching position matching a regular expression within a string, or 0 if the substring is not found.

regexp_replace

Returns a string with the supplied regular expression replaced.

regexp_substr

Returns the portion of a string which matches a supplied regular expression.

R packages for working with shapefile

A shapefile (points, lines, and polygons) can be read into R object using any of the following packages: sf, rgdal, maptools and PBSmapping.

First you need to install them as follow: install.packages(c('sf', 'rgdal', 'maptools', 'PBSmapping'))

The code below shows how to each package to read in shapefile into an object for further processing.

library(sf)
library(rgdal)
library(maptools)
library(PBSmapping)


# read in shapefiles using 'sf'
my_map <- st_read("C:/Users/Yusuf_08039508010/Desktop/Working_Files/2021/08-August/R Poor and Vulnerable/SHP/NIG_ADM.shp")



# read in shapefiles using 'rgdal'
my_map <- readOGR("C:/Users/Yusuf_08039508010/Desktop/Working_Files/2021/08-August/R Poor and Vulnerable/SHP", "NIG_ADM")



# read in shapefiles using 'maptools'
# my_map1 <- readShapePoints("...")
# my_map2 <- readShapeLines("...")
my_map3 <- readShapePoly("C:/Users/Yusuf_08039508010/Desktop/Working_Files/2021/08-August/R Poor and Vulnerable/SHP/NIG_ADM")



# read in shapefiles using 'PBSmapping'
my_map <- importShapefile("C:/Users/Yusuf_08039508010/Desktop/Working_Files/2021/08-August/R Poor and Vulnerable/SHP/NIG_ADM")

Note that using maptools is deprecated and you will get a warning message that reads: readShapePoly is deprecated; use rgdal::readOGR or sf::st_read

For more, read the web archive on Read and write ESRI Shapefiles with R.

Manipulating spatial data using the SF package

By far, the sf package is commonly used for reading and manipulating shapefiles and other spatial file types such as geojson, geopackage etc. Lets see more on the library(sf)

The package sf tries to fill this gap, and aims at succeeding sp in the long term. Which means sf was developed base on sp which is now deprecated.

In the code snippet below, you will see:-

Using the sf st_read() function to read different spatial files
Converting the sf object to old school sp object and vis-à-vis
Looking at common tidyverse functions that works on sf spatial objects

# Function to create spatial obj. with sp package: st_point(), st_linestring(), and st_polygon()
# However, the function "st_read()" is mostly used to read existing spatial objects

# Read shp...
myMap_1 <- st_read(dsn = "C:/Users/Yusuf_08039508010/Desktop/Working_Files/Fiverr/2021/010-October/NIG Grid/Oyo/NIG.shp")

# Read geoJSON file...
myMap_2 <- st_read("C:/Users/Yusuf_08039508010/Desktop/ng_State.geojson")

# Read GeoPackage (.gpkg) file...
myMap_3 <- st_read("C:/Users/Yusuf_08039508010/Desktop/Working_Files/GIS Data/NGR/GRID3 Data/GPKG files/Fire Stations/fire-stations (5).gpkg")



# What to use sp package...?
# Coverting form "sf data.frame" to sp package i.e: SpatialPointsDataFrame/SpatialLinesDataFrame/SpatialPolygonsDataFrame
class(myMap_1) # This returns sf df...
myMap_1_sp <- as(myMap_1, "Spatial")
class(myMap_1_sp)


# inspect the structure...
str(myMap_1_sp, max.level=2)

# We can see there are five slots, which can be accessed using @ symbol follow by the name
myMap_1_sp@data
myMap_1_sp@polygons
myMap_1_sp@plotOrder
myMap_1_sp@bbox
myMap_1_sp@proj4string

# To convert back to sf, use this function...
myMap_1 <- st_as_sf(myMap_1_sp)
class(myMap_1) # Now we have our sf df, which is the best for working in tydiverse universe.


# Now normal functions in tydiverse will work on the sf df apart from the core spatial functions such as: st_geometry_type(), st_dimension(), st_bbox() and st_crs().
# sf spatial functions....
st_geometry_type(myMap_1)
st_dimension(myMap_1)
st_bbox(myMap_1)
st_crs(myMap_1)

# tidyverse functions....
glimpse(myMap_1$state_name)
View(myMap_1)

That is it!

Wednesday, September 8, 2021

Generate Emails given owner and domain names

This python script will generate four variant emails based on the company officers name and domain names.

If the officer name is "John Smith" and his company's domain name is "example.com", then the script is required to return four emails with the formats as follow:-

Email 1: john.smith@example.com
Email 2: jsmith@example.com
Email 3: info@example.com
Email 4: sales@example.com

I think the requirement is fairly clear and requires no further explanations.

import pandas as pd

df = pd.read_excel(r"C:\Users\Yusuf_08039508010\App_Data.xlsx")

df

Create function to perform the magics.

# Defining the email functions...

# Email 1 - john.smith@example.com
def email_1(name, website):
    return name.replace(' ', '.').lower() +"@"+ website.lower()


# Email 2 - jsmith@example.com
def email_2(name, website):
    return name.split(' ')[0][0].lower() + name.split(' ')[1].lower()+"@"+ website.lower()


# Email 3 - info@example.com
def email_3(website):
    return 'info' +"@"+ website.lower()


# Email 4 - sales@example.com
def email_4(website):
    return 'sales' +"@"+ website.lower()

Apply the functions to the columns.

# df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)
# df['col_3'] = df.apply(lambda x: f(x['col_1'], x['col_2']), axis=1)

df['Email 1'] = df.apply( lambda x : email_1(x['Company Staff Name'], x['Company Domain Name']), axis=1 )
df['Email 2'] = df.apply( lambda x : email_2(x['Company Staff Name'], x['Company Domain Name']), axis=1 )
df['Email 3'] = df.apply( lambda x : email_3(x['Company Domain Name']), axis=1 )
df['Email 4'] = df.apply( lambda x : email_4(x['Company Domain Name']), axis=1 )

df

That is it!

Friday, September 3, 2021

RegEx - I used recently

1) RegEx to Match string - starting with 2 or 3 numbers ending with character 'd'.

12d

123d

^[0-9]{2,3}+(d)

[] - Range

{} - Number Count

() - Group

2) Starts with 1, 2, 3 or 4 numbers followed by space and 'pts'

12 pts

234 pts

^[0-9]{1,4}+( pts)

3) Starts with 2 alphabetic characters followed by .

dr.

sd.

^[a-z]{2}+[.]

4) Starts with any of these strings

^(support|info|Hello|contact|inquiries|sales|orders|Customerservice)

5) Ends with any of these strings

(bad-elf.com|balancedbites.com|bellamihair.com|bellybandit.com|betseyjohnson.com|betterwayhealth.com|blenderseyewear.com|capbeauty.com|child1st.com|chilitechnology.com|concealmentexpress.com|ctl.net|kognito.com|koparibeauty.com|nrsworld.com|nzxt.com|olloclip.com)$

6) Regex to select every thing between ( and '

\(.*'


------------------ SAMPLE STRING -----------------------
('Download Paper', '
('Download Past Paper', '
('Download Mark Scheme', '
('Download Marking Scheme', '
('Download Insert', '
('Download Booklet', '
('Download Mark Schemes', '
('Download Marking Scheme for all Papers', '
('Download Periodic table', '
('Download Insert ', '
('Download Instructions', '
('Download Paper ', '
('Download Paper\xa0', '
('Download All Foundation Mark Schemes', '
('Dowload Mark Scheme', '
('Dowload Paper', '", 
('Download All Foundation Mark Schemes', '
('Dowload Mark Scheme', '
('Dowload Paper', '
('here', '

7) Extract country name and phone code from a 'select' HTML tag

Example is: Afghanistan +93 and AF +93. The 'select' HTML tag is seen below:-

<span class="a-dropdown-container"><select name="countryCode" autocomplete="off" data-a-touch-header="Country/Region Code" id="auth-country-picker" tabindex="-1" class="a-native-dropdown">
<option data-calling-code="93" data-country-code="AF" value="AF" data-a-html-content="Afghanistan +93">
AF +93
</option>
<option data-calling-code="355" data-country-code="AL" value="AL" data-a-html-content="Albania +355">
AL +355
</option>
<option data-calling-code="213" data-country-code="DZ" value="DZ" data-a-html-content="Algeria +213">
DZ +213
</option>
<option data-calling-code="1" data-country-code="AS" value="AS" data-a-html-content="American Samoa +1">
AS +1
</option>
<option data-calling-code="376" data-country-code="AD" value="AD" data-a-html-content="Andorra +376">
AD +376
</option>
<option data-calling-code="244" data-country-code="AO" value="AO" data-a-html-content="Angola +244">
AO +244
</option>
<option data-calling-code="1" data-country-code="AG" value="AG" data-a-html-content="Antigua &amp; Barbuda +1">
AG +1
</option>
<option data-calling-code="54" data-country-code="AR" value="AR" data-a-html-content="Argentina +54">
AR +54
</option>
<option data-calling-code="374" data-country-code="AM" value="AM" data-a-html-content="Armenia +374">
AM +374
</option>
<option data-calling-code="297" data-country-code="AW" value="AW" data-a-html-content="Aruba +297">
AW +297
</option>
<option data-calling-code="61" data-country-code="AU" value="AU" data-a-html-content="Australia +61">
AU +61
</option>
<option data-calling-code="43" data-country-code="AT" value="AT" data-a-html-content="Austria +43">
AT +43
</option>
<option data-calling-code="994" data-country-code="AZ" value="AZ" data-a-html-content="Azerbaijan +994">
AZ +994
</option>
<option data-calling-code="1" data-country-code="BS" value="BS" data-a-html-content="Bahamas +1">
BS +1
</option>
<option data-calling-code="973" data-country-code="BH" value="BH" data-a-html-content="Bahrain +973">
BH +973
</option>
<option data-calling-code="880" data-country-code="BD" value="BD" data-a-html-content="Bangladesh +880">
BD +880
</option>
<option data-calling-code="1" data-country-code="BB" value="BB" data-a-html-content="Barbados +1">
BB +1
</option>
<option data-calling-code="375" data-country-code="BY" value="BY" data-a-html-content="Belarus +375">
BY +375
</option>
<option data-calling-code="32" data-country-code="BE" value="BE" data-a-html-content="Belgium +32">
BE +32
</option>
<option data-calling-code="501" data-country-code="BZ" value="BZ" data-a-html-content="Belize +501">
BZ +501
</option>
<option data-calling-code="229" data-country-code="BJ" value="BJ" data-a-html-content="Benin +229">
BJ +229
</option>
<option data-calling-code="1" data-country-code="BM" value="BM" data-a-html-content="Bermuda +1">
BM +1
</option>
<option data-calling-code="975" data-country-code="BT" value="BT" data-a-html-content="Bhutan +975">
BT +975
</option>
<option data-calling-code="591" data-country-code="BO" value="BO" data-a-html-content="Bolivia +591">
BO +591
</option>
<option data-calling-code="387" data-country-code="BA" value="BA" data-a-html-content="Bosnia &amp; Herzegovina +387">
BA +387
</option>
<option data-calling-code="267" data-country-code="BW" value="BW" data-a-html-content="Botswana +267">
BW +267
</option>
<option data-calling-code="55" data-country-code="BR" value="BR" data-a-html-content="Brazil +55">
BR +55
</option>
<option data-calling-code="1" data-country-code="VG" value="VG" data-a-html-content="British Virgin Islands +1">
VG +1
</option>
<option data-calling-code="673" data-country-code="BN" value="BN" data-a-html-content="Brunei +673">
BN +673
</option>
<option data-calling-code="359" data-country-code="BG" value="BG" data-a-html-content="Bulgaria +359">
BG +359
</option>
<option data-calling-code="226" data-country-code="BF" value="BF" data-a-html-content="Burkina Faso +226">
BF +226
</option>
<option data-calling-code="257" data-country-code="BI" value="BI" data-a-html-content="Burundi +257">
BI +257
</option>
<option data-calling-code="855" data-country-code="KH" value="KH" data-a-html-content="Cambodia +855">
KH +855
</option>
<option data-calling-code="237" data-country-code="CM" value="CM" data-a-html-content="Cameroon +237">
CM +237
</option>
<option data-calling-code="1" data-country-code="CA" value="CA" data-a-html-content="Canada +1">
CA +1
</option>
<option data-calling-code="238" data-country-code="CV" value="CV" data-a-html-content="Cape Verde +238">
CV +238
</option>
<option data-calling-code="1" data-country-code="KY" value="KY" data-a-html-content="Cayman Islands +1">
KY +1
</option>
<option data-calling-code="236" data-country-code="CF" value="CF" data-a-html-content="Central African Republic +236">
CF +236
</option>
<option data-calling-code="235" data-country-code="TD" value="TD" data-a-html-content="Chad +235">
TD +235
</option>
<option data-calling-code="56" data-country-code="CL" value="CL" data-a-html-content="Chile +56">
CL +56
</option>
<option data-calling-code="86" data-country-code="CN" value="CN" data-a-html-content="China +86">
CN +86
</option>
<option data-calling-code="57" data-country-code="CO" value="CO" data-a-html-content="Colombia +57">
CO +57
</option>
<option data-calling-code="269" data-country-code="KM" value="KM" data-a-html-content="Comoros +269">
KM +269
</option>
<option data-calling-code="242" data-country-code="CG" value="CG" data-a-html-content="Congo - Brazzaville +242">
CG +242
</option>
<option data-calling-code="243" data-country-code="CD" value="CD" data-a-html-content="Congo - Kinshasa +243">
CD +243
</option>
<option data-calling-code="682" data-country-code="CK" value="CK" data-a-html-content="Cook Islands +682">
CK +682
</option>
<option data-calling-code="506" data-country-code="CR" value="CR" data-a-html-content="Costa Rica +506">
CR +506
</option>
<option data-calling-code="385" data-country-code="HR" value="HR" data-a-html-content="Croatia +385">
HR +385
</option>
<option data-calling-code="53" data-country-code="CU" value="CU" data-a-html-content="Cuba +53">
CU +53
</option>
<option data-calling-code="357" data-country-code="CY" value="CY" data-a-html-content="Cyprus +357">
CY +357
</option>
<option data-calling-code="420" data-country-code="CZ" value="CZ" data-a-html-content="Czech Republic +420">
CZ +420
</option>
<option data-calling-code="225" data-country-code="CI" value="CI" data-a-html-content="C&ocirc;te d&rsquo;Ivoire +225">
CI +225
</option>
<option data-calling-code="45" data-country-code="DK" value="DK" data-a-html-content="Denmark +45">
DK +45
</option>
<option data-calling-code="253" data-country-code="DJ" value="DJ" data-a-html-content="Djibouti +253">
DJ +253
</option>
<option data-calling-code="1" data-country-code="DM" value="DM" data-a-html-content="Dominica +1">
DM +1
</option>
<option data-calling-code="1" data-country-code="DO" value="DO" data-a-html-content="Dominican Republic +1">
DO +1
</option>
<option data-calling-code="593" data-country-code="EC" value="EC" data-a-html-content="Ecuador +593">
EC +593
</option>
<option data-calling-code="20" data-country-code="EG" value="EG" data-a-html-content="Egypt +20">
EG +20
</option>
<option data-calling-code="503" data-country-code="SV" value="SV" data-a-html-content="El Salvador +503">
SV +503
</option>
<option data-calling-code="240" data-country-code="GQ" value="GQ" data-a-html-content="Equatorial Guinea +240">
GQ +240
</option>
<option data-calling-code="291" data-country-code="ER" value="ER" data-a-html-content="Eritrea +291">
ER +291
</option>
<option data-calling-code="372" data-country-code="EE" value="EE" data-a-html-content="Estonia +372">
EE +372
</option>
<option data-calling-code="251" data-country-code="ET" value="ET" data-a-html-content="Ethiopia +251">
ET +251
</option>
<option data-calling-code="500" data-country-code="FK" value="FK" data-a-html-content="Falkland Islands +500">
FK +500
</option>
<option data-calling-code="298" data-country-code="FO" value="FO" data-a-html-content="Faroe Islands +298">
FO +298
</option>
<option data-calling-code="679" data-country-code="FJ" value="FJ" data-a-html-content="Fiji +679">
FJ +679
</option>
<option data-calling-code="358" data-country-code="FI" value="FI" data-a-html-content="Finland +358">
FI +358
</option>
<option data-calling-code="33" data-country-code="FR" value="FR" data-a-html-content="France +33">
FR +33
</option>
<option data-calling-code="594" data-country-code="GF" value="GF" data-a-html-content="French Guiana +594">
GF +594
</option>
<option data-calling-code="689" data-country-code="PF" value="PF" data-a-html-content="French Polynesia +689">
PF +689
</option>
<option data-calling-code="241" data-country-code="GA" value="GA" data-a-html-content="Gabon +241">
GA +241
</option>
<option data-calling-code="220" data-country-code="GM" value="GM" data-a-html-content="Gambia +220">
GM +220
</option>
<option data-calling-code="995" data-country-code="GE" value="GE" data-a-html-content="Georgia +995">
GE +995
</option>
<option data-calling-code="49" data-country-code="DE" value="DE" data-a-html-content="Germany +49">
DE +49
</option>
<option data-calling-code="233" data-country-code="GH" value="GH" data-a-html-content="Ghana +233">
GH +233
</option>
<option data-calling-code="350" data-country-code="GI" value="GI" data-a-html-content="Gibraltar +350">
GI +350
</option>
<option data-calling-code="30" data-country-code="GR" value="GR" data-a-html-content="Greece +30">
GR +30
</option>
<option data-calling-code="299" data-country-code="GL" value="GL" data-a-html-content="Greenland +299">
GL +299
</option>
<option data-calling-code="1" data-country-code="GD" value="GD" data-a-html-content="Grenada +1">
GD +1
</option>
<option data-calling-code="590" data-country-code="GP" value="GP" data-a-html-content="Guadeloupe +590">
GP +590
</option>
<option data-calling-code="1" data-country-code="GU" value="GU" data-a-html-content="Guam +1">
GU +1
</option>
<option data-calling-code="502" data-country-code="GT" value="GT" data-a-html-content="Guatemala +502">
GT +502
</option>
<option data-calling-code="224" data-country-code="GN" value="GN" data-a-html-content="Guinea +224">
GN +224
</option>
<option data-calling-code="245" data-country-code="GW" value="GW" data-a-html-content="Guinea-Bissau +245">
GW +245
</option>
<option data-calling-code="592" data-country-code="GY" value="GY" data-a-html-content="Guyana +592">
GY +592
</option>
<option data-calling-code="509" data-country-code="HT" value="HT" data-a-html-content="Haiti +509">
HT +509
</option>
<option data-calling-code="504" data-country-code="HN" value="HN" data-a-html-content="Honduras +504">
HN +504
</option>
<option data-calling-code="852" data-country-code="HK" value="HK" data-a-html-content="Hong Kong +852">
HK +852
</option>
<option data-calling-code="36" data-country-code="HU" value="HU" data-a-html-content="Hungary +36">
HU +36
</option>
<option data-calling-code="354" data-country-code="IS" value="IS" data-a-html-content="Iceland +354">
IS +354
</option>
<option data-calling-code="91" data-country-code="IN" value="IN" data-a-html-content="India +91">
IN +91
</option>
<option data-calling-code="62" data-country-code="ID" value="ID" data-a-html-content="Indonesia +62">
ID +62
</option>
<option data-calling-code="98" data-country-code="IR" value="IR" data-a-html-content="Iran +98">
IR +98
</option>
<option data-calling-code="964" data-country-code="IQ" value="IQ" data-a-html-content="Iraq +964">
IQ +964
</option>
<option data-calling-code="353" data-country-code="IE" value="IE" data-a-html-content="Ireland +353">
IE +353
</option>
<option data-calling-code="972" data-country-code="IL" value="IL" data-a-html-content="Israel +972">
IL +972
</option>
<option data-calling-code="39" data-country-code="IT" value="IT" data-a-html-content="Italy +39">
IT +39
</option>
<option data-calling-code="1" data-country-code="JM" value="JM" data-a-html-content="Jamaica +1">
JM +1
</option>
<option data-calling-code="81" data-country-code="JP" value="JP" data-a-html-content="Japan +81">
JP +81
</option>
<option data-calling-code="962" data-country-code="JO" value="JO" data-a-html-content="Jordan +962">
JO +962
</option>
<option data-calling-code="7" data-country-code="KZ" value="KZ" data-a-html-content="Kazakhstan +7">
KZ +7
</option>
<option data-calling-code="254" data-country-code="KE" value="KE" data-a-html-content="Kenya +254">
KE +254
</option>
<option data-calling-code="686" data-country-code="KI" value="KI" data-a-html-content="Kiribati +686">
KI +686
</option>
<option data-calling-code="965" data-country-code="KW" value="KW" data-a-html-content="Kuwait +965">
KW +965
</option>
<option data-calling-code="996" data-country-code="KG" value="KG" data-a-html-content="Kyrgyzstan +996">
KG +996
</option>
<option data-calling-code="856" data-country-code="LA" value="LA" data-a-html-content="Laos +856">
LA +856
</option>
<option data-calling-code="371" data-country-code="LV" value="LV" data-a-html-content="Latvia +371">
LV +371
</option>
<option data-calling-code="961" data-country-code="LB" value="LB" data-a-html-content="Lebanon +961">
LB +961
</option>
<option data-calling-code="266" data-country-code="LS" value="LS" data-a-html-content="Lesotho +266">
LS +266
</option>
<option data-calling-code="231" data-country-code="LR" value="LR" data-a-html-content="Liberia +231">
LR +231
</option>
<option data-calling-code="218" data-country-code="LY" value="LY" data-a-html-content="Libya +218">
LY +218
</option>
<option data-calling-code="423" data-country-code="LI" value="LI" data-a-html-content="Liechtenstein +423">
LI +423
</option>
<option data-calling-code="370" data-country-code="LT" value="LT" data-a-html-content="Lithuania +370">
LT +370
</option>
<option data-calling-code="352" data-country-code="LU" value="LU" data-a-html-content="Luxembourg +352">
LU +352
</option>
<option data-calling-code="853" data-country-code="MO" value="MO" data-a-html-content="Macau +853">
MO +853
</option>
<option data-calling-code="389" data-country-code="MK" value="MK" data-a-html-content="Macedonia +389">
MK +389
</option>
<option data-calling-code="261" data-country-code="MG" value="MG" data-a-html-content="Madagascar +261">
MG +261
</option>
<option data-calling-code="265" data-country-code="MW" value="MW" data-a-html-content="Malawi +265">
MW +265
</option>
<option data-calling-code="60" data-country-code="MY" value="MY" data-a-html-content="Malaysia +60">
MY +60
</option>
<option data-calling-code="960" data-country-code="MV" value="MV" data-a-html-content="Maldives +960">
MV +960
</option>
<option data-calling-code="223" data-country-code="ML" value="ML" data-a-html-content="Mali +223">
ML +223
</option>
<option data-calling-code="356" data-country-code="MT" value="MT" data-a-html-content="Malta +356">
MT +356
</option>
<option data-calling-code="692" data-country-code="MH" value="MH" data-a-html-content="Marshall Islands +692">
MH +692
</option>
<option data-calling-code="596" data-country-code="MQ" value="MQ" data-a-html-content="Martinique +596">
MQ +596
</option>
<option data-calling-code="222" data-country-code="MR" value="MR" data-a-html-content="Mauritania +222">
MR +222
</option>
<option data-calling-code="230" data-country-code="MU" value="MU" data-a-html-content="Mauritius +230">
MU +230
</option>
<option data-calling-code="52" data-country-code="MX" value="MX" data-a-html-content="Mexico +52">
MX +52
</option>
<option data-calling-code="691" data-country-code="FM" value="FM" data-a-html-content="Micronesia +691">
FM +691
</option>
<option data-calling-code="373" data-country-code="MD" value="MD" data-a-html-content="Moldova +373">
MD +373
</option>
<option data-calling-code="377" data-country-code="MC" value="MC" data-a-html-content="Monaco +377">
MC +377
</option>
<option data-calling-code="976" data-country-code="MN" value="MN" data-a-html-content="Mongolia +976">
MN +976
</option>
<option data-calling-code="382" data-country-code="ME" value="ME" data-a-html-content="Montenegro +382">
ME +382
</option>
<option data-calling-code="1" data-country-code="MS" value="MS" data-a-html-content="Montserrat +1">
MS +1
</option>
<option data-calling-code="212" data-country-code="MA" value="MA" data-a-html-content="Morocco +212">
MA +212
</option>
<option data-calling-code="258" data-country-code="MZ" value="MZ" data-a-html-content="Mozambique +258">
MZ +258
</option>
<option data-calling-code="95" data-country-code="MM" value="MM" data-a-html-content="Myanmar (Burma) +95">
MM +95
</option>
<option data-calling-code="264" data-country-code="NA" value="NA" data-a-html-content="Namibia +264">
NA +264
</option>
<option data-calling-code="674" data-country-code="NR" value="NR" data-a-html-content="Nauru +674">
NR +674
</option>
<option data-calling-code="977" data-country-code="NP" value="NP" data-a-html-content="Nepal +977">
NP +977
</option>
<option data-calling-code="31" data-country-code="NL" value="NL" data-a-html-content="Netherlands +31">
NL +31
</option>
<option data-calling-code="687" data-country-code="NC" value="NC" data-a-html-content="New Caledonia +687">
NC +687
</option>
<option data-calling-code="64" data-country-code="NZ" value="NZ" data-a-html-content="New Zealand +64">
NZ +64
</option>
<option data-calling-code="505" data-country-code="NI" value="NI" data-a-html-content="Nicaragua +505">
NI +505
</option>
<option data-calling-code="227" data-country-code="NE" value="NE" data-a-html-content="Niger +227">
NE +227
</option>
<option data-calling-code="234" data-country-code="NG" value="NG" data-a-html-content="Nigeria +234">
NG +234
</option>
<option data-calling-code="683" data-country-code="NU" value="NU" data-a-html-content="Niue +683">
NU +683
</option>
<option data-calling-code="672" data-country-code="NF" value="NF" data-a-html-content="Norfolk Island +672">
NF +672
</option>
<option data-calling-code="850" data-country-code="KP" value="KP" data-a-html-content="North Korea +850">
KP +850
</option>
<option data-calling-code="47" data-country-code="NO" value="NO" data-a-html-content="Norway +47">
NO +47
</option>
<option data-calling-code="968" data-country-code="OM" value="OM" data-a-html-content="Oman +968">
OM +968
</option>
<option data-calling-code="92" data-country-code="PK" value="PK" data-a-html-content="Pakistan +92">
PK +92
</option>
<option data-calling-code="680" data-country-code="PW" value="PW" data-a-html-content="Palau +680">
PW +680
</option>
<option data-calling-code="970" data-country-code="PS" value="PS" data-a-html-content="Palestinian Territories +970">
PS +970
</option>
<option data-calling-code="507" data-country-code="PA" value="PA" data-a-html-content="Panama +507">
PA +507
</option>
<option data-calling-code="675" data-country-code="PG" value="PG" data-a-html-content="Papua New Guinea +675">
PG +675
</option>
<option data-calling-code="595" data-country-code="PY" value="PY" data-a-html-content="Paraguay +595">
PY +595
</option>
<option data-calling-code="51" data-country-code="PE" value="PE" data-a-html-content="Peru +51">
PE +51
</option>
<option data-calling-code="63" data-country-code="PH" value="PH" data-a-html-content="Philippines +63">
PH +63
</option>
<option data-calling-code="48" data-country-code="PL" value="PL" data-a-html-content="Poland +48">
PL +48
</option>
<option data-calling-code="351" data-country-code="PT" value="PT" data-a-html-content="Portugal +351">
PT +351
</option>
<option data-calling-code="1" data-country-code="PR" value="PR" data-a-html-content="Puerto Rico +1">
PR +1
</option>
<option data-calling-code="974" data-country-code="QA" value="QA" data-a-html-content="Qatar +974">
QA +974
</option>
<option data-calling-code="40" data-country-code="RO" value="RO" data-a-html-content="Romania +40">
RO +40
</option>
<option data-calling-code="7" data-country-code="RU" value="RU" data-a-html-content="Russia +7">
RU +7
</option>
<option data-calling-code="250" data-country-code="RW" value="RW" data-a-html-content="Rwanda +250">
RW +250
</option>
<option data-calling-code="262" data-country-code="RE" value="RE" data-a-html-content="R&eacute;union +262">
RE +262
</option>
<option data-calling-code="685" data-country-code="WS" value="WS" data-a-html-content="Samoa +685">
WS +685
</option>
<option data-calling-code="378" data-country-code="SM" value="SM" data-a-html-content="San Marino +378">
SM +378
</option>
<option data-calling-code="966" data-country-code="SA" value="SA" data-a-html-content="Saudi Arabia +966">
SA +966
</option>
<option data-calling-code="221" data-country-code="SN" value="SN" data-a-html-content="Senegal +221">
SN +221
</option>
<option data-calling-code="381" data-country-code="RS" value="RS" data-a-html-content="Serbia +381">
RS +381
</option>
<option data-calling-code="248" data-country-code="SC" value="SC" data-a-html-content="Seychelles +248">
SC +248
</option>
<option data-calling-code="232" data-country-code="SL" value="SL" data-a-html-content="Sierra Leone +232">
SL +232
</option>
<option data-calling-code="65" data-country-code="SG" value="SG" data-a-html-content="Singapore +65">
SG +65
</option>
<option data-calling-code="421" data-country-code="SK" value="SK" data-a-html-content="Slovakia +421">
SK +421
</option>
<option data-calling-code="386" data-country-code="SI" value="SI" data-a-html-content="Slovenia +386">
SI +386
</option>
<option data-calling-code="677" data-country-code="SB" value="SB" data-a-html-content="Solomon Islands +677">
SB +677
</option>
<option data-calling-code="252" data-country-code="SO" value="SO" data-a-html-content="Somalia +252">
SO +252
</option>
<option data-calling-code="27" data-country-code="ZA" value="ZA" data-a-html-content="South Africa +27">
ZA +27
</option>
<option data-calling-code="82" data-country-code="KR" value="KR" data-a-html-content="South Korea +82">
KR +82
</option>
<option data-calling-code="211" data-country-code="SS" value="SS" data-a-html-content="South Sudan +211">
SS +211
</option>
<option data-calling-code="34" data-country-code="ES" value="ES" data-a-html-content="Spain +34">
ES +34
</option>
<option data-calling-code="94" data-country-code="LK" value="LK" data-a-html-content="Sri Lanka +94">
LK +94
</option>
<option data-calling-code="1" data-country-code="KN" value="KN" data-a-html-content="St. Kitts &amp; Nevis +1">
KN +1
</option>
<option data-calling-code="1" data-country-code="LC" value="LC" data-a-html-content="St. Lucia +1">
LC +1
</option>
<option data-calling-code="508" data-country-code="PM" value="PM" data-a-html-content="St. Pierre &amp; Miquelon +508">
PM +508
</option>
<option data-calling-code="1" data-country-code="VC" value="VC" data-a-html-content="St. Vincent &amp; Grenadines +1">
VC +1
</option>
<option data-calling-code="249" data-country-code="SD" value="SD" data-a-html-content="Sudan +249">
SD +249
</option>
<option data-calling-code="597" data-country-code="SR" value="SR" data-a-html-content="Suriname +597">
SR +597
</option>
<option data-calling-code="268" data-country-code="SZ" value="SZ" data-a-html-content="Swaziland +268">
SZ +268
</option>
<option data-calling-code="46" data-country-code="SE" value="SE" data-a-html-content="Sweden +46">
SE +46
</option>
<option data-calling-code="41" data-country-code="CH" value="CH" data-a-html-content="Switzerland +41">
CH +41
</option>
<option data-calling-code="963" data-country-code="SY" value="SY" data-a-html-content="Syria +963">
SY +963
</option>
<option data-calling-code="239" data-country-code="ST" value="ST" data-a-html-content="S&atilde;o Tom&eacute; &amp; Pr&iacute;ncipe +239">
ST +239
</option>
<option data-calling-code="886" data-country-code="TW" value="TW" data-a-html-content="Taiwan +886">
TW +886
</option>
<option data-calling-code="992" data-country-code="TJ" value="TJ" data-a-html-content="Tajikistan +992">
TJ +992
</option>
<option data-calling-code="255" data-country-code="TZ" value="TZ" data-a-html-content="Tanzania +255">
TZ +255
</option>
<option data-calling-code="66" data-country-code="TH" value="TH" data-a-html-content="Thailand +66">
TH +66
</option>
<option data-calling-code="670" data-country-code="TL" value="TL" data-a-html-content="Timor-Leste +670">
TL +670
</option>
<option data-calling-code="228" data-country-code="TG" value="TG" data-a-html-content="Togo +228">
TG +228
</option>
<option data-calling-code="676" data-country-code="TO" value="TO" data-a-html-content="Tonga +676">
TO +676
</option>
<option data-calling-code="1" data-country-code="TT" value="TT" data-a-html-content="Trinidad &amp; Tobago +1">
TT +1
</option>
<option data-calling-code="216" data-country-code="TN" value="TN" data-a-html-content="Tunisia +216">
TN +216
</option>
<option data-calling-code="90" data-country-code="TR" value="TR" data-a-html-content="Turkey +90">
TR +90
</option>
<option data-calling-code="993" data-country-code="TM" value="TM" data-a-html-content="Turkmenistan +993">
TM +993
</option>
<option data-calling-code="1" data-country-code="TC" value="TC" data-a-html-content="Turks &amp; Caicos Islands +1">
TC +1
</option>
<option data-calling-code="688" data-country-code="TV" value="TV" data-a-html-content="Tuvalu +688">
TV +688
</option>
<option data-calling-code="1" data-country-code="VI" value="VI" data-a-html-content="U.S. Virgin Islands +1">
VI +1
</option>
<option data-calling-code="256" data-country-code="UG" value="UG" data-a-html-content="Uganda +256">
UG +256
</option>
<option data-calling-code="380" data-country-code="UA" value="UA" data-a-html-content="Ukraine +380">
UA +380
</option>
<option data-calling-code="971" data-country-code="AE" value="AE" data-a-html-content="United Arab Emirates +971">
AE +971
</option>
<option data-calling-code="44" data-country-code="GB" value="GB" data-a-html-content="United Kingdom +44">
GB +44
</option>
<option data-calling-code="1" data-country-code="US" value="US" data-a-html-content="United States +1" selected>
US +1
</option>
<option data-calling-code="598" data-country-code="UY" value="UY" data-a-html-content="Uruguay +598">
UY +598
</option>
<option data-calling-code="998" data-country-code="UZ" value="UZ" data-a-html-content="Uzbekistan +998">
UZ +998
</option>
<option data-calling-code="678" data-country-code="VU" value="VU" data-a-html-content="Vanuatu +678">
VU +678
</option>
<option data-calling-code="58" data-country-code="VE" value="VE" data-a-html-content="Venezuela +58">
VE +58
</option>
<option data-calling-code="84" data-country-code="VN" value="VN" data-a-html-content="Vietnam +84">
VN +84
</option>
<option data-calling-code="967" data-country-code="YE" value="YE" data-a-html-content="Yemen +967">
YE +967
</option>
<option data-calling-code="260" data-country-code="ZM" value="ZM" data-a-html-content="Zambia +260">
ZM +260
</option>
<option data-calling-code="263" data-country-code="ZW" value="ZW" data-a-html-content="Zimbabwe +263">
ZW +263
</option>
<option data-calling-code="358" data-country-code="AX" value="AX" data-a-html-content="&Aring;land Islands +358">
AX +358
</option>
</select><span tabindex="-1" class="a-button a-button-dropdown"><span class="a-button-inner"><span class="a-button-text a-declarative" data-action="a-dropdown-button" role="button" tabindex="0" aria-hidden="true"><span class="a-dropdown-prompt">US +1</span></span><i class="a-icon a-icon-dropdown"></i></span></span></span>

Grab text, space, + and number between double quotes: "[A-Z a-z]+ \+[0-9]{1,3}"

The expression above only matches 204 records but I was expecting 222 records. The expression that worked was to select everything from content= to the end: content=".*

The from there we can filter out unwanted parts.

To select country code and phone code the expression is: ^[A-Z]* \+[0-9]+

Have a nice day.

Thursday, September 2, 2021

Nairaland website Data Scraper

₦airaLand.com Data Scraper Bot

In this information age, the need and importance of extracting data from the web is becoming increasingly obvious.

Over the years attempts has been made to duplicate nairaland website structure by developers using different programming languages such PHP, Python, .NET, Perl, Ruby, C#, Java etc. But unfortunately little or no attempt has been made to scrape or extract useful data from the forum for legitimate purpose.

If you know nairaland.com, then I don't need to tell you that it is the equivalent of Facebook or Twitter for Nigerians that houses abundant information related to Nigeria and environs. So, as a data person, you know what that means!

If you want to measure the opinion of Nigerians online, don't use data from sources like facebook or twitter, instead use the data from Nairaland. As a the moment, nairaland.com has about 1.5 million active user accounts (90% of them are Nigerians residing in the country) and more than 3 million topics on different subjects has been created.

The problem now is, how to extract these data legally and freely without breaking the site and you pocket.

Off course, you can always copy, paste and edit contents from any section of the forum. But in situations were you have to do this repeatedly, then you need a way to automate the process to ease your task.

Imagine if you have to copy the title of the topics that made front page everyday, you will select all the content and copy paste in a text editor to edit into a friendly format. That is how you will do it every single day! Won't it be nice if you have a script/program that does that for you with just a mouse click?

Legal Warning before Scraping a website

There are a few points that we need to go over before we start scraping.

~ Always check the website’s terms and conditions before you scrape them. They usually have terms that limit how often you can scrape or what you can you scrape
~ Because your script will run much faster than a human can browse, make sure you don’t hammer their website with lots of requests. This may even be covered in the terms and conditions of the website.
~ You can get into legal trouble if you overload a website with your requests or you attempt to use it in a way that violates the terms and conditions you agreed to.
~ Websites change all the time, so your scraper will break some day. Know this: You will have to maintain your scraper if you want it to keep working.
~ Unfortunately the data you get from websites can be a mess. As with any data parsing activity, you will need to clean it up to make it useful to you.

With that out of the way, let’s start scraping!

Lets get started scraping data from nairaland.

Here I present to you a solution that allows you scrape or extract the following datasets from Nairaland:-

1) Front Page Topics

2) Members and Guests Online

3) Section Topics and poster usernames

4) First Post content (Original Post) from thread

5) Images from thread

6) Email Addresses from thread

How to use the data

What can the scraped data be used for?
Data from Nairaland can be used for Data Science, Machine Learning, Computer Vision etc as follows:-

~ Text mining
~ Sentiment Analysis
~ Natural Language Processing
~ Polls and Opinions Study
~ Trends Analysis
~ Market Research
~ Automatic summarization
~ Machine translation
~ Named entity recognition
~ Relationship extraction
~ Sentiment analysis
~ Speech recognition
~ Words embedding
~ Topic segmentation
~ etc.

Understand Nairaland Structure

The structure of the website has pretty much remained the same for some years now. See the look of the forum in 2005, 2011, 2014, and 2017.

Year 2005

Year 2011

Year 2014

Year 2017

This means a web crawler script written for nairaland will remain functional for a long time, until the structure is changed.

Also, the HTML structure uses a lot of tables. Most of the data we will be scrapping are nested in html table structures.

Choropleth and Bubble Maps in R - A case study of mapping Nigeria Poor & Vulnerable Households

According to WikiPedia, a choropleth map is a type of thematic map in which a set of pre-defined areas is colored or patterned in proportion to a statistical variable.

Here our statistical variable is going to be the table on Nigeria Poor & Vulnerable Households provided by "The National Social Register of Poor & Vulnerable Households (PVHHs)" as of 31st March, 2020.

There are many R packages for making and working with maps and GIS/spatial data in general. Some of them are: ggplot2, ggmap, maps, mapdata, tmap, sp, sf, rgdal, rgeos, mapproj, etc. More related packages can be reviewed on this page.

For the purpose of making the Choropleth map, I will make use of the following packages:

~ tmap (to plot the map),

~ sf or rgdal (to read shapefile as spatial data.frame) and

~ readr (to read .csv file) ( install.packages(c('tmap', 'rgdal')) ).

Choropleth map

The script below will plot a basic map based on the default attribute columns.

# For reading CSV files
library(readr)
# For reading, writing and working with spatial objects
library(sf)
library(rgdal)
# For creating map
library(tmap)



# Read the CSV data...
# Note missing data for: Ebonyi and Ogun states
Poor_and_Vulnerable <- read_csv("C:/Users/Yusuf_08039508010/Desktop/Working_Files/Fiverr/2021/08-August/R Poor and Vulnerable/Poor and Vulnerable.csv")

# Read the NIG Admin shapefile... using sf
ng_map1 <- st_read('C:/Users/Yusuf_08039508010/Desktop/Working_Files/Fiverr/2021/08-August/R Poor and Vulnerable/SHP/NIG_ADM.shp')


# Read the NIG Admin shapefile... using rgdal
# ng_map2 <- readOGR("C:/Users/Yusuf_08039508010/Desktop/Working_Files/Fiverr/2021/08-August/R Poor and Vulnerable/SHP", "NIG_ADM")


colnames(ng_map1)

# Quick base R plot...
plot(ng_map1) # by all columns
plot(ng_map1['geographic']) # by column name
plot(ng_map1['geometry']) # by column name


# Simple plot using tmap...
tm_shape(ng_map1) + 
  tm_polygons(col='geographic')

Note that: I was facing this error: https://github.com/mtennekes/tmap/issues/571, so I downgrade sf from version 1.0.0 to 0.9.8

# Installing specific version (0.9.8) of sf package... www.support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages
# Saerch for specific package at: https://cran.r-project.org/src/contrib/Archive/

packageurl <- 'https://cran.r-project.org/src/contrib/Archive/sf/sf_0.9-8.tar.gz'
install.packages(packageurl, repos=NULL, type="source")

However, the map we wanted is based on a dataset which is in a CSV file we read into a variable named "Poor_and_Vulnerable". So we have to find a way of combining the CSV data to the map to be able to plot the choropleth map showing the Poor & Vulnerable Households/Individuals in Nigeria.

The process is very simple using the merge() function as follow;-

# Merge the CSV data to the Shp data...

# Check the col names for both the CSV and shp data...
names(Poor_and_Vulnerable)
names(ng_map1)

m <- merge(ng_map1, Poor_and_Vulnerable, by.x='state_name', by.y='State')

names(m)


# Plot choropleth map by Households using tmap...
tm_shape(m) + 
  tm_polygons(col='Households')

 
# Plot choropleth map by Individuals using tmap...
tm_shape(m) + 
  tm_polygons(col='Individuals')

We just need to lookup the merge/common column names and provide is as an parameter in the merge() function. We will then plot the new merged object (m) as it is called above.

Note that after the merge some states were missing. One reason for this could be because of missing record or mismatch names between the two columns.

Bubble map

A bubble map uses circles of different size to represent a numeric value on a territory. It displays one bubble per geographic coordinate, or one bubble per region (in this case the bubble is usually displayed in the baricentre of the region).

It takes few lines of code to make bubble map using tmap as seen below...

# Bubble Map....
tm_shape(ng_map1) + 
  tm_polygons(col='black') + 
  
  tm_shape(m) + 
  tm_bubbles("Households", col='red')

If you are interested in traditional GIS graphical approach of producing similar maps, check this post on 'Mapping Poor And Vulnerable Nigerians by state', where I used QGIS to produce similar maps.

Happy mapping!

Geospatial Solutions Expert

Wednesday, December 1, 2021

Get IP address from domain name

Friday, November 26, 2021

Several ways of doing same thing in programming - mapping two lists into one dictionary

Tuesday, November 16, 2021

Making contour map using QuickGrid Software

Wednesday, November 10, 2021

Overlay old image map on leafletjs web map

Thursday, November 4, 2021

Python GIS data wrangling - Mapping supper eagles head coaches since 1949

Wednesday, November 3, 2021

Google base Maps in LeafletJS

Sunday, October 31, 2021

LeafletJS - 3 ways to plot lines and add to layer control

Wednesday, September 29, 2021

GIS data wrangling with python - Gridded Street Finder

Wednesday, September 15, 2021

Working with Regular Expression in QGIS

Friday, September 10, 2021

R packages for working with shapefile

Manipulating spatial data using the SF package

Wednesday, September 8, 2021

Generate Emails given owner and domain names

Friday, September 3, 2021

RegEx - I used recently

Thursday, September 2, 2021

Nairaland website Data Scraper

Lets get started scraping data from nairaland.

How to use the data

Understand Nairaland Structure

Monday, August 30, 2021

Choropleth and Bubble Maps in R - A case study of mapping Nigeria Poor & Vulnerable Households