Wednesday, December 21, 2016

Copying tabular data to Excel/CSV file in Python

I know there are several ways of doing this task. But if you have very large number of tables to be copied into several spreadsheets, then the best approach is to automate the process using a scripting language like python.

One of the best python modules for doing this is called: pandas (an easy-to-use data structures and data analysis tools for the Python programming language)

If you don't already have pandas install, run "pip install pandas" from your command window.


Copying tabular data from PDF, HTML, Words etc to Excel/CSV spreadsheet file


With few lines of python pandas script, you can get this done. Then if you have multiple tables to copy into spreadsheets, you can use loop statement to go over all the tables.

I will give a very simple example here. Lets assume you have web page that contains a table and you want to save that table in an excel/csv file, here is how to go about doing it with python pandas module.

Problem


To copy the "List of Latin American countries by population" from this wikipedia web page into a spreadsheet.




Solution


1- First open the web page and highlight/select the table and copy it to clipboard.
2- Enter the following script into a .py file

# IMPORT THE PANDAS MODULE
import pandas as pd

# CREATE A PANDAS DATAFRAME FROM CLIPBOARD (MAKESURE YOU HAVE SELECTED AND COPIED THE TABLE)
latin_america_df = pd.read_clipboard()

# SAVES THE TABLE IN A SPREADSHEET
latin_america_df.to_csv('lat_ame_cont.csv')

Script explanation: it just imported the pandas module and use the read_clipboard() method  to read the copied table into a data frame and then used the to_csv() method to write the data frame into a CSV spreadsheet.
3- Now, check you working directory for a copy of the table saved in a csv spreadsheet file.



That is all. Thank you for following the post.

No comments:

Post a Comment