Tuesday, October 13, 2020

Extracting Hyperlink 'href' attribute from excel

On this page I will show you how to extract the link/url from the text in each cell of the excel sheet. The excel document (.xlsx file) is as seen below, contains hyperlink text. We are interested in the actual url embedded in the href link anchor.


We can access each link be right clicking on each cell and edit hyperlink as seen below;-


Doing this manually is tedious especially if we have to do it for many cells. So, here I will use a python module for manipulating excel files. This module is called: openpyxl.

The pseudocode is as follow:-

  • Read or Load the excel file
  • Read the active sheet or get the sheet by name
  • Access the sheet's column or cells

import openpyxl

# Read or Load the excel file...
work_book = openpyxl.load_workbook(r"C:\Users\Yusuf_08039508010\Desktop\Book1_gethref.xlsx")

# Read the sheet to use by name or active
work_sheet = work_book['Sheet1']
# work_sheet = work_book.active

# Access sheet cells...
print(work_sheet.cell(row=2, column=1).hyperlink.target)
print(work_sheet['A1'].value)
print(work_sheet['A2'].value)
print(work_sheet['A3'].value)


# Access sheet column...
print(work_sheet['A'])

# Loop over column and get the text and hyperlink...
for cell in work_sheet['A']:
    print(cell.value)
    try:
        print(cell.hyperlink.target)
    except AttributeError:
        pass

That is it!

No comments:

Post a Comment