Friday, February 9, 2024

Extract images from PDF file

 The code snippet below will read a PDF file and extract the images in every page of the file into a folder.

The file name is structured like so: Image-{x}_{index}.png where x is the PDF page number while index is an arbitrary number that increment to make the names unique for each file.

from spire.pdf.common import *
from spire.pdf import *


# Create a PdfDocument object
doc = PdfDocument()

# Load a PDF document
doc.LoadFromFile(r"dermatology-atlas-for-skin-color_compress.pdf")

for x in range(0, 305): # 305 is the expected number of pages
    print('Processing...', x)
    # Get a specific page
    page = doc.Pages[x]

    # Extract images from the page
    images = []
    for image in page.ExtractImages():
        images.append(image)

    # Save images to specified location with specified format extension
    index = 0
    for image in images:
        imageFileName = f'image_for_PDF/Image-{x}_{index}.png'
        index += 1
        image.Save(imageFileName, ImageFormat.get_Png())
        
doc.Close()


The output result of the PDF file: dermatology-atlas-for-skin-color_compress.pdf is as shown below:-

That is it!

No comments:

Post a Comment