Monday, September 2, 2024

Extracting image Geographic metadata in Python

 Here I got hundreds of pictures captured during a field trip. Each picture has its GPS or geographical coordinates embedded in its metadata. These pictures' metadata are stored in EXIF standard (read about other standards from this link). EXIF stands for Exchangeable Image File Format. It stores technical information about an image and its capture method, such as exposure settings, capture time, GPS location information and camera model.



The EXIF metadata can contain many attributes (approximately 270 attributes) as listed below. This attribute will depend on the camera capability and the purpose of the picture captured.

InteropIndex, ProcessingSoftware, NewSubfileType, SubfileType, ImageWidth, ImageLength, BitsPerSample, Compression, PhotometricInterpretation, Thresholding, CellWidth, CellLength, FillOrder, DocumentName, ImageDescription, Make, Model, StripOffsets, Orientation, SamplesPerPixel, RowsPerStrip, StripByteCounts, MinSampleValue, MaxSampleValue, XResolution, YResolution, PlanarConfiguration, PageName, FreeOffsets, FreeByteCounts, GrayResponseUnit, GrayResponseCurve, T4Options, T6Options, ResolutionUnit, PageNumber, TransferFunction, Software, DateTime, Artist, HostComputer, Predictor, WhitePoint, PrimaryChromaticities, ColorMap, HalftoneHints, TileWidth, TileLength, TileOffsets, TileByteCounts, SubIFDs, InkSet, InkNames, NumberOfInks, DotRange, TargetPrinter, ExtraSamples, SampleFormat, SMinSampleValue, SMaxSampleValue, TransferRange, ClipPath, XClipPathUnits, YClipPathUnits, Indexed, JPEGTables, OPIProxy, JPEGProc, JpegIFOffset, JpegIFByteCount, JpegRestartInterval, JpegLosslessPredictors, JpegPointTransforms, JpegQTables, JpegDCTables, JpegACTables, YCbCrCoefficients, YCbCrSubSampling, YCbCrPositioning, ReferenceBlackWhite, XMLPacket, RelatedImageFileFormat, RelatedImageWidth, RelatedImageLength, Rating, RatingPercent, ImageID, CFARepeatPatternDim, BatteryLevel, Copyright, ExposureTime, FNumber, IPTCNAA, ImageResources, ExifOffset, InterColorProfile, ExposureProgram, SpectralSensitivity, GPSInfo, ISOSpeedRatings, OECF, Interlace, TimeZoneOffset, SelfTimerMode, SensitivityType, StandardOutputSensitivity, RecommendedExposureIndex, ISOSpeed, ISOSpeedLatitudeyyy, ISOSpeedLatitudezzz, ExifVersion, DateTimeOriginal, DateTimeDigitized, OffsetTime, OffsetTimeOriginal, OffsetTimeDigitized, ComponentsConfiguration, CompressedBitsPerPixel, ShutterSpeedValue, ApertureValue, BrightnessValue, ExposureBiasValue, MaxApertureValue, SubjectDistance, MeteringMode, LightSource, Flash, FocalLength, Noise, ImageNumber, SecurityClassification, ImageHistory, TIFF/EPStandardID, MakerNote, UserComment, SubsecTime, SubsecTimeOriginal, SubsecTimeDigitized, AmbientTemperature, Humidity, Pressure, WaterDepth, Acceleration, CameraElevationAngle, XPTitle, XPComment, XPAuthor, XPKeywords, XPSubject, FlashPixVersion, ColorSpace, ExifImageWidth, ExifImageHeight, RelatedSoundFile, ExifInteroperabilityOffset, FlashEnergy, SpatialFrequencyResponse, FocalPlaneXResolution, FocalPlaneYResolution, FocalPlaneResolutionUnit, SubjectLocation, ExposureIndex, SensingMethod, FileSource, SceneType, CFAPattern, CustomRendered, ExposureMode, WhiteBalance, DigitalZoomRatio, FocalLengthIn35mmFilm, SceneCaptureType, GainControl, Contrast, Saturation, Sharpness, DeviceSettingDescription, SubjectDistanceRange, ImageUniqueID, CameraOwnerName, BodySerialNumber, LensSpecification, LensMake, LensModel, LensSerialNumber, CompositeImage, CompositeImageCount, CompositeImageExposureTimes, Gamma, PrintImageMatching, DNGVersion, DNGBackwardVersion, UniqueCameraModel, LocalizedCameraModel, CFAPlaneColor, CFALayout, LinearizationTable, BlackLevelRepeatDim, BlackLevel, BlackLevelDeltaH, BlackLevelDeltaV, WhiteLevel, DefaultScale, DefaultCropOrigin, DefaultCropSize, ColorMatrix1, ColorMatrix2, CameraCalibration1, CameraCalibration2, ReductionMatrix1, ReductionMatrix2, AnalogBalance, AsShotNeutral, AsShotWhiteXY, BaselineExposure, BaselineNoise, BaselineSharpness, BayerGreenSplit, LinearResponseLimit, CameraSerialNumber, LensInfo, ChromaBlurRadius, AntiAliasStrength, ShadowScale, DNGPrivateData, MakerNoteSafety, CalibrationIlluminant1, CalibrationIlluminant2, BestQualityScale, RawDataUniqueID, OriginalRawFileName, OriginalRawFileData, ActiveArea, MaskedAreas, AsShotICCProfile, AsShotPreProfileMatrix, CurrentICCProfile, CurrentPreProfileMatrix, ColorimetricReference, CameraCalibrationSignature, ProfileCalibrationSignature, AsShotProfileName, NoiseReductionApplied, ProfileName, ProfileHueSatMapDims, ProfileHueSatMapData1, ProfileHueSatMapData2, ProfileToneCurve, ProfileEmbedPolicy, ProfileCopyright, ForwardMatrix1, ForwardMatrix2, PreviewApplicationName, PreviewApplicationVersion, PreviewSettingsName, PreviewSettingsDigest, PreviewColorSpace, PreviewDateTime, RawImageDigest, OriginalRawFileDigest, SubTileBlockSize, RowInterleaveFactor, ProfileLookTableDims, ProfileLookTableData, OpcodeList1, OpcodeList2, OpcodeList3, NoiseProfile, SpatialFrequencyResponse, SubjectLocation, ExposureIndex, CFAPattern, FlashEnergy

In our case here, the EXIF metadata attribute we are interested in is the GPSInfo. As seen above, it contains the latitude, longitude and altitude of the picture.

Lets extract this attribute information using python.

We are going to use the PIL library to read the EXIF metadata into pandas dataframe. Lets import the required modules.

import glob
import pandas as pd
from PIL import Image, ExifTags
from PIL.ExifTags import TAGS


The dictionary of all available EXIF metadata attributes can be accessed using ExifTags.TAGS as follow:-

{1: 'InteropIndex', 11: 'ProcessingSoftware', 254: 'NewSubfileType', 255: 'SubfileType', 256: 'ImageWidth', 257: 'ImageLength', 258: 'BitsPerSample', 259: 'Compression', 262: 'PhotometricInterpretation', 263: 'Thresholding', 264: 'CellWidth', 265: 'CellLength', 266: 'FillOrder', 269: 'DocumentName', 270: 'ImageDescription', 271: 'Make', 272: 'Model', 273: 'StripOffsets', 274: 'Orientation', 277: 'SamplesPerPixel', 278: 'RowsPerStrip', 279: 'StripByteCounts', 280: 'MinSampleValue', 281: 'MaxSampleValue', 282: 'XResolution', 283: 'YResolution', 284: 'PlanarConfiguration', 285: 'PageName', 288: 'FreeOffsets', 289: 'FreeByteCounts', 290: 'GrayResponseUnit', 291: 'GrayResponseCurve', 292: 'T4Options', 293: 'T6Options', 296: 'ResolutionUnit', 297: 'PageNumber', 301: 'TransferFunction', 305: 'Software', 306: 'DateTime', 315: 'Artist', 316: 'HostComputer', 317: 'Predictor', 318: 'WhitePoint', 319: 'PrimaryChromaticities', 320: 'ColorMap', 321: 'HalftoneHints', 322: 'TileWidth', 323: 'TileLength', 324: 'TileOffsets', 325: 'TileByteCounts', 330: 'SubIFDs', 332: 'InkSet', 333: 'InkNames', 334: 'NumberOfInks', 336: 'DotRange', 337: 'TargetPrinter', 338: 'ExtraSamples', 339: 'SampleFormat', 340: 'SMinSampleValue', 341: 'SMaxSampleValue', 342: 'TransferRange', 343: 'ClipPath', 344: 'XClipPathUnits', 345: 'YClipPathUnits', 346: 'Indexed', 347: 'JPEGTables', 351: 'OPIProxy', 512: 'JPEGProc', 513: 'JpegIFOffset', 514: 'JpegIFByteCount', 515: 'JpegRestartInterval', 517: 'JpegLosslessPredictors', 518: 'JpegPointTransforms', 519: 'JpegQTables', 520: 'JpegDCTables', 521: 'JpegACTables', 529: 'YCbCrCoefficients', 530: 'YCbCrSubSampling', 531: 'YCbCrPositioning', 532: 'ReferenceBlackWhite', 700: 'XMLPacket', 4096: 'RelatedImageFileFormat', 4097: 'RelatedImageWidth', 4098: 'RelatedImageLength', 18246: 'Rating', 18249: 'RatingPercent', 32781: 'ImageID', 33421: 'CFARepeatPatternDim', 33423: 'BatteryLevel', 33432: 'Copyright', 33434: 'ExposureTime', 33437: 'FNumber', 33723: 'IPTCNAA', 34377: 'ImageResources', 34665: 'ExifOffset', 34675: 'InterColorProfile', 34850: 'ExposureProgram', 34852: 'SpectralSensitivity', 34853: 'GPSInfo', 34855: 'ISOSpeedRatings', 34856: 'OECF', 34857: 'Interlace', 34858: 'TimeZoneOffset', 34859: 'SelfTimerMode', 34864: 'SensitivityType', 34865: 'StandardOutputSensitivity', 34866: 'RecommendedExposureIndex', 34867: 'ISOSpeed', 34868: 'ISOSpeedLatitudeyyy', 34869: 'ISOSpeedLatitudezzz', 36864: 'ExifVersion', 36867: 'DateTimeOriginal', 36868: 'DateTimeDigitized', 36880: 'OffsetTime', 36881: 'OffsetTimeOriginal', 36882: 'OffsetTimeDigitized', 37121: 'ComponentsConfiguration', 37122: 'CompressedBitsPerPixel', 37377: 'ShutterSpeedValue', 37378: 'ApertureValue', 37379: 'BrightnessValue', 37380: 'ExposureBiasValue', 37381: 'MaxApertureValue', 37382: 'SubjectDistance', 37383: 'MeteringMode', 37384: 'LightSource', 37385: 'Flash', 37386: 'FocalLength', 37389: 'Noise', 37393: 'ImageNumber', 37394: 'SecurityClassification', 37395: 'ImageHistory', 37398: 'TIFF/EPStandardID', 37500: 'MakerNote', 37510: 'UserComment', 37520: 'SubsecTime', 37521: 'SubsecTimeOriginal', 37522: 'SubsecTimeDigitized', 37888: 'AmbientTemperature', 37889: 'Humidity', 37890: 'Pressure', 37891: 'WaterDepth', 37892: 'Acceleration', 37893: 'CameraElevationAngle', 40091: 'XPTitle', 40092: 'XPComment', 40093: 'XPAuthor', 40094: 'XPKeywords', 40095: 'XPSubject', 40960: 'FlashPixVersion', 40961: 'ColorSpace', 40962: 'ExifImageWidth', 40963: 'ExifImageHeight', 40964: 'RelatedSoundFile', 40965: 'ExifInteroperabilityOffset', 41483: 'FlashEnergy', 41484: 'SpatialFrequencyResponse', 41486: 'FocalPlaneXResolution', 41487: 'FocalPlaneYResolution', 41488: 'FocalPlaneResolutionUnit', 41492: 'SubjectLocation', 41493: 'ExposureIndex', 41495: 'SensingMethod', 41728: 'FileSource', 41729: 'SceneType', 41730: 'CFAPattern', 41985: 'CustomRendered', 41986: 'ExposureMode', 41987: 'WhiteBalance', 41988: 'DigitalZoomRatio', 41989: 'FocalLengthIn35mmFilm', 41990: 'SceneCaptureType', 41991: 'GainControl', 41992: 'Contrast', 41993: 'Saturation', 41994: 'Sharpness', 41995: 'DeviceSettingDescription', 41996: 'SubjectDistanceRange', 42016: 'ImageUniqueID', 42032: 'CameraOwnerName', 42033: 'BodySerialNumber', 42034: 'LensSpecification', 42035: 'LensMake', 42036: 'LensModel', 42037: 'LensSerialNumber', 42080: 'CompositeImage', 42081: 'CompositeImageCount', 42082: 'CompositeImageExposureTimes', 42240: 'Gamma', 50341: 'PrintImageMatching', 50706: 'DNGVersion', 50707: 'DNGBackwardVersion', 50708: 'UniqueCameraModel', 50709: 'LocalizedCameraModel', 50710: 'CFAPlaneColor', 50711: 'CFALayout', 50712: 'LinearizationTable', 50713: 'BlackLevelRepeatDim', 50714: 'BlackLevel', 50715: 'BlackLevelDeltaH', 50716: 'BlackLevelDeltaV', 50717: 'WhiteLevel', 50718: 'DefaultScale', 50719: 'DefaultCropOrigin', 50720: 'DefaultCropSize', 50721: 'ColorMatrix1', 50722: 'ColorMatrix2', 50723: 'CameraCalibration1', 50724: 'CameraCalibration2', 50725: 'ReductionMatrix1', 50726: 'ReductionMatrix2', 50727: 'AnalogBalance', 50728: 'AsShotNeutral', 50729: 'AsShotWhiteXY', 50730: 'BaselineExposure', 50731: 'BaselineNoise', 50732: 'BaselineSharpness', 50733: 'BayerGreenSplit', 50734: 'LinearResponseLimit', 50735: 'CameraSerialNumber', 50736: 'LensInfo', 50737: 'ChromaBlurRadius', 50738: 'AntiAliasStrength', 50739: 'ShadowScale', 50740: 'DNGPrivateData', 50741: 'MakerNoteSafety', 50778: 'CalibrationIlluminant1', 50779: 'CalibrationIlluminant2', 50780: 'BestQualityScale', 50781: 'RawDataUniqueID', 50827: 'OriginalRawFileName', 50828: 'OriginalRawFileData', 50829: 'ActiveArea', 50830: 'MaskedAreas', 50831: 'AsShotICCProfile', 50832: 'AsShotPreProfileMatrix', 50833: 'CurrentICCProfile', 50834: 'CurrentPreProfileMatrix', 50879: 'ColorimetricReference', 50931: 'CameraCalibrationSignature', 50932: 'ProfileCalibrationSignature', 50934: 'AsShotProfileName', 50935: 'NoiseReductionApplied', 50936: 'ProfileName', 50937: 'ProfileHueSatMapDims', 50938: 'ProfileHueSatMapData1', 50939: 'ProfileHueSatMapData2', 50940: 'ProfileToneCurve', 50941: 'ProfileEmbedPolicy', 50942: 'ProfileCopyright', 50964: 'ForwardMatrix1', 50965: 'ForwardMatrix2', 50966: 'PreviewApplicationName', 50967: 'PreviewApplicationVersion', 50968: 'PreviewSettingsName', 50969: 'PreviewSettingsDigest', 50970: 'PreviewColorSpace', 50971: 'PreviewDateTime', 50972: 'RawImageDigest', 50973: 'OriginalRawFileDigest', 50974: 'SubTileBlockSize', 50975: 'RowInterleaveFactor', 50981: 'ProfileLookTableDims', 50982: 'ProfileLookTableData', 51008: 'OpcodeList1', 51009: 'OpcodeList2', 51022: 'OpcodeList3', 51041: 'NoiseProfile', 37388: 'SpatialFrequencyResponse', 37396: 'SubjectLocation', 37397: 'ExposureIndex', 33422: 'CFAPattern', 37387: 'FlashEnergy'}

You can export this into a user friendly spreadsheet format using the following lines of code:-

df_exif_tags = pd.DataFrame([ExifTags.TAGS]).T
df_exif_tags.to_excel('EXIF.xlsx', index=True)

We can use this img._getexif().items() to access the specific EXIF metadata attributes available in a picture after reading it like so;-

images = glob.glob(r'Field Trip Map\30-08-2024_05-00-34_7579\*.jpg')

img = Image.open(images[0])
print(img._getexif().items())

Now putting it all together, we can use dictionary comprehension to read all the available attributes of the picture.

images = glob.glob(r'Field Trip Map\30-08-2024_05-00-34_7579\*.jpg')

img = Image.open(images[0])
exifdata = { ExifTags.TAGS[k]: v for k, v in img._getexif().items() if k in ExifTags.TAGS }

print(exifdata)

From the result above, the items we needed are these: exifdata['GPSInfo'][2], exifdata['GPSInfo'][4], exifdata['GPSInfo'][6]

So, lets clean them up into a pandas dataframe.

lat, long, alt = exifdata['GPSInfo'][2], exifdata['GPSInfo'][4], exifdata['GPSInfo'][6]
data = lat, long, alt

df_exifdata = pd.DataFrame([data], columns=['Latitude', 'Longitude', 'Altitude'])


Running the above script for all the images, the code will be as follow:-

images = glob.glob(r'C:\Users\`HYJ7\Desktop\MSc NSUK\First Semester Classes\Field Trip Map\30-08-2024_05-00-34_7579\*.jpg')

df_exifdata_list = []
for im in images:
    
    # Open image...
    img = Image.open(im)
    exifdata = { ExifTags.TAGS[k]: v for k, v in img._getexif().items() if k in ExifTags.TAGS }

    # Extract needed attribute...
    lat, long, alt = exifdata['GPSInfo'][2], exifdata['GPSInfo'][4], exifdata['GPSInfo'][6]
    data = im, lat, long, alt

    # Contruct df...
    df_exifdata = pd.DataFrame([data], columns=['Image', 'Latitude', 'Longitude', 'Altitude'])
    # Append df to list...
    df_exifdata_list.append(df_exifdata)
    
print('DONE...')

# Save to file...
merge_df = pd.concat(df_exifdata_list).reset_index(drop=True)
merge_df.to_excel(f'Image_Geographic_Coordinates.xlsx', index=False)
The code snippet above will result in this table. As you can tell, further cleaning is still needed on the latitude and longitude columns so they are friendly for use in GIS environment. You can either do it manually or write script to handle it.


The script data cleaning is as follow:-

# Convert to Degree Munites Seconds (DMS)
def helperFunc1(x):
    deg = x[0]
    minu = x[1]
    sec = x[2]
    return f'{deg}° {minu}\' {sec}"'

merge_df['Lat (DMS)'] = merge_df['Latitude'].apply( lambda x: helperFunc1(x) )
merge_df['Long (DMS)'] = merge_df['Longitude'].apply( lambda x: helperFunc1(x) )


# Convert to Decimal Degree (DD)
def helperFunc2(x):
    deg = int(x[0])
    minu = int(x[1])/60
    sec = int(x[2])/3600
    result = deg + minu + sec
    return result

merge_df['Lat (DD)'] = merge_df['Latitude'].apply( lambda x: helperFunc2(x) )
merge_df['Long (DD)'] = merge_df['Longitude'].apply( lambda x: helperFunc2(x) )


That is it!

No comments:

Post a Comment