The provided data is an excel file as seen below. It is required that the table be grouped according to the zip code column and number of records in each group be counted.
# Read data table and remove empty rows...
f1 = r"C:\Users\path_to_file\fileName.xlsx"
# zipcode_df = pd.read_excel(f2)
zipcode_df = pd.read_csv(f1, encoding='latin1', on_bad_lines='skip')
zipcode_df.dropna(subset=['ZIP5'], inplace=True)
zipcode_df['ZIP5'] = zipcode_df['ZIP5'].astype(int)
zipcode_df.head(2)
Above lines will remove rows where the zip code column in empty. This will reduce the chances of getting misplaced groups.
import pandas as pd
f = r"C:\Users\path_to_file\fileName.xlsx"
zipcode_df = pd.read_excel(f)
# zipcode_df.head(2)
print(f'Number of unique zip codes: {len(zipcode_df['ZIP5'].unique())}')
# Group the dataframe by state column using groupby() method
group_df = zipcode_df.groupby('ZIP5')
# Generate list of the group zip codes (groupby keys)
group_keys = list(group_df.groups.keys())
# Loop over the keys and save each group to excel file
for s in group_keys:
# save_df = group_df.get_group('Abia')
save_df = group_df.get_group(s)
print(s, save_df.shape[0])
# make the file name, e.g: "Abia state.xlsx"
s_name = 'Colorado Zipcodes\\' + str(s) + '.xlsx'
save_df.to_excel(s_name, index=None)
# break
print('Done')
Further analysis can then continue from here.
Thank is it!
No comments:
Post a Comment