Tuesday, March 9, 2021

Spread column-wise data to row-wise

 On the web, it is very common to find dataset displayed in column-wise manner as seen below.


As you will already noticed, each new record is separated by the company name in bold capital letters. So, lets add a sign to separate one record from the other. I used this "------------------------" sign, but you can use anything as long as it is unique and not part of the records itself.

So, the working data copied from the website is like this:-

ACCESS CREDIT MANAGEMENT, INC.
Tim Cullen, Attorney
11225 Huron Ln Ste 222
Little Rock, AR 72211-1861
United States
Phone: (501) 664-2922
Fax: (501) 664-3207
MAP Attorney
------------------------
CREDIT CONTROL CO., INC.
Bill Caldwell, President
Bill Caldwell, Ethics Contact
10201 W Markham St Ste 104
Little Rock, AR 72205-2180
United States
Phone: (501) 225-2050
Fax: (501) 225-2135
ACA Member since 1982
Line of Business: Third Party Collections
------------------------
THE MCHUGHES LAW FIRM, PLLC
Becky A. McHughes Esq., Attorney at Law
10810 Executive Center Dr
Danville Bldg Ste 312
Little Rock, AR 72211
United States
Phone: (501) 376-9131
Fax: (501) 374-9332
http://www.mchugheslaw.com
ACA Member since 2013
Line of Business: Law Firm
Line of Business: Third Party Collections
------------------------
THE MCHUGHES LAW FIRM, PLLC
Becky A. McHughes Esq., Attorney at Law
10809 Executive Center Dr
Danville Bldg Ste 312
Little Rock, AR 72204
United States
Phone: (501) 376-9131
Fax: (501) 374-9332
MAP Attorney
Lowell
------------------------
CENTRAL RESEARCH, INC.
Karena Holt, Vice President of Operations
Karena Holt, Ethics Contact
122 N. Bloominton Ste 1
Lowell, AR 72745
United States
Phone: (479) 419-5456
Fax: (479) 419-5460
http://www.central-research.com
ACA Member since 2016
Line of Business: Third Party Collections
------------------------
CENTRAL RESEARCH, INC.
Shane Taylor
106 N Bloomington
Ste S
Lowell, AR 72745-8988
United States
Phone: (479) 419-5456
MAP Attorney
Mabelvale
------------------------
FIRST COLLECTION SERVICES
Chris Dunkum, President
Chris Dunkum, Ethics Contact
10925 Otter Creek East Blvd
Mabelvale, AR 72103-1661
United States
Phone: (501) 455-1658
http://www.FCScollects.com
ACA Member since 1983
Line of Business: Outsourced First Party or Billing Company
Line of Business: Third Party Collections


What we really want is something like this:-

ACCESS CREDIT MANAGEMENT, INC. :: Tim Cullen, Attorney :: 11225 Huron Ln Ste 222 :: Little Rock, AR 72211-1861 :: United States :: Phone: (501) 664-2922 :: Fax: (501) 664-3207 :: MAP Attorney
------------------------
CREDIT CONTROL CO., INC. :: Bill Caldwell, President :: Bill Caldwell, Ethics Contact :: 10201 W Markham St Ste 104 :: Little Rock, AR 72205-2180 :: United States :: Phone: (501) 225-2050 :: Fax: (501) 225-2135 :: ACA Member since 1982 :: Line of Business: Third Party Collections
------------------------
THE MCHUGHES LAW FIRM, PLLC
Becky A. McHughes Esq., Attorney at Law :: 10810 Executive Center Dr :: Danville Bldg Ste 312 :: Little Rock, AR 72211 :: United States :: Phone: (501) 376-9131 :: Fax: (501) 374-9332 :: http://www.mchugheslaw.com :: ACA Member since 2013 :: Line of Business: Law Firm :: Line of Business: Third Party Collections
------------------------
THE MCHUGHES LAW FIRM, PLLC :: Becky A. McHughes Esq., Attorney at Law :: 10809 Executive Center Dr :: Danville Bldg Ste 312 :: Little Rock, AR 72204 :: United States :: Phone: (501) 376-9131 :: Fax: (501) 374-9332 :: MAP Attorney :: Lowell
------------------------
CENTRAL RESEARCH, INC. :: Karena Holt, Vice President of Operations :: Karena Holt, Ethics Contact :: 122 N. Bloominton Ste 1 :: Lowell, AR 72745 :: United States :: Phone: (479) 419-5456 :: Fax: (479) 419-5460 :: http://www.central-research.com :: ACA Member since 2016 :: Line of Business: Third Party Collections
From vertical arrangement to horizontal arrangement. The horizontal (row-wise) arrangement, works best in spreadsheet. We will have common column for the same records.

What we have (vertical/column-wise arrangement)


What we want (horizontal/row-wise arrangement)



The Code
There are several ways to do this. Here, we will see two approaches, the first will split the data into chucks of list based on the separator string. While the second approach will make use of the groupby method from itertools package.


Read the data into a list
f = r"C:\Users\Yusuf_08039508010\Desktop\All in One.xlsx"
df = pd.read_excel(f, sheet_name='Sheet3')

# Convert the column-wise data to list... asign the separator string to a variable
mylist = df['Data'].to_list() #.split('=======')
mysepstring = '------------------------'

df




Approach 1:
# The separator string "mysepstring" most be repeated at the end...
def list_splitz(baseList, sepString):    
    group = []
    for x in baseList:
        if x != sepString:
            group.append(x)
        elif group:
            yield group
            group = []
            
# print(list(list_splitz(mylist, mysepstring)))

df_list1 = list(list_splitz(mylist, mysepstring))

display(len(df_list1))
display(df_list1)


[['ACCESS CREDIT MANAGEMENT, INC.',
  'Tim Cullen, Attorney',
  '11225 Huron Ln Ste 222',
  'Little Rock, AR 72211-1861',
  'United States',
  'Phone: (501) 664-2922',
  'Fax: (501) 664-3207',
  'MAP Attorney'],
 ['CREDIT CONTROL CO., INC.',
  'Bill Caldwell, President',
  'Bill Caldwell, Ethics Contact',
  '10201 W Markham St Ste 104',
  'Little Rock, AR 72205-2180',
  'United States',
  'Phone: (501) 225-2050',
  'Fax: (501) 225-2135',
  'ACA Member since 1982',
  'Line of Business: Third Party Collections'],
 ['THE MCHUGHES LAW FIRM, PLLC',
  'Becky A. McHughes Esq., Attorney at Law',
  '10810 Executive Center Dr',
  'Danville Bldg Ste 312',
  'Little Rock, AR 72211',
  'United States',
  'Phone: (501) 376-9131',
  'Fax: (501) 374-9332',
  'http://www.mchugheslaw.com',
  'ACA Member since 2013',
  'Line of Business: Law Firm',
  'Line of Business: Third Party Collections'],
 ['THE MCHUGHES LAW FIRM, PLLC',
  'Becky A. McHughes Esq., Attorney at Law',
  '10809 Executive Center Dr',
  'Danville Bldg Ste 312',
  'Little Rock, AR 72204',
  'United States',
  'Phone: (501) 376-9131',
  'Fax: (501) 374-9332',
  'MAP Attorney',
  'Lowell'],
 ['CENTRAL RESEARCH, INC.',
  'Karena Holt, Vice President of Operations',
  'Karena Holt, Ethics Contact',
  '122 N. Bloominton Ste 1',
  'Lowell, AR 72745',
  'United States',
  'Phone: (479) 419-5456',
  'Fax: (479) 419-5460',
  'http://www.central-research.com',
  'ACA Member since 2016',
  'Line of Business: Third Party Collections'],
 ['CENTRAL RESEARCH, INC.',
  'Shane Taylor',
  '106 N Bloomington',
  'Ste S',
  'Lowell, AR 72745-8988',
  'United States',
  'Phone: (479) 419-5456',
  'MAP Attorney',
  'Mabelvale'],
 ['FIRST COLLECTION SERVICES',
  'Chris Dunkum, President',
  'Chris Dunkum, Ethics Contact',
  '10925 Otter Creek East Blvd',
  'Mabelvale, AR 72103-1661',
  'United States',
  'Phone: (501) 455-1658',
  'http://www.FCScollects.com',
  'ACA Member since 1983',
  'Line of Business: Outsourced First Party or Billing Company',
  'Line of Business: Third Party Collections']]

Now, we can convert the list to a dataframe and subsequently to spreadsheet file.
df1 = pd.DataFrame(df_list1)
df1.to_excel('CleanData.xlsx', index=None)



Approach 2:

# No need for the separator string "mysepstring" to be repeated at the end...
from itertools import groupby


df_list2 = [list(g) for k, g in groupby(mylist, lambda x: x != mysepstring) if k]
len(df_list2)

df2 = pd.DataFrame(df_list2)

df2.to_excel('CleanData2.xlsx', index=None)


df2




That is it!

No comments:

Post a Comment