DataFrame – Split Excel Files

Last Updated on February 11, 2022 – 3 min read

Python allows you to load Excel files data into a DataFrame.

However, the DataFrame is stored in the machine memory. It is not visible in a workspace library with multiple tables as in a SQL IDE or in a SAS IDE.

How do you split a DataFrame into multiple Excel files?

Island – Photo taken by me

We will illustrate the DataFrame split in Excel workbooks through the following data table of supermarket products.

We have a database with all products in a store. Each product belongs to a category. These categories are beverages, dairy, nonfood, personal care.

ProductCategoryStock
WaterBeverages200
EggDairy75
TicketsNonfood200
CosmeticsPersonal care100
SoapPersonal care50
ShampooPersonal care70
TomatoProduce150
OrangeProduce250
MilkDairy300
CokeBeverages30

The purpose is to load the DataFrame, get the list of all categories, split data by product category, and store each DataFrame separately in an Excel workbook.

Store each DataFrame in a dictionary using the product category for the key. Use a for loop to extract each DataFrame in a separate Excel workbook.

The code below shows you how to slice your Dataframe into multiple DataFrames stored in a dictionary and extract them into an excel workbook.

The code contains two functions, create_workbook_sheets and fill_data_workbook, as seen in the DataFrame – Input & Output article to create an Excel workbook and fill the workbook with data. 

To run the code you need to import the following packages, Pandas for the DataFrame manipulation and Openpyxl for the Excel process.

The main procedure if __name__ == ‘__main__‘ run the code. 

import pandas as pd
import openpyxl as op

def create_workbook_sheets(name_file_to_save, path_file_to_save):
    """
    Procedure to create and save the final Excel workbook and sheets
    --------------
    : param name_file_to_save: name of the file to save for the workbook
    : param path_file_to_save: the path to save the file
    """

    # create a new excel workbook
    wb = op.Workbook()
    wb.title = name_file_to_save
    sh_data = wb.active
    sh_data.title = "Results"

    # zoom & gridlines
    sh_data.sheet_view.zoomScale = 80
    sh_data.sheet_view.showGridLines = True

    # save the workbook
    wb.save(filename=path_file_to_save)


def fill_data_workbook(path_file_to_save, df_data):
    """
    Procedure to fill workbook sheet with DataFrame.
    ----------
    :param path_file_to_save : name of the file to save
    :param df_data : the DataFrame to write
    """

    # write DataFrame on an Excel file
    wb_result = op.load_workbook(path_file_to_save)
    with pd.ExcelWriter(path_file_to_save, engine="openpyxl", mode="a") as writer:
        writer.book = wb_result
        # ExcelWriter uses writer.sheets to access sheets of the workbook
        writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
        df_data.to_excel(writer, "Results", startrow=0, startcol=0, index=False)
        writer.save()

    # initialisation sheet
    sh_result = wb_result["Results"]
    sh_result.row_dimensions[1].height = 90

    # save workbook
    wb_result.save(filename=path_file_to_save)


if __name__ == '__main__':

    # initialisation input file
    path_file_database = "/Users/Name/Documents/Python/Slicing DataFrame/List_products.xlsx"

    # initialisation dataframe
    print("--> Step : Loading DataFrame")
    df_data = pd.read_excel(path_file_database, sheet_name="DATA", index_col=None)

    # list distinct categories
    list_categ = df_data["Category"].unique()

    # slicing DataFrame in multiple DataFrame stored in a dictionary
    dict_df = {elem: pd.DataFrame for elem in list_categ}

    for key in dict_df.keys():
        dict_df[key] = df_data[:][df_data.Category == key]

    # export each dataframe to a new workbook
    print("--> Step : Export Result to Excel")
    path_folder_to_save = "/Users/Name/Documents/Python/Slicing DataFrame/"

    for key in dict_df.keys():
        name_file_to_save = key
        path_file_to_save = path_folder_to_save + name_file_to_save + ".xlsx"
        create_workbook_sheets(name_file_to_save, path_file_to_save)
        fill_data_workbook(path_file_to_save, dict_df[key])

Congrats! You finally get the following files in your folder.