DataFrame – Split Excel Files

Last Updated on February 11, 2022 – 3 min read

Python allows you to load Excel files data into a DataFrame.

However, the DataFrame is stored in the machine memory. It is not visible in a workspace library with multiple tables as in a SQL IDE or in a SAS IDE.

How do you split a DataFrame into multiple Excel files?

We will illustrate the DataFrame split in Excel workbooks through the following data table of supermarket products.

We have a database with all products in a store. Each product belongs to a category. These categories are beverages, dairy, nonfood, personal care.

Product	Category	Stock
Water	Beverages	200
Egg	Dairy	75
Tickets	Nonfood	200
Cosmetics	Personal care	100
Soap	Personal care	50
Shampoo	Personal care	70
Tomato	Produce	150
Orange	Produce	250
Milk	Dairy	300
Coke	Beverages	30

The purpose is to load the DataFrame, get the list of all categories, split data by product category, and store each DataFrame separately in an Excel workbook.

Store each DataFrame in a dictionary using the product category for the key. Use a for loop to extract each DataFrame in a separate Excel workbook.

The code below shows you how to slice your Dataframe into multiple DataFrames stored in a dictionary and extract them into an excel workbook.

The code contains two functions, create_workbook_sheets and fill_data_workbook, as seen in the DataFrame – Input & Output article to create an Excel workbook and fill the workbook with data.

To run the code you need to import the following packages, Pandas for the DataFrame manipulation and Openpyxl for the Excel process.

The main procedure if __name__ == ‘__main__‘ run the code.

import pandas as pd
import openpyxl as op

def create_workbook_sheets(name_file_to_save, path_file_to_save):
    """
    Procedure to create and save the final Excel workbook and sheets
    --------------
    : param name_file_to_save: name of the file to save for the workbook
    : param path_file_to_save: the path to save the file
    """

    # create a new excel workbook
    wb = op.Workbook()
    wb.title = name_file_to_save
    sh_data = wb.active
    sh_data.title = "Results"

    # zoom & gridlines
    sh_data.sheet_view.zoomScale = 80
    sh_data.sheet_view.showGridLines = True

    # save the workbook
    wb.save(filename=path_file_to_save)


def fill_data_workbook(path_file_to_save, df_data):
    """
    Procedure to fill workbook sheet with DataFrame.
    ----------
    :param path_file_to_save : name of the file to save
    :param df_data : the DataFrame to write
    """

    # write DataFrame on an Excel file
    wb_result = op.load_workbook(path_file_to_save)
    with pd.ExcelWriter(path_file_to_save, engine="openpyxl", mode="a") as writer:
        writer.book = wb_result
        # ExcelWriter uses writer.sheets to access sheets of the workbook
        writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
        df_data.to_excel(writer, "Results", startrow=0, startcol=0, index=False)
        writer.save()

    # initialisation sheet
    sh_result = wb_result["Results"]
    sh_result.row_dimensions[1].height = 90

    # save workbook
    wb_result.save(filename=path_file_to_save)


if __name__ == '__main__':

    # initialisation input file
    path_file_database = "/Users/Name/Documents/Python/Slicing DataFrame/List_products.xlsx"

    # initialisation dataframe
    print("--> Step : Loading DataFrame")
    df_data = pd.read_excel(path_file_database, sheet_name="DATA", index_col=None)

    # list distinct categories
    list_categ = df_data["Category"].unique()

    # slicing DataFrame in multiple DataFrame stored in a dictionary
    dict_df = {elem: pd.DataFrame for elem in list_categ}

    for key in dict_df.keys():
        dict_df[key] = df_data[:][df_data.Category == key]

    # export each dataframe to a new workbook
    print("--> Step : Export Result to Excel")
    path_folder_to_save = "/Users/Name/Documents/Python/Slicing DataFrame/"

    for key in dict_df.keys():
        name_file_to_save = key
        path_file_to_save = path_folder_to_save + name_file_to_save + ".xlsx"
        create_workbook_sheets(name_file_to_save, path_file_to_save)
        fill_data_workbook(path_file_to_save, dict_df[key])

Congrats! You finally get the following files in your folder.

Share this: