DataFrame – Input & Output

Last Updated on February 9, 2022 – 5 min read

In this article, you will discover how to load data as input from an Excel workbook and create a new Excel workbook as output to save your work with Python.

There are multiple ways to load an Excel workbook with Python depending on your file format.

The best way with Python to manipulate data is to use the DataFrame object. 

What is DataFrame in Python? A DataFrame is a data structure in two dimensions, a table of rows and named columns like an Excel spreadsheet or a SQL table. This object is present in multiple languages.

To manipulate DataFrame with Python, you need to use the Pandas library

To install pandas, you can use the pip install pandas command in your terminal, if the pip tool is installed on your machine. You can install the package in your conda environment if you use anaconda. Anaconda is a distribution for Python to simplify package management and deployment.

How do you import DataFrame into Excel? How do you read a pandas DataFrame in Excel?

Republic of Malta

Input – Load an Excel File ‘.xlsx’

To load an Excel file with data in a worksheet, you can use the read_excel method by specifying several arguments. 

The read_excel function supports the following extensions : xls, xlsx, xls, xlsb, odf, ods, odt. Excel workbooks are not opened. Data are automatically stored in the pandas DataFrame object. Be careful, do not have empty columns in your data table.

The example below demonstrates how you can use the read_excel function.

import pandas as pd

# input files - initialisation
path_file_database = "C:/Users/Name/INPUT_DATA.xlsx"

# loading dataframe
df_data = pd.read_excel(path_file_database, sheet_name="Data", index_col = None)

The df_data variable corresponds to the loading DataFrame containing your data from your excel sheet.

You can use the usecols argument to store one specified column in your DataFrame from your excel sheet.

# loading DataFrame
df_data = pd.read_excel(path_file_database, sheet_name="Data", index_col=None, uselcols="A:E")

You can consult the pandas documentation for more details on the read_excel function and his arguments.

Input – Load an Excel File ‘.csv’

To load an Excel file with a csv format, you can use the read_csv.

This function supports only the csv extension. Be careful to specify the separator of your data and the decimal separator.

import pandas as pd

# input files - initialisation
path_file_database = "C:/Users/Name/INPUT_DATA.xlsx"

# loading dataframe
df_data = pd.read_csv(path_file_database, sep=";", decimal=".", header=0)

Input – Load an Excel File ‘.csv’

Using pandas specific read functions you can also load many types of data such as : a SAS database with pd.read_sas, a json database with pd.read_json, a html database with pd.read_html, a xml database with pd.read_xml, a SQL database with pd.read_sql.


Output – Load an Excel File ‘.csv’

How do you convert a DataFrame to a worksheet?

As seen in the Run Script Python article, by using openpyxl library you can create an excel workbook. 

Below, is a function with two arguments: the name of the file to save and the path of the file to save to create a workbook with a sheet named “Data – Results”.

def create_workbook_sheets(name_file_to_save, path_file_to_save):
    """
    Procedure to create and save the final Excel workbook and sheets
    --------------
    : param name_file_to_save: name of the file to save for the workbook
    : param path_file_to_save: the path to save the file
    """

    # create a new excel workbook with one sheet
    wb = op.Workbook()
    wb.title = name_file_to_save
    sh_data = wb.active
    sh_data.title = "Data - Results"

   # zoom & gridlines
   sh_data.sheet_view.zoomScale = 90
   sh_data.sheet_view.showGridLines = True

    # save the workbook
    wb.save(filename = path_file_to_save)

The function can be called in your python script :

path_folder = "C:/Users/Name/"
name_file = "Data_Results"
create_workbook_sheets(name_file, path_folder)

Output – Write DataFrame in Excel workbook

Now, you have a new workbook. You need to write the result of your data analysis process in different Excel sheets.

The example below demonstrates how you can display a DataFrame object in a workbook

This function uses two arguments: the path of the excel workbook and the DataFrame. Use the ExcelWriter function of pandas. It allows you to write the DataFrame in an Excel sheet.

import openpyxl as op
import pandas as pd

def fill_data_workbook(path_file_to_save, df_data):
    """
    Procedure to fill workbook sheet of the final workbook with DataFrame.
    ----------
    :param path_file_to_save : name of the file to save
    :param df_data : the DataFrame to write
    """

    # write DataFrame on an Excel file
    wb_result = op.load_workbook(path_file_to_save)
    with pd.ExcelWriter(path_file_to_save, engine="openpyxl", mode="a") as writer:
        writer.book = wb_result
        # ExcelWriter uses writer.sheets to access sheets of the workbook
        writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
        df_data.to_excel(writer, "Data - Results)
        writer.save()
    
    # save workbook
    wb_result.save(filename = path_file_to_save)

Output – Layout sheet in Excel workbook

The example below demonstrates how you can layout your excel sheet using openpyxl

Load your workbook first and apply available functions on cells, ranges, rows, columns objects.

wb_result = op.load_workbook(path_file_to_save)

sh_data = wb_results["Data"]

# layout rows on sheet "Data"
last_row = sh_data.max_row
for row_range in range(1, last_row+1):
    sh_data.cell(row=row_range, column=1).fill = PatternFill(start_color="7030A0", end_color="7030A0", fill_type = "solid")
    sh_data.cell(row=row_range, column=1).font = Font(size=11, color="FFFFF", bold = True)

# layout named columns on sheet "Data"
for col_range in range(1, 4):
    sh_data.cell(row=1, column=col_range).fill = PatternFill(start_color="7030A0", end_color="7030A0", fill_type = "solid")
    sh_data.cell(row=1, column= col_range).font = Font(size=11, color="FFFFF", bold=True)
    sh_data.cell(row=1, column= col_range).alignment = Alignment(horizontal="center", vertical="center", wrap_text=True)

sh_data.row_dimensions[1].height=27
sh_data.column_dimensions["B"].width=10
sh_data.column_dimensions["B"].width=10

# save workbook
wb_result.save(filename=path_file_to_save)

Advertisement