Rebuild an XLSX file in Python

I ran into a compatibility issue with the formatting in an Excel file. When I tried to open the file in a Python program using Pandas, as one does, I got the following error: ValueError: Colors must be aRGB hex values.

Very old versions of xlrd (version 1.2.0) can get around the color problem and open the file anyway, but Pandas requires version 2.0.1 or newer. I got around the problem by using an old xlrd to open the file and rebuild the file with a current version of xlsxwriter so Pandas could open the resulting file.

It turns out xlrd has companion utilties for editing and writing files, but they can only write in Excel 2003 format. If you need more than 65,535 rows, which I often do, that won’t work. Making xlrd and xlsxwriter work together isn’t as straightforward and fast, but it’s doable.

Here’s the code.

# pip install xlrd, or otherwise install xlrd. specify v1.2.0
from xlrd import open_workbook
import xlsxwriter

print("Opening xlsx file")
rb = open_workbook('zz.xlsx')
wb = xlsxwriter.Workbook('zz1.xlsx')

for i in rb.sheet_names():
    print(i)
    sheet = rb.sheet_by_name(i)
    out_sheet = wb.add_worksheet(i)
    col = 0
    for row in range(sheet.nrows):
        for j in sheet.row_values(row):
            out_sheet.write(row, col, j)
            col += 1
        col = 0
wb.close()
rb.release_resources()

And that’s it. Now you can make pivot tables and stuff with Pandas.

I have no idea who else will run into this problem, but here’s a solution if you do. Substitute your own filenames in lines 6 and 7, or implement something cleaner to take command line arguments if you’d like.

If you found this post informative or helpful, please share it!