Tag Archives: xlsx

Processing .docx and .xlsx files with Python

MS Office documents are probably one of the most inconvenient and poorly formalized data sources. It’s much better to keep all the data in specialized databases or at least in wiki. But in real life, MS Office documents, especially Excel and Word, are in active use in nearly every organization. Simply because it is a flexible and easy tool that anyone can use. That’s why it’s important to know an automated ways for processing such files.

Processing MS office files with python

You can easily edit .docx files without any libraries. Technically it’s just a zip archive. So, you can unzip it, make a replacement in the document.xml file and make a zip it again. It’s much better than dealing with old binary .doc files. But there are even more elegant ways.

Let’s says, we need to read data from .xlsx document and generate .docx files based on some existing template. To work with .xlsx files I will use openpyxl python library.

Continue reading