Forum


Replies: 3   Views: 566
How to extract xml from .doc or .docx ?
Topic closed:
Please note this is an old forum thread. Information in this post may be out-to-date and/or erroneous.
Every phpdocx version includes new features and improvements. Previously unsupported features may have been added to newer releases, or past issues may have been corrected.
We encourage you to download the current phpdocx version and check the Documentation available.

Posted by admin  · 13-03-2024 - 09:35

Hello,

Please note that DOCX documents include XML files and also other optional files such as images, XLSX, binary files... If you extract the DOCX file (a DOCX is a ZIP file), you can view the included files. A DOCX is not a single XML file but many XML contents that follow OOXML standard (https://en.wikipedia.org/wiki/Office_Open_XML).

phpdocx includes methods to extract information and XML contents from DOCX documents. For example getDocxPathQueryInfo, getWordStyles, indexer... You can also transform DOCX to HTML and other document formats.

Regards.