Forum


Replies: 3   Views: 565
How to extract xml from .doc or .docx ?
Topic closed:
Please note this is an old forum thread. Information in this post may be out-to-date and/or erroneous.
Every phpdocx version includes new features and improvements. Previously unsupported features may have been added to newer releases, or past issues may have been corrected.
We encourage you to download the current phpdocx version and check the Documentation available.

Posted by kupa369  · 13-03-2024 - 07:32

 

I need a way to convert from doc and docx to xml. Can I use phpdocx to convert doc or docx to xml without any loss?

Posted by admin  · 13-03-2024 - 09:35

Hello,

Please note that DOCX documents include XML files and also other optional files such as images, XLSX, binary files... If you extract the DOCX file (a DOCX is a ZIP file), you can view the included files. A DOCX is not a single XML file but many XML contents that follow OOXML standard (https://en.wikipedia.org/wiki/Office_Open_XML).

phpdocx includes methods to extract information and XML contents from DOCX documents. For example getDocxPathQueryInfo, getWordStyles, indexer... You can also transform DOCX to HTML and other document formats.

Regards.

Posted by kupa369  · 14-03-2024 - 00:51

 

I asked the wrong question. I am dealing with Word documents from the 2003 version (doc). I want to convert doc documents to XML without any loss. Is there no way to directly map doc to XML instead of converting doc to docx and then extracting the XML?

Posted by admin  · 14-03-2024 - 08:38

Hello,

phpdocx doesn't include a direct conversion from DOC to XML. You need to transform DOC to DOCX using transformDocument to be able to use phpdocx methods to get XML contents and information.

Regards.