indexer

WORD CONTENTS

LAYOUT & SETTINGS

TEMPLATES

FORMAT CONVERSION

DOCXPATH

DOCXUTILITIES

DOCXPATHUTILITIES

PDFUTILITIES

DOCXCUSTOMIZER

customizeWordContent

PERFORMANCE

BULK PROCESSING

TRACKING

CRYPTOPHPDOCX

DIGITAL SIGNATURE

ARTIFICIAL INTELLIGENCE

XLSXUTILITIES

PPTXUTILITIES

BLOCKCHAIN

VARIANT

indexer

ADVANCED / PREMIUM

TRIAL

Parses Word documents and return their contents.

Description

Indexer ( mixed $source [, array $options = array() ] )

This class parses the content of DOCX documents and returns an array with these contents:

body
- charts
- images
- fields
- links
- text
comments
- images
- links
- text
endnotes
- images
- links
- text
fonts
footers
- images
- links
- text
footnotes
- images
- links
- text
headers
- images
- links
- text
information
people
properties
- core
- custom
sections
signatures
sources
styles
- docDefaults
- heading
- numbering
- style

Parameters

source

The path to the Word document to be parsed (Advanced and Premium licenses) or a DOCXStructure object (Premium licenses).

options

The possible keys and values of this array are:

Key	Type	Description
getExternalResourceContents	bool	Get external resource contents. Default as true.

Return values

array or json

Exceptions

Error while trying to open the (base) template as a zip file.

Code samples

Example #1:

require_once 'classes/Indexer.php';

$indexer = new Indexer('document.docx');
$output = $indexer->getOutput();

print_r('body: ');
print_r($output['body']['text']);

print_r('comments: ');
print_r($output['comments']);

print_r('endnotes: ');
print_r($output['endnotes']);

print_r('fonts: ');
print_r($output['fonts']);

print_r('footers: ');
print_r($output['footers']);

print_r('footnotes: ');
print_r($output['footnotes']);

print_r('headers: ');
print_r($output['headers']['text']);

print_r('body links: ');
print_r($output['body']['links']);

print_r('body charts: ');
print_r($output['body']['charts']);

print_r('headers images: ');
print_r($output['headers']['images']);

print_r('core properties: ');
print_r($output['properties']['core']);

print_r('custom properties: ');
print_r($output['properties']['custom']);

Release notes

phpdocx 16.0:
- Linked image contents are only returned if the external linked resource is http or https.
- getExternalResources option.

phpdocx 15.5:
- supported DOCM, DOTM and DOTX files.

phpdocx 14.5:
- heading style names, input fields and merge fields.

phpdocx 13.5:
- extra information (variant, macros...).
- get paragraph tags to avoid extra empty blank spaces returning text contents.

phpdocx 12.0:
- signatures and sources.

phpdocx 11.0:
- online images.

phpdocx 10.0:
- link contents from rels files.
- alt text titles and description contents from images.

phpdocx 9.0:
- people, sections and styles.
- in-memory DOCX documents.

phpdocx 8.5:
- fonts.

phpdocx 7.0:
- image sizes.
- charts.
- document properties.

phpdocx 6.5:
- new method.