Forum


Replies: 1   Views: 781
Indexer returns data with random spacing
Topic closed:
Please note this is an old forum thread. Information in this post may be out-to-date and/or erroneous.
Every phpdocx version includes new features and improvements. Previously unsupported features may have been added to newer releases, or past issues may have been corrected.
We encourage you to download the current phpdocx version and check the Documentation available.

Posted by ordersvrc  · 14-07-2022 - 14:20

While using the indexer class, my output occassionaly has random spacing somewhere in the returned body.  I cant seem to figure out why.  There are no hidden characters in the original document and I am not doing anything with the data after the fact. Example:

 

docx:    The quick brown fox jumps over the lazy dog
parsed: The quick bro wn fox jumps over the lazy dog

 

Hoping someone has had similar issues or knows how I can debug the original document to figure out the cause.

Posted by admin  · 14-07-2022 - 16:09

Hello,

The Indexer class returns text contents querying text (w:t) tags. A word can use one more w:t tags. Indexer divides each w:t tag using a blank space.

If you want to get words from a DOCX we recommend you to use the getWordContents method. Using this method you can query by paragraph (and other content types if needed) and return all text contents. For example:

$docx = new CreateDocxFromTemplate('document.docx');

// get the reference of the nodes to be returned
$referenceNode = array(
    'type' => 'paragraph',
);

$contents = $docx->getWordContents($referenceNode);

print_r($contents);

You can also use a wildcard:

// get the reference of the nodes to be returned
$referenceNode = array(
    'type' => '*',
);

This method returns an array with the contents.

Regards.