Forum


Replies: 6   Views: 156
Indexer: check or exclude externals resources

Posted by Bouillou  · 28-11-2024 - 08:54

Hello,

Sometimes my customers upload templates containing external resources. When Indexer processes the template, it generates a warning because the external resources are not accessible.

I never want to use external resources.

Is it possible to check whether external resources are defined in the Word document?

Is it possible to exclude the use of external resources?

Best regards

Posted by admin  · 28-11-2024 - 09:38

Hello,

The OOXML standard handles all external (linked) images using the same tag and attributes. For example an external URL or external image file.

If you open a DOCX with inaccessible external resources with MS Word and other DOCX readers, they warn about the missing resources. The current stable version of Indexer works in the same way, and file_get_contents throws a PHP Warning when an external resource is not accessible.

Please note that the information returned by Indexer (arrays) can be unset to filter sensitive information from the document if you need to display some information to external users.
You can use getDocxPathQueryInfo to check if a DOCX contains external image resources and removeWordContent to remove them:

$docx = new CreateDocxFromTemplate('template.docx');

// remove external (linked) images
$referenceNode = array(
    'customQuery' => '//w:drawing[.//@r:link]',
);
$docx->removeWordContent($referenceNode);
$docx->createDocx('template_new');

We have added a task to the dev team to improve Indexer for this specific case.

To limit the files in the fs that PHP can access, you can use open_basedir (https://www.php.net/manual/en/ini.core.php#ini.open-basedir) in the PHP configuration.

Regards.

Posted by Bouillou  · 28-11-2024 - 10:53

Can you help me to understand how to use getDocxPathQueryInfo to check if external links exists ?

$referenceNode = array(
    'type' => '*',
);
$queryInfo = $this->docx->getDOCXPathQueryInfo($referenceNode);

queryInfo returns only 

array:3 [â–¼
  "elements" => DOMNodeList {#2300 â–¼
    +length: 35
  }
  "length" => 35
  "query" => "//w:body/*[1=1]"
]

 

Posted by admin  · 28-11-2024 - 10:55

Hello,

Please check our previous reply, we have updated it. You can use the following DOCXPath reference node to get linked image nodes:

$referenceNode = array(
    'customQuery' => '//w:drawing[.//@r:link]',
);

For example, to remove these nodes from the template:

$docx = new CreateDocxFromTemplate('template.docx');

// remove external (linked) images
$referenceNode = array(
    'customQuery' => '//w:drawing[.//@r:link]',
);
$docx->removeWordContent($referenceNode);
$docx->createDocx('template_new');

or to check if any exists:

$referenceNode = array(
    'customQuery' => '//w:drawing[.//@r:link]',
);
$queryInfo = $docx->getDOCXPathQueryInfo($referenceNode);

Regards.

Posted by Bouillou  · 28-11-2024 - 12:33

Using your code, I can see that external link is removed :

Before removeWordContent

array:3 [â–¼
  "elements" => DOMNodeList {#2377 â–¶}
  "length" => 1
  "query" => "//w:drawing[.//@r:link]"
]

After removeWordContent

array:3 [â–¼
  "elements" => DOMNodeList {#2535 â–¶}
  "length" => 0
  "query" => "//w:drawing[.//@r:link]"
]

but after saving, the final document has no longer the image as expected, but it still containing the external link in "word\_rels\document.xml.rels" and the Indexer is throwing a warning.

<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"><Relationship Id="rId8" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="file:///\\SVFHVA\Commun\.......

Posted by admin  · 28-11-2024 - 13:02

Hello,

removeWordcontent removes the nodes from document.xml, not from the xml.rels file.

For further support, please send a DOCX to contact[at]phpdocx.com and we'll test the same code with it.

Regards.

Posted by Bouillou  · 28-11-2024 - 14:19

OK, I sent the file and a script to reproduce the problem.