Forum


Replies: 2   Views: 204
Encoding problems on heroku transforming html
Topic closed:
Please note this is an old forum thread. Information in this post may be out-to-date and/or erroneous.
Every phpdocx version includes new features and improvements. Previously unsupported features may have been added to newer releases, or past issues may have been corrected.
We encourage you to download the current phpdocx version and check the Documentation available.

Posted by admin  · 27-06-2024 - 09:40

Hello,

We recommend using PHP Tidy when transforming HTML to DOCX.

If the server doesn't allow installing PHP Tidy, setting forceNotTidy as true avoids throwing the "PHP Tidy is not available" Exception. Please note that if you set forceNotTidy to false but PHP Tidy is available, it's used internally.

If forceNotTidy is enabled and PHP Tidy is not available, phpdocx uses DOMDocument to parse the HTML (using loadHTML). In this case (and your HTML contains UTF-8 contents), you need to set UTF-8 as the content charset. You can set the content charset in the imported HTML, for example:

$docx->embedHTML('<?xml encoding="utf-8" ?><p>åäö testing chars ÅÄÖ</p>');

or transform it before it's added:

$docx->embedHTML(mb_encode_numericentity('<p>åäö testing chars ÅÄÖ</p>', [0x80, 0x10FFFF, 0, ~0], 'UTF-8'));

In both cases, please check that PHP mbstring is installed and enabled.

Regards.