Forum


Replies: 2   Views: 231
Encoding problems on heroku transforming html
Topic closed:
Please note this is an old forum thread. Information in this post may be out-to-date and/or erroneous.
Every phpdocx version includes new features and improvements. Previously unsupported features may have been added to newer releases, or past issues may have been corrected.
We encourage you to download the current phpdocx version and check the Documentation available.

Posted by aventyret  · 27-06-2024 - 09:07

When I use embedHTML it works great locally. But when deploying to Heroku the characters come out weird. We're using the same php version on both (8.1.11)
 

$docx->embedHTML('<p>åäö testing chars ÅÄÖ</p>');

This works fine locally, but on Heroku it comes out  Ã¥Ã¤Ã¶ testing chars Ã…ÄÖ

Also the document is damaged and need to be repaired when downloading it.

Unfortunalty Heroku doesn't allow ext-tidy. So I have added 

'forceNotTidy' => true

But since I'm adding well formatted hard coded html this should be fine I'm guessing. Also it's working locally without Tidy.

Any ideas what this could be?

Posted by admin  · 27-06-2024 - 09:40

Hello,

We recommend using PHP Tidy when transforming HTML to DOCX.

If the server doesn't allow installing PHP Tidy, setting forceNotTidy as true avoids throwing the "PHP Tidy is not available" Exception. Please note that if you set forceNotTidy to false but PHP Tidy is available, it's used internally.

If forceNotTidy is enabled and PHP Tidy is not available, phpdocx uses DOMDocument to parse the HTML (using loadHTML). In this case (and your HTML contains UTF-8 contents), you need to set UTF-8 as the content charset. You can set the content charset in the imported HTML, for example:

$docx->embedHTML('<?xml encoding="utf-8" ?><p>åäö testing chars ÅÄÖ</p>');

or transform it before it's added:

$docx->embedHTML(mb_encode_numericentity('<p>åäö testing chars ÅÄÖ</p>', [0x80, 0x10FFFF, 0, ~0], 'UTF-8'));

In both cases, please check that PHP mbstring is installed and enabled.

Regards.

Posted by aventyret  · 28-06-2024 - 05:15

This did the trick. Thank you.

<?xml encoding="utf-8" ?>