Forum


Replies: 8   Views: 3801
Utf-8 issues
Topic closed:
Please note this is an old forum thread. Information in this post may be out-to-date and/or erroneous.
Every phpdocx version includes new features and improvements. Previously unsupported features may have been added to newer releases, or past issues may have been corrected.
We encourage you to download the current phpdocx version and check the Documentation available.

Posted by Squad internet  · 12-10-2012 - 14:08

Hi guys,

I'm having a serious issue here and I don't know how to fix it.
I narrowed it down quite a bit and it comes to the following:

I have a html entitity (using a space between & and uuml; as I can't get it display properly ):
[code]
& uuml; => ü
[/code]

And I wanna write that to the document, the following options I tried (and failed)



[code]
$docx->addText(html_enties_decode('& uuml;'); // fails
$docx->addText(utf8_encode('& uuml;'); // fails
[/code]

When however I do the folllowing:
[code]
echo utf8_encode('& uuml;');
// Copy & paste the result from my browser into the following code:
$docx->addText('ü');
[/code]

It works,...

What is going on, what am I doing wrong?

Posted by Squad internet  · 11-04-2013 - 12:13

Update:

I changed the default char-encoding in my php config to utf-8.
Now when I use :

[code]
$docx->addText(utf8_encode(html_entity_decode('& uuml;'));
[/code]

I get it working for this single char.
However, when I call the same encode/decode chain with the actual string I need to use, it doesn't work anymore ;(

[code]
$docx->addText(utf8_encode(html_entity_decode($db['textField']));
[/code]

When I echo the $docx object, and view the src, it looks like proper encoding, but when writing it to a file and downloading it, it tells me there are "incorrect characters" in the document.

What is going on here,.. plz anyone know how to fix this?

Posted by Squad internet  · 11-04-2013 - 12:13

Yet another update:

It breaks on the ampasand sign (&).

[code]
$docx->addText('&'); // fails
[/code]
anyone?

Posted by Squad internet  · 11-04-2013 - 12:13

[code]
$docx->addText(str_replace('&', '& amp;', utf8_encode(html_entity_decode($db['textField'])));
[/code]

;( FML ;(

Posted by admin  · 11-04-2013 - 12:13

Hello,

If you're using an UTF8 editor there shouldn't be any error. Please remember that eclipse default charset isn't UTF8.
Another way to solve this error is to use setEncodeUTF8 method included in CreateDocx.inc class.

Regards.

Posted by Squad internet  · 11-04-2013 - 12:13

Hi,

Thanks for your reply.
My editor is set to UTF8, it really is, my database cols are set to utf8, but some of the contents might not actually be utf8 . In theory this however shouldn't be a serious problem, I should be able to encode it to utf8.

The issue here is (I think) the utf8_encode() method in php,.. I do believe it is rather buggy as I also was required to change the internal encoding of php to utf8 to store strings in memory as utf8 as encoding seemed to take place while outputting to the browser and the encoded string did not reside in memory as a multi-byte char, I think the utf8encode() only encodes the required chars (ea. htmlenties) to the proper encoding or something like that, else my issue wouldn't exist. I might try iconv() to encode after a while.. but the issue is fixed for now.

It might be a nice addon to the tutorial section about how to change the encoding in php in a proper way, as most of the scripts using this lib will be using database data that has been harvested using some kind of web-form.
Encoding in php is a really dark corner of the language, I hope this will change soon but for now,...


Thanks for your help in this issue, if I have a better solution to the issue (better than a str_replace()), I will reply to this post.

Posted by Squad internet  · 11-04-2013 - 12:13

Hi (again)

I figured out what the problem was ;)
I leftout a single header before sending, it seems that that has been the problem.

I forgot:
[code]
header('Content-Transfer-Encoding: binary');
[/code]

So all together, you'll need:

header('Content-type: application/vnd.openxmlformats-officedocument.wordprocessingml.document');
header('Content-Disposition: attachment; filename="report.docx"');
header('Content-Transfer-Encoding: binary');

Problem solved.

Posted by admin  · 11-04-2013 - 12:13

Hello,

Thanks for posting your solution.

Regards.