Opened 4 years ago

Last modified 3 years ago

#2104 new bug

Impossible to post message in some less used languages

Reported by: leoalone Owned by:
Priority: major Milestone: unassigned
Component: BW Forum Keywords: Forum messages utf8 translation groups
Cc:

Description

It is not possible to post neither in the forum nor as personal messages charachter that have more than 16 bit, that is in some less used languages (the ones outside basic multilanguage plan but also some chinese charachters).
Having one of such charachters (that in UTF8 are represented by four bytes sequences staring with 11110xxx) cuts at that point any message.

for example the chinese charachter 𠀫 would block the message.

Change History (5)

comment:1 Changed 4 years ago by lantti

This seems to be how the message and forum post text fields are set up to function in the database. If I choose an existing forum post or a private message and access the database directly to replace the text with the last line of leos report (the one with the strange character) the strange character and the rest of the text are dropped. I also receive a warning saying "Warning: #1366 Incorrect string value: '\xF0\xA0\x80\xAB L...' for column 'message' at row 1"

Unfortunately I don't understand mySQL configuration so I cannot say what needs to be changed or how big change that would be.

comment:2 Changed 4 years ago by lantti

  • Priority changed from critical to major

As far as I see at least all the three byte UTF-8 characters work and there are not really any useful characters in the 4-byte range (the character Leo supplied doesn't show correctly in Firefox, Chrome either. Only IE seems to print it correctly by default). Unless somebody knows how to configure mySQL to handle this, would it be a good enough fix to just filter away all the 4-byte characters as it is only those that mySQL does not accept? Anyway I am downgrading this ticket to a non-critical status as I cannot see how this bug could be abused to cause any harm or how this bug could hinder normal use.

Last edited 4 years ago by lantti (previous) (diff)

comment:3 Changed 4 years ago by steinwinde

The character set used in MySQL is utf8 (database, table default, column). If the column character set is changed to utf8mb4 (e.g. "alter table messages modify Message text character set utf8mb4"), text entered in form fields is not truncated anymore. Instead every byte of the special character is replaced by a question mark ("????").
To get rox to process/display these characters correctly, it's not enough to adjust the column. Probably the database needs "character_encoding_server = utf8mb4". Even this wasn't enough in my testing though.
Note that phpmyadmin has its own problems with 4-byte UTF8 and the mysql command line needs "set names utf8mb4;" before playing.

comment:4 Changed 3 years ago by leoalone

  • Component changed from unknown to BW Forum
  • Keywords Forum messages utf8 translation groups added

comment:5 Changed 3 years ago by leoalone

I found this solution and on a test version at home it works ...
see:

http://planet.mysql.com/entry/?id=673860

at least for alpha would be the case to make it work 😇 !

Note: See TracTickets for help on using tickets.