Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#1809 closed improve feature (fixed)

Language codes should be iso639-1 throughout the system

Reported by: shevek Owned by: shevek
Priority: major Milestone: 1.6
Component: BW Internationalization Keywords:
Cc: jsfan

Description (last modified by shevek)

Some languages like Spanish and Catalan use three letter language codes instead of two letter ones (esp instead of es, cat instead of ca) which wouldn't matter if the languages wouldn't be used to get alternate names from geonames which doesn't work if the language code isn't correct.

Related:

Attachments (3)

fixrightsvolunteers.sql (1.6 KB) - added by shevek 6 years ago.
SQL script to update rightsvolunteers to match the updated language table
iso639fix.sql (31.7 KB) - added by shevek 6 years ago.
Patch languages and words table to use ietf tags.
signlangtrans.sql (21.6 KB) - added by shevek 6 years ago.
SQL to add sign languages to the words table

Download all attachments as: .zip

Change History (69)

comment:1 Changed 6 years ago by pablobd

I just discovered euskera (basque) language code is "basq" it should be "eu"

comment:2 Changed 6 years ago by pablobd

This bug is related to http://trac.bewelcome.org/ticket/1748 and mostly a bug since wrong iso codes give problems not only to the translators but when showing profile translations aswell (I will try to find or create a ticket for that).

comment:3 Changed 6 years ago by shevek

I don't think we need another bug report. As the fix for the geonames would fix the translation problem as well.

comment:4 Changed 6 years ago by planetcruiser

  • Description modified (diff)

comment:5 Changed 6 years ago by shevek

  • Description modified (diff)

#1828 is also related as the language recognition fails as basq isn't something that a webbrowser would sent.

comment:6 Changed 6 years ago by shevek

I reviewed the whole language table and compared it to the iso 639-1 list in wikipedia.

The result is the attached SQL.

Three languages used a ShortCode? that is reserved for another language. These languages where all missing.

comment:7 Changed 6 years ago by planetcruiser

as i already noted in #1884, we need a thorough review of the impact of this database update, before we can run it on the live db. suggestion: look at all occurrences of "ShortCode" in the rox code. i counted 132.

a list of templates or pages where ShortCodes are exposed to users is also needed, so we can decide if we need url redirects from legacy code to new code or similar sanitation.

Last edited 6 years ago by planetcruiser (previous) (diff)

comment:8 follow-up: Changed 6 years ago by shevek

  • Cc planetcruiser added
  • Owner set to shevek
  • Status changed from new to assigned

I looked at all occurences of ShortCode? in the code. That amounted to 158.

All tanslations (words table) are based on the ShortCode?, therefore the scripts needs to update that table as well to match the new code (already done).

All translations of profile items, forums and group descriptions are based on the ID of the language in the languages table and not the short code so these tables don't need to be touched.

Links I found that are presented to the user and that use the shortcode are:

  • Languages list at the bottom
  • Forum post translations
  • Profile languages
  • Admin words

All these are autogenerated from the content of the languages table so would stay intact after an update.

While I still believe a redirect isn't necessary:

If we decide to ignore the languages 'Ewe', 'Sami', and 'Sinhala' we could probably forward from their ShortCode? to the corresponding one (Estonian, Swedish and Slovak). Question would be where do we do that?

For the other wrong language codes the redirect wouldn't be problematic.

comment:9 in reply to: ↑ 8 ; follow-up: Changed 6 years ago by planetcruiser

Replying to shevek:

I looked at all occurences of ShortCode? in the code. That amounted to 158.

wow, you are serious about this! impressive. :)

Links I found that are presented to the user and that use the shortcode are:

oh, cool, that list is much smaller than i thought. in detail:

  • Languages list at the bottom

if non-existent code is used (e.g. /rox/in/blabla/main), english is loaded -> not critical

  • Forum post translations

if non-existent code is used (e.g. /rox/in/blabla/forums/s3189), english is loaded -> not critical

  • Profile languages

if non-existent code is used (e.g. /members/jsfan/bla), english is loaded -> not critical

  • Admin words

only exposed to translators -> not critical

While I still believe a redirect isn't necessary:

If we decide to ignore the languages 'Ewe', 'Sami', and 'Sinhala' we could probably forward from their ShortCode? to the corresponding one (Estonian, Swedish and Slovak). Question would be where do we do that?

since english and no 404 is shown as i anticipated, this is not critical.

For the other wrong language codes the redirect wouldn't be problematic.

same here. loading english instead of another language can be neglected. no redirects needed.

while your review revealed what you probably anticipated, i think it was needed to be sure we don't break anything critical. thanks! i bet it took quite a while..

i think the db update can be scheduled for a milestone now. to be on the safe side we must create a dump of the affected tables just before we upgrade (in addition to last night's backup). and *maybe* only allow access to the website from a couple of ip addresses while doing smoke tests. just to ensure that if data corruption occurs, we can roll back without losing live user data.

comment:10 in reply to: ↑ 9 Changed 6 years ago by shevek

Replying to shevek:

I looked at all occurences of ShortCode? in the code. That amounted to 158.

wow, you are serious about this! impressive. :)

I even bought a bigger screen with a hight resolution to ease the thing ;-)

while your review revealed what you probably anticipated, i think it was needed to be sure we don't break anything critical. thanks! i bet it took quite a while..

Luckily it wasn't that bad.

i think the db update can be scheduled for a milestone now. to be on the safe side we must create a dump of the affected tables just before we upgrade (in addition to last night's backup). and *maybe* only allow access to the website from a couple of ip addresses while doing smoke tests. just to ensure that if data corruption occurs, we can roll back without losing live user data.

How would be do that?

comment:11 Changed 6 years ago by pablobd

I'm trying to add a word in "bahasa malaysia" (code: zlm) but i get error "System error, please report the following timestamp along the error: [1358098415]"

comment:12 Changed 6 years ago by pablobd

the "rights" table stores the language code that translation volunteers are allowed to translate in the "scope" field, please correct this too when updating iso codes

comment:13 Changed 6 years ago by shevek

@pablobd: Thanks for the pointer. Do you give rights for each language or do you add them into the scope field with a delimiter?

Bahasa Melayu: The code in the DB is incorrect and I missed that while reviewing (Burmese used 'myb' and I couldn't update that to 'my'; so I thought Burmese is already in twice; turns out my was used for Malaysian which should be 'ms'). Therefore updated the iso639fix.sql file.

Which page did you use and what did you enter?

comment:14 follow-up: Changed 6 years ago by pablobd

Scope for translation rights can be "All" or a different code for each language ";" separated and between double quote example: "fr";"de";"en" but sometimes I see old righs using commas instead of semicolons, wich I don't know if they work I tried to add the new language to the footer by the following procedure: http://www.bewelcome.org/wiki/Group_BeWelcomeTranslators#line92 page http://www.bewelcome.org/bw/admin/adminwords.php I entered the wordcode "WelcomeToSignup?", the tranlsation in malaysian and the language code, I tried many language codes from the Malay family :D all give same error

comment:15 Changed 6 years ago by shevek

I could reproduce the problem. If you try to use an unknown shortcode like 'zlm' the system tries to get the language from the languages tables which fails. Instead of showing a nice error message it just breaks.

Please try again with 'my'. That should add 'Bahasa Melayu, بهاس ملاي.' to the footer.

comment:16 Changed 6 years ago by shevek

Well, that obviously work in a way, but the link is split.

The reason for that is that Hebrew is directly next to it and the second part of the language is also right to left.

Two ways to fix that. Delete the right-to-left part or change the FlagSortCriteria? for Hebrew so that it ends up before Malaysian. I'd suggest to do that as a hotfix.

comment:17 Changed 6 years ago by shevek

From #1852: Don't use Norsk but translate to bokmål and nynorsk.

Someone with knowhow in bokmål and nynorsk needs to figure out if our translation that says 'Norsk' is bokmål or nynnorsk. The words table should then updated accordingly from using 'no' to 'nb' or 'nn'.

Last edited 6 years ago by shevek (previous) (diff)

comment:18 Changed 6 years ago by pablobd

do we already have brazilian portuguese in the language list? some member argues there was a version but I can't remember are those translations still lying around?

comment:19 Changed 6 years ago by shevek

There is a 'Portuguese (br)' language in the language table but that used the ShortCode? 'br' which is reserved for Breton.

ISO 639-1 doesn't differentiate between Portuguese and Brazilian Portuguese.

comment:20 Changed 6 years ago by shevek

Just checked the current words table. There are 1147 words translated to Brazilian Portuguese already in the words table.

Someone removed the translation for 'WelcomeToSignup?' I suppose to hide it.

comment:21 Changed 6 years ago by pablobd

Can we add pt-br or we just wait for welen?

Personally I don't see any rush to include every possible regionalization or we will en udp with 20 spanish variations :D

If we do it there must be a way to show the parent language in case the regionalization is not translated, e.g. show word in pt if pt-br has no translation

Last edited 6 years ago by pablobd (previous) (diff)

comment:22 Changed 6 years ago by crumbking

Personally I suggest to simply follow: iso639-1. If there is no pt-br we shouldn't introduce it (again)

We should simply have a guideline we follow -> means iso639-1.

comment:23 Changed 6 years ago by shevek

I just read the article about Portuguese in the Wikipedia. Seems to me like someone from the US asking that we present the website in American English as well.

There is no specialized languages code for the Brazilian variant of Portuguese in iso 639 1 - 3.

comment:24 Changed 6 years ago by pablobd

Languages and regional variations are two diferent things.

For example in Spanish people don't talk the same in mexico, argentina or spain

Maybe we can say that for the moment we support "languages" as in the iso639 but we don't have support for regionalization, as in for every single variation

I think in the future we should have support for both

comment:25 in reply to: ↑ 14 Changed 6 years ago by shevek

Scope for translation rights can be "All" or a different code for each language ";" separated and between double quote example: "fr";"de";"en"

I don't know an automated way to fix the rights with a SQL script. Anyone else? Otherwise we have to just get a SQL dump; Fix it with search and replace and reimport that when we migrate.

comment:26 Changed 6 years ago by planetcruiser

what is the status here? we didn't run anything on the live db yet, right?

http://www.bewelcome.org/forums/s4604-Brazilian_portuguese_

however, i can select pt-BR in my prefs. no idea why it doesn't show up at the bottom of the page.

pt-BR not being in iso639 is a bit of a problem for a quick fix here. looks like we can't just change the short code then, but need another column to state pt-BR. are there other languages in our current languages table that don't have a iso639 code?

i vote for a new column, containing the standard codes used everywhere else, consisting of ISO 639-1 and ISO 3166-1 alpha-2 (IETF tag, http://en.wikipedia.org/wiki/IETF_language_tag) or maybe just ISO 3166-1 alpha-2, so we can build our own IETF tag using ShortCode and ISO 3166-1 alpha-2.

Last edited 6 years ago by planetcruiser (previous) (diff)

comment:27 Changed 6 years ago by shevek

@planetcruiser: Not done yet.

We have a ticket (#1852) that the list of preference languages should match the translated languages. Currently the list shows all entries from the languages table which leads to the faulty display of pt-BR.

@all: Could someone please review the attached SQL script and execute it and check the resulting language table against the list in the Wikipedia?

comment:28 Changed 6 years ago by shevek

  • Status changed from assigned to local_testing

Updated the SQL file once again. In lieu of the thread planetcruiser mentioned above I altered the languages and words table to allow for 5 chars for ShortCode? and moved all translations for Brazilian Portuguese to pt-BR.

pt-BR gets directly enabled by a hack as well.

Please test this locally so that we can schedule the update soon.

comment:29 follow-up: Changed 6 years ago by mikael

Is this ok ticket to discuss also about language names?

It was pointed out at the forum that some people might not be able to read in the language they speak, thus language list dropdown should have language name also in English.

comment:30 in reply to: ↑ 29 Changed 6 years ago by shevek

  • Description modified (diff)

Is this ok ticket to discuss also about language names?

No ;)

It was pointed out at the forum that some people might not be able to read in the language they speak, thus language list dropdown should have language name also in English.

That's a good comment for #1748. I added it there.

comment:31 Changed 6 years ago by mikael

@Shevek thanks. I'm a bit lost with all these tickets, it's so many! :-)

comment:32 Changed 6 years ago by pablobd

related: please rename language as requested in http://www.bewelcome.org/forums/s4967

comment:33 Changed 6 years ago by sitatara

From a support ticket: "Please add language "Schwizerdütsch" (for Swiss German) to language list: http://en.wikipedia.org/wiki/Swiss_German. Some 4.5 Million people speak it as their mother tongue."

I checked the ISO 639-1 list and it's not in there but I definitely support the introduction of Swiss German as a separate language on BeWelcome, especially for the list of spoken languages, since Swiss German is completely unintelligible for other German speakers. By the way, I find the ISO 639-1 codes extremely limited (especially for an international network!). Is this really what we want to use in the long run?

comment:34 Changed 6 years ago by shevek

Schwyzerdütsch (de-CH) could be enabled while enabling Brazilian Portuguese. The only problem I see there is which version we support? According to the wikipedia article we have 45 dialects for this.

comment:35 Changed 6 years ago by shevek

Checking the article again Schriftdeutsch Schwyterdütsch is not written at all only spoken. So we would support Schriftdeutsch (Swiss Standard German) only. Don't know how to put that into the languages list, though :)

comment:36 Changed 6 years ago by shevek

Updated the SQL again to change Chinese to '中文'.

comment:37 Changed 6 years ago by jsfan

I believe de-CH is Swiss Standard German which only differs from standard German in terms of vocabulary and the absence of ß. I believe that there would be none to very little difference in spellings otherwise.

While de-CH is not always intelligible to standard German speakers, Swiss German speakers do not usually have any problems understanding Standard German due to their common exposure to it.

comment:38 Changed 6 years ago by sitatara

My point was not to add a new version of the website in Swiss German but to have it in the list of spoken languages that a member can select (beacuse the spoken language is very different). Maybe I chose the wrong ticket to report this? Not sure if this is covered by any existing ticket yet - there are so many concerning the languages that I often don't know which one covers what exactly.

Last edited 6 years ago by sitatara (previous) (diff)

comment:39 Changed 6 years ago by jsfan

  • Status changed from local_testing to to_alpha

Is there anything to deploy here?

comment:40 Changed 6 years ago by jsfan

  • Status changed from to_alpha to testing

comment:41 Changed 6 years ago by jsfan

  • Status changed from testing to needs_work

To be honest, I'm not sure if the ticket comes with any code changes? Really we only have to fix the db, right?

comment:42 Changed 6 years ago by shevek

The idea planetcruiser laid out was that we do the DB update out of sync with the releases and ensure we have backups of all affected tables. Especially the rights tables would need an update that isn't yet done.

So this should be done after 1.5. While at the beginning I thought we need a code update as well, that isn't the case.

comment:43 follow-up: Changed 6 years ago by pablobd

2 questions:

  1. Can we please add sign languages? http://www.evertype.com/standards/iso639/sgn.html . Not needed for translations but useful for user profiles.
  2. What about URL refering to old language codes? do we get redirects to new language codes? like */esp/ -> */es/

comment:44 in reply to: ↑ 43 Changed 6 years ago by shevek

Replying to pablobd:

  1. Can we please add sign languages? http://www.evertype.com/standards/iso639/sgn.html . Not needed for translations but useful for user profiles.

Obviously that is for the profile languages and therefore another ticket. But shouldn't be a problem.

  1. What about URL refering to old language codes? do we get redirects to new language codes? like */esp/ -> */es/

Do we need that? Inside of BW there aren't such links as far as I'm aware.

Last edited 6 years ago by shevek (previous) (diff)

comment:46 Changed 6 years ago by shevek

Sorry, I wasn't clear there. What I meant is that all links to language specific settings are auto generated.

So if you reload your profile and switch to Spanish the link will be */es instead of */esp. Same for the forums.

I discussed that alreadyx with Meinhard in one of the earlier comments. We agreed that we don't need to redirect (and would fail anyway as there are clashes in the language list).

comment:47 Changed 6 years ago by shevek

  • Cc jsfan added; planetcruiser removed

@jsfan: I'd need the rights table to be able to prepare the migration. Could you send me a DB dump please?

@pablo: How do we schedule this. While I work on the rights table no new translators can be added.

comment:48 Changed 6 years ago by pablobd

Just let me know when to stop and when to resume

comment:49 Changed 6 years ago by pablobd

regarding chinese language, can we change the name to 中文 instead of 中文, 汉语, 漢語 ?

or better: can we create two chinese instances, one for traditional alphabet and one for simplified?

zh-cn for simplified and zh-tw for traditional

简体中文 = simplified Chineses

繁體中文 = traditional Chinese

is it posible to clone the existing one to not have to start from zero?

you can refer to http://www.bewelcome.org/groups/60/forum/s4967 (group only)

comment:50 Changed 6 years ago by shevek

I add the two variants to the list.

For Brazilian Portuguese I already copied some translations around so it shouldn't be a problem to start with a copy of the current Chinese for both.

comment:51 Changed 6 years ago by shevek

  • Milestone set to 1.6-proposed

comment:52 Changed 6 years ago by shevek

  • Status changed from needs_work to local_testing

Please test locally.

(one problem known is the wrong translation of brezhoneg).

comment:53 Changed 6 years ago by crumbking

Imported patch and worked locally. Quite some stuff changed so someone with a good eye should look over this sql patch before doing some magic on the live server. (And backup all tables)

Something else to do than sql test? Code probably in other ticket?

comment:54 Changed 6 years ago by shevek

@crumbking: This ticket is only about the language update. So no, only check the result for this.

The full beauty of it is only visible through 1748, 1828 and 1852.

Changed 6 years ago by shevek

SQL script to update rightsvolunteers to match the updated language table

comment:55 Changed 6 years ago by toub

I have a bug locally, not sure if it is a problem with my local install or not.

To reproduce, simply try to add a language to your profile: the translation links of existing profiles translations disappear.

comment:56 Changed 6 years ago by shevek

Couldn't reproduce that behaviour.

Any additional hints what might went wrong? Console log or something?

Changed 6 years ago by shevek

Patch languages and words table to use ietf tags.

comment:57 Changed 6 years ago by toub

Strange: I don't reproduce the bug anymore.

comment:58 Changed 6 years ago by shevek

  • Status changed from local_testing to to_alpha

comment:59 Changed 6 years ago by shevek

  • Status changed from to_alpha to testing

comment:60 Changed 6 years ago by jsfan

Breton appears labeled as Brazilian Portuguese for me.

comment:61 Changed 6 years ago by shevek

A member reported that 'Esperanto' should be in the list as 'Esperanto' and not as it is now as 'esperanton'. The following SQL query fixes this:

UPDATE languages SET Name = 'Esperanto' WHERE ShortCode = 'eo';

Changed 6 years ago by shevek

SQL to add sign languages to the words table

comment:62 Changed 6 years ago by shevek

Sign languages weren't translatable: https://www.gitorious.org/bewelcome/rox/commit/1524a198c2138a03d0c38cbf801d4109a0a04e38

Please execute attached SQL file signlangtrans.sql to add all sign languages to the words table.

Last edited 6 years ago by shevek (previous) (diff)

comment:63 Changed 6 years ago by shevek

SQL query was executed and code is deployed on alpha. Please check.

comment:64 Changed 6 years ago by shevek

Some duplicates were created on the live DB that will be cleaned up by jsfan using this query:

DELETE words FROM words LEFT OUTER JOIN (SELECT MIN(Id) AS lid, IdLanguage?, Code FROM words GROUP BY IdLanguage?,Code) AS keeprows ON words.id = keeprows.lid WHERE keeprows.lid IS NULL;

Anything else works as expected. If we find a language code that isn't correct we need to deal with that later on. Closing as fixed.

comment:65 Changed 6 years ago by shevek

  • Resolution set to fixed
  • Status changed from testing to closed

comment:66 Changed 6 years ago by jsfan

Cleaned up live database.

Note: See TracTickets for help on using tickets.