Opened 6 years ago

Closed 6 years ago

#1884 closed improve feature (wontfix)

Add ISO 639-1 codes to languages

Reported by: planetcruiser Owned by:
Priority: major Milestone: unassigned
Component: BW Database Keywords:
Cc: shevek

Description (last modified by shevek)

Issue:

  • Our languages table uses non-standard abbreviations for codes (sometimes 2, sometimes 3 letters) in column "ShortCode"
  • When expressing the languages spoken by a member via API, so also when importing to Welen or elsewhere we need to do a mapping to a standard code

Suggestion:

  • Add a new column for ISO 639-1/T (2-letter codes, some languages missing)
  • If not all current languages are included in ISO 639-1: add a new column for ISO 639-2/T (3-letter codes) as well

Notes:

Related:

  • #1809 - might be a duplicate
  • #1828 - language selection based on browser language fails as language codes are wrong

Attachments (2)

languages.php (27.5 KB) - added by planetcruiser 6 years ago.
languages.ods (9.0 KB) - added by planetcruiser 6 years ago.

Download all attachments as: .zip

Change History (23)

Changed 6 years ago by planetcruiser

Changed 6 years ago by planetcruiser

comment:1 Changed 6 years ago by planetcruiser

  • Description modified (diff)

adding related #1809 - sorry, i didn't search for "language code" before creating this resourceful ticket ;)

comment:2 Changed 6 years ago by planetcruiser

  • Description modified (diff)

added condition for ISO 639-2 column

comment:3 Changed 6 years ago by shevek

  • Cc shevek added

comment:4 Changed 6 years ago by pablobd

Please also consider support for: Regionalization (example: pt-br brazilian portuguese) Non standard languages (example: en-pirate) these are real life examples based on requests made by the community

comment:5 Changed 6 years ago by shevek

  • Description modified (diff)

Added related ticket #1828

Last edited 6 years ago by shevek (previous) (diff)

comment:6 follow-up: Changed 6 years ago by shevek

I suggest to close this ticket. We should fix the content of the table instead of adding a second column to provide correct output. Especially as we rely on the correctness of the data when we do language matching with the browser's preferred language.

@pablobd: Please be aware that pt-BR is not only a language setting but also a locale setting. So the expectation might be different than only switching the language but also to change the date and number format.

comment:7 in reply to: ↑ 6 ; follow-up: Changed 6 years ago by planetcruiser

Replying to shevek:

I suggest to close this ticket. We should fix the content of the table instead of adding a second column to provide correct output. Especially as we rely on the correctness of the data when we do language matching with the browser's preferred language.

i expect a lot of surprises if we touch the existing codes, i would rather not do this, as rox is in maintenance mode and i don't want to cause too much extra work refactoring code that we will throw away soon, just to save a few kb database space.

the way of least resistance from api point of view and just add that column to offer standard codes. no big deal and it will produce 0 new bugs. what's the harm?

the browser language matching could be fixed by using the new column.

comment:8 follow-up: Changed 6 years ago by pablobd

why renaming a code will generate a bug? i'm for updating current data to match iso codes

as for brazilian language, it is not clear for me if it qualifies as a diferent language than portuguese, is just the same diference between español rioplatense and español from spain, so not so urgent. We might consider it for welen, the choice to have local dialects inside every language.

comment:9 in reply to: ↑ 8 Changed 6 years ago by planetcruiser

Replying to pablobd:

why renaming a code will generate a bug?

knowing rox, quite possibly in a lot of places. i simply don't know for sure and researching this will take more time than just adding correct codes.

just one example: wrong language codes are used in profile urls for translations. in theory someone could have linked to their profile in a certain language (e.g. http://www.bewelcome.org/members/pablobd/esp ). this link would break.

sorry, we used wrong codes for too long just to change them now. this will be straightened in welen. in order to export data to welen we need standard codes in rox. hence this ticket.

i'm for updating current data to match iso codes

sure, this could be done in other tickets, if deemed important. this ticket would serve as a foundation for possible other fixes.

comment:10 in reply to: ↑ 7 Changed 6 years ago by shevek

Replying to planetcruiser:

i expect a lot of surprises if we touch the existing codes, i would rather not do this, as rox is in maintenance mode and i don't want to cause too much extra work refactoring code that we will throw away soon, just to save a few kb database space.

I don't I already had a look at the code.

Rox will be in maintenance mode for the next three years if my gut feeling proves true. Not to fix this will lead to a number of disappointed members. That is exactly what maintenance is for.

the browser language matching could be fixed by using the new column.

Which would lead to a maintenance nightmare.

Regarding the link example: As long as no link rox generates fails it doesn't matter.

comment:11 follow-up: Changed 6 years ago by planetcruiser

what shall i say, both solutions (fixing the current column and deal with the fallout or adding 1 or 2 new columns with iso codes) are valid options. since adding new columns doesn't break anything, i think it's the best thing to do at this point.

my opinion in summary: i don't think we should close this ticket, but implement it to start solving the codes mismatch. i will probably look into this soon in order to move forward with the api.

others will be able to base further fixes on work done for this ticket. it is an incremental way of fixing the code mismatch issue.

possible process:

step 1: add new columns with valid iso codes

step 2: fix old bugs using new columns instead of old column one by one

step 3: decommission old column when all bugs are fixed

why would it be a maintenance nightmare?

the links were generated in the past, search engines contain them and people may have bookmarked them. fight link rot!

comment:12 in reply to: ↑ 11 ; follow-up: Changed 6 years ago by shevek

my opinion in summary: i don't think we should close this ticket, but implement it to start solving the codes mismatch. i will probably look into this soon in order to move forward with the api.

If we want to move to geonames as a table and not as a service we need to fix that code thing anyway. Why would I use the session language as a pointer into the languages table all the time.

I already have a SQL script to adapt the wrong codes and the words table.

All places where languages are used can easily be spotted. No need to add something new and maintain that instead. Adding new things opens a whole new field for bugs...

the links were generated in the past, search engines contain them and people may have bookmarked them. fight link rot!

But they break anyway in the future...

comment:13 in reply to: ↑ 12 Changed 6 years ago by planetcruiser

Replying to shevek:

If we want to move to geonames as a table and not as a service we need to fix that code thing anyway. Why would I use the session language as a pointer into the languages table all the time.

i don't understand the connection to geonames and session data. isn't geonames just about geographical data? or do they also map language codes to regions?

I already have a SQL script to adapt the wrong codes and the words table.

ok, if this goes live soon, good. the api can then just use the fixed codes. i just didn't expect that anyone has a solution for this near ready.

All places where languages are used can easily be spotted. No need to add something new and maintain that instead.

well, as of now we have a need: there is currently no way the api can return iso codes. but this just gives me an idea. i will simply do the language code mapping/fixing in the api model until the languages table in rox is fixed.

Adding new things opens a whole new field for bugs...

adding a new column in a database for a new feature does not add any bugs.

the links were generated in the past, search engines contain them and people may have bookmarked them. fight link rot!

But they break anyway in the future...

will they? when? we will need to redirect all decommissioned rox routes to new welen routes of course. possibly on webserver level. we can't just break the internet by knowingly cutting off links. but that's for another time. :)

comment:14 follow-up: Changed 6 years ago by shevek

Geonames alternate names uses iso 639-1 codes. So if we want to present the user with the right alternate names for his/her/I don't tell language we need the iso 639-1 codes.

Adding the column itself obviously doesn't add bugs using it does ;-)

You suggested to introduce a column fix everything using that one and than get rid of the faulty one. That would surely break the links.

comment:15 in reply to: ↑ 14 Changed 6 years ago by planetcruiser

Replying to shevek:

You suggested to introduce a column fix everything using that one and than get rid of the faulty one. That would surely break the links.

right, i missed that. so yeah, we need to keep the historic codes anyway then. which leads to what conclusion?? :)

comment:16 follow-up: Changed 6 years ago by shevek

That we should use our brains before posting? ;-)

No, really. We should not keep stupid data in tables to avoid future trouble. Someone might have used a direct deep link but that is rather unlikely.

Additionally I don't see any good reason to add a 'where language.shortcode = '$lang' and languages.realisocode = alternatenames.isocode'. That will surely lead to more bugs...

comment:17 in reply to: ↑ 16 ; follow-up: Changed 6 years ago by planetcruiser

Replying to shevek:

No, really. We should not keep stupid data in tables to avoid future trouble. Someone might have used a direct deep link but that is rather unlikely.

well, i suppose it's a price we pay for legacy support. we have a *lot* of stupid things in the database, the ShortCode is just the tip of the iceberg. that's why we rewrite. that's why i push the api forward.

i just counted 132 occurrences of "ShortCode" in rox, 92 of them outside ./htdocs/bw/ - i really, really don't feel like looking at them all, just to export standardised data via api. and without unit tests it would be lightheaded to simply change data underneath the application's feet (by updating codes in the ShortCode column).

the deep link issue is just one example. i am sure there will be more things where we need to support legacy codes.

it's just the general approach we agreed on, to let sleeping dogs lie and don't mess with rox code unless we really have to.

i personally don't want to "clean up rox", but "leave rox behind". but judging by your earlier comment we might have diverging goals here.

Additionally I don't see any good reason to add a 'where language.shortcode = '$lang' and languages.realisocode = alternatenames.isocode'. That will surely lead to more bugs...

where would we want to do this? we must of course find solutions, where this is not needed. but that's not what this ticket is about.

so, new column(s) now and everything else later. when we are at a point where we don't need any column any longer, we can remove them. but for a ShortCode -> iso639-1/2 migration different ways of looking up a language will be really helpful and needed.

in the time discussing this i would have done the matching and adding of column 2 times already. this is helping neither productivity nor fun. it's frustrating, let's move on.

if you have already prepared it, could you attach a matching table for ShortCode to iso639-1 please?

comment:18 in reply to: ↑ 17 ; follow-up: Changed 6 years ago by shevek

Replying to planetcruiser:

i just counted 132 occurrences of "ShortCode" in rox, 92 of them outside ./htdocs/bw/ - i really, really don't feel like looking at them all, just to export standardised data via api. and without unit tests it would be lightheaded to simply change data underneath the application's feet (by updating codes in the ShortCode column).

Did you also check where this ShortCode? is coming from. If it only comes from the DB I can't see a reason not to change it.

in the time discussing this i would have done the matching and adding of column 2 times already. this is helping neither productivity nor fun. it's frustrating, let's move on.

That's a bogus argument. Adding something just to get rid of it later on as discussion shows we didn't need to is frustrating as well.

if you have already prepared it, could you attach a matching table for ShortCode to iso639-1 please?

I thoroughly reviewed the SQL I had and updated it. You can find it attached to #1809.

comment:19 in reply to: ↑ 18 ; follow-up: Changed 6 years ago by planetcruiser

Replying to shevek:

Replying to planetcruiser:

i just counted 132 occurrences of "ShortCode" in rox, 92 of them outside ./htdocs/bw/ - i really, really don't feel like looking at them all, just to export standardised data via api. and without unit tests it would be lightheaded to simply change data underneath the application's feet (by updating codes in the ShortCode column).

Did you also check where this ShortCode? is coming from. If it only comes from the DB I can't see a reason not to change it.

such assumptions in rox are dangerous. since we don't have unit tests, please make sure to look at all 132 occurrences and review what the effects of your language code update could be. without such a review i can not approve such a complex update of the live db.

what will happen with links to profile translations with old codes? they just break? that would not be acceptable. we need redirects. or at least review how many such links are in use.

in the time discussing this i would have done the matching and adding of column 2 times already. this is helping neither productivity nor fun. it's frustrating, let's move on.

That's a bogus argument. Adding something just to get rid of it later on as discussion shows we didn't need to is frustrating as well.

please refrain from calling my personal opinion on the process "bogus".

if you have already prepared it, could you attach a matching table for ShortCode to iso639-1 please?

I thoroughly reviewed the SQL I had and updated it. You can find it attached to #1809.

i looked at it. it's a pretty complex fix in the live database. since we can't really test this beforehand, it will be quite stressful and cause extra admin work for last minute dumps, lots of testing etc. i really don't see why so much time is wasted with such uncritical work. an incremental fix as i suggested would have saved a lot of secondary work related to this.

comment:20 in reply to: ↑ 19 Changed 6 years ago by shevek

Did you also check where this ShortCode? is coming from. If it only comes from the DB I can't see a reason not to change it.

such assumptions in rox are dangerous. since we don't have unit tests, please make sure to look at all 132 occurrences and review what the effects of your language code update could be. without such a review i can not approve such a complex update of the live db.

That was my point. If the review shows that ShortCode? always relies on the DB to be filled it should be safe.

what will happen with links to profile translations with old codes? they just break? that would not be acceptable. we need redirects. or at least review how many such links are in use.

Outside or inside links? Inside links are build using the ShortCode? from the language table. Outside links to Swedisch would break now as the shortcode for Swedish was wrong and a new language exists for that.

I thoroughly reviewed the SQL I had and updated it. You can find it attached to #1809.

i looked at it. it's a pretty complex fix in the live database. since we can't really test this beforehand, it will be quite stressful and cause extra admin work for last minute dumps, lots of testing etc. i really don't see why so much time is wasted with such uncritical work. an incremental fix as i suggested would have saved a lot of secondary work related to this.

To get the API results right I wouldn't have cared.

But I did this for #1809 and #1828. We get complaints about this often enough to look into it.

If we ever switch to geonames with alternatenames Greek and Swedish people probably will be pretty annoyed when they realize that all other people can see country names etc. in their native language while they can't.

comment:21 Changed 6 years ago by planetcruiser

  • Resolution set to wontfix
  • Status changed from new to closed

will be fixed via #1809

Note: See TracTickets for help on using tickets.