Opened 9 years ago

Closed 9 years ago

#262 closed improve feature (fixed)

SlowQuery with region.country_code

Reported by: jeanyves Owned by: jaliste
Priority: major Milestone: 0.1.3
Component: BW General Keywords: slowness indexes regions database
Cc: jaliste

Description

I just detected that the following query is extremely slow (average 25-30 second)

SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND cities.IdCountry=countries.Id AND regions.country_code='MD' GROUP BY regions.id ORDER BY regions.name;

the reason is that country_code is not indexed (to temporary solve the problem I created an index with on the real database).
May be this is one of the causes of the #261 ticket

this query should be write in another way i suggess :

SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND regions.IdCountry=countries.Id GROUP BY regions.id ORDER BY regions.name;

In general, in the BW database model, I have make indexes for almost all Id* things (when it make sense), the country_code is some redudancy because of geonames I think, it could have been used as an index, but I strongly suggess to keep the initial logic (use Id* to join tables). This make the code more readable (I think)and this is a good way to detect at first glance if a query use index or not.

remember that we must avoid to have a lot of index, it cost load for updates and load for memory. In general we must be thrifty with needed ressource

Could the programmer who made this request contact me just to make a survey for similar problems (in case the current indexes are not sufficient).

Please also look to the following report, and understand why ticket like the #164 (adding comment in all query to help to figure where they are used) is something very needed.

# Time: 071219 10:46:40
# User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
# Query_time: 28 Lock_time: 0 Rows_sent: 12 Rows_examined: 4126900
SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND cities.IdCountry=countries.Id AND regions.country_code='BE' GROUP BY regions.id ORDER BY regions.name;
# Time: 071219 10:46:46
# User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
# Query_time: 12 Lock_time: 0 Rows_sent: 2095 Rows_examined: 12741
select members.Username as Username,countries.id as id,countries.Name as CountryName,regions.Name as RegionName,cities.Name as CityName from (countries,members,cities) left join regions on (cities.IdRegion=regions.id) where members.IdCity=cities.id and members.Status='Active' and countries.id=cities.IdCountry order by countries.id,regions.id,cities.id;
# Time: 071219 10:48:22
# User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
# Query_time: 27 Lock_time: 0 Rows_sent: 12 Rows_examined: 4126900
SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND cities.IdCountry=countries.Id AND regions.country_code='BE' GROUP BY regions.id ORDER BY regions.name;
# Time: 071219 12:28:36
# User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
# Query_time: 26 Lock_time: 0 Rows_sent: 20 Rows_examined: 4126916
SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND cities.IdCountry=countries.Id AND regions.country_code='IT' GROUP BY regions.id ORDER BY regions.name;
# Time: 071219 12:29:04
# User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
# Query_time: 27 Lock_time: 0 Rows_sent: 20 Rows_examined: 4126916
SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND cities.IdCountry=countries.Id AND regions.country_code='IT' GROUP BY regions.id ORDER BY regions.name;
# Time: 071219 12:29:33
# User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
# Query_time: 27 Lock_time: 0 Rows_sent: 8 Rows_examined: 4126892
SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND cities.IdCountry=countries.Id AND regions.country_code='SK' GROUP BY regions.id ORDER BY regions.name;
# Time: 071219 12:30:11
# User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
# Query_time: 36 Lock_time: 0 Rows_sent: 20 Rows_examined: 4126916
SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND cities.IdCountry=countries.Id AND regions.country_code='IT' GROUP BY regions.id ORDER BY regions.name;
# Time: 071219 13:49:57
# User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
# Query_time: 27 Lock_time: 0 Rows_sent: 37 Rows_examined: 4126950
SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND cities.IdCountry=countries.Id AND regions.country_code='MD' GROUP BY regions.id ORDER BY regions.name;
# Time: 071219 14:26:23
# User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
# Query_time: 27 Lock_time: 0 Rows_sent: 50 Rows_examined: 4126976
SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND cities.IdCountry=countries.Id AND regions.country_code='US' GROUP BY regions.id ORDER BY regions.name;
# Time: 071219 15:40:18
# User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
# Query_time: 33 Lock_time: 0 Rows_sent: 20 Rows_examined: 4126916
SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND cities.IdCountry=countries.Id AND regions.country_code='IT' GROUP BY regions.id ORDER BY regions.name;
# Time: 071219 15:40:55
# User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
# Query_time: 30 Lock_time: 0 Rows_sent: 20 Rows_examined: 4126916
SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND cities.IdCountry=countries.Id AND regions.country_code='IT' GROUP BY regions.id ORDER BY regions.name;
# Time: 071219 15:44:33
# User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
# Query_time: 27 Lock_time: 0 Rows_sent: 20 Rows_examined: 4126916
SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND cities.IdCountry=countries.Id AND regions.country_code='IT' GROUP BY regions.id ORDER BY regions.name;
# Time: 071219 15:45:22
# User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
# Query_time: 40 Lock_time: 0 Rows_sent: 20 Rows_examined: 4126916
SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND cities.IdCountry=countries.Id AND regions.country_code='IT' GROUP BY regions.id ORDER BY regions.name;
# Time: 071219 16:37:19
# User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
# Query_time: 27 Lock_time: 0 Rows_sent: 33 Rows_examined: 4126942
SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND cities.IdCountry=countries.Id AND regions.country_code='IN' GROUP BY regions.id ORDER BY regions.name;
# Time: 071219 16:45:53
# User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
# Query_time: 48 Lock_time: 0 Rows_sent: 48 Rows_examined: 4126972
SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND cities.IdCountry=countries.Id AND regions.country_code='DZ' GROUP BY regions.id ORDER BY regions.name;
# Time: 071219 16:45:58
# User@Host: bewelcome[bewelcome] @ localhost []
# Query_time: 45 Lock_time: 0 Rows_sent: 48 Rows_examined: 4126972
SELECT regions.name AS region, regions.country_code AS country
FROM regions, cities, countries
WHERE cities.idregion = regions.id AND cities.IdCountry=countries.Id AND regions.country_code='DZ' GROUP BY regions.id ORDER BY regions.name;

Attachments (1)

patch (7.8 KB) - added by jaliste 9 years ago.
Work in progress patch

Download all attachments as: .zip

Change History (21)

comment:1 Changed 9 years ago by jeanyves

  • Component changed from unknown to BW General

comment:2 Changed 9 years ago by micha

Sorry, that I didn't reply earlier. This is definitely my fault. I created the query and still am not that familiar with the logic of Mysql-queries. Maybe we can go through the code together these days. Hope we find the time.

comment:3 Changed 9 years ago by jaliste

HI, I don't quite follow the difference between the first and the second query, becquse they are not doing the same thing. Does Mysql do any optimisations where the inner join is written like above? As we are using mysql maybe the join should be written

SELECT r.name AS region, r.country_code AS country
FROM regions as r INNER JOIN cities as c1 ON c1.idregion=r.id 
                  INNER JOIN countries as c2 ON c1.IdCountry=c2.Id 
WHERE regions.country_code='MD' GROUP BY regions.id ORDER BY regions.name;

Of course that you are not avoiding the full scan that is forced by the WHERE clause!(country_code being a string) As an alternative to the index solution( I agree with JY that should be kept to a minimum) maybe we could also look in another query for the idCountry for some specific countrycode and then do the query like

SELECT r.name AS region, r.country_code AS country
FROM regions as r INNER JOIN cities as c1 ON c1.idregion=r.id 
                  INNER JOIN countries as c2 ON c1.IdCountry=c2.Id 
WHERE c2.Id=1213 GROUP BY regions.id ORDER BY regions.name;

where 1213 is the id of the country corresponding to countrycode 'MD' and obtained by

SELECT CACHE Id from countries where countrycode = 'MD'

So this could be getted by PHP and then used to constructed the first query.
Does this make sense?

comment:4 Changed 9 years ago by jeanyves

They are several things which need to be update in this model (on sql point of view)

@Jaliste since no fields from the cities table are selected and regions country_code is used in the where this query don't need to use the cities table, if fact it browsed the whole cities table (fortunately only via indexes in fact).

comment:5 Changed 9 years ago by micha

  • Milestone changed from 0.1.1-outreach-bugfixing to 0.1.2 - more improvements & bugfixing
  • Priority changed from blocker to major
  • Type changed from bug to improve feature

mostly fixed. moving to new 0.1.2 to keep track and improve.

comment:6 Changed 9 years ago by jaliste

@JY, you are totally right. However, in the getRegions functions we could use:

SELECT regions.name  AS region, regions.country_code AS country 
FROM regions WHERE  regions.country_code='".$countrycode."' ORDER BY regions.name

instead of:

SELECT regions.name  AS region, regions.country_code AS country 
FROM regions, countries 
WHERE  regions.country_code='".$countrycode."' GROUP BY regions.id ORDER BY regions.name

??
My first comment is still valid I think. As IdCountry? and country_code are "foreign keys" either we keep the index on country_code either
we first get the id from the country and then
use

SELECT regions.name  AS region, regions.country_code AS country 
FROM regions WHERE  regions.IdCountry='".$idcountry."' ORDER BY regions.name

That should be IMO the better solution.

comment:7 Changed 9 years ago by jeanyves

@Jaliste yes I agree

I prefer if we use the IdCountry? / IdRegion? /IdCity? foreign keys, since they are links with primary keys

in the future geo system, I believe we should use the same IdXXXX rather than code

nota in terms of perfomance this will be better, but this is mainly theoric at human scale this will probably not be something we will notice. I just think that the country code is to be considerated as a data. If we don't use it as in the "where" I will remove the index on it.

comment:8 Changed 9 years ago by jaliste

@JY, please comment on the first part of my last comment, so I can change the corresponding query in the code.

comment:9 Changed 9 years ago by micha

  • Cc jaliste added

Jose, can't you just try it and then upload it? I would be happy to test it.

comment:10 Changed 9 years ago by jaliste

  • Owner set to jaliste
  • Status changed from new to assigned

@Micha,

I have a patch for country.model.php. However, it needs changes in the controller that I don't have time to do know so I can't upload it to the trunk!
I should have more time soon, but in the meantime I will upload the patch to this ticket. Please take a look at it. I simplified all the queries...

Changed 9 years ago by jaliste

Work in progress patch

comment:11 Changed 9 years ago by jaliste

  • follow_up changed from review code to test

Ok, I took care in [3804].

comment:12 Changed 9 years ago by jaliste

  • follow_up changed from test to review code

comment:13 Changed 9 years ago by jeanyves

  • follow_up changed from review code to test on alpha

reviewed and move to alpha

comment:14 Changed 9 years ago by jaliste

I'm not sure, but I think this fix is killing alpha!!!! When I click on browse countries alpha stops responding. Could someone revert to see if this is the problem?

comment:15 Changed 9 years ago by jaliste

Ok, the good one should be in [3819]!!! Actually I reverted the getAllcities to before my changes... So We should move this right to alpha please

comment:16 Changed 9 years ago by jeanyves

New problem :
In the slow queries we have :

     80 # Time: 080126  7:41:23
     81 # User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
     82 # Query_time: 268  Lock_time: 0  Rows_sent: 38355  Rows_examined: 76932
     83 SELECT regions.name  AS region, regions.country_code AS country
     84 FROM regions, countries
     85 WHERE  regions.country_code='US' GROUP BY regions.id ORDER BY regions.name;
     86 # Time: 080126  7:41:38
     87 # User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
     88 # Query_time: 253  Lock_time: 0  Rows_sent: 38355  Rows_examined: 76932
     89 SELECT regions.name  AS region, regions.country_code AS country
     90 FROM regions, countries
     91 WHERE  regions.country_code='US' GROUP BY regions.id ORDER BY regions.name;
     92 # Time: 080126  7:41:41
     93 # User@Host: bewelcome[bewelcome] @ localhost [127.0.0.1]
     94 # Query_time: 256  Lock_time: 0  Rows_sent: 38355  Rows_examined: 76932
     95 SELECT regions.name  AS region, regions.country_code AS country
     96 FROM regions, countries
     97 WHERE  regions.country_code='US' GROUP BY regions.id ORDER BY regions.name

This is because they are too much regions inside the USA, I suggess to keep only the states for now, and I think it is needed to do the same for other countries. Of course new geo will need to deal with this.

comment:17 Changed 9 years ago by jeanyves

I add a and regions.feature_code='ADM1' to reduce the number of region to the first level of regions

This as an emergency because, in fact they are 90000 regions in the database 35000 for US (with bout 50 with level ADM1), for belgium it is 2600 (belgian people are crazy), france 493 and Germany something like 1300. Since these area are populated with BW members, better to avoid to retrieve a regions page with so many useless records ...

comment:18 Changed 9 years ago by jaliste

Good move,
The difference between the version of GetAllCountries? in alpha and in test right now is that the version in test
shows all the countries where the version in alpha doesn't(it only shows countries with active users). The version in test is the same version that is in production right now(I really don't know why I wanted to change the query, big mistake as doing the outer join force a full scan of cities table!!!)

So, Which version do we keep? If it's version from test, please move changeset [3819] to alpha.

comment:19 Changed 9 years ago by matrixpoint

Using the DB Queries link at the bottom of the Algeria page, the regions query is

16: (7.4 ms) SELECT name AS region FROM regions WHERE regions.country_code='DZ' and regions.feature_code='ADM1' ORDER BY regions.name

Very fast now.

Ready for release.

comment:20 Changed 9 years ago by matrixpoint

  • follow_up changed from test on alpha to none
  • Resolution set to fixed
  • Status changed from assigned to closed

Released to production in r3913.

Note: See TracTickets for help on using tickets.