3. Analysis

3.1. OSM Data extraction

The objective of this step is to extract needed data to specific format like CSV, JSON or SQLite. For my analysis, I will use the SQLite database. Nevertheless, I create feature allowing me to export data to CSV or SQLite.

For each extract some OSM data are rejected due to several reasons. The export program give information about number, reason and ratio of rejected data (node or way xml field). The goal is to realise an anlysis the representative amount of data.

3.1.1. OSM database extract to CSV

To export OSM data to CSV file, I use the OSMToCSV.py main programme with OSM file as argument. It take many time to export all data. The number of extracted data is 677248 and the number of rejected data is 459. We have a ratio : 99.93%.

We can consider that the amount of data is representative.

.\OSMToCSV.py
usage: OSMToCSV.py [-h] osm folder
OSMToCSV.py: error: the following arguments are required: osm, folder
.\OSMToCSV.py .\Database\marseille.osm .\Database\
Wrong element: Element ({'node': {'id': '26761400', 'lat': '43.2961743', 'lon': '5.3699525', 'user': 'bjankuloski', 'uid': '158784', 'version': '56', 'changeset': '74709450', 'timestamp': '2019-09-20T08:44:28Z'}, 'node_tags': [{'id': '26761400', 'value': '13000;13001;13002;13003;13004;13005;13006;13007;13008;13009;13010;13011;13012;13013;13014;13015;13016', 'key': 'postcode', 'type': 'addr'}, {'id': '26761400', 'value': 'Mác-xây', 'key': 'vi', 'type': 'alt_name'}, {'id': '26761400', 'value': '4', 'key': 'capital', 'type': 'regular'}, {'id': '26761400', 'value': 'France', 'key': 'country', 'type': 'is_in'}, {'id': '26761400', 'value': 'FR', 'key': 'country_code', 'type': 'is_in'}, {'id': '26761400', 'value': 'FR-U', 'key': 'iso_3166_2', 'type': 'is_in'}, {'id': '26761400', 'value': 'Marseille', 'key': 'name', 'type': 'regular'}, {'id': '26761400', 'value': 'مارسيليا', 'key': 'ar', 'type': 'name'}, {'id': '26761400', 'value': 'Marsella', 'key': 'ast', 'type': 'name'}, {'id': '26761400', 'value': 'Марсилия', 'key': 'bg', 'type': 'name'}, {'id': '26761400', 'value': 'Marsilha', 'key': 'br', 'type': 'name'}, {'id': '26761400', 'value': 'Marsella', 'key': 'ca', 'type': 'name'}, {'id': '26761400', 'value': 'Marseille', 'key': 'de', 'type': 'name'}, {'id': '26761400', 'value': 'Μασσαλία', 'key': 'el', 'type': 'name'}, {'id': '26761400', 'value': 'Marseille', 'key': 'en', 'type': 'name'}, {'id': '26761400', 'value': 'Marsejlo', 'key': 'eo', 'type': 'name'}, {'id': '26761400', 'value': 'Marsella', 'key': 'es', 'type': 'name'}, {'id': '26761400', 'value': 'Marseille', 'key': 'fr', 'type': 'name'}, {'id': '26761400', 'value': 'מרסיי', 'key': 'he', 'type': 'name'}, {'id': '26761400', 'value': 'Marsiglia', 'key': 'it', 'type': 'name'}, {'id': '26761400', 'value': 'マルセイユ', 'key': 'ja', 'type': 'name'}, {'id': '26761400', 'value': 'ಮಾರ್ಸೇಯ', 'key': 'kn', 'type': 'name'}, {'id': '26761400', 'value': 'Marsêy', 'key': 'ku', 'type': 'name'}, {'id': '26761400', 'value': 'Massalia', 'key': 'la', 'type': 'name'}, {'id': '26761400', 'value': 'Marselis', 'key': 'lt', 'type': 'name'}, {'id': '26761400', 'value': 'Марсеј', 'key': 'mk', 'type': 'name'}, {'id': '26761400', 'value': 'Marselha', 'key': 'oc', 'type': 'name'}, {'id': '26761400', 'value': 'Marsylia', 'key': 'pl', 'type': 'name'}, {'id': '26761400', 'value': 'Marselha', 'key': 'pt', 'type': 'name'}, {'id': '26761400', 'value': 'Марсель', 'key': 'ru', 'type': 'name'}, {'id': '26761400', 'value': 'Марсељ', 'key': 'sr', 'type': 'name'}, {'id': '26761400', 'value': 'Marsilya', 'key': 'tr', 'type': 'name'}, {'id': '26761400', 'value': 'Марсель', 'key': 'tt', 'type': 'name'}, {'id': '26761400', 'value': 'Марсель', 'key': 'uk', 'type': 'name'}, {'id': '26761400', 'value': 'Mạc Xây', 'key': 'vi', 'type': 'name'}, {'id': '26761400', 'value': '馬賽', 'key': 'zh', 'type': 'name'}, {'id': '26761400', 'value': 'city', 'key': 'place', 'type': 'regular'}, {'id': '26761400', 'value': '855393', 'key': 'population', 'type': 'regular'}, {'id': '26761400', 'value': '211300553', 'key': 'FR:SIREN', 'type': 'ref'}, {'id': '26761400', 'value': '13055', 'key': 'INSEE', 'type': 'ref'}, {'id': '26761400', 'value': 'FRMRS', 'key': 'LOCODE', 'type': 'ref'}, {'id': '26761400', 'value': 'ofis publik ar brezhoneg', 'key': 'name:br', 'type': 'source'}, {'id': '26761400', 'value': 'INSEE 2013', 'key': 'population', 'type': 'source'}, {'id': '26761400', 'value': 'Q23482', 'key': 'wikidata', 'type': 'regular'}, {'id': '26761400', 'value': 'fr:Marseille', 'key': 'wikipedia', 'type': 'regular'}]} ) has the following errors:{'node_tags': [{1: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 7: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 9: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 13: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 18: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 20: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 21: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 25: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 29: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 30: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 32: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 33: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 34: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 35: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}]}]}

number of items 99.90% (read: 999 | reject 1)
Wrong element: Element ({'node': {'id': '66891736', 'lat': '36.8967845', 'lon': '7.7646249', 'user': 'ملسبكو', 'uid': '9535554', 'version': '8', 'changeset': '75123155', 'timestamp': '2019-10-01T00:00:21Z'}, 'node_tags': [{'id': '66891736', 'value': 'عنابة', 'key': 'city', 'type': 'addr'}, {'id': '66891736', 'value': 'ferry_terminal', 'key': 'amenity', 'type': 'regular'}, {'id': '66891736', 'value': 'yes', 'key': 'ferry', 'type': 'regular'}, {'id': '66891736', 'value': 'Annaba عنابة', 'key': 'name', 'type': 'regular'}, {'id': '66891736', 'value': 'عنابة', 'key': 'ar', 'type': 'name'}, {'id': '66891736', 'value': 'Annaba', 'key': 'de', 'type': 'name'}, {'id': '66891736', 'value': 'Annaba', 'key': 'en', 'type': 'name'}, {'id': '66891736', 'value': 'station', 'key': 'public_transport', 'type': 'regular'}]} ) has the following errors:{'node': [{'user': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 'node_tags': [{0: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 3: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 4: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}]}]}

number of items 99.90% (read: 1998 | reject 2)
...
...
number of items 99.93% (read: 677248 | reject 459)

Note

Improvement: We can reduce the number of rejected data by analysing the reason of each wrong elements.

3.1.2. OSM database extract to SQLlite

To export OSM data to SQLite database, I use the OSMToSQL.py main programme with OSM file as argument. It take many time to export all data. The number of extracted data is the same as CSV: 677248 and the number of rejected data is 459. We have a ratio : 99.93%.

We can consider that the amount of data is representative.

.\OSMToSQL.py
usage: OSMToSQL.py [-h] osm sql
OSMToSQL.py: error: the following arguments are required: osm, sql
.\OSMToSQL.py .\Database\marseille.osm .\Database\marseille.db
Wrong element: Element ({'node': {'id': '26761400', 'lat': '43.2961743', 'lon': '5.3699525', 'user': 'bjankuloski', 'uid': '158784', 'version': '56', 'changeset': '74709450', 'timestamp': '2019-09-20T08:44:28Z'}, 'node_tags': [{'id': '26761400', 'value': '13000;13001;13002;13003;13004;13005;13006;13007;13008;13009;13010;13011;13012;13013;13014;13015;13016', 'key': 'postcode', 'type': 'addr'}, {'id': '26761400', 'value': 'Mác-xây', 'key': 'vi', 'type': 'alt_name'}, {'id': '26761400', 'value': '4', 'key': 'capital', 'type': 'regular'}, {'id': '26761400', 'value': 'France', 'key': 'country', 'type': 'is_in'}, {'id': '26761400', 'value': 'FR', 'key': 'country_code', 'type': 'is_in'}, {'id': '26761400', 'value': 'FR-U', 'key': 'iso_3166_2', 'type': 'is_in'}, {'id': '26761400', 'value': 'Marseille', 'key': 'name', 'type': 'regular'}, {'id': '26761400', 'value': 'مارسيليا', 'key': 'ar', 'type': 'name'}, {'id': '26761400', 'value': 'Marsella', 'key': 'ast', 'type': 'name'}, {'id': '26761400', 'value': 'Марсилия', 'key': 'bg', 'type': 'name'}, {'id': '26761400', 'value': 'Marsilha', 'key': 'br', 'type': 'name'}, {'id': '26761400', 'value': 'Marsella', 'key': 'ca', 'type': 'name'}, {'id': '26761400', 'value': 'Marseille', 'key': 'de', 'type': 'name'}, {'id': '26761400', 'value': 'Μασσαλία', 'key': 'el', 'type': 'name'}, {'id': '26761400', 'value': 'Marseille', 'key': 'en', 'type': 'name'}, {'id': '26761400', 'value': 'Marsejlo', 'key': 'eo', 'type': 'name'}, {'id': '26761400', 'value': 'Marsella', 'key': 'es', 'type': 'name'}, {'id': '26761400', 'value': 'Marseille', 'key': 'fr', 'type': 'name'}, {'id': '26761400', 'value': 'מרסיי', 'key': 'he', 'type': 'name'}, {'id': '26761400', 'value': 'Marsiglia', 'key': 'it', 'type': 'name'}, {'id': '26761400', 'value': 'マルセイユ', 'key': 'ja', 'type': 'name'}, {'id': '26761400', 'value': 'ಮಾರ್ಸೇಯ', 'key': 'kn', 'type': 'name'}, {'id': '26761400', 'value': 'Marsêy', 'key': 'ku', 'type': 'name'}, {'id': '26761400', 'value': 'Massalia', 'key': 'la', 'type': 'name'}, {'id': '26761400', 'value': 'Marselis', 'key': 'lt', 'type': 'name'}, {'id': '26761400', 'value': 'Марсеј', 'key': 'mk', 'type': 'name'}, {'id': '26761400', 'value': 'Marselha', 'key': 'oc', 'type': 'name'}, {'id': '26761400', 'value': 'Marsylia', 'key': 'pl', 'type': 'name'}, {'id': '26761400', 'value': 'Marselha', 'key': 'pt', 'type': 'name'}, {'id': '26761400', 'value': 'Марсель', 'key': 'ru', 'type': 'name'}, {'id': '26761400', 'value': 'Марсељ', 'key': 'sr', 'type': 'name'}, {'id': '26761400', 'value': 'Marsilya', 'key': 'tr', 'type': 'name'}, {'id': '26761400', 'value': 'Марсель', 'key': 'tt', 'type': 'name'}, {'id': '26761400', 'value': 'Марсель', 'key': 'uk', 'type': 'name'}, {'id': '26761400', 'value': 'Mạc Xây', 'key': 'vi', 'type': 'name'}, {'id': '26761400', 'value': '馬賽', 'key': 'zh', 'type': 'name'}, {'id': '26761400', 'value': 'city', 'key': 'place', 'type': 'regular'}, {'id': '26761400', 'value': '855393', 'key': 'population', 'type': 'regular'}, {'id': '26761400', 'value': '211300553', 'key': 'FR:SIREN', 'type': 'ref'}, {'id': '26761400', 'value': '13055', 'key': 'INSEE', 'type': 'ref'}, {'id': '26761400', 'value': 'FRMRS', 'key': 'LOCODE', 'type': 'ref'}, {'id': '26761400', 'value': 'ofis publik ar brezhoneg', 'key': 'name:br', 'type': 'source'}, {'id': '26761400', 'value': 'INSEE 2013', 'key': 'population', 'type': 'source'}, {'id': '26761400', 'value': 'Q23482', 'key': 'wikidata', 'type': 'regular'}, {'id': '26761400', 'value': 'fr:Marseille', 'key': 'wikipedia', 'type': 'regular'}]} ) has the following errors:{'node_tags': [{1: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 7: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 9: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 13: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 18: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 20: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 21: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 25: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 29: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 30: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 32: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 33: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 34: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 35: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}]}]}

number of items 99.90% (read: 999 | reject 1)
Wrong element: Element ({'node': {'id': '66891736', 'lat': '36.8967845', 'lon': '7.7646249', 'user': 'ملسبكو', 'uid': '9535554', 'version': '8', 'changeset': '75123155', 'timestamp': '2019-10-01T00:00:21Z'}, 'node_tags': [{'id': '66891736', 'value': 'عنابة', 'key': 'city', 'type': 'addr'}, {'id': '66891736', 'value': 'ferry_terminal', 'key': 'amenity', 'type': 'regular'}, {'id': '66891736', 'value': 'yes', 'key': 'ferry', 'type': 'regular'}, {'id': '66891736', 'value': 'Annaba عنابة', 'key': 'name', 'type': 'regular'}, {'id': '66891736', 'value': 'عنابة', 'key': 'ar', 'type': 'name'}, {'id': '66891736', 'value': 'Annaba', 'key': 'de', 'type': 'name'}, {'id': '66891736', 'value': 'Annaba', 'key': 'en', 'type': 'name'}, {'id': '66891736', 'value': 'station', 'key': 'public_transport', 'type': 'regular'}]} ) has the following errors:{'node': [{'user': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 'node_tags': [{0: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 3: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}], 4: [{'value': ["value does not match regex '[\x00-\x7féèçàêâôÉòîö ]*'"]}]}]}

number of items 99.90% (read: 1998 | reject 2)
number of items 99.93% (read: 2998 | reject 2)
number of items 99.95% (read: 3998 | reject 2)
number of items 99.96% (read: 4998 | reject 2)
number of items 99.97% (read: 5998 | reject 2)
...
number of items 99.93% (read: 677248 | reject 459)

Note

Improvement: We can reduce the number of rejected data by analysing the reason of each wrong elements.

3.2. Database Analysis

The command line used to analyse to the marseille database is the following. I use the optional argument to specify the map central position and the map html filename.

.\Analysis.py --latitude 43.3 --longitude 5.4 --map_file my_map.html \Database\marseille.db

3.2.1. Data quantity Analysis:

The first analysis is based on print on stdout to inform user about quantity of data:

  • Number of unique users for nodes and ways:720

  • Number of nodes:557650

  • Number of ways:120057

3.2.2. Users Repartition

The second analysis allows to display the users activity repartition.

../_images/List_on_users_sort_by_activities__10_firsts_.png

Note

RoHroHH is probably the moderator :)

3.2.3. Nodes type Repartition

The third analysis show the nodes type repartition.

../_images/Nodes_types_repartition__10_firsts_.png

3.2.4. Points of interest repartition on map

Now, I want to display some points of interestt like GPS equipment. Mainly, customer wants to find:

  • parking

  • bus station

  • cafes

  • shop

  • cash distributor

This map use map extension MarkerCLuster to simplify displayed marker. In other case, the map will be not usable.

Note

If you want to see map without marker cluster just set the value of CONST_WITH_MARKERCLUSTER to False and execute Analysis.py