libosmscout 1.1.1
Loading...
Searching...
No Matches
Resources

Import of germany.osm (~11G) from http://download.geofabrik.de/osm/.

As one can see the following steps are the relevant, most time-consuming steps:

Calculation the objects in areas index. "object in areas" calculation must be improved by improved the "point is area" algorithm. Generation of ways.dat file. This is time consuming because the nodes of a way are resolved. Since data is too big to keep everything in memory and on-disk algorithm must be used, repeately scanning data files. Instead of this the node index file could be used, but we need some performance tests first.

egrep -h "(^\+ Step|^ =>)" <logfile>

  • Step #1 - Preprocess... => 128.146 second(s)
  • Step #2 - Generating 'rawnode.idx'... => 32.002 second(s)
  • Step #3 - Generating 'rawway.idx'... => 8.849 second(s)
  • Step #4 - Generate 'relations.dat'... => 67.382 second(s)
  • Step #5 - Generating 'relation.idx'... => 1.530 second(s)
  • Step #6 - Generate 'nodes.dat'... => 25.975 second(s)
  • Step #7 - Generating 'node.idx'... => 1.052 second(s)
  • Step #8 - Generate 'ways.dat'... => 752.376 second(s)
  • Step #9 - Generating 'way.idx'... => 17.741 second(s)
  • Step #10 - Generate 'area.idx'... => 252.107 second(s)
  • Step #11 - Generate 'areanode.idx'... => 5.262 second(s)
  • Step #12 - Generate 'region.dat' and 'nameregion.idx'... => 774.346 second(s)
  • Step #13 - Generate 'nodeuse.idx'... => 229.044 second(s)
  • Step #14 - Generate 'water.idx'... => 29.784 second(s) => 2325.608 second(s)

The following files have been generated (du –total -h *.idx *.dat): 67M area.idx 8,0M areanode.idx 1,5M nameregion.idx 3,2M node.idx 88M nodeuse.idx 98M rawnode.idx 15M rawway.idx 204K relation.idx 20K water.idx 15M way.idx 4,0K bounding.dat 31M nodes.dat 730M rawnodes.dat 22M rawrels.dat 331M rawways.dat 21M region.dat 110M relations.dat 1,2M wayblack.dat 635M ways.dat 2,2G insgesamt

Of these the following are necessary at application runtime: 67M area.idx 8,0M areanode.idx 1,5M nameregion.idx 3,2M node.idx 88M nodeuse.idx 204K relation.idx 20K water.idx 15M way.idx 4,0K bounding.dat 31M nodes.dat 21M region.dat 110M relations.dat 635M ways.dat

Where the nodeuse.idx currently is only interesting for the internal routing solution. However some code change is required to do not make the database class require it. The water.idx is not finished and will likely increase in future (not not in a way that it changes the overall calculation drastically).

So the overall size required on disk for a gemrany map without routing would be:

du –total -h area.idx areanode.idx bounding.dat nameregion.idx node.idx \ nodes.dat region.dat relation.idx relations.dat water.idx way.idx ways.dat

68M area.idx 8,5M areanode.idx 4,0K bounding.dat 1,5M nameregion.idx 3,2M node.idx 31M nodes.dat 21M region.dat 216K relation.idx 114M relations.dat 20K water.idx 16M way.idx 646M ways.dat 907M insgesamt

Memory usage: Open the germany map, search for Bonn, search for Dortmund (top -b | grep <processname>):

PID USER PR NI VIRT RES SHR S CPU MEM TIME+ COMMAND 11244 tim 20 0 1031m 81m 34m S 0 2.7 0:05.38 lt-TravelJinni 11415 tim 20 0 975m 79m 36m S 0 2.6 0:09.19 lt-OSMScout

Most of the memory useage however comes from opening the data files using memory maped files.

The same, if no memory maped files are used:

9276 tim 20 0 219m 127m 13m S 0 4.2 0:02.97 lt-TravelJinni

This is the result of the internal statistics dump. Memory usage is rather high (but was much higher in the past). There is not yet any real measurement regarding optimal/minimal cache sizes, so possibly by just reducing the cache sizes some memory can be saved.

Problem is not the in memory cache for nodes, ways/areas and relations but the various indexes. The currently biggest index is the area node index and should be next target for optimizations. After that most memory saving can be gained by improving the NumericIndex that is used for accessing all date files. Possibly using caching to hold only used index pages in memory is possible here, too.

nodes.dat entries: 330, memory 28000 Index node.idx: 113 entries, memory 4469 ways.dat entries: 1000, memory 121600 Index way.idx: 192 entries, memory 1800 relations.dat entries: 884, memory 79392 Index relation.idx: 268 entries, memory 1868 area.idx entries: 1000, memory 316888 areanode.idx entries: 385, memory 27560 AdminRegion size 77942, locations size 0, memory 2182376 WaterIndex size 18795, memory 18795