Performance: Multi-threaded operations

From gvSIG CE Wiki

Jump to: navigation, search

This page contains a description of current performance bottlenecks in gvSIG CE and collects ideas for their removal, with a focus on multithreading and multicore/parallel processing. In this context "performance" mostly relates to the speed of operations in typical interactive GIS use, such as loading and rendering large layers.

There are already some areas in which gvSIG CE makes good use of multiple cores/CPUs:

  • The SEXTANTE Java algorithms for geoprocessing tasks run on individual cores. With an n-core CPU, n algorithms can run in parallel, without them "stealing" CPU time from each other.
  • The same is true for the native gvSIG CE geoprocessing tools ("Basic Vector Tools"). The JVM does a good job of spreading the load over multiple CPUs/cores.
  • SEXTANTE algorithms of external providers (e.g. GRASS/SAGA GIS) are run in their own process spaces. It is up to the operating system to run each algorithm on a separate core/CPU if available. This should normally work well, too.

Apart from geoprocessing, however, gvSIG CE makes very limited use of parallel processing to improve performance and GUI responsiveness. We hope to remove many performance bottlenecks using memory optimizations and multi-threaded refactoring.

Contents

Performance bottlenecks

Currently, the following significant performance bottlenecks have been identified:

  • Vector layer geometries are styled and rendered using a single thread.
  • Vector layer geometries are read sequentially from their data sources.
  • Conversion of vector layer geometries read from OGR data sources (e.g. SQLite, MapInfo, ArcInfo, etc.) into gvSIG CE's internal geometry model is very slow (multiple path conversions)
  • Sized-down layer rendering in the Overview widget is done be rendering the original layer again. It would be much faster to take the rendered image of the original layer and just shrink it. Even a simple image resizing (nearest neighbour) would yield a good enough quality for this purpose.
  • The same is true for Views that are embedded in Map layouts: every layer of an embedded View is rendered twice: once in its View and once in the Map layout. Note however, the a complete re-rendering at highest quality level is still needed for producing PDF/PS output.

Work plan and analyses

Multi-threaded rendering of vector layer features

Waiting for layers to render is probably the no. 1 waste of time for GIS users.

Aparapi

https://aparapi.github.io/ Aparapi provides transparent OpenCL support for Java.

Progress

We have some initial patches from Laurent Burg├Ęs, author of the Marlin renderer. His patchset has added support for multi-threaded rendering of vector layers. This works well and has given substantial speed-ups (on top of what Marlin already provides).

Open Problems

How to decide the number of threads to use? In QGIS, the user can set the number of CPUs/cores to use explicitely. Similarly, there are some property settings in "gvSIGCE.sh", but it is very hard for the user to decide on a good setting, so the choice should be automatic. One idea would be to simply set "threads=number of cores".

Best strategy for partitioning data? Parallelized algorithms need to work on partitioned data. If we want to have more than one render thread then ideally each thread should render a separate "chunk" of geometries. The question is how to determine a good chunk size. This is especially true in the case of complex line and polygon geometries. E.g. a single complex polyon may have many thousands of vertices and thus it may in itself be more complex than a chunk of hundreds of very simple polyons (asymetric workload). An efficient estimator is required to find a good chunk size based on the current layer's visible data.

Possible improvements for panning across vector maps? Currently vector geometries are redrawn after a panning action completes. Navigation in complex maps would be easier if features in the off-screen margins of a vector layer could be rendered in the background and immediately displayed after a panning action.

How to avoid rendering the same layer in the View and Overview? Maybe this could be done by using a rescaled image of the layer in the Overview map.

How to avoid rendering the same layer in the View and Map Layout? Embedded Views in Map Layouts could be treated just like Overview previes images (see above). However, for PDF/PS output we need full quality re-rendering.

Is it possible to benenfit from non-sequential access to data sources? File-level performance is probably very limited with shapefiles (unless they are completely read into memory), but other backends, such as PostGIS and SQLite (see also here), should allow non-sequential and non-blocking access to the data for multiple files. There are some generic options for improving non-sequential file access: http://libprefetch.cs.ucla.edu/.

Background

All layers (raster and vector) are drawn by com.iver.cit.gvsig.fmap.MapContext.draw(BufferedImage, Graphics2D, Cancellable, double).

First, the method redraws its own contents:

 this.getMapContextDrawer().draw(this.layers, image, g, cancel, scale);

Then it calls the individual "draw()" methods for the different layer types. E.g. raster layers use org.gvsig.fmap.raster.layers.FLyrRasterSE.draw(BufferedImage, Graphics2D, ViewPort, Cancellable, double).

We only need to be concerned with the "draw()" method for vector layers, because their content can be subdivided into individual feature sets that can be rendered in different threads (for raster layers, we already have the option of using pyramids to get very fast display speed).

The rendering of features in a vector layer is done by com.iver.cit.gvsig.fmap.layers.FLyrVect.draw(BufferedImage, Graphics2D, ViewPort, Cancellable, double)

Normally, the method "_draw()" is invoked here to do the actual rendering. It is currently unclear whether any type of vector layer uses a different strategy. There is probably a lot of dead code!

If the layer uses Z ordering (for symbol levels), then its data has to be drawn into an array of stacked images, each of which is drawn to by a corresponding Graphics2D object. Therefore, there is a lot of extra code for the Z ordering case ("useZSort"):

 _draw(BufferedImage, Graphics2D, ViewPort, Cancellable, double)
 [..]
  
 // render temporary map each screenRefreshRate milliseconds;
 int screenRefreshDelay = (int) ((1D / MapControl.getDrawFrameRate()) * 3 * 1000);
 BufferedImage[] imageLevels = null;
 Graphics2D[] graphics = null;
 if (useZSort) {
   imageLevels = new BufferedImage[zSort.getLevelCount()];
   graphics = new Graphics2D[imageLevels.length];
   for (int i = 0; !cancel.isCanceled()
        && i < imageLevels.length; i++) {
     imageLevels[i] = new BufferedImage(image.getWidth(),
       image.getHeight(), image.getType());
     graphics[i] = imageLevels[i].createGraphics();
     graphics[i].setTransform(g.getTransform());
     graphics[i].setRenderingHints(g.getRenderingHints());
   }
 }
 // -- end visual FX stuff
   
 [..]

Same in com.iver.cit.gvsig.fmap.layers.FLyrVectDrawThread.draw:

 -- visual FX stuff
 // Cuando el offset!=0 se esta dibujando sobre el Layout
 // y por tanto no tiene que ejecutar el siguiente
 // codigo.
 if (offset.getX() == 0 && offset.getY() == 0) {
   if ((System.currentTimeMillis() - time) > screenRefreshDelay) {
     virtualBim = new BufferedImage(
       image.getWidth(), image.getHeight(),
       BufferedImage.TYPE_INT_ARGB);
     virtualGraphics = virtualBim.createGraphics();
     virtualGraphics.drawImage(image, 0, 0, null);
     for (int i = 0; !cancel.isCanceled()
       && i < imageLevels.length; i++) {
       virtualGraphics.drawImage(imageLevels[i],
         0, 0, null);

}

     g.clearRect(0, 0, image.getWidth(),
       image.getHeight());
     g.drawImage(virtualBim, 0, 0, null); <-- THIS IS WHERE THE FINAL IMAGE GETS DRAWN
     time = System.currentTimeMillis();
   }
 // -- end visual FX stuff
 }

The code in the above loop could be optimized by have each "virtualGraphics.drawImage()" run in its own thread.

If the layer does not use Z ordering, then the rendering is much simpler:

 } else {
   // no ZSort, so there is only one map level, symbols are
   // just drawn.
   if (onePoint) {
     if (x < 0 || y < 0 || x >= image.getWidth()
       || y >= image.getHeight()) {
       return;
     }
     image.setRGB(x, y, sym.getOnePointRgb());
   } else {
     if (!bDrawCartographicSupport) {
       // we are drawing onto a view
       geom.drawInts(g, viewPort, sym, cancel);  <-- THIS IS WHERE THE FEATURE IS DRAWN (1)
       } else {
         // we are drawing onto a map layout
         geom.drawInts(g, viewPort, dpi, csSym, cancel);
       }
     }
   }

Multi-threaded geometry conversion

Currently, converting GIS geometries from OGR-supported datasources (basically, everything that his not a shapefile or a PostGIS table) to gvSIG CE's own geometry model produces a large overhead and makes access to these data sources a lot slower than e.g. to shapefiles using our native shapefile driver.

Open Problems

It is unclear why large point clouds stored in an SQLite database render so much slower than point clouds stored in a simple shapefile. After all, there are no complex path conversion needed here, as in the case of complex polygons. It is possible that there is some excessive garbage collector activity due to a lack of reusing existing objects.

Background

The conversion happens in (https://svn.code.sf.net/p/gvsigce/code/trunk/extensions/extOGR/src/main/java/org/gvsig/ogr/drivers/AbstractOGRDriver.java AbstractOGRDriver.java).

Multi-threaded geometry conversion from OGR is, however, difficult, because conversion runs from within the geometry rendering loop. So a major refactoring effort (conversion in one thread, rendering in another) would be required.

Sample Date

Some rich datasets are available from the U.S. Fish & Wildlife Service. E.g. the "Alabama" datasets contains some extremely complex polygons that represent wetland areas. All datasets are available in both File Geodatabase (FGDB) and Shapefile formats. This makes it possible to directly compare the performance of the OGR FGDB driver and the native gvSIG CE Shapefile driver.

Handling Very Large Point Clouds

Given current advances in spatial sensor technology, we expect an increasing demand for the processing of very large point clouds (hundreds of millions of points, up to billions) in gvSIG CE.

Current Situation

The only fully working file format (note: not talking about client/server DBMS storage) with both acceptable performance and memory efficiency that we currently have is Shapefile.

Shapefiles have a theoretical size limit of 8GB. However, the OGR driver imposes a limit of 4GB and some other GIS might even be limited to 2GB (notably ArcGIS). This limit is on the shapefile itself, not the DBF attribute table (see the "Size issues" section on the OGR shapefile driver page).

Thus:

  • All GIS that use the GDAL/OGR Shapefile driver are limited to 4GB point clouds: QGIS, GRASS GIS, many others.
  • SAGA GIS uses its own Shapefile driver and can probably (untested) handle up to 8GB.
  • gvSIG CE uses its own Shapefile driver and can handle up to 8GB. However, problems appear before the 8GB limit: see below).

The following problems were found when trying to handle Shapefiles with > 100 mio points in gvSIG CE:

  • Spatial index generation consumes too much RAM -> kills JVM.
  • Attribute table widget shows only empty fields.
  • Interval computation (symbology) creates extreme CPU and RAM load -> kills JVM.
  • There are probably many more that need to be identified: need to test field calculator, field and group statistics, etc.

In practice, what this means is that we can handle ca. 100 mio points in a Shapefile, assuming that the JVM has at least 8 to 10 GB of RAM available. This includes Z data, fully working attribute data in gvSIG CE and the ability to pass such data on to OGR-driven processing back-ends such as GRASS.

If we want to move beyond that, we need another storage backend...

Options

Recording the storage of such datasets in gvSIG CE, here are some options that need to be discussed:

  • LiDAR: This is a special kind of data with a specific attribute schema that has its own formats. It would be useful to implement LiDAR support (some time in the future), but not as a generic way for supporting large point clouds. There used to be a LiDAR extension for gvSIG, but it is depracated (stuck with the 1.1.2 API). The best open source library and command line tools for LiDAR is libLas; this could be wrapped with a Processing interface to make use of the CLI tools.
  • An SQLite container seems more appropriate, but the current SQLite interface in gvSIG CE uses the OGR Java bindings and has bad performance. One option would be to implement a native SQLite driver in gvSIG CE, but this requires a lot of resources. Also, using SQLite means that users have to deal with complex containers and multiple layers within the SQLite container. This offers added flexibility, but adds more complexity to the data handling. On the plus site, GUI tools for dealing with SQLite data are alread implemented in gvSIG CE.
  • Another option would be to use NetCDF to store point clouds. NetCDF has excellent Java support, is optimized for good performance with large and complex datasets. It also supports the usual attribute field types. Currently regularly gridded (aka raster) data in NetCDF is already support by GDAL. However, we would need to add native support for irregular grids with user-defined attribute schema.
  • And finally, we already have a driver that can read very large datasets from tabular text files and read them directly into memory. This is very fast, but currently it consumes a lot of RAM. It should be possible to reduce RAM consumption by optimizing object allocations. Another option would be to cache the data using a temporary, proprietary (i.e. gvSIG CE only) binary file, and read the required chunks from the cache as needed.

Workplan

Whatever format is chosen, there are some general aspects, as well:

  • We need to implement spatial indexing support for the chosen format.
  • The Processing interface (aka SEXTANTE) needs to be able to access data sources in the chosen format.
  • The format should be simple and compact; so if we do decide to use our own format it won't be a big problem to implement support in the GRASS, SAGA, etc. processing back-ends.

Most attractive option (WIP); in order of implementation:

  1. For now, use simple Shapefiles to store point clouds imported from tabular text files. Up to the 4GB limit, it will possible to process these clouds with the GRASS and SAGA interfaces.
  2. Create a 200 mio points Shapefile (use Processing tool "Generate random vector layer). And test/fix the following problems with the current Shapefile handling:
    1. Empty attribute table widget. This may be related to this: http://gvsigce.sourceforge.net/mantis/view.php?id=578
    2. Interval computation runs out of RAM.
    3. Spatial index generation takes too long and eventually runs out of RAM.
  3. Work with the SAGA GIS devs to extend the SAGA Point Cloud (.spc) format:
    1. Int64 pointers to allow > 2,147,483,647 points per file.
    2. String type attribute table fields (at least short/fixed length).
    3. Spatial indexing file (even if SAGA does not use it).
  4. Implement support for the SAGA Point Cloud (.spc) format. This is more compact than Shapefiles and optimized for size/perfomance with point geometry; Allow to load/save layers to SPC format (editing support not required).
  5. Switch storage backend for point clouds imported from text files from Shapefiles to SPC.
  6. Implement support for SAGA point cloud processing modules in our Processing backends. This will also supply all required (non-interactive) editing functions.