VTK/InSituDataStructures

From KitwarePublic
< VTK
Revision as of 14:31, 14 March 2016 by Dlonie (talk | contribs) (→‎Mapped vtkDataArrays (Deprecated))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Motivation

This document describes modifications to the Visualization ToolKit that allow data structures with arbitrary memory layouts to be used with a subset of the VTK pipeline. Supporting such data objects enables in-situ analysis of large simulations without the need to copy and repack the data into standard VTK arrays and data sets. For large simulations, this saves time and memory.

The greatest obstacle to using a non-standard memory layout in a VTK array or dataset is that the implementations of VTK data structures are only weakly abstracted. For example, vtkAbstractArray, the root of all data arrays in VTK, defines a GetVoidPointer method that returns the memory address of the first element in the array. Many algorithms in VTK rely on obtaining this pointer and assuming that the 2-dimensional data is contiguous in memory and follows a specific ordering. Similarly, vtkUnstructuredGrid, the most flexible dataset type, exposes many details of its implementation in its API, making it impossible to store arbitrary representations of data topologies. Such issues prevent simulation data with alternate memory layouts from being properly handled by the VTK pipeline.

The following describes techniques that allow arbitrary data layouts to be used with the vtkDataArray and vtkUnstructuredGrid APIs with the goal of enabling in-situ visualization of active simulations.

Mapped vtkUnstructuredGrids

A vtkUnstructuredGrid object represents a dataset consisting of points and cells. Cells hold the topology of the dataset, and unlike the other VTK datasets, vtkUnstructuredGrid makes no assumptions about the types of cells in the data. This flexibility makes it an ideal choice for a generic in-situ dataset, as it can losslessly store all other types of datasets.

Like vtkDataArray, the unstructured grid interface exposes much of its implementation. The point information is not a problem, as it is ultimately stored in a vtkDataArray and thus the in-situ dataset’s points can be easily handled by creating a subclass of vtkMappedDataArray. The cell information, however, is distributed across four arrays:

  1. Connectivity (vtkCellArray): Stores the connections between points cells as an array: [ n0, ptId1, ptId2, ..., ptId(n0), n1, ptId1, ptId2, ..., ptId(n1), ... ] where nX is the number of points in cell X, and the point ids that follow are the indices of the points in that cell.
  2. Links (vtkCellLinks): Stores the cells to which each point belongs.
  3. Types (vtkUnsignedCharArray): Type of cell (quad, triangle, line, etc).
  4. Locations (vtkIdTypeArray): Maps cell ID into Connectivity offset for random-access cell lookup.

None of these arrays are fully encapsulated by the vtkUnstructuredGrid API, meaning that filters and algorithms can (and do) retrieve and operate on them directly. Thus, many filters depend on the implementation details of the dataset. A mapped version of a vtkUnstructuredGrid dataset could be created by using vtkMappedDataArrays to emulate the connectivity, types, and locations arrays, but this would not be a straightforward process. In particular, the connectivity array is difficult to map data onto, since the location of each cell is dependent on the size of every cell before it. A higher-level solution is needed.

Approach

Rather than reuse the vtkDataArray solution of creating a mapped subclass that emulates the API of the VTK object, we introduce a superclass to vtkUnstructuredGrid that removes the implementation dependant methods. The vtkUnstructuredGridBase class defines a set of pure virtual functions, taken from the vtkUnstructuredGrid API, that allow interaction with the cell information in a generic way. Combined with the cell-related methods in vtkDataSet, this provides a suitable level of abstraction for implementing a mapped dataset.

Creating a Mapped vtkUnstructuredGridBase Subclass

In-situ datasets should make use of the vtkMappedUnstructuredGrid wrapper. This is a class template that fulfills the vtkUnstructuredGridBase API by delegating to an implementation class. This wrapper is used by writing an implementation class that meets the requirements listed in the vtkMappedUnstructuredGrid class documentation, and then calling a macro from the vtkMakeMappedUnstructuredGrid family to create a new concrete class. See the examples section for a sample implementation.

The template approach allows cheap shallow copying of the dataset. Unlike the mapped arrays, calling NewInstance() on a mapped dataset will return a new mapped dataset of the same type. Calling ShallowCopy(...) will only copy a reference-counted pointer to the implementation class, quickly making a low-overhead copy of the dataset.

Making Existing Code Compatible With vtkUnstructuredGridBase

Updating existing code for compatibility with mapped datasets is more intrusive than the changes required for mapped data arrays. A common pattern in VTK is to obtain a raw pointer to the connectivity array data and iterate through the cells sequentially. For example, older implementations of the vtkContourGrid filter determined the range of scalar attribute values for each cell using the something similar to the following:

<source lang="cpp"> // Finding the range of a cell’s scalar attributes with vtkUnstructuredGrid:

// Pointer to the input vtkUnstructuredGrid vtkUnstructuredGrid *inputGrid = ...; // Pointer or vtkTypedDataArrayIterator to point scalar attributes. Iterator scalarIterator = ...; // Pointer to the start of the connectivity array. vtkIdType *cellArrayPtr = inputGrid->GetCells()->GetPointer();

for (vtkIdType cellId = 0; cellId < numCells; ++cellId)

 {
 vtkIdType numCellPoints = *cellArrayPtr; // First value is number of points in cell
 ++cellArrayPtr; // Increment to the first pointId of the cell
 // Find min and max values in scalar data. Initialize to the first point’s
 // scalar value:
 range[0] = range[1] = scalarIterator[cellArrayPtr[0]];
 for (vtkIdType cellPointId = 1; cellPointId < numCellPoints; ++cellPointId)
   {
   range[0] = std::min(range[0], scalarIterator[cellArrayPtr[cellPointId]);
   range[1] = std::max(range[1], scalarIterator[cellArrayPtr[cellPointId]);
   }
 ... // (do work with range)
 cellArrayPtr += numCellPoints; // advance to next cell

} </source>

The above bit of code is very dependent on vtkUnstructuredGrid’s connectivity array implementation. Adapting this to use with the vtkUnstructuredGridBase interface is best accomplished by applying the new vtkCellIterator class:

<source lang="cpp"> // Finding the range of a cell’s scalar attributes with // vtkUnstructuredGridBase and vtkCellIterator:

// Pointer to the input vtkUnstructuredGridBase vtkUnstructuredGridBase *inputGrid = ...; // Point scalar data for inputGrid vtkDataArray *inputScalars = ...; // Temporary array used to hold cell scalars: vtkSmartPointer<vtkDataArray> cellScalars =

 vtkSmartPointer<vtkDataArray>::Take(inputScalars->NewInstance());

cellScalars->SetNumberOfComponents(inputScalars->GetNumberOfComponents()); // The cell iterator: vtkSmartPointer<vtkCellIterator> cellIter =

 vtkSmartPointer<vtkCellIterator>::Take(inputGrid->NewCellIterator());

for (cellIter->InitTraversal(); !cellIter->IsDoneWithTraversal(); cellIter->GoToNextCell())

 {
 // Batch lookup of cell scalars:
 vtkIdType numPoints = cellIter->GetNumberOfPoints();
 cellScalars->SetNumberOfTuples(numPoints);
 inScalars->GetTuples(cellIter->GetPointIds(), cellScalars);
 // We created the cellScalars above as a standard vtkDataArray,
 // so we know GetVoidPointer is valid and efficient. ‘Scalar’ is
 // template parameter of the function that this code lives in:
 Scalar *cellScalarPtr = static_cast<Scalar*>(cellScalars->GetVoidPointer(0));
 Scalar *cellScalarEnd = cellScalarPtr + numPoints);
 range[0] = range[1] = *cellScalarPtr++;
 while (cellScalarPtr != cellScalarEnd)
   {
   range[0] = std::min(range[0], *cellScalarPtr);
   range[1] = std::max(range[1], *cellScalarPtr);
   ++cellScalarPtr;
   } 
 ... // (do work with range)
 }

</source>

See the vtkCellIterator documentation for more information. By default, the vtkMappedUnstructuredGrid wrapper will use a vtkCellIterator implementation that performs a indexed random-access lookup on the cell data using the vtkUnstructuredGridBase API. For datasets that are better suited for sequential iteration, a custom vtkCellIterator implementation can be passed to the appropriate vtkMakeMappedUnstructuredGrid macro variant.

Integrating with ParaView/Catalyst

If the data structures are to be used in a Catalyst/ParaView co-processing pipeline, they must be registered with ParaView’s client-server interpreter. This is done by writing a static function that creates a new instance of the class and returns it as a vtkObjectBase pointer. For a vtkMappedUnstructuredGrid subclass named Grid, this is done as follows:

<source lang="cpp">

  1. include <vtkClientServerInterpreter.h>
  2. include <vtkClientServerInterpreterInitializer.h>
  3. include "Grid.h"

// Used by Catalyst to create a dummy grid object static vtkObjectBase* MakeGrid() { return Grid::New(); }

// In the catalyst initialization code: // // Register our Grid class with Catalyst so that it can // be used in a pipeline. if (vtkClientServerInterpreterInitializer *intInit =

   vtkClientServerInterpreterInitializer::GetInitializer())
 {
 if (vtkClientServerInterpreter *interp =
     intInit->GetGlobalInterpreter())
   {
   interp->AddNewInstanceFunction("Grid", MakeGrid);
   }
 }

</source>

Example code

An example implementation of a mapped dataset with mapped point/attribute arrays has been added to the IO/Exodus module. It reads an Exodus II file from disk and uses the raw arrays returned by the Exodus II library in mapped data structures. The example classes are:

  1. vtkCPExodusIIElementBlock: vtkMappedUnstructuredGrid subclass representing a single Exodus II element block dataset.
  2. vtkCPExodusIIElementBlockCellIterator: vtkCellIterator subclass specialized for vtkCPExodusIIElementBlock.
  3. vtkCPExodusIIElementBlockImpl: Implementation class for vtkCPExodusIIElementBlock.
  4. vtkCPExodusIINodalCoordinatesTemplate: vtkMappedDataArray subclass containing the x, y, and z node (point) coordinate arrays.
  5. vtkCPExodusIIResultsArrayTemplate: templated vtkMappedDataArray subclass containing point or cell result (attribute) arrays.
  6. vtkCPExodusIIInSituReader: File reader for producing the in-situ data structures.

The TestInSituExodus.cxx source file exercises, validates, and benchmarks the mapped structures against standard VTK structures with the same data.

A example Catalyst adapter implementation is available as part of the Albany code.

Mapped vtkDataArrays (Deprecated)

The interfaces described here will be deprecated and removed in the near future. This page details the next iteration of this functionality, which will focus on performance improvements.

Note that only the vtkMappedDataArray approach is being replaced. The vtkMappedUnstructuredGrid implementation is still preferred for mapping entire datasets.

Approach

Since removing methods like GetVoidPointer from vtkDataArray’s API is not feasible for legacy reasons, a work-around is used to allows existing algorithms that use raw pointers to work with non-standard (“mapped”) memory layouts. This is accomplished by adding an STL-style iterator with pointer-like semantics that can safely traverse data arrays (vtkTypedDataArrayIterator). The iterator walks through the components and tuples in any data array as though they were in the standard memory layout. With small modifications and a bit of template metaprogramming, complex algorithms can be adapted to operate on iterators for mapped data arrays while continuing to use efficient raw memory access for standard VTK data arrays.

VTKDataArrayHierarchy-comp.png

To achieve this functionality, the vtkDataArray inheritance hierarchy has been modified to introduce a new type-aware abstract interface, vtkTypedDataArray, which is templated on the element type. This augments the existing data array abstractions: vtkAbstractArray, which only passes values in and out using other arrays, and vtkDataArray, which allows elements to be set and retrieved using the double data type. vtkTypedDataArray uses its template argument to provide a specialized type-specific API, which is used by the new vtkTypedDataArrayIterator to interact with the data. The hierarchy now branches into the concrete vtkDataArrayTemplate subclass (which provides the implementation of the standard VTK data arrays), and the abstract vtkMappedDataArray interface. The latter is the superclass of all in-situ and other non-standard data arrays. The figure illustrates the difference between both the old and new hierarchies, with new classes highlighted in yellow.

Creating a Mapped vtkDataArray Subclass

To create a subclass of vtkDataArray with a non-standard memory layout, simply write a new class that derives from vtkMappedDataArray. This interface has features that simplify the mapping process:

  1. Pure virtuals: Several methods implemented in the vtkAbstractArray and vtkDataArray classes assume that their subclasses will use the standard memory ordering and rely on raw pointer access. These methods are made pure virtual in the vtkMappedDataArray interface to ensure that these methods are overridden with appropriate implementations.
  2. Safe void pointers: vtkMappedDataArray provides default implementations of GetVoidPointer, ExportToVoidPointer, and DataChanged that print a warning and copy the data to/from a persistent temporary array with the VTK memory layout. Thus, derived classes will continue to work with code that needs void pointers, although there is a large performance and memory penalty in doing so. Monitoring the warning stream from VTK will help identify sections of code that needs to be adapted as filters call these methods.
  3. Standard VTK array clones: When a VTK filter propagates a data object through the pipeline, it calls the NewInstance method of the source object to create an output object of the same type. Since mapped data arrays are currently incompatible with most of VTK, this is problematic. The vtkMappedDataArrayNewInstanceMacro and vtkMappedDataArrayTypeMacro behave similarly to vtkTypeMacro, but use a custom implementation of NewInstance that returns a standard vtkDataArray subclass of the appropriate type.

It is worth noting that the implementation of the virtual method GetValueReference should be as efficient as possible, as this is the entry point used by vtkTypedDataArrayIterator.

Adapting Filters to Use vtkDataArrayIteratorMacro

A common approach to using vtkDataArrays without knowing the element type is to use vtkTemplateMacro. This macro is often paired with a templated worker function as follows:

<source lang="cpp"> // vtkTemplateMacro usage:

template<typename Scalar> void Worker(Scalar *data, ...) { /* work on data */ } ... vtkDataArray *array = ...; switch (array->GetDataType())

 {
 vtkTemplateMacro(Worker(static_cast<VTK_TT*>(array->GetVoidPointer()), ...))
 }

</source>

This macro expands to a series of case statements for the various numeric data types supported by VTK. Each case statement typedefs VTK_TT to the corresponding C++ type and then calls the macro argument, which will execute the appropriate variant of the Worker function template (see vtkSetGet.h for more details).

In order to adapt this frequently-used pattern to work with mapped data structures, STL-esque Iterator and ValueType typedefs, along with Begin() and End() methods, have been added to vtkTypedDataArray and vtkDataArrayTemplate. The vtkTypedDataArray iterators are vtkTypedDataArrayIterators, which will only access the array data through API calls on the vtkTypedDataArray interface. vtkDataArrayTemplate simply uses raw pointers as iterators.

This API is used in the new vtkDataArrayIteratorMacro, which determines the proper iterator to use with a given instance of vtkDataArray, and defines the following typedefs and variables to reflect the array and element type:

  • vtkDAValueType: Typedef for the array’s element type.
  • vtkDAContainerType: Typedef to the most derived subclass of vtkDataArray for which a suitable iterator exists.
  • vtkDAIteratorType: Typedef for the most suitable iterator type for the input array.
  • vtkDABegin: Instance of vtkDAIteratorType that points to the first element of the array data.
  • vtkDAEnd: Instance of vtkDAIteratorType that points to the first element past the end of the array data.

Using this macro, the earlier example would be rewritten as:

<source lang="cpp"> // vtkDataArrayIteratorMacro usage:

template<typename Iterator> void Worker(Iterator data, ...) { /* work on data */ } ... vtkDataArray *array = ...; switch (array->GetDataType())

 {
 vtkDataArrayIteratorMacro(array, Worker(vtkDABegin, ...))
 }

</source>

The code will now work with both standard VTK data arrays and non-standard vtkMappedDataArray subclasses. As shown above, simply replacing the typecasted void pointer with vtkDABegin is sufficient -- often times the worker function’s body will not need to be modified, only the template parameters and signature need to be updated. More advanced algorithms can take vtkDAValueType as a template parameter to create temporary variables, or vtkDAEnd for STL-style range checking and iteration.

Since a raw pointer is used for the standard data arrays, the compiler is free to optimize the worker function and maintain the performance of the old implementation. Safe implementations will be generated for cases where vtkMappedDataArray is used, ensuring that only the proper subset of the array API is called from the worker function. The pointer-like semantics and element traversal of vtkTypedDataArrayIterator allow the same template function to be used in both cases. This macro is defined in vtkDataArrayIteratorMacro.h.