VTK/Multicore Vision

From KitwarePublic
< VTK
Jump to: navigation, search

The goal of this document is to provide an overview of the vision that the working team (SCI, Sandia, Kitware) has for the shared memory parallelism improvements to VTK.

As a first step, we will work on collecting use cases that we want to address. This will help us to scope what we will focus on.

Some high level use-cases are listed below.

1. Direct acceleration of algorithms

This is the classical approach VTK takes to take advantage of shared-memory parallel architectures. A few of the scientific vis. algorithms and many of the imaging algorithms can run multi-threaded. The input data structures are usually not pre-partitioned but shared by all threads. For algorithms that cannot stream, this is the only way to make use of multiple cores. Examples:

  • Streamlines
  • Some graph algorithms

We currently use vtkMultiThreader to manage multiple threads. Are we interested in investigating OpenMP, Intel TBB, GPU parallelism?

  • Pros:
    • Does not require streaming
  • Cons:
    • Filters need to change
    • Needs thread safe data structures

2. Pipeline parallelism

We informally refer to pipeline parallelism as the execution model where the executive is responsible of distributing tasks to filters that are (possibly) running on multiple threads. This requires that the executive analyzes the pipeline to figure out task distribution and that it assigns tasks in a way that makes use of multiple cores. This sort of parallelism is closely tied to streaming. Without streaming, only branching pipelines can be parallelized and almost no scalability is reached.

  • Pros:
    • Automatic performance improvement for all filters
  • Cons:
    • It is only useful when streaming

Streaming

  • All types of data:
    • Structured data
    • Unstructured data
    • Text (documents)
    • ...
  • How do we break the data?

4. How do we bring (1), (2) and streaming together?

  • Do algorithms need to provide more meta-data?
    • I can work with pieces
    • I need the whole thing
    • I scale up to N processors?