VtkExecutionScheduler

From KitwarePublic
Jump to: navigation, search

vtkExecutionScheduler

vtkExecutionScheduler is responsible for scheduling and distributing computing resources (threads) to pipeline modules. When an executive is asked to execute its algorithm, instead of performing the computation right away, it would queue the execution to the scheduler instead. The scheduler then, depending on the number of available threads, will execute the algorithm at the appropriate and dependency.

Here is the map of how vtkExecutionScheduler interacts with the others:

This is a graph with borders and nodes. Maybe there is an Imagemap used so the nodes may be linking to some Pages.

Each algorithm can define a set of configurations it would like/can execute on by specifying the computing resources, the number of threads, to vtkComputingResource and set it to the value of COMPUTING_RESOURCES() key of the algorithm meta information. The value can either be a range or a list of configurations. At run-time, the scheduler will use this information to perform load balancing and assign resource to the module.

ThreadedStreamingSchedulerPriority.png

Flow Dependency and Execution Order

When executing modules of a network simultaneously, a simple FIFO queue of modules will not guarantee the order and data dependency of the pipeline. To avoid this, we use a priority queue with the module topoligical order encapsulated into keys. It also helps to make sure that there would not be any repeated update for a single request in Push() model. In the case of the pipeline above,, a regular FIFO queue would push module (1), (2), (3), (4), (5) and (8), thus, (8) may be executed twice, but with the priority queue, when (6) is pushed in, it would get executed before (8). And the execution of (8) would be postponed to after (7).

However, it still causes problems with streaming, when upstream modules may process a later piece of data but have to get executed before downstream modules with earlier piece because of its topological order. For example, if (1) is a streaming data reader, after it processed its first data piece and pass down to the contour filters (2) and (5). (2) and (5) are now on the scheduling queue. Then, the reader may want to read the second piece, and thus, putting itself again on the queue. Because (1) has lower order than (2) and (5) it will get execute again. This is not correct.

Our solution is to use not only the topological order as priority key but also use the data block number, e.g. streaming piece. We use a 64-bit key on the priority queue with the high 32 bits corresponds to the data block number and the lower 32 bits as the topological order. Internally, if modules don't specify the data block number as they push the execution onto the queue, a global counter is used. With this approach, the scheduler will execute as far down of a single data-block as possible before processing the next piece.

Data Dependency

When a module executes and pushes its data downstream, how soon it can be executed again is determined by the fact that whether its next execution may cause the output to be overwritten before its downstream modules done processing it. This is the concurrent read-write problem in pipeline parallelism. To tackle the problem, we allow locking of module output connections. For example, whenever a module perform a Push() operation, it can set lock on its outputs and wait until all of its downstream modules signal that its outputs have been processed. A locked module will not be called active by the scheduler, thus, might causes wasting of resources if the number of modules is not much larger the number of threads. The implementation is simply a vtkMutexLock with a number of connections to be locked.

For the default VTK behavior, Push() will lock with its number of output connections. Whenever a module is done execution (with a Push request), it will release its input (i.e. the upstream output) connection. This implicitly causes the modules to be updated in an order that no modules immediately connected to each other would be updated at the same time.

This behavior can totally be override for custom pipeline. Any module that wish too unlock its upstream modules can call ReleaseInputs(). Below is an example that a module unlocks its upstream modules as soon as its done reading the data (instead of done processing the data):

void RequestUpdateData(...) {
  // copying input data
  ...
 
  this->GetExecutive()->ReleaseInputs();
 
  // processing the copied data
  ...
}

Similarly, if a module does not wish to lock its output (e.g. it just writes to a dedicated memory area depending on the input), it can call vtkThreadedStreamingPipeline::PushNoLock(), which is the same syntax with Push() but will not lock any port.

Resource Distributing Strategy

Right now the strategy has not yet received a lot of attention. It is rather quite simple. On the first run, all modules receive receives the minimum number of threads, mostly 1, unless the module COMPUTING_RESOURCES() specifies otherwise. As each module runs, vtkThreadedStreamingPipeline measures the time its algorithm spent on REQUEST_UPDATE_DATA(). Then, once a complete data block is processed, the scheduler will collect these information and split the number of threads on each branch upstream by the ratio of its running time.