View Issue Details Jump to Notes ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0015632ParaView(No Category)public2015-08-10 15:382015-11-04 20:57
ReporterAlan Scott 
Assigned ToUtkarsh Ayachit 
PrioritynormalSeverityminorReproducibilityhave not tried
StatusclosedResolutionfixed 
PlatformOSOS Version
Product Versiongit-master 
Target Version5.0Fixed in Version5.0 
Summary0015632: Exodus reader is crazy slow reading time data
DescriptionExodus reader is crazy slow reading many timestep, many file, many variable datasets. The reason is we are reading redundant data. Within the Exodus reader, we are basically doing the following:

for every file()
    for all timesteps
        read the floating point time for each timestep as a double.

Now, if we have a small number of cells (for instance, a million) and a lot of variables, the time on each timestep is spread out quite far within each file. This forces the disk to read up a mutli-million byte disk block for each time read. This is crazy expensive.

ParaView reads a example dataset in 11 minutes 10 seconds. EnSnight reads the same dataset (reading in the last time step) in 14 seconds.

From one of the archtects of the Exodus II spec, Greg Sja...,

If there is a spatially decomposed (file-per-processor) set of files, then all files should have the same number of time steps and the same times for those time steps. If the set of files is written by a sierra application or one that uses the IOSS library, then in addition, there will be an attribute in the file called “last_written_time” which should be the time of the timestep which was written and flushed to disk. If, for example, the code crashed while writing a timestep, then the last_written_time would be less than the maximum time on the database.

So, what I would propose is that we read in the time data from one file, and pass this time data around to the data structures from the other files. Reads are amazingly expensive compared to just passing data between objects/ processes.


I am giving the latest Exodus II spec to Utkarsh. It may be passed into the public domain.
TagsNo tags attached.
ProjectSandia
Topic Name
Typeincorrect functionality
Attached Files

 Relationships

  Notes
(0035253)
Alan Scott (manager)
2015-10-01 17:16

Utkarsh asked for more detail. Here are my notes. You should be able to see this by putting a break in vtkExodusIIReader::RequestInformation. Use 32 file WhippleShield, or 8 file zpinch.


For a 256 file, 256 time step, 20 variable, .5 million cell dataset,

For (all files (this process))
{
  vtkExodusIIReader::RequestInformation
    this->Metadata->RequestInformation();
      this->UpdateTimeInformation() (vtkExodusIIReader::UpdateTimeInformation()
        ex_get_all_times(Exiod, this->Times[])
          nc_get_var_double(time_values[])
            NC_get_var(value[], DOUBLE)
              NC_get_vara(value[], type)
                get_vara(value[],type)
                  for(all timesteps)
                  {
                    readNCv(*valueThisTimestepDouble)
                      getNCvx_double_double(*valueThisTimestepOneDouble)
                        get(*valueThisTimestepOneDouble)
                          px_pgin(*valueThisTimestepOneDouble)
                          read(*valueThisTimestepOneDouble)
                  }
}

Load time local server, Linux – 11 minutes 10 seconds.
(0035282)
Alan Scott (manager)
2015-10-13 20:39

I have a dataset that I will give you. It is exactly the same as my user's, however has fewer time steps. It is under a million cells, 256 files, 20'ish variables, and 16 timesteps. The original is actually 500 timesteps.

File name: slicex.e.256.xxx

The primary issue is that we are finding all of the time information in every single file, necessitating reading in all files start to finish.

In file vtkExodusIIReader.cxx, vtkExodusIIReader::RequestInformation(), put a break at line number 5318, which looks like this: this->Metadata->RequestInformation(). Now, go to this point, and go a half dozen times. We make this call for every file (256 of them in this case).

Now, step in. You are now in vtkExodusIIReaderPrivate::RequestInformation() (also in vtkExodusIIReader.cxx). Step to line 3920, which looks like this: VTK_EXO_FUNC( this->UpdateTimeInformation(), "" ); We now update time information for every single file, although by definition, all files will be the same.

Open file putget.c. Go to line 5232. This is in NC3_get_vara(). This code looks like this:

    while(*coord < *upper)
     {
<<snip>>

         odo1(start, upper, coord, &upper[index], &coord[index]);
     }
 

This while loop is now stepping completely though this file, searching for and reading every time stamp in the file, stepping over the variable data. So, seeks in the files. Further, very high performance disk systems will read not just the double of the time, but rather the whole cache line (ballpark a million bytes). Thus, you read the whole file every time.

If you start the debugger going (no breaks), and keep hitting halt, you always end up with this odo1 call, inside of the while loop, as the hit point.

Ping me if I was not clear.
(0035284)
Utkarsh Ayachit (administrator)
2015-10-14 10:06

I think I am able to reproduce this. While I can hack together a solution that avoid re-reading time information, there's a bigger problem with the design of the exodus reader. Currently, for datasets with multiple files for spatial partitions, we use the vtkPExodusIIReader, which internally create a vtkExodusIIReader for each of the files. In vtkPExodusIIReader::RequestData(), it calls vtkExodusIIReader::RequestInformation() and vtkExodusReader::RequestData() for each of the files --- this is what you're seeing in your debugging.

While I can make vtkPExodusIIReader pass the time information from the first file's reader to the others, what about other information that it reads e.g. variables available etc.? Isn't that going to cause slow down too? Maybe not as much since it may not need to skip through the entire file to determine that information. As a first pass, I am going to just try to share time information. Let me know if things are not fast enough and we can see how we can share other meta-data too.
(0035288)
Alan Scott (manager)
2015-10-14 13:02

YESSSSSSS!!!!! Yes, that did it. Going to the original dataset (500 time steps), Linux, serial, debug build, we just went from a read time of about 11 minutes 10 seconds to about 15 seconds. Second time through (data still in cache) read the whole dataset in in 8 seconds.

I tried the example I gave you (16 time steps), it read in 15 seconds.

OK, I had to check. Production 4.4.0, release build,
16 timesteps, 60 seconds
500 time steps, 4 minutes.
(0035289)
Utkarsh Ayachit (administrator)
2015-10-14 13:51

https://gitlab.kitware.com/vtk/vtk/merge_requests/769 [^]
(0035290)
Utkarsh Ayachit (administrator)
2015-10-14 16:09

https://gitlab.kitware.com/paraview/paraview/merge_requests/428 [^]
(0035418)
Alan Scott (manager)
2015-11-04 20:57

Great!!

Tested master, Linux, local server, debug build, OGL2. Took about 25 seconds first time, 10 second time. 500 timesteps, 256 files, 625,000 cells.

 Issue History
Date Modified Username Field Change
2015-08-10 15:38 Alan Scott New Issue
2015-08-10 15:39 Alan Scott Target Version => 4.5
2015-09-11 16:43 Utkarsh Ayachit Target Version 4.5 => 5.0
2015-10-01 17:16 Alan Scott Note Added: 0035253
2015-10-13 20:39 Alan Scott Note Added: 0035282
2015-10-14 10:06 Utkarsh Ayachit Note Added: 0035284
2015-10-14 13:02 Alan Scott Note Added: 0035288
2015-10-14 13:51 Utkarsh Ayachit Note Added: 0035289
2015-10-14 13:51 Utkarsh Ayachit Status backlog => gatekeeper review
2015-10-14 13:51 Utkarsh Ayachit Resolution open => fixed
2015-10-14 13:51 Utkarsh Ayachit Assigned To => Utkarsh Ayachit
2015-10-14 16:09 Utkarsh Ayachit Note Added: 0035290
2015-10-15 08:07 Utkarsh Ayachit Status gatekeeper review => customer review
2015-10-15 08:07 Utkarsh Ayachit Fixed in Version => git-master
2015-10-28 09:29 Utkarsh Ayachit Fixed in Version git-master => 5.0
2015-11-04 20:57 Alan Scott Note Added: 0035418
2015-11-04 20:57 Alan Scott Status customer review => closed


Copyright © 2000 - 2018 MantisBT Team