Performance Report for 2004 hdf support for the esdis project and the eosdis standard Data Format



Download 85.88 Kb.
Page3/5
Date28.01.2017
Size85.88 Kb.
#9700
1   2   3   4   5

3.3Evolve HDF5 library and tools


The following list of utility and workstation tool development were identified as important in the past year. These utilities and tools were prioritized and implemented as resources allowed.

Subtask

Status

3.3.1Format features


With HDF5, we believe we have developed a basic format structure that can stand the test of time, but extensions to the format will almost certainly be needed, and software that will access HDF5 data will also change. New features that will likely be added to include new forms of storage (e.g. new data compression schemes), and new data models such as indexing schemes to support better search and retrieval.

In 2004, NCSA will also



  • Add Szip compression support in new releases of HDF4 and HDF5 with encoder optionally disabled.

  • Add support for dimension scales in HDF5.

  • Investigate and prototype new compression methods that show promise for the EOS community.

Szip. This year we resolved almost all of the issues with SZIP, and released updated SZIP libraries at the same time as HDF4 and HDF5 were updated.

  • The SZIP library was modified to make it possible to create a version of the library with the encoder disabled. This version has no license restrictions.

  • The HDF5 library was modified to dynamically detect whether the SZIP encoder is available or not. This makes it easier for tools and libraries to provide for users who do and don’t have the encoder.

Details on Szip and HDF are available at http://hdf.ncsa.uiuc.edu/doc_resource/SZIP/.

We continued to study the requirements to support dimension scales in HDF5. This work is still in progress, but is expected to see a phase 1 implementation in 2005.

A Fortran API was added to the HDF5 High Level API, and will be released by the end of 2004.

In collaboration with Unidata, we will implement a new method in HDF5 analogous to the “scale+offset” compression used in GRiB.


3.3.2High performance computing.


New high performance computing architectures and other HPC developments are also certain to require changes in the HDF5 library, and perhaps also changes in the format. The likely transition to Linux cluster computing will place demands on HDF5 that will need to be addressed. Thread safety has been identified as an important feature of EOS software. We will need to determine what this means for HDF5 and what can be done with HDF5 to support multithreaded applications. Also, performance testing is a valuable way to discover ways to improve the performance of the HDF5 library, and also to identify strategies that applications can use to improve their I/O performance.

In 2003, NCSA will investigate the requirement that one application be able to write data while another application reads the data. This is a feature that several people have requested. It may be possible to create a special file system driver to support this kind of operation.1 This would be a software development project, so and implementation would require extra resources.



The new “Flexible Parallel HDF5” API will be completed in early 2004. NCSA will carry out testing with this interface and report results as they might benefit the EOS community.

NCSA has invested considerable resources into the achievement of high I/O performance in both serial and parallel computing environments. Most of this work has been supported by NCSA, NSF, and DOE sponsorship, but its benefits will be very valuable to the EOS community as it embraces new high performance computing technologies. This included the following activities.

  • The WRF (Weather Research and Forecasting) model was adapted to use HDF5, including parallel HDF5. Performance studies revealed valuable information on how to run such simulations, and demonstrated that substantial performance improvements could be achieved on parallel platforms. (http://hdf.ncsa.uiuc.edu/apps/WRF-ROMS/).

  • The same project also adapted the ROMS model to use parallel I/O with the experimental parallel version of netCDF from Argonne National Laboratory. Results demonstrated the value of being able to do parallel I/O with netCDF and foreshadow improvements in the next version of netCDF, which will be built on top of HDF5.

  • The HDF5 team plays a central role in the NSF TeraGrid Project, HDF5 being one of the key technologies used by applications on the computational grid.

  • An initial implementation of a feature called “Flexible Parallel HDF5” (FPHDF5) was completed and released in June. FPHDF5 simplifies the programming model for situations where many processors access a single file or dataset. Subsequent performance testing revealed performance bottlenecks, and additional research is planned to investigate new models for doing parallel I/O in HDF5. (http://hdf.ncsa.uiuc.edu/Parallel_HDF/PHDF5/FPH5/)



3.3.3Tools development


This work includes continuing emphases on supporting HDFView and the HDF4 to HDF5 transition. Specific work in 2004 includes

  • Maintenance and periodic releases

  • Investigate the possibility of creating an HDF web browser plug-in, based on the modular version of HDFView.

  • Continue to work with ECS to create an HDF-EOS module for HDFView, making it possible to meaningfully inspect HDF-EOS files.

  • Add dimension scales to all HDF5 tools when they become available.

  • Complete first release of the hrepack utility

  • Address the need for tools to be able to work with datasets that cannot fit in core.

  • Conduct a major review and revision of the DDL for h5dump. As resources permit, update and upgrade the h5dump utility to implement many user requests, recent features, etc.

  • Add requested enhancements to h5diff and h4diff utilities.




There were releases of the Java tools in March . and October. A number of new features were added to HDFView:

  1. Modular HDFView. Modular HDFView is an improved HDFView where I/O and GUI components are replaceable modules. The current replaceable modules include: File I/O, Image view, Table view (a spreadsheet-like layout), Text view, Metadata (metadata and attributes) view, Tree view, and Palette view.

  2. User’s Guide on How to Implement HDFView Modules, http://hdf.ncsa.uiuc.edu/hdf-java-html/hdfview/ModularGuide

  3. Seamless plug-in for user modules. HDFView will automatically detect and load user's GUI and I/O modules. There is no need to register new modules.

  4. Ability to open remote files with a URL such as http://hdf.ncsa.uiuc.edu/hdf-java-html/hdf5_test.h5, available at ftp://ftp.ncsa.uiuc.edu/HDF/files/hdf5/hdf-java/hdf5_test.h5.

  5. "HelpView" interface. Users can implement their own help view.

  6. Quick view of metadata.

  7. Ability to create an empty HDF5 compound dataset.

  8. Ability to add and modify user blocks.

  9. Support for netCDF and FITS file formats (read-only).

  10. Animated display of 3D images.

  11. Display the HDF5 user block in text, octal, hex and other formats.

  12. Option to show char datasets as text.

  13. Display of nested compound datasets.

  14. Display of multi-dimension compound datasets.

HDFView was ported to two additional platforms: AIX & OSF1.

We collaborated closely with the ECS team to develop plug-ins to support HDF-EOS2, HDF-EOS5, and to display HDF-EOS objects. We are working with ECS to develop a distribution plan.

Hdiff and hrepack were high priority requests from DAAC developers. H5diff and h5dump enhancements was a high priority request from HIRDLS SIP. We have received both bug reports and positive feedback from the NASA users.

A number of bugs were fixed and a few features were added to the HDF tools:



  • h5diff

  • Added support for all types of data, including compound, etc.

  • Introduced new options for displays :

Normal mode: print the number of differences found and where they occurred

Report mode: print the above plus the differences

Verbose mode: print the above plus a list of objects and warnings

Quiet mode: do not print output .



  • h5dump: included new features of the format and library, and added new options for displaying information:

  • Print dataset filters, storage layout and fill value information.

  • Print a list of the file contents.

  • Escape non printing characters.

  • Print the contents of the super block.

  • Print array indices with the data (the default).

  • A new tool, h5jam, was created to add (or remove) a user block from an HDF5 file.)

We investigated the web-browser plug-in technology and published a report at:

http://hdf.ncsa.uiuc.edu/RFC/Web-Plugin/hdf_plugin_report.pdf

A demo version of HDF web-browser plug-in for Windows (on Microsoft ActiveX) was completed. The first release is schedule on middle December 2004.


3.3.4Investigating new data management technologies including XML


In 2004, the following activities will be explored as time and resources permit

  • Investigate and implement ‘modular’ HDF5 XML Schema. This has been requested by NASA contractors.

  • Investigate and implement a Web Browser plug-in for HDF5 (and possibly HDF4) Ideally, this will reuse the HDFView Java classes, to provide a sub-set of HDFView features in a plug-in, and be extensible by others, e.g., with HDF-EOS plug-ins. This task will require investigation to determine what can be done and what features the plug-in should have. There is a risk that it may not be feasible, or that we may not be able to reuse existing code.

  • Implement an XML reader for the Java HDFView. This can be used to create a separate ‘h5gen’ tool, as well.

  • If resources permit, develop full XML support for HDF4, similar to the support for HDF5. This work includes:

  1. Create an XML Schema for HDF4.

  2. Create a tool to convert HDF4 to XML, e.g., an h4toxml dumper

  3. Create a tool to convert XML to HDF4. This might be implemented as a Java module to be used in the HDFView tool.

Because of other priorities, this task received less attention in 2004 than others.

Investigation of the HDF5 XML schema indicated that no change is required to make it ‘modular’. However, several EOS tools that use DTDs will need to be updated to use XML schema.

As noted above, a major effort was made to develop a web-browser plug-in for MS Internet Explorer. As part of this work, we investigated a number of technologies, as described in the report at:

http://hdf.ncsa.uiuc.edu/RFC/Web-Plugin/hdf_plugin_report.pdf.

3.3.5General performance enhancements


HDF5 performance is expected to become increasingly important as EOS users migrate to this new format. Most HDF5 performance is sponsored by the HDF5 DOE community, but the following needs have been identified by the EOS user community:

  • Add suite of routines for benchmarking sequential access performance

  • Improve benchmarking capabilities of PIO (parallel I/O) benchmark suite.

  • Address issues that arise from benchmarking studies.




HDF5 performance enhancement was a major focus in 2004.. A number of modifications were made to the library to improve I/O performance for very large datasets and for files with large numbers of datasets or groups.

In working with DAACs and the ECS on HDF-EOS performance concerns, we discovered some ways that the HDF-EOS library could be changed to improve performance. This work ultimately led to a proposal for NCSA to work with the ECS and certain DAACS to study performance profiling. This work will take place in 2005.

Development of the h5repack and h5diff tools also proved valuable in the performance area. Locally, and among certain DAACs, these tools revealed I/O bottlenecks in the HDF5 library when accessing many objects in a large, complex HDF5 file. Subsequent tuning of the HDF5 library resulted in major performance gains.

3.3.6Investigate “HDF-GEO” standard and API


At the HDF-EOS Workshop VII, a number of participants recommended that NCSA investigate the possibility of defining simple geospatial data types in HDF5. As resources permit, we will investigate this possibility in 2004.

NCSA continued its work in support of handling geospatial data in HDF5. This included a project funded by the National Archives and Records Administration (NARA) that involved converting raster and vector data to HDF5. In turn, this work has led to discussions with ESRI and others about ways to support HDF5-based geospatial data in GIS applications.

In a special session at the October HDF-EOS Workshop VIII, possible directions for HDF-GEO were discussed, and this is expected to lead to further investigation of this very promising approach.


3.3.7Address issues of sustainability (New)


It is important that EOS data be available and usable for many decades into the future. One likely scenario for addressing this requirement is to sustain the HDF4 and HDF5 software and support for many more years. NASA’s continued support of the project has been, and will continue to be critical to the sustainability of HDF, but it is also important that the organization and institutional context of the project be sustainable. During the coming year, the NCSA team will address the need to continue the HDF5 project over time by identifying those aspects of the project that must continue and seeking mechanisms by which sustainability can be insured.

We will seek involvement from NASA representatives in this process. We will also engage other agencies and projects that similarly rely on continuing future support for HDF, such as the NPOESS project. Other agencies and projects that could be important, and contributing users of HDF include the National Weather Service, the meteorological modeling community, and we will seek to reach out to these communities.



The HDF Sustainability Working Group completed a mission statement and began work on a business plan for creating sustainable institutional support of HDF5. The approach that was arrived at is to create a non-profit institution whose mission is to sustain and support HDF technologies. This approach was discussed with NASA, NPOESS and others that rely on HDF, and received approval from all quarters.





Download 85.88 Kb.

Share with your friends:
1   2   3   4   5




The database is protected by copyright ©ininet.org 2024
send message

    Main page