1. 0 Strategic Information


Information infrastructure: Earth System Sciences Digital Repositories



Download 472.99 Kb.
Page2/3
Date05.05.2018
Size472.99 Kb.
#47670
1   2   3

1.1.2 Information infrastructure: Earth System Sciences Digital Repositories

The explosion in types and volumes of observational data and model simulations associated with the Earth System Science presents many challenges in the analysis, synthesis and interpretation of these data. Furthermore, the data is increasingly distributed nationally and internationally and interrogated by a broad spectrum of scientific communities. The scale and size of these data sets (1 Mb to 10’s Gb of self-describing structures, with data sets containing millions of files for repositories in the range of 10-100’s Terabytes) present computational, storage and work-flow challenges.


Increased understanding of the climate system requires the synthesis of homogenised repositories from disparate data sources to facilitate study of the complex feedback mechanisms in operation. The synthesis of these repositories will, increasingly, incorporate machine readable data sets (including model simulations of the Earth System) using a range of emerging standards and interfaces on the internet (or grid, see Figure 3).
The development of standard interfaces is a key aspect for the uptake of Earth Systems Science distributed data and simulation repositories by the wider scientific communities. Standard interfaces result in a uniform mechanism for data distribution across the scientific communities and, perhaps more importantly, the ability to interface with analysis tools and access protocols that these communities are familiar with. The community access protocols that will be supported include OPeNDAP and Web Services for the Earth Systems and Climate Impacts communities respectively.

Standards have been developed within these communities for the vocabularies and for both OPeNDAP and web services (WCS, WMS, and WFS) thus making it possible to present, to the wider scientific community, distributed data sets that are homogeneous in their vocabulary (eg “ocean temperature” is uniformly understood while the term “temperature” is ambiguous) along with underlying descriptive information and metadata. The responsibility for each data holding remains with the original source, typically these are organizations where the expertise and institutional mandate for curation of the data is located. Legacy datasets are unaltered to ensure compatibility with existing applications, with vocabulary standardisation being achieved by name translation on demand.


The development of self-describing data with a homogenous vocabulary allows the federation of data sets across institutional and national boundaries. This is the basis of collaboration between TPAC and the UK NERC Grid. Furthermore, and more importantly, this homogenisation allows for the automatic creation of catalogues that are rich in information about the data and model experiments, and that can be used for information discovery and analysis through web portals (see Figure 3).
From a technology perspective the convergence of web services (and related standards for geospatial data and models) and OPeNDAP on the grid (under Globus Toolkit 4) means that the TPAC and partner repositories can be combined with the emerging APAC information grid. This will allow the remaining difficulties associated with these repositories to be addressed more easily and with a logical consistency that provides greater synergy, including in particular

  • authorization, authentication, and access control

  • superior network capacity of g-ftp (rather than other protocols)

  • integration of compute grid (Earth Systems Science analysis tools) with repositories

  • development of a nationally distributed global file system for hosting ESS data

  • development of Earth Systems Science products

  • portals that integrate with Globus Toolkit




Figure 3:The TPAC Earth Systems Science digital repository consists of more than 2 million files and 26 Terabytes of climate and oceanographic data and model simulations distributed across four states and territories (left panel). The vision (right panel) is for the discovery, visualization and analysis of Earth Systems Science data on the APAC grid, using grid based tools.

Download 472.99 Kb.

Share with your friends:
1   2   3




The database is protected by copyright ©ininet.org 2024
send message

    Main page