Categories appearing in this document

Download 56.65 Kb.
Size56.65 Kb.
Note: This document primarily consists of snippets of text copied from various online resources. This “cheat sheet” is only for informational purposes.
Categories appearing in this document:

  • Data Description and Formatting

  • Middleware/GRID Technologies

  • Model Description and Archive

  • Portal Technologies

  • Data Providers

  • Modeling Infrastructure

Data Description and Formatting

Climate and Forecasting (CF) metadata

“The CF conventions for climate and forecast metadata are designed to promote the processing and sharing of files created with the netCDF API. The conventions define metadata that provide a definitive description of what the data in each variable represents, and of the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities.”
CF is backwards compatible with COARDS. (home page?) (article)

Cooperative Ocean-Atmosphere Research Data Service (COARDS)

“This standard is a set of conventions adopted in order to promote the interchange and sharing of files created with the netCDF Application Programmer Interface (API).” (home page?)


Earth System Markup Language
“Earth science data is archived and distributed in many different formats varying from character format, packed binary, "standard" scientific formats to self-describing formats. This heterogeneity results in data-application interoperability problems for scientific tools. The Earth Science Markup Language (ESML) is an elegant solution to this problem. ESML is an interchange technology that enables data (both structural and semantic) interoperability with applications without enforcing a standard format within the Earth science community. Users can write external files using ESML schema to describe the structure of the data file. Applications can utilize the ESML Library to parse this description file and decode the data format. As a result, software developers can now build data format independent scientific applications utilizing the ESML technology. Furthermore, semantic tags can be added to the ESML files by linking different domain ontologies to provide a complete machine understandable data description. This ESML description file allows the development of intelligent applications that can now understand and "use" the data.”
“ESML provides Syntactic (or structural) metadata describe the data in terms of bits and bytes. These metadata are used by the ESML parser to give structure to the bit stream which is the data file. For example, the syntactic metadata tell the parser that the next 32 bits of data are to be interpreted as a big-endian 32-bit two’s complement integer value.” (home page) (overview) (schema description)

FGDC - Content Standard for Digital Geospatial Metadata

Federal Geographic Data Committee (also see Data Providers)
“The objectives of the standard are to provide a common set of terminology and definitions for the documentation of digital geospatial data. The standard establishes the names of data elements and compound elements (groups of data elements) to be used for these purposes, the definitions of these compound elements and data elements, and information about the values that are to be provided for the data elements.”
“This program is a compiler to parse formal metadata, checking the syntax against the FGDC Content Standard for Digital Geospatial Metadata and generating output suitable for viewing with a web browser or text editor.” (mp metadata tool)

Geographic Markup Language (GML)

“Geography Markup Language is an XML grammar written in XML Schema for the modeling, transport, and storage of geographic information.” (article) (GML specification)

GRIB Format

Timeline: released 1985
“The World Meteorological Organization (WMO) Commission for Basic Systems (CBS) Extraordinary Meeting Number VIII (1985) approved a general purpose, bit-oriented data exchange format, designated FM 92-VIII Ext. GRIB (GRIdded Binary). It is an efficient vehicle for transmitting large volumes of gridded data to automated centers over high-speed telecommunication lines using modern protocols. By packing information into the GRIB code, messages (or records - the terms are synonymous in this context) can be made more compact than character oriented bulletins, which will produce faster computer-to-computer transmissions. GRIB can equally well serve as a data storage format, generating the same efficiencies relative to information storage and retrieval devices.” (explanation of GRIB format)

Hierarchical Data Format (HDF5)

Timeline: latest release, 03/2005
“HDF5 is a general purpose library and file format for storing scientific data.”

”HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic objects, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs.” (home page)


Hierarchical Data Format – Earth Observing System
“In 1993 NASA chose NCSA's HDF format to be the standard file format for storing data from the Earth Observing System (EOS), which is the data gathering system of sensors (mainly satellites) supporting the Global Change Research Program.” (home page) (user’s guide)


“NetCDF (network Common Data Form) is an interface for array-oriented data access and a library that provides an implementation of the interface. The netCDF library also defines a machine-independent format for representing scientific data. Together, the interface, library, and format support the creation, access, and sharing of scientific data.” (home page) (user’s guide)

PCMDI - Climate Model Output Rewriter (CMOR)

The "Climate Model Output Rewriter" (CMOR, pronounced "Seymour") comprises a set of FORTRAN 90 functions that can be used to produce CF-compliant netCDF files that fulfill the requirements of many of the climate community's standard model experiments.  These experiments are collectively referred to as MIP's and include, for example, AMIP, CMIP, CFMIP, PMIP, APE, and IPCC scenario runs. The output resulting from CMOR is "self-describing" and facilitates analysis of results across models. (CMOR user’s guide)

Middleware/GRID Technologies

Common Component Architecture

“The Common Component Architecture (CCA) Forum is a group of researchers from national labs and academic institutions committed to defining a standard component architecture for high performance computing. “

“The objective of the CCA Forum is to define a minimal set of standard interfaces that a high-performance component framework has to provide to components, and can expect from them, in order to allow disparate components to be composed together to build a running application. Such a standard will promote interoperability between components developed by different teams across different institutions.” (CCA forum)


“Grid computing, the ability for communities to share resources, has emerged as an important facet of computing. Condor-G is the marriage of technologies from the Condor project and the Globus project.”
“The Condor-G system leverages recent advances in two distinct areas: (1) security and resource access in multi-domain environments, as supported within the Globus Toolkit, and (2) management of computation and harnessing of resources within a single administrative domain, embodied within the Condor system. Condor-G combines the inter-domain resource management protocols of the Globus Toolkit and the intra-domain resource and job management methods of Condor to allow the user to harness multi-domain resources as if they all belong to one personal domain.” (home page)

Globus Toolkit / 4 (GT/4)

Timeline: v1.0 in 1998, v4.0 April 2005
“The open source Globus Toolkit is a fundamental enabling technology for the "Grid," letting people share computing power, databases, and other tools securely online across corporate, institutional, and geographic boundaries without sacrificing local autonomy. The toolkit includes software services and libraries for resource monitoring, discovery, and management, plus security and file management. In addition to being a central part of science and engineering projects that total nearly a half-billion dollars internationally, the Globus Toolkit is a substrate on which leading IT companies are building significant commercial Grid products.”
Used by Earth System Grid for moving large amounts of climate data. (home page) (GT4 primer)


Timeline: Release 1 1/2003, Release 7 scheduled 9/2005
Open Grid Services Architecture - Data Access and Integration
“The OGSA-DAI project is concerned with constructing middleware to assist with access and integration of data from separate data sources via the grid. It is engaged in identifying the requirements, designing solutions and delivering software that will meet this purpose. The project was conceived by the UK Database Task Force and is working closely with the Global Grid Forum DAIS-WG and the Globus team.”
Built on Globus Toolkit. (home page)


“The OPeNDAP provides a way for ocean researchers to access oceanographic data anywhere on the Internet from a wide variety of new and existing programs. By developing network versions of commonly used data access Application Program Interface (API) libraries, such as NetCDF , HDF, JGOFS , and others, the OPeNDAP project can capitalize on years of development of data analysis and display packages that use those APIs, allowing users to continue to use programs with which they are already familiar.” (home page) (user’s guide)

THREDDS data catalog {Cinquini}

“The THREDDS (Thematic Realtime Environmental Distributed Data Services) project is developing middleware to bridge the gap between data providers and data users. The goal is to simplify the discovery and use of scientific data and to allow scientific publications and educational materials to reference scientific data.”
“The mission of THREDDS is for students, educators and researchers to publish, contribute, find, and interact with data relating to the Earth system in a convenient, effective, and integrated fashion. Just as the World Wide Web and digital-library technologies have simplified the process of publishing and accessing multimedia documents, THREDDS is building infrastructure needed for publishing and accessing scientific data in a similarly convenient fashion.” (home page)

Web Feature Service

Web Feature Service… “takes the next logical step and proposes interfaces for describing data manipulation operations on geographic features using HTTP as the distributed computing platform. Data manipulation operations include the ability to:

  1. Create a new feature instance

  2. Delete a feature instance

  3. Update a feature instance

  4. Get or Query features based on spatial and non-spatial constraints

A Web Feature Service (WFS) request consists of a description of query or data transformation operations that are to be applied to one or more features. The request is generated on the client and is posted to a web feature server using HTTP. The web feature server then reads and (in a sense) executes the request.”

“That is to say that the state of a geographic feature is described by a set of properties where each property can be thought of as a {name, type, value} tuple. The name and type of each feature property is determined by its type definition. Geographic features are those that may have at least one property that is geometry-valued. This, of course, implies that features can be defined with no geometric properties at all. The geometries of geographic features are restricted to what OGC calls simple geometries. A simple geometry is one for which coordinates are defined in two dimensions and the delineation of a curve is subject to linear interpolation.” (home page) (WFS specification)

Web Services Resource Framework (WSRF)

“We have pointed out that even when a Web service implementation itself can be described as a stateless message processor, the message exchanges that it implements (as defined by its interface) are frequently intended to enable access to, and/or update of, state maintained by other system components, whether database, file systems, or other entities.”

“Given the vital role that access to state plays in many Web service interfaces, it is important to identify and standardize the patterns by which state is represented and manipulated, so as to facilitate the construction and use of interoperable services. To this end, we introduce an approach to modeling stateful resources in a Web services framework based on a construct that we call a WS-Resource.” (home page)


“XCAT allows scientists to compose applications with a set of distributed components. In recent times the Web Services model has been adopted as the underlying architecture for Grid Systems. One of the goals of the XCAT project is to be develop a distributed framework that is consistent with this model. The distributed framework itself is designed to be compliant with the Common Component Architecture (CCA) specification. An XCAT component can serve as both a CCA and a Grid (OGSI based) service. XCAT has been developed in both C++ and Java.” (home page)

Model Description and Archives

Biogeochemical Model Archive - ORNL-DAAC

“Archiving environmental data products has become recognized as a vital research practice: it improves our ability to reproduce results and perform additional analyses while saving the cost of redundant data collection activities.”

“The same rationale applies to archiving numerical models. Archived models will provide the methodological detail of numerical modeling studies to recreate published modeling results, enabling the synthesis of results across modeling studies and the investigation of new hypotheses. In addition, archived models will allow determination of uncertainties for comparison with results from other models in assessment / policy studies. The model source code will also allow others to see how models treat individual processes.”
“The model archive contains comprehensive model documentation, input files, source code, code version, output files, and output analysis approaches or software used to produce tables and figures for a particular publication.”
“We are creating a two-tiered archive for numerical models. The first tier supports the storage and retrieval of benchmark model versions, and the second tier supports the association of published research results with specific model implementations. For both tiers, we present a set of recommended best practices aimed at raising the standards for reproducibility in numerical modeling studies through the use of a dedicated archive for numerical models and modeling studies. The model archive is intended as a resource for experienced modelers.” (home page)

Numerical Model Metadata XML

Timeline: latest release, 08/2005
“The Numerical Model Metadata XML - NMM XML, previously known as the EarleySuite, is an evolving metadata standard intended for the exchange of information about numerical models or codebases, and the simulations done using them. Providing a metadata standard to describe the numerical codebase and its associated simulations, greatly extends and refines the researcher's ability to understand how a resulting output data was produced.”

”The Numerical Model Metadata XML is built and based on XML and associated technologies.  The goal in the design of the suite of Numerical Model Metadata XML is to provide the clear, well-defined and flexible metadata needed for climate and forecast numerical models and the simulations which produce numerical model output data. At this point the Numerical Model Metadata XML is primarily concentrating on describing numerical climate models and the simulations done using them.” (home page)


Timeline: est. 1989,
The Program for Climate Model Diagnosis and Intercomparison
“The PCMDI mission is to develop improved methods and tools for the diagnosis and intercomparison of general circulation models (GCMs) that simulate the global climate. The need for innovative analysis of GCM climate simulations is apparent, as increasingly more complex models are developed, while the disagreements among these simulations and relative to climate observations remain significant and poorly understood. The nature and causes of these disagreements must be accounted for in a systematic fashion in order to confidently use GCMs for simulation of putative global climate change.” (home page) (software tools) (listing of numerous MIPs) (another listing of MIPs)


Register of Ecological Models
The Register of Ecological Models (REM) is a meta-database for existing mathematical models in ecology. (home page)

WRF Metadata Registry

Timeline: first non-beta release, 05/2004
“The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather prediction system designed to serve both operational forecasting and atmospheric research needs. It features multiple dynamical cores, a 3-dimensional variational (3DVAR) data assimilation system, and a software architecture allowing for computational parallelism and system extensibility. WRF is suitable for a broad spectrum of applications across scales ranging from meters to thousands of kilometers.” (home page) (paper software architecture) (software tools and documentation)


“The Registry is a concise database of information about WRF data structures and a mechanism for automatically generating large sections of WRF code from the notations in the database. The Registry data base is a collection of tables that lists and describes the WRF state variables and arrays with their attributes such as dimensionality, number of time levels, association with a particular dynamical core, association with a particular physics package, membership in an input, output, or restart dataset, communication operations on the data, and some descriptive meta-data such as what the variable or array represents and its units. From this database, the Registry generates code for interfaces between layers of the infrastructure, packing and unpacking code for communication and nesting, and field-by-field calls to routines for model I/O -- code that would otherwise be extremely time-consuming and error-prone to write and manage manually. Adding or modifying a state variable or array in WRF is a matter of modifying a line or two in the Registry. Currently, the Registry automatically generates 60-thousand of the total 250-thousand lines of WRF code.” (description of registry)

Portal Technologies

JSR 168

Timeline: API Draft 10/2002, Final Release 10/2003
“JSR 168 establishes a standard API for creating portlets, the integration component between applications and portals that enables delivery of an application through a portal. Without this standard, each version of an application has needed its own portlet API, and each of the various portals required that these portlets be specifically tailored for implementation through that portal. This has raised portlet developer time, effort, and costs with the net effect that fewer applications have been made available through fewer portals to the detriment of the end-users, ISVs, developers, and portal vendors.” (home page)

Open Grid Computing Environments (OGCE)

Timeline: RC5 8/2005
“The OGCE is developing standard compliant portlet components that can be reused by multiple container providers. The JSR 168 Portlet Specification defines interoperability standards for portal containers. JSR 168 compliant portlets may be deployed in multiple vendor containers.” (home page)

Data Providers

Earth Science Information Partners (ESIP)

“The Federation of Earth Science Information Partners ("Federation") is a network of researchers and associated groups that collects, interprets and develops applications for satellite-generated Earth observation information. Founded in 1998 under a grant from NASA, the consortium includes more than 80 member organizations, spanning NASA and NOAA's data centers, government research laboratories, research universities, education resource providers, technology developers, and nonprofit and commercial enterprises.” (home page)

Earth System Grid (ESG)

“The primary goal of ESG is to address the formidable challenges associated with enabling analysis of and knowledge development from global Earth System models. Through a combination of Grid technologies and emerging community technology, distributed federations of supercomputers and large-scale data & analysis servers will provide a seamless and powerful environment that enables the next generation of climate research.” (home page) (An Ontology for Scientific Information in a Grid Environment: the Earth System Grid.) (ESG 1.1 Ontology)

Federal Geographical Data Committee

“The FGDC is developing the National Spatial Data Infrastructure (NSDI) in cooperation with organizations from State, local and tribal governments, the academic community, and the private sector. The NSDI encompasses policies, standards, and procedures for organizations to cooperatively produce and share geographic data.” (home page) (geospatial metadata standards)


“Using the data elements defined in the Content Standards for Digital Geospatial Metadata, governmental, non-profit, and commercial participants worldwide can make their collections of spatial information searchable and accessible on the Internet using free reference implementation software developed by the FGDC.” (Clearinghouse)


Timeline: 4th workshop held June 2005
“The Global Organization for Earth System Science Portal (GO-ESSP) is a collaboration designed to develop a new generation of software infrastructure that will provide distributed access to observed and simulated data from the climate and weather communities. GO-ESSP will achieve this goal by developing individual software components and by building a federation of frameworks that can work together using agreed-upon standards. The GO-ESSP portal frameworks will provide efficient mechanisms for data discovery, access, and analysis of the data.” (home page)

Intergovernmental Panel for Climate Change (IPCC)

“The role of the IPCC is to assess on a comprehensive, objective, open and transparent basis the scientific, technical and socio-economic information relevant to understanding the scientific basis of risk of human-induced climate change, its potential impacts and options for adaptation and mitigation.” (home page) (IPCC Data Distribution Center)

Linked Environments for Atmospheric Discovery (LEAD)

Linked Environments for Atmospheric Discovery
“A multi-disciplinary effort involving 9 institutions and more than 100 scientists, students and technical staff, LEAD is addressing the fundamental IT research challenges, and associated development, needed to create an integrated, scalable framework for identifying, accessing, preparing, assimilating, predicting, managing, analyzing, mining, and visualizing a broad array of meteorological data and model output independent of format and physical location.” (home page) (latest report)


“A major underpinning of LEAD is dynamic workflow orchestration and data management in a web services framework – a concept we frame more generally as Workflow Orchestration for On-Demand, Real-Time, Dynamically-Adaptive Systems (WOORDS). WOORDS provides for the use of analysis tools, forecast models, and data repositories not in fixed configurations or as static recipients of data, as is now the case for most meteorological research and operational forecasting technologies, but rather as dynamically adaptive, on-demand, grid-enabled systems that can a) change configuration rapidly and automatically in response to weather; b) continually be steered by new data; c) respond to decision-driven inputs from users; d) initiate other processes automatically; and e) steer remote observing technologies to optimize data collection for the problem at hand. Although mesoscale meteorology is the particular problem to which the WOORDS concept is being applied, the methodologies and infrastructures being developed are extensible to other domains such as medicine, ecology, oceanography and biology.” (WOORDS description)

National Weather Service data format for RSS feed

“RSS is an XML based document format for syndicating news and other timely news-like information. It provides headlines, URLs to the source document and brief description information in an easy to understand and use format. RSS based "News Readers" and "News Aggregators" allow the display of RSS headlines on workstation desktops. Software libraries exist to read the RSS format and present RSS headlines on web pages and other online applications.” (RSS alerts) (RSS 2.0 Specification)

Modeling Infrastructure

Earth System Modeling Framework (ESMF)

GFDL Flexible Modeling System (FMS)

“FMS is a software framework for supporting the efficient development, construction, execution, and scientific interpretation of atmospheric, oceanic, and climate system models.”

  1. A software infrastructure for constructing and running atmospheric, oceanic, and climate system models. This infrastructure includes software to handle parallelization, input and output, data exchange between various model grids, orchestration of the time stepping, makefiles, and simple sample run scripts. This infrastructure should largely insulate FMS users from machine-specific details.

  2. A standardization of the interfaces between various component models.

  3. Software for standardizing, coordinating, and improving diagnostic calculations of FMS-based models, and input data preparation for such models. Common preprocessing and post-processing software are included to the extent that the needed functionality cannot be adequately provided by available third-party software.

  4. Contributed component models that are subjected to a rigorous software quality review and improvement process. The development and initial testing of these component models is largely a scientific question, and would not fall under FMS. The quality review and improvement process includes consideration of (A) compliance with FMS interface and documentation standards to ensure portability and inter-operability, (B) understandability (clarity and consistency of documentation, comments, interfaces, and code), and (C) general computational efficiency without algorithmic changes.

  5. A standardized technique for version control and dissemination of the software and documentation. (home page) (description)


This European analog to ESMF has created a software infrastructure for coupling arbitrary climate models. (home page)

Modeling Environment for Atmospheric Discovery (MEAD)

“The goal of the MEAD expedition is the development/adaptation of cyberinfrastructure that will enable simulation, datamining/machine learning and visualization of hurricanes and storms utilizing the TeraGrid. The focus is on retrospective computation and analysis (not real-time prediction). Portal grid and web infrastructure will enable launching of hundreds of individual WRF (Weather Research and Forecasting), Regional Ocean Modeling System (ROMS), or WRF/ROMS simulations on the grid in either ensemble or parameter mode. Metadata and the resulting large volumes of data will then be made available through the MEAD portal for further study and for educational purposes. “ (home page) (overview)

Catalog Services for the Web virtual data set (CSW) {Liping}


Configuration Attributes for Textual I/O (CATTR) {Smith}


Geos Generic


Download 56.65 Kb.

Share with your friends:

The database is protected by copyright © 2024
send message

    Main page