Supplementary Material Ecoinformatics: Supporting Ecology as a Data-Intensive Science



Download 47.47 Kb.
Date conversion28.01.2017
Size47.47 Kb.
Supplementary Material

Ecoinformatics: Supporting Ecology as a Data-Intensive Science

William K. Michener1 and Matthew B. Jones2

1University Libraries, University of New Mexico, Albuquerque, NM 87131, USA

2National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, CA 93101, USA

Corresponding author: Michener, W.K. (william.michener@gmail.com)


Table S1. Metadata standards (A) and tools (B) that are commonly used in the ecological sciences




A. Metadata Standard

Description

Reference

Content Standard for Digital Geospatial Metadata (CSDGM)


CSDGM was created by the US Federal Geographic Data Committee and includes the Biological Data Profile (BDP). The CSDGM focuses on geospatial data and the BDP adds categories relevant to biological data.

http://www.fgdc.gov/metadata/csdgm/



Darwin Core


Darwin Core metadata include descriptors necessary for documenting museum specimens and facilitating the sharing of information pertaining to organisms and biological diversity (e.g., taxonomic classification, geographic location).

http://www.tdwg.org/activities/darwincore/

Dublin Core Element Set

Dublin Core metadata encompasses a small number of elements that are widely used to describe physical resources such as books and digital materials such as video, text files, images, and web pages.

http://dublincore.org/

Ecological Metadata Language (EML)

EML includes a comprehensive set of descriptors that can be used to document all elements of an array of ecological and environmental data and non-digital resources such as maps.

http://knb.ecoinformatics.org/software/eml/

ISO 19115

ISO19115 includes a comprehensive set of more than 400 elements that describe geospatial data and services. ISO 19115 is a standard of the International Organization for Standardization (ISO).

http://www.iso.org/iso/

B. Metadata Tool

Description

Reference

MERMAid (Metadata Enterprise Resource Management Aid)





MERMAid is an online metadata entry and management tool that supports FGDC compliant metadata and the Biological Data Profile. The National Coastal Data Development Center (NCDDC) developed this US National Oceanic and Atmospheric Administration metadata tool.

http://www.ncddc.noaa.gov/activities/mermaid/

Metavist


Metavist is a software tool for the metadata archivist, and is used to create FGDC compliant metadata. Metavist is a product of the US Forest Service and provides support for the Biological Data Profile.

http://metavist.djames.net/; [S1]



Morpho

Morpho is a comprehensive metadata management system that supports the creation and management of metadata that conform to EML, FGDC, and BDP standards. Morpho also interfaces with the Knowledge Network for Biocomplexity (KNB) Metacat server, which allows scientists to upload, download, store, query and view public metadata and data.

http://knb.ecoinformatics.org/morphoportal.jsp; [S2]




Table S2. Examples of programming languages, scripting languages and statistical software, and scientific workflows used in ecology and related sciences by category




Software Type and Tool

Description

Reference

Programming Languages







C, C++

C is a widely used, general-purpose computer programming language created at Bell Telephone Laboratories that has been used extensively for developing system software and portable application software.

C++ began as an extension to C and evolved into an intermediate level, general-purpose programming language that comprises a combination of both high-level and low-level language features. C++ is a popular programming language with application domains including systems software and for hardware design.



[S3, S4]



Fortran

Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and engineering and scientific computing. It remains very popular in areas such as climate modeling and high performance computing.

[S5]

Perl

Perl is a high-level, general-purpose, interpreted, flexible, dynamic programming language. Perl was originally developed as a general-purpose Unix scripting language but has undergone many revisions and become extremely popular among programmers. The language provides powerful text processing facilities and is used extensively for CGI scripting, graphics programming, system and network administration, bioinformatics and ecoinformatics, and other applications.

www.perl.org; [S6]

Python

Python is a general-purpose, high-level programming language (often used as a scripting language) that emphasizes code readability and supports a large and comprehensive library of code.

Python.org; [S7]










Statistics and Analysis







Excel

Microsoft Excel is a software package included in the Microsoft Office Suite that enables the creation of spreadsheets or forms, provides simple data comparison, QA/QC, and analysis and visualization tools, and creates graphs. Built-in or user-defined formulas can be used for calculations or transformations.

http://office.microsoft.com/en-us/excel/

MATLAB

MATLAB is an interactive data analysis and visualization environment that can be used to perform computationally-intensive operations on large data sets efficiently. MATLAB also provides a high level programming language that supports rapid development of workflow scripts and Graphical User Interface applications to automate repetitive tasks. A wide variety of discipline-specific software libraries, called toolboxes, are available from the publisher or user communities to extend the capabilities of the base program (e.g. statistics, curve fitting, image analysis and mapping). MATLAB programs can also leverage existing code written in Fortran, Java or other languages and source code is provided for most functions, allowing end-users to extend or customize routines for specialized analyses.

http://www.mathworks.com/; [S8]

R

R is a free software tool for statistical computing and graphics. R provides a wide variety of statistical and graphical techniques, and is highly extensible. One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. R is highly extensible and has many user-submitted packages for specific functions or specific areas of study such as bioinformatics, ecological models, population dynamics, and analysis of spatial data.

www.r-project.org; [S9]

SAS

SAS is an integrated system of software that enables procedures ranging from data access across multiple sources to complex manipulations of data files to performance of sophisticated statistical analyses and data visualizations. Three of SAS's most popular software products that are commonly used by ecologists are Base SAS, SAS/STAT, and SAS/GRAPH.

http://www.sas.com/; [S10]









Scientific Workflows







Kepler

Kepler is a scientific workflow application that enables scientists, engineers, analysts, and computer programmers to create, execute, and share models and analyses and associated provenance information about analyses. Kepler is a Java-based application that can operate on data stored in a variety of formats, locally and over the internet, and is an effective environment for integrating disparate software components, such as merging "R" scripts with compiled "C" code, or facilitating remote, distributed execution of models.

https://kepler-project.org/users/documentation;

[S11]


myExperiment

myExperiment is a collaborative environment where scientists can publish their workflows and experiment plans, share them with groups and find those of others.

http://www.myexperiment.org/; [S12]

Pegasus

Pegasus encompasses a set of technologies that help workflow-based applications execute in a number of different environments including desktops, campus clusters, grids, and clouds. Scientific workflows allow users to easily express multi-step computations, for example retrieve data from a database, reformat the data, and run an analysis. Once an application is formalized as a workflow the Pegasus Workflow Management Service can map it onto available compute resources and execute the steps in appropriate order.

http://pegasus.isi.edu/; [S13]

Taverna


Taverna is an open source family of tools for designing and executing workflows, created by the myGrid project. Written in Java, the family consists of the Taverna Engine (the workhorse), and the Taverna Workbench (desktop client) and Taverna Server (remote workflow execution server) that sit on top of the Engine. Taverna allows for the automation of experimental methods through the use of a number of different services (such as Web services) from a very diverse set of domains – from biology, chemistry and medicine to music, meteorology and social sciences.

http://www.taverna.org.uk; [S14]

VisTrails

VisTrails is an open-source scientific workflow management system that provides support for data exploration and visualization. A key distinguishing feature of VisTrails is its comprehensive provenance infrastructure that maintains detailed history information about the steps followed in the course of an exploratory task. VisTrails leverages this information to provide novel operations and user interfaces that streamline this process.

www.vistrails.org; [S15]



Supplementary references

S3 Kernighan, B.W and Ritchie, D.M. (1988). The C Programming Language (2nd ed.). Englewood Cliffs, NJ: Prentice Hall.

S4 Stroustrup, B. (1997) The C++ Programming Language. Addison-Wesley Professional.

S5 Adams, J.C. et al. (2009) The Fortran 2003 Handbook (1st ed.). Springer.

S6 Schwartz, R.L. et al. (2011) Learning Perl. O’Reilly Media.

S7 Lutz, M. (2009) Learning Python. O'Reilly Media.

S8 Hanselman, D. and Littlefield, B. (2004) Mastering MATLAB 7. Prentice Hall.

S9 Crawley, M.J. (2007) The R Book. Wiley.

S10 Elliott, A.C. and Woodward, W.A. (2010) SAS Essentials: A Guide to Mastering SAS for Research. John Wiley and Sons, Inc.

S11 Ludäscher B. et al. (2006) Scientific Workflow Management and the Kepler System. Special Issue: Workflow in Grid Systems. Concurrency and Computation: Practice & Experience 18, 1039-1065.

S12 De Roure, D. et al. (2009) The design and realisation of the myExperiment virtual research environment for social sharing of workflows. Future Generation Computer Systems 25, 561-567.

S13 Deelman, E. et al. (2005) Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Scientific Programming Journal 13, 219-237.



S14 Hull, K. et al. (2006) Taverna: a tool for building and running workflows of services. Nucleic Acids Research 34, 729-732.

S15 Silva, C.T. et al. (2007) Provenance for Visualizations: Reproducibility and Beyond, Computing in Science & Engineering 9, 82-90.


The database is protected by copyright ©ininet.org 2016
send message

    Main page