Table of Contents Executive Summary 3



Download 337.55 Kb.
Page8/20
Date08.01.2017
Size337.55 Kb.
#7901
1   ...   4   5   6   7   8   9   10   11   ...   20

2.8MINOS


Over the last three years, computing for MINOS data analysis has greatly expanded to use more of the OSG resources available at Fermilab. The scale of computing has increased from about 50 traditional batch slots to typical user jobs running on over 1,000 cores, with a strong desire to expand to about 5,000 cores (over the past 12 months they have used 3.1M hours on OSG from 1.16M submitted jobs). This computing resource, combined with 90 TB of dedicated BlueArc (NFS mounted) file storage, has allowed MINOS to move ahead with traditional and advanced analysis techniques, such as Neural Network, Nearest Neighbor, and Event Library methods. These computing resources are critical as the experiment moves beyond the early, somewhat simpler Charged Current physics, to more challenging Neutral Current, +e and other analyses which push the limits of the detector. We use a few hundred cores of offsite computing at collaborating universities for occasional Monte Carlo generation. MINOS is also starting to use TeraGrid resources at TACC, hoping to greatly speed up their latest processing pass.

2.9Astrophysics


The Dark Energy Survey (DES) used approximately 40,000 hours of OSG resources in 2009, with DES simulation activities ramping up in the latter part of the year. The most recent DES simulation run produced 3.5 Terabytes of simulated imaging data, which were used for testing the DES data management data processing pipelines as part of DES Data Challenge 5. These simulations consisted of 2,600 mock science images, covering 150 square degrees of the sky, along with another 900 calibration images. Each 1-GB-sized DES image is produced by a single job on OSG and simulates the stars and galaxies on the sky covered in a single 3-square-degree pointing of the DES camera. The processed simulated data are also being actively used by the DES science working groups for development and testing of their science analysis codes. DES expects to roughly triple its usage of OSG resources over the following 12 months, as we work to produce a larger simulations data set for DES Data Challenge 6 in 2010.

2.10Structural Biology


Biomedical computing

In 2009 we established NEBioGrid VO to support biomedical research on OSG (nebiogrid.org), and to integrate the regional biomedical resources in the New England area. NEBioGrid has deployed the OSG MatchMaker service that was provided by RENCI. The Matchmaker provides automatic resource selection and scheduling. When combined with Condor's DAGMan, job submission, remote assignment, and resubmission (in the event of a failure it can be automated with minimal user intervention required). This will allow NEBioGrid to streamline utilization of a wide range of OSG resources for its researchers.

A researcher at Harvard Medical School expressed interest in utilizing OSG resources for his project; within a week's time NEBioGrid had an initial batch of jobs running on OSG sites. They have continued to refine submission scripts and joband job work-flow to improve the success rate and efficiency of the researcher's continuing computations. A framework has also been established for a researcher at Massachusetts General Hospital, which will allow him to utilize OSG to access computational resources far surpassing what was available to him previously. Work is also in progress with researchers at Northeastern University and Children's Hospital who have expressed a need and desire to utilize OSG.

NEBioGrid collaborated with the Harvard Medical School West Quad Computing Group to help set up a new cluster configured as an OSG CE. This work was completed within a couple weeks to bring their CE into operational status and receive remote jobs. A cluster utilizing Infiniband for MPI calculations is currently being installed in collaboration with Children's Hospital and the Immune Disease Institute. This cluster will be used for long-running molecular dynamics simulations, and will be made available as an OSG CE with MPI support in 2010. Progress is under way with another HMS IT group to integrate their cluster into OSG.

NEBioGrid holds regular seminars and workshops on HPC and grid computing related subjects. Some talks are organized in collaboration with the Harvard University Faculty of Arts and Sciences. In April the topics were the LSF batch management system, and Harvard's Odyssey cluster, which contributes to the ATLAS experiment. In July Ian Stokes-Rees spoke about collaborative web portal interfaces to HPC resources. In August Johan Montagnat spoke on the topic of large scalelarge-scale medical image analysis using the European grid infrastructure. In October a full-day workshop was held in collaboration with Northeastern University; over a dozen speakers presented on the topic of Biomedical computing on GPUs. In December Theresa Kaltz spoke about Infiniband and other High-Speed Interconnects.

Structural biology computing

There were two major developments in 2009. First, SBGrid has deployed a range of applications onto over a dozen OSG sites. The primary applications that have been utilized are two molecular replacement programs: Molrep and Phaser. These programs are widely used by structural biologists to identify the 3-D structure of proteins by comparing imaging data from the unknown structure to that of known protein fragments. Typically a data set for an unknown structure is compared to a single set of protein coordinates. SBGrid has developed a technique to do this analysis with 100,000 fragments, requiring between 2000 and 15,000 hours for a single structure study, depending on the exact application and configuration parameters. Our early analysis with Molrep indicated that signals produced by models with very weak sequence identity areis, in many cases, too weak to produce meaningful ranking. The study was repeated utilizing Phaser - a maximum likelihood application that requires significant computing resources (searches with individual coordinates take between 2-10 minutes for crystals with a single molecule in an asymmetric unit, and longer for molecules with many copies of the same molecule). With Phaser a significant improvement in sensitivity of global molecular replacement was achieved and we have recently identified several very encouraging cases. For example structure of NADH:FMN oxidoreductase (PDB code: 2VZF, (Error: Reference source not foundB gray), originally solved by experimental phasing method), could be phased with coordinates of SCOP model 1zwkb1 (Error: Reference source not foundB teal). The SCOP model was one of four structures that formed a distinct cluster from the rest of SCOP models, and yet has a very weak sequence identity (<15%). In the testing phase of the project between Sept-Nov 2009 SBGrid has ramped up its utilization of OSG, running 42,000 jobs consuming over 200,000 CPU hours. The portal and our new method have been recently tested by users from Yale University and UCSF, and will soon become available to all members of the structural biology community.



sbgrid-doe-1.png

Figure : Global Molecular replacement. A - after searching with 100,000 SCOP domains four models form a distinct cluster (highlighted). B - one of the SCOP domain in the cluster (teal) superimposes well with 2VZF coordinates (grey), although sequence identity between two structures is minimal. C - SBGrid molecular replacement portal deploys computations to OSG resources. Typical runtime for an individual search with a single SCOP domain is 10 minutes.

In the second major development, in collaboration with Harrison laboratory at Harvard Medical School, we deployed on the grid a method to build and refine a three dimensional (3D) protein structure primarily based on orientational constraints obtained from nuclear magnetic resonance (NMR) data. We measured three sets of orthogonal residual dipolar couplings (rdc) constraints corresponding to protein backbone atom pairs (N-NH, N-CO and CO-CA vectors). In our protocol, the experiential data is first divided into small, sequential rdc sets. Each set is used to scan a database of known backbone fragments. We generated a database of 5,7,9,12,15,20 and 25 residues fragments by processing a selected group of high quality x-ray and NMR structures deposited in the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB). For each protein fragment in the database, we score our experimental data using the Prediction of Alignment from Structure procedure (PALES). The top scoring fragments define the local protein conformation; the global protein structure is refined in a second step using xplor. Each PALES comparison requires the prediction of an alignment tensor from a known 3D coordinate file (each fragment) and the statistical analysis of the fit to the experimental dataset. Every PALES analysis takes about 50 sec to complete on a current state-of-the-art single core processor. A typical fragment database consists of ~300,000 fragments and our experiential data (obtained on a 32 KD membrane protein) is divided into ~300 sequential rdc sets. This requires ~ 109 comparisons or ~52 days for completion on a single core computer. The scoring step can be easily divided in to ~1000 independent jobs, reducing the execution time to only a few hours per database. We have used the SBGrid computing infrastructure to develop our protocol and to efficiently process our data. We are currently refining the 3D protein structure.




Download 337.55 Kb.

Share with your friends:
1   ...   4   5   6   7   8   9   10   11   ...   20




The database is protected by copyright ©ininet.org 2024
send message

    Main page