This guide is designed to aid service programmers as well as physics data processing application designers.
Chapter 2 describes technical and implementation details of the ClaRA framework.
Chapters 2, 3 and 4 are for developers that are already familiar with basic concepts of object-oriented programming, and have experience in programming in one of the high level programming languages, such as Java, C++ or Python.
Chapters 3 and 4 provide service programmers with concrete ClaRA service examples.
If you are experienced programmer and are familiar with SOA, then chapter 3 and paragraph titled “Clas12 PDP engine deployment and testing” from the Chapter 5 will be enough to start developing ClaRA services.
Physics data processing application designers not interested in service programming and framework implementation details can skip chapters 2, 3 and 4. However, the interface definition description in the chapter 2 will be very helpful for designing physics data processing applications.
I would like to express my gratitude to those who helped me through this project; to all those who provided support, offered comments, and assisted in the editing, proofreading and design this manual.
I would like to extend special thanks to Carl Timmer for his invaluable support during the development of this project, as well as for proofreading and editing this manual.
I would like to thank students Sebouh Paul and Sebastian Mancille, as well as a research scientist Gagik Gavalian (ODU) for their implementation efforts. Their comments and suggestions helped in shaping the ClaRA framework. Without their efforts ClaRA would have remained only a theoretical concept.
Finally, I would like to thank the entire Hall-B off-line group for continuing support of the ClaRA project. It has been a pleasure collaborating with Dave Heddle (CNU), Jerry Gilfoyle (URich), Dennis Weygand, Veronique Ziegler, Mac Mestayer, Johann Goetz, Yelena Prok, Maurizio Ungaro, and others.
Data Processing Environment (DPE): A Java or C++ process that provides a complete, run-time environment for deployment, execution and management of ClaRA services.
Deploying a Service: The process of dynamically loading the shared object of a service engine and presenting it as a ClaRA service inside of a data processing cloud.
Linking Services: Each ClaRA service has an input and an output. Typically the services are "linked" together into a chain to design a specific application starting from the "event source" (e.g. reading from a file) and ending at some "event sink" (e.g. writing to a file).
Orchestrator: A ClaRA based program designed to coordinate service/services execution. This process usually runs outside of the DPE.
Service Container: ClaRA service naming convention, used to logically group services.
Service Engine: A compiled class (shared object, i.e. class file in java and .so file in C++) that implements the ClaRA interface and is designed to provide certain data processing functionality.
SOA: Service oriented architecture
Transient data envelope (or communication envelope): ClaRA message structure that contains data (required for service execution), as well as communication and service operational details.
Physics Data Processing: A Contemporary Approach
Data processing requirements
Modern high energy and nuclear physics experiments require significant computing power to keep up with large data volumes. To achieve quality data analysis, intellectual input from diverse groups within a large collaboration must be brought together. Such analysis in a collaborative environment has historically involved a computing model based on self-contained, monolithic software applications running in batch-processing mode. This model, if not organized properly, can be inefficient in terms of deployment, maintenance, response to errors, update propagation, scalability and fault-tolerance. CLAS offline group have experienced such problems during the fifteen years of operation of the CLAS on-line and off-line software. Even though these challenges are common to all Physics Data Processing (PDP) applications, the small size of the CLAS offline group magnified their effect.
Experimental configurations have become more complex and compute capacity has expanded at a rate consistent with Moore’s Law. As a consequence, these compute applications have become much more complex, with significant interaction between diverse program components. This has led to computing systems so complex and intertwined that the programs have become difficult to maintain and extend.
In large collaborations it is difficult to enforce policies on computer hardware. For example, groups use whatever computing resources they have at their home institutions, which evolve as new hardware is added. Additional software and organizational effort must be put in place to update and rebuild software applications for new hardware or OS environments.
In order to improve productivity, it is essential to provide location-independent access to data, as well as flexibility of design, operation, maintenance and extension of physics data processing applications. These applications have a very long lifetime, and the ability to upgrade technologies is therefore essential. They must be organized in a way that easily permits the discarding of aged components and the inclusion of new ones without having to redesign entire software packages at each change. The addition of new modules and removal of unsatisfactory ones is a natural process of the evolution of applications over time. Experience shows that software evolution and diversification is important and results in more efficient and robust applications.
New generations of young physicists doing data analysis may or may not have the programming skills required for extending/modifying applications that were written using older technologies. For example, JAVA is the main educational programming language in many universities today, but most of the data production software applications are written in C++ and some even in FORTRAN.
The offline software of the CLAS12 project aims at providing tools to the collaboration that allows design, simulation, and data analysis to proceed in an efficient, repeatable, and understandable way. The process should be designed to minimize errors and to allow crosschecks of results. As much as possible, software engineering related details should be hidden from collaborators, allowing them to concentrate on the physics.