Acdc tools Specifications and Select acdc-wp2 1 abstract itea2 #09008 Deliverable 1 abstract



Download 232.79 Kb.
Page6/8
Date05.01.2017
Size232.79 Kb.
#7140
1   2   3   4   5   6   7   8

1.4.2Cloud Standardization


Many organizations should play a role in Cloud standards definition, the most significants are :

  • IETF (Internet Engineering Task Force) which works on Internet related concerns including the issue of virtual networks and of security.

  • DMTF (Distributed Management Task Force) which has already worked on virtual machines standardization and who is legitimate on management aspects.

  • SNIA (Storage Networking Industry Association) which is responsible for all matters relating to the representation of storage in the cloud.

  • IEEE (The Institute of Electrical and Electronics Engineers) which has an established role in the standardization of SOA (Service Oriented Architecture) to be an important component for the Cloud.

  • OMG (Object Management Group) which is involved in the standardization of Web Services

  • W3C (World Wide Web Consortium) which works on standards related to Web

  • ETSI (European Telecommunications Standards Institute) which is involved in the field of IaaS and the access to these IaaS resources.

  • Open Grid Forum which is working on the provisioning and monitoring of distributed resources and infrastructure services.

  • NIST (National Institute of Standards and Technology) which has developed the most commonly used definition of Cloud Computing

In addition to these organizations, we can include initiatives wishing to promote the emergence of open standards:

  • Open Cloud Manifesto which promotes interoperability between Cloud Computing Solutions

  • Cloud Security Alliance which contributes to a better consideration of safety issues in the Cloud

  • Free Cloud Alliance which aims to promote open solutions.

The ACDC project will work synchronizing with the progress that can be made ​​by those organizations.

1.4.3General Cloud Tools

1.4.3.1Apache Tomcat(web server):


Apache Tomcat (or Jakarta Tomcat or simply Tomcat) is an open source servlet container developed by the Apache Software Foundation (ASF). Tomcat implements the Java Servlet and the JavaServer Pages (JSP) specifications from Sun Microsystems, and provides a "pure Java" HTTP web server environment for Java code to run.

1.4.3.2Apache CFX (web service):


Apache CXF is an open-source, fully featured Web Services framework. It originated as the combination of two open-source projects: Celtix developed by IONA Technologies (acquired by Progress Software in 2008) and XFire developed by a team hosted at Codehaus. These two projects were combined by people working together at the Apache Software Foundation. The name CXF derives from combining the "Celtix" and "XFire" project names.

1.4.3.3Apache Whirr


Apache Whirr is a set of libraries for running cloud services. Whirr is currently in the Apache Incubator. Whirr provides;

  1. A cloud-neutral way to run services. You don't have to worry about the idiosyncrasies of each provider.

  2. A common service API. The details of provisioning are particular to the service.

  3. Smart defaults for services. You can get a properly configured system running quickly, while still being able to override settings as needed.


1.4.3.4Apache Pig


Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.

At the present time, Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations already exist (e.g., the Hadoop subproject). Pig's language layer currently consists of a textual language called Pig Latin, which has the following key properties:



  • Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, making them easy to write, understand, and maintain.

  • Optimization opportunities. The way in which tasks are encoded permits the system to optimize their execution automatically, allowing the user to focus on semantics rather than efficiency.

  • Extensibility. Users can create their own functions to do special-purpose processing.

1.4.3.5Apache Web Server


The Apache HTTP Server Project is a collaborative software development effort aimed at creating a robust, commercial-grade, featureful, and freely-available source code implementation of an HTTP (Web) server. The project is jointly managed by a group of volunteers located around the world, using the Internet and the Web to communicate, plan, and develop the server and its related documentation. This project is part of the Apache Software Foundation. In addition, hundreds of users have contributed ideas, code, and documentation to the project.

The Apache HTTP Server Project is an effort to develop and maintain an open-source HTTP server for modern operating systems including UNIX and Windows NT. The goal of this project is to provide a secure, efficient and extensible server that provides HTTP services in sync with the current HTTP standards.


1.4.4Distributed processing

1.4.4.1Apache™ Hadoop™


The Apache™ Hadoop™ software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

The project includes these subprojects:

* Hadoop Common is a set of utilities including FileSystem, RPC, and serialization libraries.

* Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets.

* Hadoop MapReduce is a software framework for easily writing MapReduce applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.

A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

For more information visit http://hadoop.apache.org

1.4.4.2Apache Mahout


Mahout's goal is to build scalable machine learning libraries. With scalable they mean:

Scalable to reasonably large data sets. Their core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm. However they do not restrict contributions to Hadoop based implementations: Contributions that run on a single node or on a non-Hadoop cluster are welcome as well. The core libraries are highly optimized to allow for good performance also for non-distributed algorithms.

Scalable to support various business cases. Mahout is distributed under a commercially friendly Apache Software license.

Scalable community. The goal of Mahout is to build a vibrant, responsive, diverse community to facilitate discussions not only on the project itself but also on potential use cases. Come to the mailing lists to find out more.

Currently Mahout supports mainly four use cases: Recommendation mining takes users' behavior and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from exisiting categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. Frequent itemset mining takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.

http://mahout.apache.org/




Download 232.79 Kb.

Share with your friends:
1   2   3   4   5   6   7   8




The database is protected by copyright ©ininet.org 2024
send message

    Main page