Business Data Lake Conceptual Framework


Introduction Objective



Download 493.56 Kb.
Page2/12
Date09.06.2018
Size493.56 Kb.
#54018
1   2   3   4   5   6   7   8   9   ...   12

Introduction

  1. Objective


This document provides Key Concepts for the Business Data Lake, as a first step towards a Reference Architecture. By describing a set of architectural patterns, principles and other reusable artifacts and guidance, it intends to help organizations leveraging new disruptive “Big Data” solutions and setting up an associated “data-centric” strategy for an increased performance and competitiveness.
    1. Overview


The Business Data Lake is a particularly relevant solution for the Big Data Analytics services attached to the Open Platform 3.0.

The Business Data Lake Conceptual Framework is described at the Enterprise Level. It means it provides both Technological and Business, organizational content. The new, disruptive technology that has emerged from the digital transformation of the Internet Giants can benefit to almost every enterprise (and ecosystem), but it also comes with a new, specific mindset that has to be addressed at the Enterprise level.

The content of the Business Data Lake Conceptual Framework has been selected to be relevant for any industry. It intentionally does not integrate sector-specific constraints or principle. Thus, obviously, it does not help regarding specific digital strategy objectives. The Business Data Lake will help you getting Insights from all kinds of data. It will not tell which new digital services you have to build.

    1. Linkage to Other Open Group Standards

      1. Linkage to TOGAF


The TOGAF Information Architecture Guide is a generic way of developing an Enterprise Information Architecture. The Business Data Lake is an instantiation of parts of an Enterprise Information Architecture designed to handle Big Data in real time and provide analytics for enterprise use.

The Business Data Lake standard implements parts of the generic Information Sharing Environment (ISE) concept elaborated on in the TOGAF Information Architecture.


      1. Linkage to Archimate


The Business Data Lake concepts can be represented using the Archimate modeling conventions and meta-model.
      1. Linkage to IT4IT


The Business Data Lake standard can be used as part of a solution to create an IT4IT implementation.
      1. Linkage to O-DEF


The Open Data Element Framework semantic interoperability concepts can be used in a Business Data Lake implementation.
    1. Conformance


For the purposes of this standard, no conformance requirements apply.
    1. Terminology


For the purposes of the Business Data Lake preliminary standard, the following terminology definitions apply:

Can Describes a possible feature or behavior available to the user or application.

May Describes a feature or behavior that is optional. To avoid ambiguity, the opposite of “may” is expressed as “need not”, instead of “may not”.

Shall Describes a feature or behavior that is a requirement. To avoid ambiguity, do not use “must” as an alternative to “shall”.

Shall not Describes a feature or behavior that is an absolute prohibition.

Should Describes a feature or behavior that is recommended but not required.

Will Same meaning as “shall”; “shall” is the preferred term.

    1. Future Directions


The Technology that powers the Business Data Lake is evolving very rapidly (counting in months, not years). For instance, the Hadoop platform has gone through a major shift when introducing the capability to integrate multiple processing engines and not only MapReduce.

Thus the Business Data Lake Conceptual Framework may be updated to reflect this kind of major change or opportunity. At the time of its first release (November 2015) it is planned that the conceptual framework described in this document is to be completed (notably by Architecture Principles) in the near future.

Nevertheless, as an Enterprise-level Platform concept, the Business Data Lake is designed to be stable enough for organization to embrace change. In a volatile world, utmost solid architectures are required. The Business Data Lake is one of them.

  1. Definitions


This Chapter gathers definitions that are connected and/or relevant for the Business Data Lake.

Definitions for the core Business Data Lake concepts are provided in Chapter 4.

For the purposes of this standard, the following terms and definitions apply. Merriam-Webster's Collegiate Dictionary should be referenced for terms not defined in this section.

    1. Analytics


Analytics facilitates realization of business objectives through reporting of data to analyze trends, creating predictive models for forecasting and optimizing business processes for enhanced performance1.

Analytics could be defined as "The extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions.2"


    1. Batch, Micro Batch


Batch processing is executed taking as input large datasets or a large group of events that are coming in a package – usually every hour or daily or monthly.

Micro Batch processing is executed taking as input a group of events as they come in compact package frequently – usually every few seconds or few minutes.


    1. Big Data3


The term "Big Data" refers to the large amounts of data available to enterprises for use, in particular for analytics. The data is characterized 4by:

  • Volume - referring to the sheer amount of data available (e.g. Walmart collects 2.5 petabytes from its customers every hour in 20125);

  • Velocity - the speed of data creation is accelerating (e.g. in 2012 2.5 exabytes of information are created every day with the amount doubling every 40 months6)

  • Variety - the sources and types of data are varied (unstructured, semi-structured and structured) be it images, digital phone messages or the like.

From a more technical perspective7, the Big Data phenomenon is a function of the integrated availability of information from traditional information systems, system control and data acquisition (SCADA) systems (e.g. electrical grid), world-wide-web, log data and social media. This capability has been made possible by the:

    • advances of technology in the past 5-10 years where improvements in computing power and storage have made the processing and integration of the data feasible8;

    • a shift in mindset about how data could be used"9 ; and

    • the conversion of analog systems (e.g. telephones) into digital (Voice over Internet Protocol [VOIP] ones creating new information assets that have to be managed.

    • The use of digital platforms (such as mobile phones) for an increasing range of activities.

    • The use of Internet and cloud based services for collaboration and sharing of all types of information.

    • Open data movements in many countries.



  • Figure 1 - Big Data - An Architecture Perspective

  • From a business perspective Big Data refers to getting used to handling large amounts of data that is "messy" (i.e. varying degrees of quality) and "giving up our quest to discover the cause of things, in return for accepting correlations"10 (i.e. focus on correlations rather than causation).

  • This integration opens up critical infrastructure protection challenges derived from the ability to access SCADA systems through new channels such as social media11.

  • Certain industry/government verticals organizations, notably power generation and defence/defense, have long coped with many aspects of Big Data, but the implications are that organizations are moving on from an emphasis on transaction processing to more decision support/analytics.

  • In TOGAF the question of Big Data will be addressed implicitly through its consideration in all of the IM Functions to be implemented. Big Data will be considered a normal way of doing business.



    1. Ecosystem


An Ecosystem is a set of Enterprises that collaborate in an open, agile way pursuing business goals that are consistent to every Enterprise.

At a certain level, one of the objectives of the Open Platform 3.0, the Business Data Lake and more generally Platforms is to foster the creation and the development of such ecosystems.


    1. Enterprise Data Warehouse (EDW)


An enterprise data warehouse is a storage architecture designed to hold data extracted from transaction systems, operational data stores and external sources. The warehouse then combines that data in an consistent data representation that can then be further aggregated, and summarised infor different formats suitable for enterprise-wide data analysis and reporting for predefined business needs.

The five components of a data warehouse are:



  • production data sources

  • data extraction and conversion

  • the data warehouse database management system

  • metadata management and governance (standards, quality, lifecycle, protection)

  • data warehouse administration

  • business intelligence (BI) tools

An enterprise data warehouse contains data arranged into abstracted subject areas with time-variant versions of the same records, with an appropriate level of data grain or detail to make it useful across two or more different types of analyses most often deployed with tendencies to third normal form. A data mart contains similarly time-variant and subject-oriented data, but with relationships implying dimensional use of data wherein facts are distinctly separate from dimension data, thus making them more appropriate for single categories of analysis.12 The data mart can be thought of as a materialized (stored) view of a subset of the data warehouse.




    1. Download 493.56 Kb.

      Share with your friends:
1   2   3   4   5   6   7   8   9   ...   12




The database is protected by copyright ©ininet.org 2024
send message

    Main page