Business Data Lake Conceptual Framework



Download 493.56 Kb.
Page3/12
Date09.06.2018
Size493.56 Kb.
#54018
1   2   3   4   5   6   7   8   9   ...   12

Knowledge13


There is a large variance between the definitions of knowledge in many different domains, but knowledge acquisition, representation, and based systems have been in use for many decades.14

Knowledge can be characterized as being either:



  • Tacit – in the head of the knower and represents a combination of formal, informal, and experiential learning. It is often acquired over time and most often not documented. With the current demographics, the need to capture as much of this knowledge as possible is paramount to continue existing levels of service in industry and government with fewer human resources.

  • Explicit – tacit knowledge that has been acquired and documented in the form of information that may be human and/or machine-understandable.

Explicit knowledge is represented in an encoding system. The following definitions are international standards:

  • Information – knowledge concerning objects, such as facts, events, things, processes, or ideas, including concepts, that within a certain context has a particular meaning15 Information can also be conceived as a Viewpoint that integrates data relating to a stakeholder concern and presents it in a way that is meaningful to that enterprise stakeholder.

  • Data – a reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing16 Alternatively data is the representation of facts as text, numbers, graphics, images, sound or video. Technically data is the plural form of the Latin word "datum" meaning "a fact". 17
    1. Master Data Management (MDM)18


Organizations often have hundreds of applications each with their own database containing high amounts of conflicting data particularly relating to key entities around people, organizations, locations, and assets. Master Data Management (MDM) systems are designed to harmonize and synchronize this type of data across the applications. They typically maintain a read-only copy of the harmonized data whilst the applications retain system of record responsibility. However in advanced deployments, the master data management system is the system of record for this data and all updates are made in the MDM system before being synchronized with the applications.

In either case the MDM system an authoritative set of values (e.g., client details) that provide key reference data to the business data lake for enriching, validating and correlating data from multiple sources. The following definitions are used in this document:



  • Master data is the data about people, organizations, assets, locations and contracts that provide the context for transaction (or operational) data. It includes the details (definitions and identifiers) of internal and external objects involved in business transactions. Examples of master data include data about customers, products, employees, locations, vendors, and controlled domains (code values).19

  • Master Data Management is the end-to-end control over master data values to enable consistent, shared, contextual use across systems, of the most accurate, timely, and relevant version of truth about essential business entities. (DMBOK).

  • Master data management system is a specialized data management server that provides services to consolidate, harmonize and synchronize master data
    1. Metadata


Metadata is defined in many ways, but the following are the most useful:

  • “Information pertaining to the information for purposes of description, administration, legal requirements, technical functionality, use and usage, and preservation”20

  • “Structured data about data used to aid the identification, description, location, or use of information resources” 21

Metadata is highly relevant to the business data lake since it provides the descriptions of the big data, such as where it came from, its terms and conditions of use, structure, currency and frequency of update and other characteristics necessary to enable the users of the business data lake locate and select the data they need for their work. This becomes critical as the business data lake grows in importance to the organization and has thousands of data sets with similar data in it.

Metadata also becomes key to automating the management of the BDL since it classifies the data into groups that need certain types of management actions.


    1. Open Platform 3.0


The Objective of the Open Platform 3.0 is to enable agile, secure, reliable, interoperable, and manageable multiple-technology solutions within and across enterprises.

It assembles a set of common platform services that support the integration and interoperability of cloud computing, mobile computing, social computing, big data analytics, and the Internet of Things (IoT) computing paradigms, technologies, infrastructures, and applications across enterprises.


    1. Platform


As defined by The Open Group in TOGAF: “A combination of technology infrastructure products and components that provides that prerequisites to host application software.”

The Business Data Lake is a Platform, for which the hosted applications are Analytics capabilities that generate Insights.

In the context of the Open Platform 3.0, the Business Data Lake can be extended with applications that integrate Analytics (together with mobile, social and other technologies) to address more “vertical” needs.

    1. Real-Time, Near Real-Time and Interactive response time


A Real-Time response time is characterized by a very low (usually a couple of seconds) latency between the event occurrence and insight generation.

A Near Real-Time response time is characterized by a slightly higher latency than Real Time, usually within few minutes of the event occurrence.

An interactive response time is acceptable by an end-user to wait for. Depending on the context, it ranges from a few seconds (web browsing) to a few minutes (for “heavy” processing). If the user needs to do something else after his/her request (even take a coffee break), it’s batch.

    1. Structured Data, Semi-Structured Data, Unstructured Data22


Structured Data describes data that can be both searched and processed by machine. This data typically resides in databases, including spreadsheets, and is used for both transaction and analytical processing. The latter normally conforms to an enterprise-wide semantics and syntax that transcends specific business processes and functions and includes the representation of knowledge (e.g., decision trees and rule bases). This “data model” or structure is defined prior to the ingestion of associated data.

Semi-Structured Data refers to Human-readable and machine-readable (e.g., able to be accessed by a search engine) data. Most electronic documents (e.g., automated office files, web pages) are in this category.

Unstructured Data refers to data that is only human-intelligible and cannot be either searched or processed by IT. Often this information in stored in non-electronic media such as a book, micro-fiche, or electronically in an imaged file that is not able to be searched or processed (even though the latter is becoming rare as image processing is becoming exceedingly sophisticated).

  1. Overview of the BDL


This Chapter provides a short description of how the Business Data Lake works, and what can it be used for.
    1. Business Data Lake Definition


The Business Data Lake (BDL) represents a new approach to the creation of analytical insights for the business, from the acceleration of traditional enterprise reporting through to new analytics driven by data science. It works with high volumes of all kinds of data (structured and unstructured), storing them at low cost and making insights rapidly available throughout the enterprise. It can coexist with earlier investments, accelerating the evolution of the information landscape.
    1. How does the BDL work?


The Figure 2 presents the key concepts and capabilities of the Business Data Lake that enable to describe how it works:

The BDL ingests data (batch as well as real-time) from other applications and data sources. All data items (structured and unstructured) go straight away into the main distributed – thus scalable – data store. The BDL stores data that can be “at-rest” or “in-motion”.

From all the ingested data, the BDL can generate Insights. These Insights are leveraged into Actions in multiple ways that make sense from a business point of view. A service layer takes care of delivering Insights at different Points of Action into business processes, applications, etc.

To turn Data into Insights the BDL integrates two data processing capabilities. On the one hand, it integrates a Real-Time Processing capability that creates Real-Time Insights. On the other hand, it relies on orchestrating iteratively-designed Distillation Steps that progressively enrich, combine, or execute Analytics with existing data to create new, “more valuable” data until it’s considered a business-relevant Insight.



The key roles involved are:

  • The Business Data Lake Platform Owner (and operator) provides (and operate) the BDL Platform Services

  • The Business Use Case Owners are responsible for the added business value of Insights and Actions.

  • The Data Owners are responsible for defining and applying the proper Data Policy, through the Unified Data Management services of the BDL.

  • The Business Use Case Contributors (eg: “Data Scientists”) are responsible for discovering, experimenting and validating new processing capabilities (Analytics) that are:

  • Relevant for the Business Use Case

  • Innovative, smart, efficient, like a positive “hack”

  • Consistent with the mathematical and statistical state-of-the-art

Figure 2 - Overview of How the BDL works




    1. Download 493.56 Kb.

      Share with your friends:
1   2   3   4   5   6   7   8   9   ...   12




The database is protected by copyright ©ininet.org 2024
send message

    Main page