Business Data Lake Conceptual Framework

Download 493.56 Kb.

Page	5/12
Date	09.06.2018
Size	493.56 Kb.
	#54018

1 2 3 4 5 6 7 8 9 ... 12

Metadata
Event
Stream

Data

Data is one of the main concepts populating the BDL. It can be any kind of data, either structured, semi-structured or non-structured²³. It especially does not have to be structured data from an existing database. It can be for instance binary raw data from sensors, images and videos from cameras, tweets, documents and files that can be stored on a digital media.

Metadata

Metadata, as previously defined, is data describing “actual” data items. It represents key inputs for Information Governance, especially data quality, data confidentiality, data classification and discovery. The source, target, date of ingestion, the filesize and the tags attached to a photo or video are examples of metadata. Meta-data can also be conceived as the "container" that describes the data it encapsulates.

Often the Big Data Lake can only process the meta-data, in accordance with the privacy and security legislation of the country of data origin or where data is processed. More detailed analysis of the actual data may require consent and / or a judicial warrant.

Event

An event is a specific structured data that has a date & time of occurrence. An event can contain additional data pieces, especially semi- or non-structured data. Logs are an example of semi-structured data.

Following a principle of monitoring the usage of the data lake with the data lake capabilities, User Actions are also a specific sub-type of Event.

Stream

A Stream represents a flow or succession of ordered Events.

Insight

Insights are data pieces that typically represent the added value of the business data lake. They are produced by successive Distillation Steps executing Analytics.

Real-tTime insights are particular insights that are produced with a very low latency by rReal-tTime analyses consuming Events or Streams of data augmented by data stored in the BDL.

Quite obviously there are issues or at least trade-offs to make to execute complex, heavy analyses in rReal-tTime. The BDL favors two analysis streams:

batch analyses can consume very large datasets but can potentially take time (hours); and
real-time analysies can deliver insights very quickly (sub-second latency) but they can’t leverage all kinds of Analytics.

The use of real time analysis adds additional availability requirements on to the BDL since it must be operational all of the time events/streams need to be processed. Batch-based processing can catch up is processing if it has short periods of down time. This is an important consideration for an environment that is also acting as a discovery “play ground” and well as a production environment for business –critical function.

Ingestion-related concepts

Batch Ingestion

Batch Ingestion is the most common way of acquiring data within the BDL, meaning creating new dData sets. It consists in acquiring a large number of data items that were previously existing elsewhere in the IT landscape. Loading (in a few hours) 30 years of customers orders is an example of Batch Ingestion The BDL is designed to be able to execute multiple sustained batch ingestions at the same type.

Real-Time Ingestion

Real-Time Ingestion is dedicated to processing Streams orof Events, which are structured and generally small data. The BDL is designed to be able to execute multiple sustained Real-Time ingestions at high velocity (thousands of values/events per second).

Micro-Batch Ingestion

Micro-Batch Ingestion implements a “bridge” between Real-Time and Batch analyses. It turns sStreams of eEvents into dData sets that can be analyzed as historical data for very long timeframes.

Metadata generation

The ingestion of metadata can be done in multiple ways, depending on the nature of data but also on its automation or not. The simple way is to automatically extract metadata from data, and create metadata at the same time data is ingested. In some cases, the metadata extraction consists in several processing steps, some of them being performed asynchronously to the ingestion of data itself, following a metadata enrichment process.

Metadata enrichment can be considered as a Distillation Step.

Processing-related concepts
1. Directory: bigdata
  bigdata -> Use Cases from nbd(nist big Data) Requirements wg
  bigdata -> Use Cases from nbd(nist big Data) Requirements wg 0
  bigdata -> Nist big Data Public Working Group (nbd-pwg) nbd-pwd-2015/6a,DW. abbreviated rr (M0444) Source: nbd-pwg status: Draft Title: Big Data Use Case #6 Implementation, using nbdra author: Afzal Godil
  bigdata -> Nist special Publication 1500-4 draft: nist big Data Interoperability Framework: Volume 4, Security and Privacy
  
  Download 493.56 Kb.
  
  Share with your friends:

1 2 3 4 5 6 7 8 9 ... 12