Acdc tools Specifications and Select acdc-wp2 1 abstract itea2 #09008 Deliverable 1 abstract



Download 232.79 Kb.
Page3/8
Date05.01.2017
Size232.79 Kb.
#7140
1   2   3   4   5   6   7   8

Executive summary


This document is an abstract of the deliverable D2.1 “Tools Specifications and Selection” associated to the Work Package 2 Cloud Infrastructure of the ACDC project, labelled ITEA2 #09008. This abstract contains a deep State of The Art.

As stated in the FPP :



  • objective of WP2 is to research and develop a new content delivery infrastructure based on new processing and storage technologies also named “cloud computing”.




  • objective of this WP2 deliverable is to specify a selection of tools allowing design independent from the hardware platform (compilers,…) and allowing mapping of the application on multi-CPU / multi- Cores / multi GPU.


  1. State of the art

1.2Video Processing

1.2.1Supported Formats


ACDC Cloud infrastructure will provide a Video Transcoder / Streamer Software prototype which performs in very performant way on-the-fly and off-line video adaptation and conversion.
It will support wide range videos inputs streams and delivers multiple output streams over a variety of different network protocols.


  • Transport Protocols

    • UDP

      • MPEG2-TS,…

    • TCP

      • MPEG2-TS,…

    • RTMP (Adobe specific process)

    • HTTP/HTTPS Progressive download

    • HTTP/HTTPS Streaming

      • Apple HLS,

      • Microsoft Smooth streaming,

      • Adobe HDS,…

    • File Transfer (

      • FTP,

      • SCP,

      • SFTP,…

  • Video format

    • Container

All major containers would be supported for input/output formats.

      • avi

      • mp4

      • flv

      • wmv, asf

      • mov

      • mpg

      • ogg/ogv

      • 3gp

    • Video codec

All major formats would be supported for input formats, there might be restrictions for output formats regarding encoding issues (like vp6 and vp7 that might not be useful as output formats)

      • H263

      • mpeg4

      • h264 (AVC with its amendments : SVC and MVC,…)

      • mpeg1video, mpeg2video

      • mjpeg

      • Microsoft specific codecs (wmv1, wmv2, wmv3/VC1)

      • Flash specific codecs (Sorenson, vp6, vp7, vp8)

    • Audio codec

All major formats would be supported for input formats, there might be restrictions for output formats regarding encoding issues (like nellymoser that might not be useful as output formats)

      • aac (mp4), mp3, mp2, mp1

      • amr

      • pcm

      • Flash specific codecs (nellymoser,…)

      • Microsoft specific codecs (wma1, wma2,…)

The major innovations are:



  • on-the-fly input Video content analysis allowing media adaptation and conversion in real-time with minimal latency

  • Video conversion processing distribution over multiple processing nodes

  • Managing distributed cache system avoiding multiple conversion of the same content. This cache will be integrated with Distributed Cloud storage



1.2.2Emerging Standards


The ACDC Cloud Infrastructure will provide additional transcoding functionalities that matches the forthcoming specifications given by emerging standardization activities. In the following, a brief summary of the major standardization activities that will be addressed in this project.

Among the recent activities related to 3D Video, MPEG has been active, mainly on two subjects : Multiview Frame Compatible Coding (MFC) and 3D Video Coding (3DVC). We will detail the recent activities in each, knowing that these will lead to 3D video coding standards that the partners will want to support in the future.



1.2.2.1Multiview Frame Compatible





    1. Description of the technology

Multiview Frame Compatible or MFC is a method for delivering stereo 3D video content using more conventional coding tools. It is a stereo coding technique that packs the two views either in a side-by-side or in a over-under (or top-bottom) configuration, as shown in Figure 5.








Figure 1: MFC Configurations, Side-by-Side (left) and Top-Bottom (right)

The two views are subsampled before packing, which results in a loss of spatial resolution. Hence, the base layer formed in this way is enhanced by sending one or more enhancement layers, in order to get back to the full resolution per view at the decoder.




    1. Draft Call for Proposals

Following the 97th MPEG meeting in Torino in July 2011, a draft Call for Proposals (CfP) has been issued, inviting proponents to submit an MFC based coding scheme, taking into input the two full original left and right views, that should be packed into a frame-compatible signal after some pre-filtering to avoid aliasing. The proposal should output the reconstructed frame-compatible base layer, the reconstructed full-resolution left and right views, and of course the bitstreams generated.



Figure 2 : IO for CfP Proposals
A set of anchors has been considered to evaluate proposals to [1] :


Anchor ID

Name

Description

1

MVC

MVC applied directly to left and right views. This serves as an upper anchor.

2

AVC-UP

Up-sampled frame-compatible base layer encoded with AVC. No enhancement layers.

3

SVC


SVC with frame-compatible base layer and spatial scalability

4

FC-MVC

MVC with frame-compatible base layer, where enhancement layer contains complementary samples

5

Simulcast

Simulcast of AVC frame-compatible and 2D-compatible MVC for left/right views. This serves as a lower anchor.





Figure 3 : MFC CfP Anchors. Top-left : SVC, Top-right : FC-MVC, Bottom-left : Simulcast
In addition, the following conditions were established :


  • The bit rate required to code the enhancement layer should not be greater than 25% of the one used to code the base layer.

  • An open GOP structure shall be used with 3 intermediate pictures : IbBbP…

  • IDR (random access points) shall be inserted every 4 seconds, and the intra period shall be equal to 2 seconds.

  • Fast motion estimation is allowed.

  • Multiple pass encoding and weighted prediction are allowed.

  • Fixed QP (per slice type with layer asymmetry) shall be used for the base layer (no rate control). [3]

The evaluation process will be based on both objective measures and subjective viewing. Objective quality evaluation of proposed solutions will be based on BD measures against the anchor encodings. The base layer should be broadcast quality and should not contain appreciable alias products. The PSNR of the upscaled base layer and the full resolution encodings shall be measured against the input full resolution source. [3]


A final Call for Proposals should be issued at the end of the 98th MPEG meeting in Geneva in November 2011, and the consequent evaluation of proposals should be carried out at the 99th MPEG meeting in February 2012.


    1. Discussions




  • Is the 25% EL rate constraint necessary ? And why isn’t it required for Simulcast anchors ?

  • Are the base layer bit rates appropriate for the content ?

  • What is the optimal temporal structure ?

1.2.2.23D Video Coding

MPEG issued after the 96th meeting in Geneva in March 2011 a Call for Proposal for 3D Video Coding technologies. The aim is to find a solution with an associated data format and compression technology that enables the rendering of high quality views from an arbitrary number of dense views. The following bullet points go over the general contents of the CfP [4] :




  • Proponents shall use texture sequences defined in the CfP with their associated depth maps. No depth maps from other depth estimation algorithms shall be used.




  • Proposals will be divided into two categories : AVC-Compatible (forward compatibility with AVC) and HEVC-Compatible (forward compatibility with HEVC, or unconstrained). Each proponent may select to contribute to either one of these categories.




  • Two test scenarios will be considered : the two-view test scenario where the input consists of 2 views (left-right), and the three-view scenario where the input consists of 3 views (left-center-right).




  • Specific rate points (not to be exceeded) are defined for each sequence (4 per sequence).




  • Rendering may be accomplished using a reference view synthesis tool, or a proposed synthesis algorithm that is more appropriate for the proposed data format.




  • For the two-view test scenario, a center view will be synthesized. For the three-view test scenario, all 1/16 positions between the left and right views will be synthesized.





Figure 4: The two-view and three-view test scenarios and the corresponding views to synthesize.


  • Anchors include MVC coding of texture and depth data independantly, and HEVC coding of each texture view and each depth view independantly. Anchors follow the same bit rate constraints imposed on proposals.




  • Objective results consist of BD-Rate and BD-PSNR measurements compared to anchors.




  • Subjective viewing is primordial as no course of action will be taken before the subjective assessment of the reconstructed and synthesized views’ quality. Stereoscopic and autostereoscopic viewing will be considered.




  • For the stereoscopic viewing, a stereo pair will be selected. It corresponds, in the 2-view scenario, to one of the original views and the synthesized center view. In the 3-view scenario, two stereo pairs, with equal baseline, will be considered. The first is specified in the CfP and is centered around the middle view, and the second is randomly selected. Unlike the 2-view scenario, in the 3-view scenario, we can have a stereo pair that is entirely composed of synthesized views.




  • For the autostereoscopic viewing, 28 views will be selected from the range of available views of the 3-view scenario. They are selected to be confortable to view, and they should provide a sufficent depth range, i.e there’s a sufficiently wide baseline distance between the 1st and the 28th view.




  • Finally, submission should contain bitstream files that satisfy the given rate constraints, the reconstructed input views and the synthesized views, the software used to synthesize the views, an excel sheet containing BD measurements, and information about complexity (encoding / decoding time, expected memory usage, specific dependancies…).



Conclusion
In this paragraph, we discussed the primary normalization activities undertaken by MPEG towards the development of new 3D services. The 98th MPEG meeting in Geneva in November 2011 should give us interesting updates since it is an important milestone in 3D video technology development, in which proposals for 3DVC are to be evaluated, and a course plan and a final CfP for MFC are to be issued.

1.2.3Video Encoding / Transcoding Algorithms Optimization


It is known that the most time and resources consuming part of video encoding algorithms is the motion estimation step. For multi-view sequences, this has also the additional burden of disparity estimation, which makes use of similar tools and therefore suffers from the same drawbacks. The development of efficient video encoding algorithms in the context of a multimedia-aware cloud infrastructure represent a key issue of this project, in order to both take advantage of the huge and varying processing capabilities of a cloud to reduce computation times, and on the other hand improve the accuracy of the encoding processes by using more sophisticated algorithms.

In the state-of-the-art codecs, like H.264/AVC, both motion and disparity estimation, the latter covered by the MVC extension, are accomplished through a recursive estimation, scanning the macroblocks in a raster order and predictively processing them, and being thus highly inefficient for parallel implementations. For this reason, following the current research trends, several original motion and disparity vector field estimation algorithms will be proposed, which have two main additional advantages:

- they provide dense fields, meaning one vector per pixel (instead of one vector per block), providing increased flexibility for representation and transmission, thus improving the overall quality of the encoding,

- the estimation algorithms are enabled for parallelization at multi-core and GPU level, thanks also to the use of recent convex optimization methods (in a set theoretic framework).

As a further advantage, these algorithms will naturally fit the developing framework that will be soon introduced by the future HEVC (High Efficiency Video Coding) standard in video coding.

1.2.3.1Motion Estimation Algorithm




Figure 5: Motion Estimation

Motion estimation represents the foundation of almost all video coding techniques currently available, for which the main idea consists in removing redundancy of video data by describing a sequence of video frames using only a key frame (namely the intra-frame) and some more or less compact information about the motion of the different parts of the scene.

In the great majority of past and current video codecs (from MPEG-1 up to MPEG-4 and H264/AVC), the motion estimation step is, often largely, the most time-consuming, calling for the design of an optimized algorithm for highly data-parallel computational devices/platforms.


Leveraging cloud computing to provide multimedia applications and services over the Internet is a major current challenge. Moreover, as stated in [6], the deployment of a multimedia-aware cloud infrastructure (MEC, Media-Edge Cloud) is necessary to provide a disruptive QoS provisioning. In this project, we first intend to study and develop the parallel implementations of state-of-the-art motion estimation algorithm, having as a constraint the development of a portable and scalable solution, able to adapt to both the heterogeneity and the evolutivity of a cloud. In particular, we focus on data parallel implementations of the full search block matching (FSBM) algorithm, since such approach is naturally suited for this problem, and provide a solution that fits both a general purpose GPU and a multi-core CPU. A few recent works have already dealt with this problem, with exclusive focus on the GPU. In [7] for example, a FSBM algorithm for H.264/AVC motion estimation has been introduced that fits into the Compute Unified Device Architecture (CUDA) [8], while in [9] the problem of scalability of the FSBM algorithm with respect to the number of cores of a GPU is discussed. In \cite{Schwalb2009}, the GPU-oriented parallelization of a motion estimation algorithm based on diamond search has been also proposed. Several works, like [10,11], discuss the GPU parallelization of the RD-optimized motion estimation in H.264/AVC video coding. As an element of novelty, we intend to develop implementations that rely on the [12] framework, as it provides a common API for the execution of programs on systems equipped with different types of computational devices such as multi-core CPUs, GPUs, or other accelerators.

1.2.3.2Depth/Disparity Estimation Algorithm





Figure 6: Depth / Disparity Estimation
The recovery of the depth information of a scene from stereo/multi-view images is of fundamental importance in many applications in the video processing domain such as autonomous navigation, 3-D reconstruction and 3-D television. It represents a basic step for the efficient encoding/compression of multi-view/3D videos. For the elementary stereo case, given two images from different viewpoints, a stereo matching method attempts to find corresponding pixels in both images. The disparity map computed from this matching process can then be used to recover the 3-D positions of the scene elements for known camera configurations. Being computationally demanding, disparity estimation can largely take advantage of a suitable algorithm designed for high performance computing infrastructures.
The disparity estimation problem has been extensively studied in computer vision [13]. Traditionally, disparity estimation algorithms are basically classified into two categories: local methods and global ones. Algorithms in the first category, where the disparity at each pixel depends only on intensity values within a local window, perform well in highly textured regions. However, they often produce noisy disparities in textureless regions and fail at occluded areas. These problems can be reduced by using global methods which aim at finding the disparity field that minimizes a global energy function over the entire image. For this purpose, several energy minimization algorithms have been proposed. The most common approaches are dynamic programming [14], graph cuts [15] and variational methods [16]. While dynamic programming and graph cuts methods operate in a discrete manner, variational techniques work in a continuous space. Therefore, they possess the advantage of producing a disparity field with ideally infinite precision. Among these global approaches, it has been shown that variational-based disparity estimation methods are among the most competitive techniques, for their preservation of depth discontinuities. By mainly focusing on dense disparity estimation, the problem will be formulated as a convex optimization problem within a global variational approach. As a major contribution to this project, IT intends to develop a optimized parallel implementation of the resulting disparity estimation algorithm algorithm. This will be achieved, first of all, exploiting the intrinsic parallel nature of the convex optimization framework, which allows us to efficiently solve the estimation problem over feasibility sets determined by multiple parallel constraints that model prior information. This naturally fits the resulting algorithm into the task-parallel programming paradigm, particularly suitable for multi-core devices. Moreover, leveraging GPU will allow the optimized data-parallel implementation of all the low-level processing involved.



Download 232.79 Kb.

Share with your friends:
1   2   3   4   5   6   7   8




The database is protected by copyright ©ininet.org 2024
send message

    Main page