International organisation for standardisation organisation internationale de normalisation


Task group activities Joint Meetings



Download 8.47 Mb.
Page105/116
Date19.10.2016
Size8.47 Mb.
#4078
1   ...   101   102   103   104   105   106   107   108   ...   116

Task group activities

  1. Joint Meetings

    1. With 3DG on ARAF and 3D Audio Tue 14:00-16:00


At the 104th MPEG meeting the 3DG Chair summarized the needs of an Audio Augmented Reality system as follows:

  • Analyze scene and capture appropriate metadata (e.g. acoustic reflection properties) and attach it to a scene

  • Attach directional metadata to audio object

  • HRTF information for the user

  • The audio augmented reality system should be of sufficiently low complexity (or we have sufficiently high computational capability) that there is very low latency between the real audio and the augmented reality audio.

The 3DG Chair clarified that less than 50 ms is a realistic requirement for 3D audio latency.

The 3DG Chair gave an example of Augmented Reality: a person has a mobile device with a transparent visual display. The person sees and hears the real world around him, and can look through the transparent visual display (i.e. “see through mode) to see the augmented reality (e.g. with an avatar rendered in the scene).

The avatar has a pre-coded audio stream (e.g. MPEG-4 AAC bitstream) that it can play out, and ARAF knows its spatial location and orientation (i.e. which way it is facing). The required the audio metadata is:


  • Radiation properties of avatar (i.e. radiation power as a function of angle and distance)

The avatar audio could be presented via headphones or ear buds. The Audio Chair noted that ARAF may want to incorporate a head tracker so that the avatar can remain fixed within the physical world coordinate system. The 3DG Chair noted that if “see through” mode is used then the orientation of the mobile device screen would be sufficient. In that case, audio could be presented via the mobile device speakers.

The Audio Chair noted that the avatar could function as a “virtual loudspeaker” in MPEG-H 3D Audio such that the 3D Audio renderer could be used for presentation via headphones. However, 3D Audio is not able to add environmental effects (e.g. room reverberation) that are separate from the avatar sound. Furthermore, 3D Audio cannot add such environmental effects based on listener location or orientation (e.g. reverberation changes when the listener moved closer to a reflecting surface).

Yeshwant Muthusamy, Samsung, noted that KHRONOS and OpenSELS provide a 3D Audio API that does support e.g. reverberation based on user location and might offer a solution. However, the Audio Chair noted that ARAF does not know the acoustic properties (i.e. surface reflective properties) of the real-world environment (unless it can deduce them from the visual scene) and thus audio effects based on real-world environment is not possible, whether using MPEG-H 3D Audio or KHRONOS. 3DG experts will consider this problem and report back at a future joint meeting.

      1. With Systems on MP4FF Enhanced Audio Support Wed 14:00-16:00


The following documents were reviewed:

  • m30100, Addition of Sample aspect ratio and further audio code-points

  • m30101, Editor’s draft of 14496-12 PDAM 3 – Enhanced audio and other improvements

It was decided to remove audio-related technology from these documents and put them in a working draft that can be better managed by Audio and can progress on an independent time line.
      1. With Requirements on DRC Fri 11:00 – 11:30


Audio experts are well aware that there is DRC capability integrated with the MPEG-4 Advanced Audio Coding profile coder, as in the AAC MDCT-based multi-band DRC. Audio experts do not want to needlessly confuse the marketplace, but are nevertheless interested in considering new technology having performance needed for diversity of the marketplace (e.g. listening in the living room home theatre or in a mobile phone on the street).

While many actions concerning Dynamic Range Control were discussed during the week, the Audio subgroup decided to issue a Call for Proposals on Program Level and Dynamic Range Control. Submissions are asked to compare the performance of proposed technology to that of MPEG-4 AAC-based DRC technology. The Call will be issued at this meeting and submissions due at the next meeting.



The text of the Call was reviewed in this joint session and approved by Requirements.
    1. Task Group discussions

      1. 3D Audio


The proponent binaurization complexity figures were reviewed and agreed to be as follows:

Proponent

Cmod

ETRI

90.93

IDMT

68.00

IIS

84.43

SONY

79.04

ORL

88.80

QUAL

87.85

TECH

88.80


Calculation of Figure of Merit

Using the binauralization complexity figures shown above, the Figure of Merit was calculated for CfP responses using the CO and HOA data sets. Audio experts checked the FoM (and revised and corrected) and the results are shown here:

Channel/Object (CO) Signal Set

Sys

High

Low

Mean

IIS

89.607

87.279

88.443

ETRI

84.080

81.138

82.609

SONY

80.013

77.283

78.648

IDMT

83.273

80.617

81.945



Higher Order Ambisonics (HOA) signal set

Sys

High

Low

Mean

QUAL

87.924

84.901

86.413

TECH

89.904

87.425

88.665

ORL

89.707

87.196

88.451



Discussion

The FoM data suggests a “best” system for each signal set. For these “best” systems, Audio experts looked at overall system performance and also performance for individual items in individual tests and did not find evidence of inconsistent performance (i.e. exceptionally poor performance for one or more individual items). No such issues were identified.



For the Channel/Object (CO) signal set, it was the consensus of the Audio subgroup to select Sys1, which is the FhG-IIS submission, as the CO RM0.

For the Higher Order Ambisonics (HOA) signal set, it was the consensus of the Audio subgroup to select Sys2, which is the Technicolor submission, as the HOA RM0. The Chair noted that the Technicolor submission is joint work between Technicolor and Orange. If was agreed to define a first Core Experiment with a workplan at this meeting to investigate merging HOA Sys1 technology, submitted by Qualcomm, into the HOA RM0.

As specified in the Call for Proposals, it is expected that proponents submit RM0, as decoder technical specification and decoder source code implementation, prior to the beginning of the next MPEG meeting (see the Call for details).



3D Audio Requirements

Akio Ando, NHK, presented



m29614

Proposed Requirements for the 22.2 channel sound broadcasting system

Akio Ando, Takehiro Sugimoto, Yasushige Nakayama, Kaoru Watanabe

The contribution responds to a WG11 request at the 104th MPEG meeting. In May, 2013 the Next Generation TV Forum (NexTV-F) was establish to promote 8K-resolution television broadcast. They have committed to 8K broadcasting in 2016.

The requirements from NHK are: (4K broadcasting?)

Service:


  • 2, 5.1 and 22.2 channels

  • Ancillary data shall be supported

  • DRC and Program Level shall be possible using ancillary data

Performance:

  • Audio quality should be transparent for typical audio programs

  • Three-dimensional immersive sound should be significantly better as comared to 5.1

  • Bandwidth: 16 bit pcm, 48 Fs, 20-20,000 Hz audio bandwidth

  • Delay: not more than delay of video decoder

Functional:

  • Delivered 22.2 channel signal shall be able to be down-mixed using transmitted downmix coefs.

  • Bit rate: (22/2)(144 kb/s) = 1.584 Mb/s is upper limit bit rate

  • Backward compatible to MPEG-4: MPEG-4 AAC shall be used as a core coder

Year 2016 MPEG-4 based model

  • MPEG-4 part:

    • The broadcasted bitstream must be decodable by MPEG-4 AAC decoder.

  • MPEG-H part (if appropriate MPEG-H outcome will be available):

    • Down-mix from 22.2 to 5.1 channel: default downmix coefs; downloadable artistic coefs.

    • Dynamic range control and Program Level control

SHV broadcast schedule (to be promoted by NexTV-F)

  • 2015 experimental broadcast (e.g. 1 hour a day)

  • 2016 examination broadcast of Rio Summer Olympics (full-service broadcast with consumer receivers)

The Chair stated that MPEG Audio could commit to delivering an MPEG-H 3D Audio Version 1 that can be used for both experimental and examination broadcast. However, he asked NHK experts if it will be easy to move to MPEG-H 3D Audio Version 2 for 2020 full-service broadcast. NHK experts were not able answer the question at this time, but indicated that Japanese broadcasters might want to use the newest technology.

Thomas Sporer, FhG-IDMT, noted that ITU-R Recommendation BS.1548 defines “broadcast quality”

The Chair encouraged Audio delegates to reflect on the issues raised and additionally to think of the content of these possible output documents for MPEG-H Version 1:

Public:


  • Requirements and functionality (there may be new ones, e.g. DRC)

  • Timeline to completion of Version 1

Private:

  • Architecture. This may encompass support for Channels, Objects and HOA, a functional description of bitstream (e.g. header, synchronization, break-in, “hooks” for Version 2).

Jan Plogsites, FhG-IIS, presented

m30327

Time line and Requirements for Next Generation Broadcast System based on current state of MPEG-H 3D Audio

Jan Plogsties, Max Neuendorf, Bernhard Grill

The contribution presented timeline requirements for ARIB, ATSC 3.0, both of which appear to be well served by an MPEG-H 3D Audio Version 1 International Standard in January 2015. A timeline for the European marketplace is less clear.

Candidate Requirements



  • Configuration – mono, stereo, 5.1 and 22.2, but flexible extensions are also possible.

  • Sampling Rate – certainly 48 kHz.

  • Quality and efficiency – broadcast quality at 1.2Mb/s or 1.4Mb/s.

  • Random access

  • Synchronization to other streams, e.g. MPEG-2 transport and Over-the-top IP delivery.

  • DRC and Program Level control

  • Downmix

In summary, FhG-IIS feels that it is important to adhere to an aggressive timeline for Version 1 to address the needs of the worldwide marketplace. In this respect, a clear set of requirements (of what is in and what is not) for Version 1 facilitates meeting this aggressive timeline.

Thomas Sporer, FhG-IDMT, presented



m30307

Requirements on an initial 22.2 channel based standard

Markus Mehnert, Robert Steffens, Thomas Sporer

The document is a joint contribution from FhG-IDMT and Iosono. The presenter noted that many requirements in this contribution have already be covered in the previous two contributions.

The proposed requirements are:



  • Highest audio quality

  • DRC and Program Loudness

  • Objects for dialog enhancement and visually impaired descriptions

  • Very efficient coding to achieve low bitrates for e.g. portable devices.

The contribution urges MPEG to make a careful technology selection, particularly for Version 1, so that it is not locked into old technology. In addition, there must be a clear plan e.g. requirements, bitstream architecture to move from Version 1 to Version 2.

Andreas Silzle, FhG-IIS, presented



m30325

Downmix Comparison Test

Andreas Silzle, Hanne Stenzel, Achim Kuntz, Arne Borsum, Jan Plogsties

The presenter noted that downmix is a very important functionality in MPEG-H 3D Audio. The fact that one might downmix from 22 channels to 5 or even 2 channels gives ample opportunity for phase alignment of correlated signal peaks and hence large values that lead to signal clipping.

Downmix may effect



  • Clipping (as already mentioned)

  • Spatial masking (or un-masking)

  • Phasing artifacts

And dowmix may result in

  • Too high an ambience level

To avoid these problems, downmix may need to be dynamic, rather than static (over the entire audio program). Dynamic downmix might be both signal-dependent and also frequency-variant.

The presentation gave an example spectrogram of a static downmix as compared to an adaptive downmix, in which the static downmix demonstrated deep comb-filtering at 1 kHz intervals. Such comb-filtering is expected to be prevelant in 22.2 channel programs because a mix of high, mid, front with a strong discrete source would have a very small time delay, leading to comb-filtering.

Gregory Pallone, Orang Labs, asked if the FhG active downmix used delay compensation (i.e. some aspect of phase in addition to gain) and the presenter stated that the mixer was signal-adaptive, but worked in real-time. He further noted that HOA signals do not have such a downmix problem.

BRIR Interchange Formats

Marc Emerit, Orange, presented



m30288

Proposal for a data exchange format

Markus Noisternig, Piotr Majdak, Thibaut Carpentier, Gregory Pallone, Marc Emerit, Olivier Warusfel

The contribution notes that a format for interfacing head-related transfer functions is currently being developed as an AES standard under project AES-X212 in AESSC working group AES SC-08-02 (www.aes.org/standards).

The contribution brings information on the Spatially Oriented Format for Acoustics (SOFA) format, which aims to store space-related audio data in a very general way. It supports a very rich description of the measurement conditions for a HRTF/BRIR, including listener position, emitter position, and IP rights comments. It also supports multiple representations for the HRTF/BRIR data.

The contribution recommends that the AES-X212 work be considered for possible adoption by MPEG-H 3D Audio as an interchange format. It is expected that the AES standard will issue in 2014.

Thomas Sporer, FhG-IDMT, noted that AES work sometimes progresses slowly and so the 2014 date could be later.

The presenter noted that the first format for AES-X212 would likely be PCM BRIR waveforms.

The Chair endorsed the idea of a simple BRIR interchange format, with a “second step” of converting the BRIR to a possible application-specific formats.

Werner Oomen, Philips, supported many formats in a single interchange format. He noted that there are two formats for binaurization information in MPEG Surround.

The Chair noted that the group should consider the impact of high-resolution BRIR on a possible interchange format.

Werner de Bruijn, Philips, presented

m30249

BRIR interface format: update and implementation

Werner de Bruijn, Aki Harma, Werner Oomen

The contribution presented a revision to a previous document (m29145) on BRIR format. It notes that there is a wide variation in the ways that BRIR are measured, represented and possibly parameterized. At the most basic level, this includes e.g. representation of angles associated with individual BRIR within a set.

The contribution describes a software tool that can convert from BRIR time-domain representations to other formats.

Philips experts have contacted the AES SC-08-02 working group and shared with them this format proposal.

The contribution asks that the format be included in 3D Audio RM0.

Clemens Par, Swissaudec, noted that the he believes that AES envisions a “scalable” standard which could result a simple format aimed at industrial applications and a more complex format aimed at research applications.

The Chair noted that the AES X212 work is just beginning, and MPEG is in a position to influence the AES work. What that will might be need more discussion.



BRIR Study

Thomas Sporer, FhG-IDMT, presented



m30222

Study on effects of BRIR data set on Quality Assessment

Carsten Boensel, Martina Böhme, David Goecke, Thomas Mayenfels, Marcel Schnemilich, Thomas Sporer, Stephan Werner, Maximilian Wolf, Albrecht Würsig

This contribution describes a study comparing codecs in headphone listening tests using different sets of BRIRs. The results show that the influence of the BRIR set is sometimes an important factor.

NHK Presentation on 2016 Broadcast Model

Akio Ando, NHK, made a presentation on the NHK view on Japan on 2016 Broadcast Model on 22.2 channel broadcast. The timeline is:



  • October 2013: 22.2 channel core coder will be developed

  • July 2014: Possible MPEG-H technology (assuming that some MPEG-H Version 1 is available as DIS in July 2015).

The presenter highlighted what is missing in MPEG-4:

  • Clipping may occur when downmixing from 22.2 to 5.1. Pre-compression will not be best solution for 22.2 (non-downmixed) presentation.

  • MPEG-H flexible rendering with downmix can solve this problem

The presenter described the 2016 Broadcast Model

MPEG-4 AAC decoder



  • With 22.2 channel codepoint

MPEG-H 3D Audio Renderer/Downmix

  • Default downmix coefficients

  • Artistic downmix coefficients sent from encoder

  • In both cases, data clipping must be avoided while maintaining loudness level

The Chair asked what is 2015 experimental model? The presenter stated that the near-term work would focus on 22.2 channel encoder/decoder, and downmix may not be considered.

The Chair asked about DRC – is it required that MPEG produce standardize DRC technology on a certain timeline? The presenter stated that DRC is not essential in the 2016 broadcast model. If it is available it may be used.

Thomas Sporer, FhG-IDMT, noted that NHK asks for default and artistic downmix coefficients. The Chair noted that this could be re-expressed as:


  • MPEG-H renderer has a meta-data guided operation for “better” 22.2 -> 5.1 mode

There are two possibilities moving forward:

Model 1


  • MPEG-4 based

    • AAC core coder

    • MPEG-H technology for downmix, DRC and program level

Model 2

  • ARIB based

    • AAC like ARIB coder

    • ARIB technology for downmix, DRC and program level

Model 1 is possible if MPEG-H 3D Audio is DIS (i.e. frozen technology) in July 2015.

Thursday Discussion

The Chair noted that the IIS submission is based on a USAC core coder with extensions, and asked for comments on whether Audio experts want:



  • Information on the benefit of the extensions relative to MPEG-D USAC

  • To incorporate these extensions back into MPEG-D

After discussion it was the consensus of the Audio subgroup to:

  • Ask FhG-IIS to bring to the next MPEG meeting information on the benefit of the extensions relative to MPEG-D USAC.

  • That there would be no extensions to MPEG-D USAC based on the MPEG-H 3D Audio RM0-CO USAC configuration. However, when MPEG-H Version is DIS (or IS), a “Version 2” of USAC technology could be considered.

  • That 3D Audio can support multiple core coders.

  • That Core Experiments on 3D Audio core coders should be allowed, but that any possible changes should be restricted to the MPEG-H standard.

ETRI requested to conduct a CE using only the CfP decoded waveforms. It would compare the subjective quality of:

  • RM0-CO Test1.3 decoded waveforms (binarualization to headphones) to

  • RM0-CO Test1.1 at 1.2 Mb/s decoded waveforms as further processed by CE proponent technology for binarualization to headphones

Several additional companies requested to participate in the CE. The Audio subgroup decided that the following process would be used to manage access to the RM0-CO decoded waveforms in the AhG prior to the next meeting (at which time the full RM0 deliverables will be provided by the RM proponents, i.e. bitstreams, decoded waveforms and decoder source code and WD text).

Process for access to RM0 decoded waveforms:

  1. CE proponents send request to the Audio Chair

  2. Audio Chair checks that CE proponent company has signed all agreements required to gain access to the test item content. If not, then Chair sends proponent information package on access to the test item content and waits until all agreements are executed.

  3. Audio Chair sends CP proponent password information for access.


      1. MPEG Surround


Daniel Fischer, FhG-IIS, presented

m30212

Proposed corrections for MPEG Surround Reference Software

Michael Fischer, Andreas Hoelzer, Johannes Hilpert

The presentation describes a error in the software when operating in 7-2-5 mode. A fix was described.

It was the consensus of the Audio Subgroup to add this correction the existing Defect Report on MPEG Surround text.

Daniel Fischer, FhG-IIS, presented



m30214

Proposed corrections for MPEG SAOC Reference Software

Maria Luis Valero, Michael Fischer

The presentation describes a error in the SAOC software which contains low delay MPEG Surround. The fix aligns the software with the standard text.

It was the consensus of the Audio Subgroup to put this correction into a new Defect Report on SAOC Reference Software.
      1. SAOC


Oliver Hellmuth, FhG-IIS, presented

m30273

Report on corrections for MPEG SAOC

Oliver Hellmuth, Harald Fuchs, Jouni Paulus, Leon Terentiv, Falko Ridderbusch, Adrian Murtaza

The contribution identifies errors in the text specification and proposed corrections.

It was the consensus of the Audio Subgroup to add these corrections to the existing Defect Report on SAOC and to issue a DCOR 2.

Oliver Hellmuth, FhG-IIS, presented



m30271

Information on Dialog Enhancement profile for SAOC

Oliver Hellmuth, Harald Fuchs, Jürgen Herre, Sascha Disch, Jouni Paulus, Leon Terentiv, Falko Ridderbusch, Adrian Murtaza

The contribution progresses the WD on SAOC-DE to

  • Add explicit statements on deactivated tools and modes

  • Definition of new terms

  • Permit asymmetric modification range control (MRC) values

  • Output interface for the gain values that modify “unprocessed channels”

Henney Oh, WILUS STRC, requested time to study the contribution and discuss it with the presenter. The topic will be brought up later in the week.

After additional discussion, it was the consensus of the Audio Subgroup to progress this work to PDAM. The Chair noted that the presenter should prepare a Request for Amendment document.


  1. Download 8.47 Mb.

    Share with your friends:
1   ...   101   102   103   104   105   106   107   108   ...   116




The database is protected by copyright ©ininet.org 2024
send message

    Main page