Joint Meetings With 3DG on ARAF and 3D Audio Tue 14:00-16:00
At the 104th MPEG meeting the 3DG Chair summarized the needs of an Audio Augmented Reality system as follows:
-
Analyze scene and capture appropriate metadata (e.g. acoustic reflection properties) and attach it to a scene
-
Attach directional metadata to audio object
-
HRTF information for the user
-
The audio augmented reality system should be of sufficiently low complexity (or we have sufficiently high computational capability) that there is very low latency between the real audio and the augmented reality audio.
The 3DG Chair clarified that less than 50 ms is a realistic requirement for 3D audio latency.
The 3DG Chair gave an example of Augmented Reality: a person has a mobile device with a transparent visual display. The person sees and hears the real world around him, and can look through the transparent visual display (i.e. “see through mode) to see the augmented reality (e.g. with an avatar rendered in the scene).
The avatar has a pre-coded audio stream (e.g. MPEG-4 AAC bitstream) that it can play out, and ARAF knows its spatial location and orientation (i.e. which way it is facing). The required the audio metadata is:
-
Radiation properties of avatar (i.e. radiation power as a function of angle and distance)
The avatar audio could be presented via headphones or ear buds. The Audio Chair noted that ARAF may want to incorporate a head tracker so that the avatar can remain fixed within the physical world coordinate system. The 3DG Chair noted that if “see through” mode is used then the orientation of the mobile device screen would be sufficient. In that case, audio could be presented via the mobile device speakers.
The Audio Chair noted that the avatar could function as a “virtual loudspeaker” in MPEG-H 3D Audio such that the 3D Audio renderer could be used for presentation via headphones. However, 3D Audio is not able to add environmental effects (e.g. room reverberation) that are separate from the avatar sound. Furthermore, 3D Audio cannot add such environmental effects based on listener location or orientation (e.g. reverberation changes when the listener moved closer to a reflecting surface).
Yeshwant Muthusamy, Samsung, noted that KHRONOS and OpenSELS provide a 3D Audio API that does support e.g. reverberation based on user location and might offer a solution. However, the Audio Chair noted that ARAF does not know the acoustic properties (i.e. surface reflective properties) of the real-world environment (unless it can deduce them from the visual scene) and thus audio effects based on real-world environment is not possible, whether using MPEG-H 3D Audio or KHRONOS. 3DG experts will consider this problem and report back at a future joint meeting.
With Systems on MP4FF Enhanced Audio Support Wed 14:00-16:00
The following documents were reviewed:
-
m30100, Addition of Sample aspect ratio and further audio code-points
-
m30101, Editor’s draft of 14496-12 PDAM 3 – Enhanced audio and other improvements
It was decided to remove audio-related technology from these documents and put them in a working draft that can be better managed by Audio and can progress on an independent time line.
With Requirements on DRC Fri 11:00 – 11:30
Audio experts are well aware that there is DRC capability integrated with the MPEG-4 Advanced Audio Coding profile coder, as in the AAC MDCT-based multi-band DRC. Audio experts do not want to needlessly confuse the marketplace, but are nevertheless interested in considering new technology having performance needed for diversity of the marketplace (e.g. listening in the living room home theatre or in a mobile phone on the street).
While many actions concerning Dynamic Range Control were discussed during the week, the Audio subgroup decided to issue a Call for Proposals on Program Level and Dynamic Range Control. Submissions are asked to compare the performance of proposed technology to that of MPEG-4 AAC-based DRC technology. The Call will be issued at this meeting and submissions due at the next meeting.
The text of the Call was reviewed in this joint session and approved by Requirements.
Task Group discussions 3D Audio
The proponent binaurization complexity figures were reviewed and agreed to be as follows:
-
Proponent
|
Cmod
|
ETRI
|
90.93
|
IDMT
|
68.00
|
IIS
|
84.43
|
SONY
|
79.04
|
ORL
|
88.80
|
QUAL
|
87.85
|
TECH
|
88.80
|
Calculation of Figure of Merit
Using the binauralization complexity figures shown above, the Figure of Merit was calculated for CfP responses using the CO and HOA data sets. Audio experts checked the FoM (and revised and corrected) and the results are shown here:
Channel/Object (CO) Signal Set
Sys
|
High
|
Low
|
Mean
|
IIS
|
89.607
|
87.279
|
88.443
|
ETRI
|
84.080
|
81.138
|
82.609
|
SONY
|
80.013
|
77.283
|
78.648
|
IDMT
|
83.273
|
80.617
|
81.945
|
Higher Order Ambisonics (HOA) signal set
Sys
|
High
|
Low
|
Mean
|
QUAL
|
87.924
|
84.901
|
86.413
|
TECH
|
89.904
|
87.425
|
88.665
|
ORL
|
89.707
|
87.196
|
88.451
|
Discussion
The FoM data suggests a “best” system for each signal set. For these “best” systems, Audio experts looked at overall system performance and also performance for individual items in individual tests and did not find evidence of inconsistent performance (i.e. exceptionally poor performance for one or more individual items). No such issues were identified.
For the Channel/Object (CO) signal set, it was the consensus of the Audio subgroup to select Sys1, which is the FhG-IIS submission, as the CO RM0.
For the Higher Order Ambisonics (HOA) signal set, it was the consensus of the Audio subgroup to select Sys2, which is the Technicolor submission, as the HOA RM0. The Chair noted that the Technicolor submission is joint work between Technicolor and Orange. If was agreed to define a first Core Experiment with a workplan at this meeting to investigate merging HOA Sys1 technology, submitted by Qualcomm, into the HOA RM0.
As specified in the Call for Proposals, it is expected that proponents submit RM0, as decoder technical specification and decoder source code implementation, prior to the beginning of the next MPEG meeting (see the Call for details).
3D Audio Requirements
Akio Ando, NHK, presented
m29614
|
Proposed Requirements for the 22.2 channel sound broadcasting system
|
Akio Ando, Takehiro Sugimoto, Yasushige Nakayama, Kaoru Watanabe
|
The contribution responds to a WG11 request at the 104th MPEG meeting. In May, 2013 the Next Generation TV Forum (NexTV-F) was establish to promote 8K-resolution television broadcast. They have committed to 8K broadcasting in 2016.
The requirements from NHK are: (4K broadcasting?)
Service:
-
2, 5.1 and 22.2 channels
-
Ancillary data shall be supported
-
DRC and Program Level shall be possible using ancillary data
Performance:
-
Audio quality should be transparent for typical audio programs
-
Three-dimensional immersive sound should be significantly better as comared to 5.1
-
Bandwidth: 16 bit pcm, 48 Fs, 20-20,000 Hz audio bandwidth
-
Delay: not more than delay of video decoder
Functional:
-
Delivered 22.2 channel signal shall be able to be down-mixed using transmitted downmix coefs.
-
Bit rate: (22/2)(144 kb/s) = 1.584 Mb/s is upper limit bit rate
-
Backward compatible to MPEG-4: MPEG-4 AAC shall be used as a core coder
Year 2016 MPEG-4 based model
-
MPEG-4 part:
-
The broadcasted bitstream must be decodable by MPEG-4 AAC decoder.
-
MPEG-H part (if appropriate MPEG-H outcome will be available):
-
Down-mix from 22.2 to 5.1 channel: default downmix coefs; downloadable artistic coefs.
-
Dynamic range control and Program Level control
SHV broadcast schedule (to be promoted by NexTV-F)
-
2015 experimental broadcast (e.g. 1 hour a day)
-
2016 examination broadcast of Rio Summer Olympics (full-service broadcast with consumer receivers)
The Chair stated that MPEG Audio could commit to delivering an MPEG-H 3D Audio Version 1 that can be used for both experimental and examination broadcast. However, he asked NHK experts if it will be easy to move to MPEG-H 3D Audio Version 2 for 2020 full-service broadcast. NHK experts were not able answer the question at this time, but indicated that Japanese broadcasters might want to use the newest technology.
Thomas Sporer, FhG-IDMT, noted that ITU-R Recommendation BS.1548 defines “broadcast quality”
The Chair encouraged Audio delegates to reflect on the issues raised and additionally to think of the content of these possible output documents for MPEG-H Version 1:
Public:
-
Requirements and functionality (there may be new ones, e.g. DRC)
-
Timeline to completion of Version 1
Private:
-
Architecture. This may encompass support for Channels, Objects and HOA, a functional description of bitstream (e.g. header, synchronization, break-in, “hooks” for Version 2).
Jan Plogsites, FhG-IIS, presented
m30327
|
Time line and Requirements for Next Generation Broadcast System based on current state of MPEG-H 3D Audio
|
Jan Plogsties, Max Neuendorf, Bernhard Grill
|
The contribution presented timeline requirements for ARIB, ATSC 3.0, both of which appear to be well served by an MPEG-H 3D Audio Version 1 International Standard in January 2015. A timeline for the European marketplace is less clear.
Candidate Requirements
-
Configuration – mono, stereo, 5.1 and 22.2, but flexible extensions are also possible.
-
Sampling Rate – certainly 48 kHz.
-
Quality and efficiency – broadcast quality at 1.2Mb/s or 1.4Mb/s.
-
Random access
-
Synchronization to other streams, e.g. MPEG-2 transport and Over-the-top IP delivery.
-
DRC and Program Level control
-
Downmix
In summary, FhG-IIS feels that it is important to adhere to an aggressive timeline for Version 1 to address the needs of the worldwide marketplace. In this respect, a clear set of requirements (of what is in and what is not) for Version 1 facilitates meeting this aggressive timeline.
Thomas Sporer, FhG-IDMT, presented
m30307
|
Requirements on an initial 22.2 channel based standard
|
Markus Mehnert, Robert Steffens, Thomas Sporer
|
The document is a joint contribution from FhG-IDMT and Iosono. The presenter noted that many requirements in this contribution have already be covered in the previous two contributions.
The proposed requirements are:
-
Highest audio quality
-
DRC and Program Loudness
-
Objects for dialog enhancement and visually impaired descriptions
-
Very efficient coding to achieve low bitrates for e.g. portable devices.
The contribution urges MPEG to make a careful technology selection, particularly for Version 1, so that it is not locked into old technology. In addition, there must be a clear plan e.g. requirements, bitstream architecture to move from Version 1 to Version 2.
Andreas Silzle, FhG-IIS, presented
m30325
|
Downmix Comparison Test
|
Andreas Silzle, Hanne Stenzel, Achim Kuntz, Arne Borsum, Jan Plogsties
|
The presenter noted that downmix is a very important functionality in MPEG-H 3D Audio. The fact that one might downmix from 22 channels to 5 or even 2 channels gives ample opportunity for phase alignment of correlated signal peaks and hence large values that lead to signal clipping.
Downmix may effect
-
Clipping (as already mentioned)
-
Spatial masking (or un-masking)
-
Phasing artifacts
And dowmix may result in
-
Too high an ambience level
To avoid these problems, downmix may need to be dynamic, rather than static (over the entire audio program). Dynamic downmix might be both signal-dependent and also frequency-variant.
The presentation gave an example spectrogram of a static downmix as compared to an adaptive downmix, in which the static downmix demonstrated deep comb-filtering at 1 kHz intervals. Such comb-filtering is expected to be prevelant in 22.2 channel programs because a mix of high, mid, front with a strong discrete source would have a very small time delay, leading to comb-filtering.
Gregory Pallone, Orang Labs, asked if the FhG active downmix used delay compensation (i.e. some aspect of phase in addition to gain) and the presenter stated that the mixer was signal-adaptive, but worked in real-time. He further noted that HOA signals do not have such a downmix problem.
BRIR Interchange Formats
Marc Emerit, Orange, presented
m30288
|
Proposal for a data exchange format
|
Markus Noisternig, Piotr Majdak, Thibaut Carpentier, Gregory Pallone, Marc Emerit, Olivier Warusfel
|
The contribution notes that a format for interfacing head-related transfer functions is currently being developed as an AES standard under project AES-X212 in AESSC working group AES SC-08-02 (www.aes.org/standards).
The contribution brings information on the Spatially Oriented Format for Acoustics (SOFA) format, which aims to store space-related audio data in a very general way. It supports a very rich description of the measurement conditions for a HRTF/BRIR, including listener position, emitter position, and IP rights comments. It also supports multiple representations for the HRTF/BRIR data.
The contribution recommends that the AES-X212 work be considered for possible adoption by MPEG-H 3D Audio as an interchange format. It is expected that the AES standard will issue in 2014.
Thomas Sporer, FhG-IDMT, noted that AES work sometimes progresses slowly and so the 2014 date could be later.
The presenter noted that the first format for AES-X212 would likely be PCM BRIR waveforms.
The Chair endorsed the idea of a simple BRIR interchange format, with a “second step” of converting the BRIR to a possible application-specific formats.
Werner Oomen, Philips, supported many formats in a single interchange format. He noted that there are two formats for binaurization information in MPEG Surround.
The Chair noted that the group should consider the impact of high-resolution BRIR on a possible interchange format.
Werner de Bruijn, Philips, presented
m30249
|
BRIR interface format: update and implementation
|
Werner de Bruijn, Aki Harma, Werner Oomen
|
The contribution presented a revision to a previous document (m29145) on BRIR format. It notes that there is a wide variation in the ways that BRIR are measured, represented and possibly parameterized. At the most basic level, this includes e.g. representation of angles associated with individual BRIR within a set.
The contribution describes a software tool that can convert from BRIR time-domain representations to other formats.
Philips experts have contacted the AES SC-08-02 working group and shared with them this format proposal.
The contribution asks that the format be included in 3D Audio RM0.
Clemens Par, Swissaudec, noted that the he believes that AES envisions a “scalable” standard which could result a simple format aimed at industrial applications and a more complex format aimed at research applications.
The Chair noted that the AES X212 work is just beginning, and MPEG is in a position to influence the AES work. What that will might be need more discussion.
BRIR Study
Thomas Sporer, FhG-IDMT, presented
m30222
|
Study on effects of BRIR data set on Quality Assessment
|
Carsten Boensel, Martina Böhme, David Goecke, Thomas Mayenfels, Marcel Schnemilich, Thomas Sporer, Stephan Werner, Maximilian Wolf, Albrecht Würsig
|
This contribution describes a study comparing codecs in headphone listening tests using different sets of BRIRs. The results show that the influence of the BRIR set is sometimes an important factor.
NHK Presentation on 2016 Broadcast Model
Akio Ando, NHK, made a presentation on the NHK view on Japan on 2016 Broadcast Model on 22.2 channel broadcast. The timeline is:
-
October 2013: 22.2 channel core coder will be developed
-
July 2014: Possible MPEG-H technology (assuming that some MPEG-H Version 1 is available as DIS in July 2015).
The presenter highlighted what is missing in MPEG-4:
-
Clipping may occur when downmixing from 22.2 to 5.1. Pre-compression will not be best solution for 22.2 (non-downmixed) presentation.
-
MPEG-H flexible rendering with downmix can solve this problem
The presenter described the 2016 Broadcast Model
MPEG-4 AAC decoder
-
With 22.2 channel codepoint
MPEG-H 3D Audio Renderer/Downmix
-
Default downmix coefficients
-
Artistic downmix coefficients sent from encoder
-
In both cases, data clipping must be avoided while maintaining loudness level
The Chair asked what is 2015 experimental model? The presenter stated that the near-term work would focus on 22.2 channel encoder/decoder, and downmix may not be considered.
The Chair asked about DRC – is it required that MPEG produce standardize DRC technology on a certain timeline? The presenter stated that DRC is not essential in the 2016 broadcast model. If it is available it may be used.
Thomas Sporer, FhG-IDMT, noted that NHK asks for default and artistic downmix coefficients. The Chair noted that this could be re-expressed as:
-
MPEG-H renderer has a meta-data guided operation for “better” 22.2 -> 5.1 mode
There are two possibilities moving forward:
Model 1
-
MPEG-4 based
-
AAC core coder
-
MPEG-H technology for downmix, DRC and program level
Model 2
-
ARIB based
-
AAC like ARIB coder
-
ARIB technology for downmix, DRC and program level
Model 1 is possible if MPEG-H 3D Audio is DIS (i.e. frozen technology) in July 2015.
Thursday Discussion
The Chair noted that the IIS submission is based on a USAC core coder with extensions, and asked for comments on whether Audio experts want:
-
Information on the benefit of the extensions relative to MPEG-D USAC
-
To incorporate these extensions back into MPEG-D
After discussion it was the consensus of the Audio subgroup to:
-
Ask FhG-IIS to bring to the next MPEG meeting information on the benefit of the extensions relative to MPEG-D USAC.
-
That there would be no extensions to MPEG-D USAC based on the MPEG-H 3D Audio RM0-CO USAC configuration. However, when MPEG-H Version is DIS (or IS), a “Version 2” of USAC technology could be considered.
-
That 3D Audio can support multiple core coders.
-
That Core Experiments on 3D Audio core coders should be allowed, but that any possible changes should be restricted to the MPEG-H standard.
ETRI requested to conduct a CE using only the CfP decoded waveforms. It would compare the subjective quality of:
-
RM0-CO Test1.3 decoded waveforms (binarualization to headphones) to
-
RM0-CO Test1.1 at 1.2 Mb/s decoded waveforms as further processed by CE proponent technology for binarualization to headphones
Several additional companies requested to participate in the CE. The Audio subgroup decided that the following process would be used to manage access to the RM0-CO decoded waveforms in the AhG prior to the next meeting (at which time the full RM0 deliverables will be provided by the RM proponents, i.e. bitstreams, decoded waveforms and decoder source code and WD text).
Process for access to RM0 decoded waveforms:
-
CE proponents send request to the Audio Chair
-
Audio Chair checks that CE proponent company has signed all agreements required to gain access to the test item content. If not, then Chair sends proponent information package on access to the test item content and waits until all agreements are executed.
-
Audio Chair sends CP proponent password information for access.
MPEG Surround
Daniel Fischer, FhG-IIS, presented
m30212
|
Proposed corrections for MPEG Surround Reference Software
|
Michael Fischer, Andreas Hoelzer, Johannes Hilpert
|
The presentation describes a error in the software when operating in 7-2-5 mode. A fix was described.
It was the consensus of the Audio Subgroup to add this correction the existing Defect Report on MPEG Surround text.
Daniel Fischer, FhG-IIS, presented
m30214
|
Proposed corrections for MPEG SAOC Reference Software
|
Maria Luis Valero, Michael Fischer
|
The presentation describes a error in the SAOC software which contains low delay MPEG Surround. The fix aligns the software with the standard text.
It was the consensus of the Audio Subgroup to put this correction into a new Defect Report on SAOC Reference Software.
SAOC
Oliver Hellmuth, FhG-IIS, presented
m30273
|
Report on corrections for MPEG SAOC
|
Oliver Hellmuth, Harald Fuchs, Jouni Paulus, Leon Terentiv, Falko Ridderbusch, Adrian Murtaza
|
The contribution identifies errors in the text specification and proposed corrections.
It was the consensus of the Audio Subgroup to add these corrections to the existing Defect Report on SAOC and to issue a DCOR 2.
Oliver Hellmuth, FhG-IIS, presented
m30271
|
Information on Dialog Enhancement profile for SAOC
|
Oliver Hellmuth, Harald Fuchs, Jürgen Herre, Sascha Disch, Jouni Paulus, Leon Terentiv, Falko Ridderbusch, Adrian Murtaza
|
The contribution progresses the WD on SAOC-DE to
-
Add explicit statements on deactivated tools and modes
-
Definition of new terms
-
Permit asymmetric modification range control (MRC) values
-
Output interface for the gain values that modify “unprocessed channels”
Henney Oh, WILUS STRC, requested time to study the contribution and discuss it with the presenter. The topic will be brought up later in the week.
After additional discussion, it was the consensus of the Audio Subgroup to progress this work to PDAM. The Chair noted that the presenter should prepare a Request for Amendment document.
Share with your friends: |