International organisation for standardisation organisation internationale de normalisation


Plenary Discussions There were none. Record of AhG meetings



Download 8.47 Mb.
Page104/116
Date19.10.2016
Size8.47 Mb.
#4078
1   ...   100   101   102   103   104   105   106   107   ...   116

Plenary Discussions


There were none.
  1. Record of AhG meetings

    1. AhG on 3D Audio


The AHG on 3D Audio and Audio Maintenance met Saturday July 27 1300-1800 hrs and Sunday July 28 1000-1800 hrs at the MPEG meeting venue.

Saturday

The meeting began with remarks from the AhG Chair, Schuyler Quackenbush, on his view of the AhG meeting goals:



  • Review Listening Lab reports

  • Review the 3D Audio CfP subjective test data.

    • Are there any errors?

    • Is post-screening correct?

    • Is it statistically appropriate to pool listening labs?

  • Discuss what the subjective results tell us.

    • Do the results differentiate the systems under test?

  • Present the proponent technical descriptions

  • Compute the figure-of-merit from the data.

  • Does the figure-of-merit appropriately discriminate between systems under test?

Listening Lab Reports

Representatives from each of the listening labs presented



m30252

ETRI Listening test environments for MPEG-H 3D Audio

Taejin Lee, Jeongil Seo, Kyeongok Kang, Hochong Park

m30223

Listening Test Site Documentation 3DP IDMT

Christina Mittag, Thomas Sporer

m30323

Listening Test Documentation

Andreas Silzle, Hanne Stenzel

m30199

Test setup of NHK for MPEG-H 3D Audio

Takehiro Sugimoto, Kensuke Irie, Akio Ando

m30245

Report on Sony listening room for MPEG-H 3D Audio

Minoru Tsuji, Toru Chinen,

m30265

Samsung Listening Test Environments for MPEG-H 3D Audio

Sunmin Kim, Namsuk Lee, Sang Bae Chon, Hangil Moon

m30236

Huawei test environment for MPEG-H 3D Audio

Panji Setiawan, Du Zhengzhong, Peter Grosche

m30287

Orange listening test report for 3D Audio

Gregory Pallone, Paul Urvoas

m29985

Test 1.3-HOA Listening Lab – Conditions & Methodology

Pei Xiang, Deep Sen, Nils Peters, Martin Morrell

m30270

Test setup for MPEG-H 3D Audio CfP evaluation

Johannes Boehm, Florian Keiler, Sven Kordon, Alexander Krueger, Oliver Wuebbolt

ETRI conducted Test1-3 (headphone listening) in small listening room using Stax Electrostatic headphones.

FhG-IIS conducted Test1-3 in a small listening room using Stax Electrostatic headphones.

Orange conducted Test1-3 in a small studio listening room using Sennheiser HD-600 headphones.

Qualcomm conducted Test1-3 in a sound booth (NR 20) using Beyerdynamic DT880 Pro headphones.

Samsung conducted Test1-3 in a typical meeting room (37 dBA SPL noise level) using Sennheiser HD-650 headphones.

Technicolor conducted Test1-3 in a sound booth using Stax Electrostatic headphones.

Andreas Silzle, FhG-IIS, kindly volunteered to collect the listening room properties for all of the Listening Labs into a single table (or spreadsheet) for inclusion in the output document report on the CfP subjective tests.

Schuyler Quackenbush, ARL, presented



m30322

AHG on 3D Audio and Audio Maintenance

Schuyler Quackenbush

The presenter reviewed work done by the AhG prior to today’s meeting.

Post-Screening

The AhG agreed to these post-screening rules.

The following post-screening rule was used for tests Test1-1 (all bitrates), Test1-3 and Test1-4 (all speaker positions):

All data associated with subjects satisfying one or both of the following criteria are removed from the corresponding test.


  • The subject’s score for the hidden reference stimuli was below 90.

  • The subject’s scores for the 3.5 kHz anchor is greater than the hidden reference score.

The following post-screening rule was used for tests Test1-2:

All data associated with subjects satisfying one or both of the following criteria are removed from the corresponding test.



  • The subject’s scores for the 3.5 kHz anchor is greater than the hidden reference score.

Thomas Sporer, FhG-IDMT, noted that, with the post-screening rules applied above, there are two additional listeners in the CO set that fail to score at least one system at the value 100 (they scored 99). Gregory Pallone, Orange, noted that there is one listener in this category in the HOA signal set (they scored 98).

The Chair summarized the two alternatives: on one hand, leave these scores in, as the listeners clearly intended to score at 100, or on the other hand, remove them since they did not score at 100.



The AhG agreed to allow the “nearly 100” scores and to keep these listeners.

The AhG agreed on Excel spreadsheet data and postscreening.

The Audio Chair presented



m30321

Report on 3D Audio Call for Proposals Subjective Tests

Schuyler Quackenbush

The Chair made the following available to the AhG members.

  • Excel spreadsheets containing CO and HOA subjective test data and data analysis

  • Subjective test data from the Excel spreadsheets as tab-separated ASCII data set

Sunday

Proponent Technology

Jeongil Seo, ETRI, presented



m30251

Description of the ETRI proposal for the MPEG-H 3D Audio

Jeongil Seo, Taejin Lee, Kyeongok Kang, Hochong Park

The technology proposal is a collaboration between ETRI and Kwangwoon University. The architecture is based on a modified MPEG Surround encoder/decoder and a modified USAC encoder/decoder. MPEG Surround was modified to be able to process the required MPEG-H loudspeaker layouts. USAC was modified to employ only the LPD mode, so that no signal classifier was needed, and also incorporated longer window options.

Binaural rendering uses block-wise fast frequency-domain convolution. In addition, BRIR were shortened to only 2048 taps.

Rendering to 5.1 or 8.1 speakers uses a low complexity static matrix downmix.

For rendering to arbitrary loudspeaker configurations, the Virtual Loudspeaker Mode was used, in which sound field principles are used to convert encoder speaker configuration signals to signals for an arbitrary decoder loudspeaker configuration. This is realized as a filtering process.

Binauralization complexity is 91.68, but this is subject to cross-check.

The presenter noted that objects are treated either as another “channel” signal or objects are rendered to channels and then processed.

Johannes Hilpert, FhG-IIS, presented

m30324

Description of the Fraunhofer IIS Submission for the 3D-Audio CfP

Johannes Hilpert, Jan Plogsties, Achim Kuntz

The proposal uses MPEG-D USAC as a core coder, MPEG-D SAOC for object coding and new technology for rendering. It has several options for processing objects.

The encoder has a pre-renderer in which objects can be



  • Pre-rendered and added to existing channel signals

  • Coded as object signals with meta-data, where object meta-data is compressed for transmission. The meta-data encoder was joint work with FhG-IDMT.

  • Coded using SAOC

The resulting channel signals are coded using a USAC core coder.

Existing tools in the USAC coder were modified



  • Unified quad-channel coding mode

  • Noise filling for higher-frequencies (so that signal bandwidth could be extended to 18 kHz).

  • Transform splitting (new block length of 512, which fits within a 1024 block length)

SAOC

Object renderer was VBAP

Binaural renderer used fast frequency-domain convolution in the QMF domain (i.e. the SBR filterbank). The BRIR was separated into direct signal/early reflections part and reverberant part and processed separately. The reverberant part was processed a stereo signal that was downmixed from the 22.2 channnel signal.

Binaural complexity Cmod is 76.29

Thomas Sporer, IDMT, presented



m30221

Description of the Fraunhofer IDMT Submission for the 3D-Audio CfP

Thomas Sporer, Christina Mittag, Andreas Franck, Christoph Sladeczek, Albert Zhykhar

This system used the same core coder as was used in the IIS proposal. The decoder block diagram is the same as with IIS, but with a unique IDMT renderer. The renderer can render objects to target output speakers and also render virtual loudspeakers (e.g. 22.2) to target loudspeakers (e.g. 5.1).

The renderer employs aspects of wave-field synthesis to map from virtual loudspeakers to actual output loudspeakers.

The binaural renderer used a fully decoded 22.2 channel signal (or other original signal channel configuration) and applied the BRIR using partitioned frequency-domain convolution with a block size of 4800.

Binaural complexity Cmod is 72.85.

Toru Chinen, Sony, presented

m30231

Technical description of Sony proposal for MPEG-H 3D Audio

Toru Chinen, Runyu Shi, Yuki Yamamoto, Mitsuyuki Hatanaka, Masayuki Nishiguchi

The Sony technology uses an MPEG-4 HE-AAC core coder and a VBAP renderer. The compressed representation consists of an AAC bistream and a “*.s3d” bitstream (which is static information, as in a “header” information).

An AAC data stream element (DSE) contains compressed object meta-data information. Objects are coded as single channel elements (SCE).

The renderer first “quantizes” possible object positions to be on the arcs of a VBAP mesh, then does VBAP rendering, and finally does gain interpolation for the rendered object signals.

The binaural renderer used a fully decoded 22.2 channel signal (or other original signal channel configuration) and applied the BRIR using partitioned frequency-domain convolution with a block size of 4800.

Binaural complexity Cmod is 66.67

Gregory Pallone, Orange, and Oliver Weubbolt, Technicolor, gave a single presentation for the following two documents



m30291

Technical Description of the Orange proposal for MPEG-H 3D Audio

Gregory Pallone, Marc Emerit

m30269

Technical Description of the Technicolor Submission for the CfP for 3D Audio

Johannes Boehm, Peter Jax, Florian Keiler, Sven Kordon, Alexander Krueger, Oliver Wuebbolt

The Technicolor and Orange proposals share the same architecture. The encoder has a spatial encoder block and a multi-channel perceptual channel coder that uses a normative HE-AAC core codec.

The spatial encoder extracts the “pre-dominant sounds” from the HOA signals and codes them as a “M” mono channels using the core coder. The HOA signals are spatially reduced in resolution, optionally with decorrelation, and coded as “N-M” mono channels using the core coder. The number “M” is signal-dependent and is dynamic within a signal. The number “N” depends on bitrate and is not signal adaptive. In this respect, the “pre-dominant sounds” take channels from the pool of N that could otherwise be allocated for the HOA signals.

The rendering to output loudspeakers is simply a matrix multiply. The rendering matrix is unique for each HOA order and output loudspeaker configuration, and is computed only once as an initialization step.

The binaural render separates the BRIR into a direct and early reflections part and a diffuse part. A frequency-domain fast convolution is applied to the direct portion. The diffuse part is rendered as a mono signal that is added to the left and right output signals. The total system latency for the binaural rendering mode is approximately 900 ms, but this is below the required limit of 1 sec.

Binaural complexity Cmod is 95.8.

The differences between the Technicolor and Orange proposals were



  • Different “pre-dominant” sound coding strategies (e.g. instantaneous “M”) in the encoder

  • Different rendering matrix values in the decoder

Nils Peters, Qualcomm, presented

m29986

Description of Qualcomm’s HoA coding technology

Nils Peters, Deep Sen, Pei Xiang, Martin Morrell

The encoder has a front-end analysis that determines, e.g. if the signal is a true microphone recording or a synthetic signal set. The signal is decomposed into distinct/independent components and “background” HOA signal set. This HOA signal set may additionally be reduced in order. All signals are coded using mono AAC core encoder.

The decoder used an AAC decoder to obtain the mono signals. The distinct/independent signal will typically be upmixed to obtain the desired HOA order, and then this is mixed with the decoded “background” HOA signal.

The binaural render separates the BRIR into a direct and early reflections part and a diffuse part, in a manner very similar to the Orange/Technicolor proposal.

Binaural complexity Cmod is 94.4 (which is revised from the contribution).



BRIR complexity discussion

Johannes Boehm, Technicolor, presented the following block diagram for fast frequency-domain convolution.



Gregory Pallone, Orange, presented his analysis of BRIR convolution complexity. This accounted for all steps in the block-wise fast convolution

Lbrir = 48000

Nblocks = 10

Ninputs = 22

Noutputs = 2

Nsamp = Lbrir/Nblocks

Nfft = 2*Nsamp


forward and inverse transforms

A= (Ninputs+Noutputs)* 3*Nfft*log2(Nfft)

Real operations to implement complex multiplications to implement frequency-domain convolution

B= 4*(Ninputs*Noutputs)*Nblocks*Nfft

Real operations to implement complex additions to add block-wise components

C= 4*(Ninputs*Noutputs)*(Nblocks-1)(Nfft)

where the “3” is a ½ weighting of 6 ops/butterfly due to the fact that you can process two inputs per complex FFT.

The complexity can be expressed per sample as:

(A+B+C)/Nsamp

Recommendations and review of AhG Report

The AhG members reviewed the AhG and agreed on the report’s recommendations made to the Audio subgroup.



  1. Download 8.47 Mb.

    Share with your friends:
1   ...   100   101   102   103   104   105   106   107   ...   116




The database is protected by copyright ©ininet.org 2024
send message

    Main page