5th etsi speech Quality Test Event Anonymous Test Report


IP Gateway - Listening Speech Quality



Download 0.63 Mb.
Page4/13
Date06.08.2017
Size0.63 Mb.
#27239
1   2   3   4   5   6   7   8   9   ...   13

IP Gateway - Listening Speech Quality


The estimation of one-way speech quality is based on the evaluation of real speech samples by PESQ and TOSQA2001. These analyses have been applied during the previous SQTEs. In order to facilitate the interpretation of these results and to avoid misinterpretation in comparing results of different implementations some additional information is provided:

  • Both methods TOSQA2001 and PESQ simulate a subjective listening test acc. to ITU-T P.800 [27]. A listening test in this manner is limited to one-way transmission scenarios. Conversational aspects like double talk, echo disturbances or background noise are not considered.

  • Consequently these TMOS or MOS-LQO results [28], [11] do not cover the overall quality of a device under test.

  • TOSQA2001 and PESQ are two independent algorithms and consequently lead to different results. TMOS and MOS-LQO scores can not be directly compared. A valid quality comparison of different devices is only guaranteed by comparing the TMOS scores to TMOS scores or comparing MOS-LQO to MOS-LQO, but not by cross-comparison of TMOS and MOS-LQO.

  • Restrictions in using these methods are known (see e.g. table 1, 2 and 3, page 2 ff. in ITU-T P.862 [10]). These factors should be carefully considered when using these methods and interpreting the results.

  • The algorithms use real speech as stimulus. The results therefore directly depend on the choice of speech sample. Exchanging the speech sample, e.g. from German speech to English speech, will lead to a different analysis result. Even the use of different sentences of the same language will lead to –slightly– different results. In continuation of the previous SQTE’s, German speech samples were used.

  • Neither TOSQA2001 nor PESQ can provide 100% accuracy. These methods have been tuned and validated on the basis of subjective tests results. Consequently the accuracy of these algorithms can never exceed the accuracy of a subjective test.

  • The accuracy of these methods increases if they are used for one test scenario, like the test of different gateways all measured with the same speech codec and only variation of single parameters, like packet loss. Nevertheless TMOS or MOS-LQO differences of 0.1 or 0.2 are not regarded as significant in this report. Differences of 0.2 show a tendency and differences of 0.3 are seen as relevant.

  • An ambition of all ETSI SQTEs is, to provide data and feedback for standardization bodies. Both analysis methods TOSQA2001 and PESQ are therefore used in parallel.

Packet loss and jitter were monitored during the tests in order to guarantee a high accuracy for the chosen test conditions. Packet monitoring and jitter analysis is carried out over the complete 32 s speech sequence (4 sentence pairs each, see also chapter 5.2).

The IP impairments are introduced by the simulation tool NISTnet. In network jitter tests NISTnet introduces an additional delay. During the following MOS-LQO and TMOS analyses the measured delay values -representing the end-to-end delay- are corrected by the NISTnet delay.


3.3G.711 Speech Coder


The G.711 speech codec was tested with a 20 ms packet length. This setting was agreed in discussions between HEAD acoustics and manufacturers prior to the event. Table 4.1 shows the test conditions combining fixed delay in the IP network, packet loss and jitter together with the measured minimum, average and maximum TMOS respectively MOS-LQO results of 7 different implementations. The one-way delay indicated in table 4.1 is also calculated by the two algorithms TOSQA2001 and PESQ.

G.711 – TMOS (TOSQA2001) and MOS-LQO (P.862.1), 20 ms, VAD off (except condition 0a), PLC on




Delay [ms] / PL / jitter [ms]

TMOS

MOS-LQO

One way delay (ms)

Min

av.

max

min

av.

max

Min

av.

max

0a VAD

0 / 0% / 0

4.1

4.1

4.2

4.3

4.4

4.4

40.9

66.7

98.4

1a

0 / 0% / 0

4.1

4.2

4.2

4.3

4.4

4.5

40.4

67.3

106.8

2a

0 / 1% / 0

3.9

4.0

4.1

4.0

4.1

4.2

40.4

68.3

111.7

3a

0 / 2% / 0

3.5

3.8

4.0

3.8

3.9

4.0

40.4

70.3

110.7

4a

0 / 3% / 0

3.3

3.7

3.9

3.4

3.8

3.9

40.4

69.7

110.3

5a

0 / 5% / 0

3.2

3.4

3.7

3.3

3.5

3.7

40.4

68.8

110.8

6a

50 / 1% / 20

2.5

3.6

4.0

2.3

3.8

4.3

80.4

110.6

130.4




G.711 ref.

---

4.2

---

---

4.4

---

---

---

---




Ref. connect.

---

4.2

---

---

4.5

---

---

2.5

---

Table 4.1: TMOS (TOSQA2001) and MOS-LQO results calculated using PESQ (according to ITU-T Recommendation P.800.1 and P.862.1), G.711 speech coder, 20 ms packet length (PL)

A graphical representation is shown in figure 4.1 (TMOS) and 4.2 (MOS-LQO). The results for each implementation tested during the 5th SQTE are indicated by a different color. Two implementations lead to identical scores and can not be distinguished in this representation.





Fig. 4.1: TMOS results calculated using TOSQA2001; G.711, 20 ms PL



Fig. 4.2: MOS-LQO results calculated using PESQ (P.862.1); G.711, 20 ms PL

The results can be summarized as follows:



  • The TMOS respectively MOS-LQO differences between the tested implementations under clean network conditions represent the typical accuracy of both analysis methods and are not significant. The delay varies between 40 ms and 100 ms which is also influenced by the individual test setups.

  • VAD does not degrade the quality scores. The VAD functionality slightly increases the delay for one implementation.

  • Quality differences in the range of 0.5 TMOS respectively MOS-LQO for the 5% packet loss rate indicate the performance of the different packet loss concealment strategies. The delay is typically not influenced by the occurrence of packet loss.

  • The jitter test condition leads to extreme quality differences. Two of the six tested implementations do not cover the network jitter – even the delay increases for both implementations both not as much as for most of the other implementations. The device represented by the green curve in both figures shows a low delay in combination with high TMOS respectively MOS-LQO results.

The average TMOS and MOS-LQO results are compared in figure 4.3.

Both analyses methods provide similar results, the TOSQA2001 assessment leads to slightly lower quality scores compared to the more optimistic MOS-LQO results. This again indicates that the quality scores derived from both methods can not be directly compared.








Fig. 4.3: TMOS vs. MOS-LQO (G.711, average)

Download 0.63 Mb.

Share with your friends:
1   2   3   4   5   6   7   8   9   ...   13




The database is protected by copyright ©ininet.org 2024
send message

    Main page