The estimation of one-way listening speech quality is based on the evaluation of real speech samples. For gateway implementations the two algorithms PESQ and TOSQA2001 are used. Due to the fact that PESQ has not been validated for measurements at the acoustical interface, only TOSQA2001 is used for IP phones. The TOSQA2001 analyses have also been applied during the previous SQTEs for IP phones.
The same precautions and limitations as listed in chapter 4, page 22 for listening speech quality analysis and result interpretation for gateways need to be considered for IP phones:
-
TOSQA2001 is limited to one-way transmission scenarios for IP phones. Conversational aspects like double talk, echo disturbances or background noise are not considered.
-
The TMOS results do not cover the overall quality of an IP phone under test.
-
The algorithm uses real speech as stimulus. The results therefore directly depend on the choice of speech sample. Exchanging the speech sample, e.g. from German speech to English speech, will lead to a different analysis result. Even the use of different sentences of the same language will lead to –slightly– different results. In continuation of the previous SQTE’s, German speech samples were used.
-
TOSQA2001 can not provide 100% accuracy. The accuracy can never exceed the accuracy of a subjective test.
-
TMOS differences of 0.1 or 0.2 are not regarded as significant in the analysis. Differences of 0.2 show a tendency and differences of 0.3 are seen as relevant.
Packet loss and jitter were monitored during the tests in order to guarantee a high accuracy for the chosen test conditions. Packet monitoring and jitter analysis is carried out over the complete 32 s speech sequence (4 sentence pairs each, see also chapter 5.2).
The IP impairments are introduced by NISTnet. In network jitter tests NISTnet introduces an additional delay. During the TMOS analyses the measured delay values -representing the end-to-end delay- are corrected by the NISTnet delay.
3.8G.711
The IP phones were tested with G.711 speech coder using a 20 ms packet length as agreed in discussions between HEAD acoustics and the manufacturers prior to the event. Table 6.1 shows the measured results in handset and hands-free mode in sending and receiving direction.
The graphical representation of these results is shown in figure 6.1. The three different IP phones are represented by the different colors. The dashed lines indicate the maximum and minimum scores, the average is given by the solid black line. The results can be summarized as follows:
-
The quality differences between the three devices are very high.
-
Listening speech quality in sending direction differs significantly by 0.6 TMOS.
-
Two phones provide a similar performance in receiving direction for the different application forces of 2 N, 8 N and 13 N. The third phone (blue curve) provides a significantly better performance. The TMOS differences can be determined to approximately 1.5 points.
-
The hands-free mode was only tested for two devices. The quality differs significantly for both phones depending on the transmission direction.
-
The delay is nearly constant for the three devices under test.
G.711 – TMOS (TOSQA2001), VAD off, 20 ms, 8 N (except where otherwise stated)
|
|
Delay [ms] / PL / jitter [ms]
|
TMOS
|
Av. one way delay (ms)
|
min
|
av.
|
max
|
min
|
av.
|
max
|
Sending
|
0 / 0% / 0
|
3.4
|
3.8
|
4.0
|
76.1
|
88.2
|
107.6
|
2 N
|
100 / 0% / 0
|
2.2
|
2.7
|
3.7
|
66.5
|
72.3
|
83.7
|
8 N
|
100 / 0% / 0
|
2.4
|
3.0
|
4.0
|
66.3
|
72.4
|
83.4
|
13 N
|
100 / 0% / 0
|
2.7
|
3.2
|
4.0
|
68.5
|
74.7
|
86.5
|
HFT, sending
|
0 / 0% / 0
|
2.4
|
3.0
|
3.7
|
90.7
|
92.3
|
93.9
|
HFT, receiving
|
0 / 0% / 0
|
1.8
|
2.3
|
2.8
|
88.2
|
89.7
|
91.1
|
|
G.711 ref.
|
---
|
4.2
|
---
|
---
|
---
|
---
|
Sending
|
ISDN Ref.
|
---
|
4.2
|
---
|
---
|
3.9
|
---
|
2 N
|
ISDN Ref.
|
---
|
3.0
|
---
|
---
|
5.4
|
---
|
8 N
|
ISDN Ref.
|
---
|
3.7
|
---
|
---
|
4.5
|
---
|
13 N
|
ISDN Ref.
|
---
|
4.0
|
---
|
---
|
4.6
|
---
|
Table 6.1: TMOS (TOSQA2001) results, G.711 speech coder, 20 ms packet length (PL)
Fig. 6.1: TMOS results calculated using TOSQA2001; G.711, 20 ms PL
Table 6.2 summarizes the results measured under the different IP network conditions. The scores are again graphically compared in figure 6.2. These tests were carried out in receiving direction. The quality scores are therefore also influenced by the receiving quality as analyzed already in table 6.1 and figure 6.1 respectively:
-
Listening speech quality slightly degrades for the three IP phones under the influence of packet loss.
-
Jitter is completely covered by the implementation represented by the blue curve.
-
The delay is in a comparable range for the three implementations without IP jitter. The different jitter buffer lengths lead to delay differences of approximately 40 ms.
G.711 – TMOS (TOSQA2001), VAD off, 20 ms, 8 N
|
|
Delay [ms] / PL / jitter [ms]
|
TMOS
|
Av. one way delay (ms)
|
min
|
av.
|
max
|
min
|
av.
|
Max
|
1c – VAD off
|
100 / 0% / 0
|
2.4
|
3.0
|
4.0
|
66.3
|
73.3
|
86.0
|
3c
|
100 / 1% / 0
|
2.4
|
2.9
|
4.0
|
53.7
|
58.3
|
60.9
|
5c
|
100 / 3% / 0
|
2.3
|
2.8
|
3.7
|
51.3
|
62.1
|
75.6
|
4c
|
100 / 0% / 20
|
2.1
|
2.8
|
4.0
|
84.8
|
110.5
|
125.2
|
2c
|
100 / 1% / 20
|
2.1
|
2.6
|
3.7
|
86.3
|
99.5
|
110.4
|
Table 6.2: TMOS (TOSQA2001) results under the influence of IP network impairments, G.711 speech coder, 20 ms packet length (PL)
Fig. 6.2: TMOS results under the influence of IP network impairments, G.711, 20 ms PL
Share with your friends: |