Enter your e-mail address below to receive our newsletter.
The VoIP User Experience
November 22, 2005
The good news is that service providers are beginning to embrace and understand the scope of the issues that can have a negative impact on the quality of IP-based services as perceived by the end user. They also are coming to the realization that what consumers expect from a telecommunications service in terms of quality implies more than just voice.
The emergence of equipment that incorporates voice signal quality monitoring is a step in the right direction towards providing operators with the information they
need to better understand the consumer experience with VoIP. In fact, systems now are coming to market that are able to calculate the Mean Opinion Score (MOS) for each and every VoIP call on the fly. This paper explores several additional areas that are often overlooked when considering the consumer VoIP experience. More than VoIP
As communications providers migrate from legacy circuit-switched networks to IP-based networks, consumers will expect their VoIP phones and "phone lines” to enable them to do, as a minimum, the same things they do today with their telephony lines and devices. Of course they will use their VoIP connections to talk, but they also will use them to send and receive faxes, to connect to home security systems that use embedded modems and much more. Even their TiVO may need a telephone line connection for program guide updates. In short, end-users quite justifiably will expect a VoIP voice service to be a full-featured telecommunications service capable of handling any service and any device they have plugged in to an RJ-11 jack in the past. Equipment manufacturers and service providers alike need to anticipate these requirements when creating their products.
Today, much is being discussed and written about VoIP quality and the concept of Quality of Service (QoS) in VoIP services, networks and devices. Typically, these discussions are centered on the network performance haracteristics of bandwidth allocation and reservation and the priority queuing of voice packets. While these characteristics are important to IP service quality, they are by no means the only factors that impact IP performance.
The VoIP User Experience October 2005 Network-based QoS cannot overcome a poor implementation of a system to manage
network impairments like jitter, packet loss and delay/latency in the end equipment. Some impairments may be overcome by using networking techniques, and some are best addressed within the voice processing domain, but often a solution that makes sense to the voice engineer may exacerbate the problem due to its effect on the network. (The reverse is also true.) For example, a common solution to overcoming lost packets is to retransmit redundant packets. But, if this is all that is done, the resulting network loading may actually have a negative impact on the end user’s experience due to increasing congestion and delay. The challenge is to compensate for network impairments with a balance of voice and
network processing for maximum effectiveness. Overcoming these impairments is the “black art” of VoIP. The best systems are based on products created by designers who
have learned through years of experience what it takes to reliably deliver voice over a hostile packet network. For this reason, equipment manufacturers and service providers
need to understand the pedigree of their core technology providers.
Today, service providers and enterprise managers have very few tools to help them determine the causes of poor IP-based service quality. This ultimately leads to sub-par
end-user experiences with VoIP, because troubles often reside a layer deeper than can be revealed by taking measurements that reveal the level of jitter, packet loss or delay. It will be many years, if ever, before all calls and communications originate on IP devices and travel exclusively across pure IP networks. In the meantime, issues impacting the end user experience will occur at points in the network where the legacy network and the IP network interconnect. This includes media gateways, line cards, and voice gateway devices used in home networks.
It is important to remember that the legacy network of devices that facilitate telephony calls will be with us for many years to come. Voice and tone activated PBXs and virtual operator systems are everywhere, and they are unlikely to disappear any time soon. Our banks and credit card companies, our voice mail systems and nearly all customer service systems use the familiar “press 1 for English, press 2 for Spanish” Dual Tone Multi-Frequency (DTMF) signaling for navigation, authentication, etc.
Consumers have become accustomed to communicating with automated attendants and voice mail interfaces using touch tones. Legacy telephones and cellular phones generate
standard tones that are transmitted across the network and are received by a wide variety of tone-driven interfaces.
VoIP phones mimic legacy phones by sending IP-based “tones” to the media gateway. At the media gateway, the tones are detected and repackaged in real-time, and then transmitted to their destination. Many things can go astray in this process. Working together, the VoIP CPE device and the media gateway must detect and recreate tones, including dialed digits, fax detection, modem detection and call progress tones that have the same amplitude, frequency and timing as those generated by the
legacy telephony equipment to which they are connecting. For example, if tones sent by an IP-phone to access voice mail are not accurately detected, translated, transmitted or received, access will be denied and the end user will not be able to check their voice mail. A good VoIP implementation will exceed ITU Q.24 tone detection standards. Today, end users, service providers or enterprise managers would be hard pressed to pinpoint which part of the packet loss was causing a problem in any particular instance of trouble. This is because this type of quality issue cannot be diagnosed and pinpointed
to the source by measuring the elements that contribute to an MOS. Rather, this type of trouble relates to in-band DTMF transmission using the G711 encoding scheme (G711 is used to encode voice), or it relates to DTMF relay and
transmission of tones in packages or separately to the receiving end of a call. Or, separately these problems could be exacerbated by, or be simply due to, poor user
equipment such as a cordless handset that is not transmitting the tones properly.
Furthermore, there are so many different tone types generated and tone standards worldwide, that if a service provider chooses to offer global services, the VoIP
equipment must be robust enough in design to operate in such diverse environments. There are many opportunities for call failures that relate to DTMF-related signaling
issues. Ultimately, how well IP-based solutions are implemented and how robustly they are executed with this in mind will determine the end user experience. When switching to IP phones and devices, the end user experience should be equal to, or better than, the end
user experience with legacy telephones and devices.
IP Network Gateway Gateway PSTN DTMF tones must be accurately detected and reproduced during calls Voice Mail
Auto Attendant IVR System On-line Banking More DTMF Possibilities.
Another important consideration that IP device designers and carriers must take into account is the fact that media and telephony gateways used in different parts of the
world have different tone parameters. Making matters even more complex, the DTMF issue can extend outside the network. There have actually been cases in which VoIP
devices have mistaken end users' voices for actual DTMF tones. When this happens, the equipment abruptly halts the voice transmission and transmits an erroneous tone.
In addition, how well the wide variety of IP network interfaces adhere to existing DTMF standards will factor in to a network element or device’s ability to deliver a quality experience to end users. The challenge for IP equipment makers is to design robust systems that can reliably reproduce true tones; quickly, and with 100% accuracy, detect a true tone; and meet 0% “false detects,” meaning that the device will never mistakenly characterize
an end user's voice as a tone. The challenge for service providers is being able to determine
if, and when, DTMF-related issues are to blame for poor user experience. Fortunately, there are metrics in VoIP gateway and end devices that can be collected and sorted through to help troubleshoot and resolve DTMF-related issues. Equipment is emerging that will take full advantage of these metrics. Eventually, all communications
will be IP-based, but this may take several decades. When this finally occurs, user interfaces will be voice- and message-based and the need for tone-based signaling will appear. Until then, accurate detection and replication of DTMF-based signals will be a key parameter affecting the customer experience.
Echo cancellers are used throughout the legacy public switched telephone network (PSTN). Line echo occurs any time a four-wire to two-wire interface is encountered in the network. In the PSTN, where interoffice trunks meet local loops are four-wire to two-wire interface junctions called hybrids. When signals travel from the four-wire network to the two-wire network, their energy is reflected back onto the four-wire network. In legacy TDM networks, the reflections typically occur quickly. PSTN-based echo
cancellation equipment is calibrated for these quick echoes. Packet networks, with longer delays, produce reflections beyond the time threshold within which existing echo cancellers’ work today. Therefore, VoIP end equipment has to be equipped with embedded echo cancellers capable of handling long echo tails in order to eradicate echo caused by packet-based traffic flows.
The process of locating echo in speech is called convergence. Poor echo canceller designs can take a “long time” to converge. When echo is heard early in a VoIP call, it is because the echo cancellers have a hard time pinpointing the echo and nullifying it. This is because they have not converged on the echo. In normal circumstances, once the echo canceller does converge on the echo, it performs adequately for the duration of the
call. There are situations where convergence is lost during a call and the echo canceller must restart.
It is important to test VoIP equipment’s echo cancellation ability in challenging conditions to determine if its designers have made the right decisions regarding cost vs. quality. With the ubiquity of mobile phones, it is even more likely that one party to a call will be in a noisy environment such as a car or airport. High background noise, particularly when the background noise changes suddenly, is especially challenging to echo cancellation.
While not common in polite face-to-face company, double talk, which occurs when both speakers talk at the same time, occurs more frequently in phone conversations due to
network latency. Like background noise, double talk challenges echo cancellers. Line echo induced by hybrids is not a problem on pure VoIP calls in which both parties are using IP Phones. This is because the IP phones are connected directly to the packet network from the start and packets never touch any legacy telephony interfaces.
Service providers need to be aware that if they opt for the cheapest VoIP phones or media gateways, echo cancellation is the one thing they are most likely to sacrifice in the process. This cost/performance tradeoff could end up being more costly, especially when
one considers that echo is the problem most often cited by end users as the cause of a poor quality experience.The another form of echo, acoustic echo, is of concern. Speakerphones experience acoustic echo from the reflections of the end users’ voices as their voices bounce off the walls, desks, windows etc. in the rooms in which the speakerphones are used. The design of the speakerphone itself may also induce acoustic echo, which is caused by the placement of the speaker and microphone. A quality echo cancellation implementation can, to a certain extent, be used to “tune” a phone’s acoustic attributes. Video IP Phones will likely feature speakerphone functionality, so acoustic echo cancellation will be a critical element influencing end users' experiences with them.
Consumers are likely to pay a premium for video phones and they also are likely to be IP Network Gateway Gateway
PSTN Echo reflected by the hybrid located in the residential voice Echo reflected by the hybrid located in the provider’s premiere customers. Poor VoIP implementation in a video phone, especially in the tricky area of acoustic echo cancellation, could result in the consumer blaming the service provider for quality issues caused by poor acoustic echo cancellation on their Video IP Phone. All echo cancellers are not created equal, and adherence to the G.168 standard for cancellation is not a “seal of approval” for Echo cancellation. Simply put, it is not easy to do good echo cancellation. It takes a lot of DSP processing power and memory to do it well. Quality echo cancellation will have a marginal cost impact associated with the additional MIPS and memory resources when compared with bare bones solutions. This cost can be minimized when the developers have control over all system resources. Thus, an integrated echo canceller that was conceived as part of the whole voice subsystem rather than a third party add-on is likely to be more resource efficient. As mentioned earlier, voice is not the only application for which telephony lines are used. Modem based devices are still quite common, with FAX being the most common of these. There are a tremendous number of issues to contend with when sending a fax, but the first is simply detecting that the call is a FAX call, and not a voice call or some other kind of modem call. False detection of FAX tones can result in frustrating failure
of the call. Once detected, FAX tones can be transmitted using G711 encoding, but the scheme is not as robust as the T.38 fax relay, which breaks a fax call down and sends
the FAX data across in a packet format. The result is greatly improved reliability and call completion rates.
FAX is designed to operate between two machines directly connected via the PSTN. This is a nearly optimal connection from the perspective of delay, and therefore FAX
machines are very intolerant of “unusual” delays. Connecting FAX machines via a packet network virtually assures that the delays will be outside the fax machines’ operating parameters. FAX transmissions implement a fax protocol between two machines involved in the transmission of a FAX. This protocol can be “spoofed” in order to compensate for the delays. If packets are lost during negotiation between the machines, or even in the middle of the fax call, problems will occur. There are ways to mitigate many of the issues described above, but service providers need to do more than adjust their networks for jitter, packet loss and delay. As legacy networks and IP networks unite, knowledge is power for service providers.
The better the tools, the better the chance of resolving issues that many end users will have difficulty describing simply because they will be new to them. Having access to calls Morein real-time in order to resolve problems, or being able to measure a call while speaking with the consumer, or the ability to (with consumer approval) playback a call, will help service providers troubleshoot problems. At a higher level, it will be useful to orrelate the similar complaints, or troubles that take place at the same time or on the same day. In addition, correlating troubles on calls running through the same pieces or types of equipment will help service providers establish trends and identify problem areas in the very-distributed and multi-network environment in which IP-based calls will be placed. Just as there were systems and tools created to manage TDM networks effectively,
tools are emerging that will examine the full spectrum of characteristics that can have a negative impact on the voice QoS. They will do the same for the telephony applications that originate on packet networks and in places where packet networks merge with TDM networks. Until then, tools that examine the end points and places of demarcation between packet networks and telephony networks will be of great value to service providers. VoIP phones and IP PBXs at the enterprise can provide metrics and statistics that help service providers and enterprise managers better understand the true causes of end users’ complaints. Service providers that are spending more than $3000 to acquire new customers will see the wisdom of investing in technology that will enable them to provide those customers with the high-quality experience they have come to expect in the telephony environment, in the IP environment. Based on their strategic plans, even ILECs, like next generation voice service providers, are moving towards complete IP-based networks as a foundation for their IP communications. A large reason for this is that service providers have been convinced that an IP-based infrastructure will reduce their overall operation expenditures, offering a more competitive business model.
However, service providers must remember to consider the operational costs of managing subscriber issues. The cost of poor management and the resulting subscriber churn can offset the OPEX benefits of a common infrastructure. Rather, if consumer satisfaction is met early on in new services by employing accurate and proactive quality
management, the OPEX model is improved further.
And, as they roll out IPTV, the challenges of providing quality IP video will be even more critical to delivering a quality experience. This is because the eye is even less forgiving than the ear. Fortunately, techniques and technology used to troubleshoot IP-based telephony
QoS issues will be ramped up to meet the coming IPTV QoS challenges.