Our recent tests showed that latest enterprise grade 802.11 AC access points are capable of delivering cumulative throughput of the order of up to 300 Mbps, with 100 concurrent users in real life situations. So what would you expect if we do a concurrent VoIP call test which needs a cumulative throughput of just 150 Mbps. This should be a cake walk for the access point, right??

The reality may be far away from expectation!!! Throughput may not be a good indicator of access point performance when the application is VoIP.

Let’s see what’s so special with VoIP!

Voice over Internet Protocol (VoIP) is a technology that allows you to make voice/telephone calls using a broadband Internet connection instead of a regular phone line, which has been the method used for the last hundred years or so.

Early VoIP deployments started in 2004. VoIP picked up for the primary reason that it is low cost to the end user as it rides on existing multi-purpose IP networks. Businesses are migrating from traditional telephone systems to VoIP systems to reduce their monthly phone costs. In 2008, 80% of all new (PBX) lines installed internationally were VoIP!

At a high level, what is needed to implement VoIP protocol is similar to traditional telephony. It involves signaling, channel setup, digitization of the analog voice signals, and encoding. They transport audio streams using special media delivery protocols that encode audio with audio codecs. Various codecs exist that optimize the media stream based on application requirements and network bandwidth

VoIP as a protocol has multiple implementations. In the early 2000s, while I was working on implementing VoIP over 3G Phone, H.323 standard was used. Then technology moved on and players like Skype implemented their own methods. Among the standard protocols today, SIP is the most popular method of implementing VoIP.

Session Initiation Protocol (SIP) is a text-based, application-layer control protocol that can be used to establish, maintain, and terminate calls between two or more endpoints.

  1. Does necessary signaling to locate and invite the target endpoint based on address
  2. Determines the lowest level of common services between the endpoints through Session Description Protocol (SDP) and then establishes a two-way voice path via Real-time Transport Protocol (RTP).
  3. Handles the termination of the call.

(Source: http://www.en.voipforo.com/SIP/SIP_example.php)

So it looks all awesome. We engineers have figured out everything and at low cost !! Is there a catch?

Yes, there is a catch and it is in the reliability!

Communication on the IP network is less reliable because it does not provide a network-based mechanism to ensure that data packets are not lost, and are delivered in sequential order. It is a best-effort network without fundamental Quality of Service (QoS) guarantees. Voice, and all other data, travel in packets over IP networks with fixed maximum capacity. A circuit switched system of insufficient capacity will refuse new connections while carrying the remainder without impairment, while the quality of real-time data such as telephone conversations on packet-switched networks degrades dramatically. Network routers on high volume traffic links may introduce latency that exceeds permissible thresholds for VoIP. Also, this latency is dynamic and may be different over time leading to Jitter in voice. Also when the load on a link grows so quickly that the queues in the network routers overflow, data packets are lost.

Therefore, VoIP implementations may face problems which are measured as follows

    1. Packet Loss percentage – This is the percentage of packets that never made it from the sender, to the target server (or intermediate hop).
    2. Average Latency – The average latency is the average (mean) time it takes a packet to get from your computer to the target server, and then back again.
    3. Jitter – Jitter is a measurement of how much latency changes, from sample to sample. A low jitter number indicates a solid, good connection.

All these problems are further accelerated when the network is Wireless. In Wireless, the effective bandwidth is lower. Further, it is carrier sense based shared medium and all VoIP calls in the vicinity will compete for the same resources causing further contention. One can not solve that problem by putting more access points as deployment experience has proven that it can worsen the situation due to interference.

Nothing you run on your wireless is more demanding than real-time applications.

So since it is established that reliability is an issue and we want to improve it, various questions arise.

  • What is the target for this reliability?
  • How do we measure the performance?
  • How do we quantitatively benchmark?
  • How do we ensure/improve it?

Let’s shift our focus for once from networking guys to the end user. In the end user perspective, he/she does not want to be bothered with these details. As per user, the call is good if it is as good as standard PSTN calls. So I guess we have our reliability target!

How do we measure it?

MOS (Mean Opinion Score) is a voice call quality metric. It is used by the VoIP industry to put a number on voice quality. In reality, this number is a subjective ‘opinion’ rating of call quality given by someone who was just on a voice call.

The user should make a call and rate it from 1 to 5. When many such calls are rated in a system and averaged by different people we arrive at a score, which will be somewhere between 1 and 5. This is called Mean Opinion Score.

5 – Excellent

4 – Good

3 – Fair

2 – Poor

1 – Bad

However, it may be cumbersome to evaluate with this method as we will need to test a large number of concurrent calls. This will need a lot of trained resources to subjectively score. To handle this issue, the specification body has come up with quantitative MOS measures.

The ITU P.861 (PSQM) and P.862 standards explain how to estimate MOS scores based on Latency, Jitter and Packet Loss. Following steps are involved

effectiveLatency = latency + jitter * latencyImpact + compTime

Jitter has more impact than latency, so the jitter is scaled up by latencyImpact (e.g. doubled) and then summed up with latency. A defined value for computation time is added (e.g., 10ms).The resulting number is called “effective latency”.

Quality Measure R = 100 – (effectiveLatency / factorLatencyBased)

Arrive at an overall Quality Measure R, which in theory ranges from 0 to 100.

There is some loss at codec level itself so the starting number is typically not 100 and is smaller (e.g. 93 for G.711 Audio Codec).

Then need to subtract the quality loss due to latency. Factor LatencyBased may be slab based. R gets a much more aggressive deduction if latency exceeds a certain time.

R = R – (lostPackets* packetLossImpact)

Depending on what impact packet loss should have, it is multiplied with a certain factor (e.g. 2.5)

MOS = ( (R – 60) * (100 – R) * 0.000007R) + 0.035R + 1

Finally, the reduced R is converted into an MOS value by applying a widely used formula for this purpose.

The scaling factors used in above formulas i.e. latency Impact, compTime, factorLatencyBased, packetLossImpact need to be arrived by training the above model with subjective scores for a given system under measurement.

Now we know that we need to chase MOS and how to measure it objectively.

Now the quest remains, how do we get better MOS for VoIP?

  • Better deployment Planning
  • Usage of Survey tools
  • Usage of monitoring tools

AND There is a strong need to do thorough testing and standard throughput, radio signal tests are not sufficient.

We are in the process of conducting such tests and will share our test framework, rating method and the test results in our next blog.