Measuring wi-fi calling latency

TL;DR: 300-450 ms latency, of which I think ~150 ms comes from Republic. Send comments to my G+ post.


I’ve been a customer of Republic Wireless for about 2 weeks now. I have terrible cell reception at home, so one of the attractions of their service was transparent wi-fi vs. cell calling. I expected that the voice quality would be much better over wi-fi than cell. Indeed it is, but what I hadn’t anticipated was high latency. From my very first call, the lag was very noticeable. Like a bad overseas call, the other party and I on the call talked over each other and had awkward pauses. Switching to cell (which happily Republic lets you toggle manually mid-call) the lag nearly vanished, but audio quality plummeted.

But how bad is the lag? I needed to quantify it. To do so, I made a recording of me calling my other line.

Experiment steps:

  • Wait until night when nobody else in the house is using my LAN (verify low traffic)
  • Call from my landline to my cell phone, with landline on speakerphone
  • Check the cell to be sure the call is using wi-fi and not cell
  • Start recording audio on desktop computer
  • Put landline handset speaker against computer microphone, speak into cell phone
  • Put cell phone speaker against computer microphone, speak into landline
  • Hang up, stop recording
That resulted in a 30-second audio clip. Listen to the raw data yourself: m4a, mp3

Then I dragged the m4a recording into Sony Sound Forge Pro (yes, I’m shilling for my employer just a tiny bit). I isolated several segments of audio where I could hear myself talking and then hear the result from a handset a fraction of a second later. To measure the latency, I simply highlighted from start to start of the same syllable and looked at the selection duration.

Here’s a screenshot showing Sound Forge with a span selected from the beginning of me saying “Test” to the beginning that “Test” coming out of the handset’s speaker. Note the “00:00:00.453” duration of the selection. Sound Forge screenshot analyzing the phone recording The final result is that the lag from cell to landline is about 300 ms. The lag from landline to cell is about 450 ms. So a conversational round trip totals to a minimum of 750 ms.

Republic’s wiki recommends using a ping/jitter test to validate your WAN connection. I ran this one: speed test, which requires Java. That report says I have 3.8 Mbps down, 545 kbps up, 96 ms RTT, 66 ms max delay, 0.1 ms jitter, 0% packet loss, and MOS of 4.2 (nearly “toll quality”). That is to say, my network appears to be capable of much better results than I’m achieving.

If you subtract 96 ms packet RTT for my network from the 300 or 450 ms voice delay that I measured, you still have about 200-350 ms of delay. There are certainly contributions from multiple sources: the audio compressor, my wi-fi, my DSL uplink, the WAN, Republic’s servers, the PSTN/POTS network to my landline, and the hop to my cordless landline handset.

Lets make some guesses about the latency budget.

  1. the best audio encoders can get down to 5-20 ms of latency (see the Opus codec latency graph)
  2. wi-fi latency is tiny on these scales (Quora says ~3ms) as long as bufferbloat doesn’t get you down
  3. my DSL uplink at ~500 kbps is slow enough to matter. If the VOIP audio stream plus overhead is, say, 80kbps (which is a reasonable guess vs. the PSTN rate of 64 kbps) time-divided into 20 ms packets, then each packet is about 1.5kb which takes about 3ms to transmit. Again, pretty negligible.
  4. I measured my LAN ping as 66 ms worst case (96 ms round-trip, but VOIP is certainly UDP so we don’t need to wait for ACKs)
  5. Republic’s servers are a black box.
  6. Wikipedia suggests that PSTN/POTS voice networks strive to keep latency under 100 ms. So lets pick that number.
  7. No idea about my cordless 2.4 GHz landline. I’ll guess it’s negligible because I don’t hear delay in normal local calls.
Without #5, that adds up to ~200 ms. So that means that Republic is likely adding 100-250ms to the path. I’d love to hear from Republic to learn about their infrastructure. Is that an expected number? Is there hope for improvement?