Polycom and Adtran AOS Geo-Redundancy Support

Background

The Polycom SoundPoint IP SIP Phones and Adtran IADs are used for Hosted IP PBX Access Devices, managed by the BroadWorks platform. In a non-geographically-redundancy network, the devices use SIP to register to a single SIP SBC IP address.

To support geographic redundancy of SBCs, the devices must support registration to multiple IP addresses. It must select the proper IP address in each case to maintain its service and operation on the platform.

In a conventional non-redundant design, each SIP Access device registers with only a single SBC. In a geo-redundant environment the SIP Access Device has to decide properly when and if to use each of the two sites.

The behavior of a SIP Access Device controls the effectiveness of geo-redundant failover. There are no hard and fast standards on the proper behavior. For example:

  • How does the access device determine that the primary site is unavailable?
  • After determining the primary site is unavailable, what should happen to calls that had already been started through that site?
  • How long should the access device wait before attempting to register with the secondary site?
  • After successfully registering with the secondary site, when should the access device check the status of the primary site?
  • Should the access device check the status of the primary site with a SIP registration, or some other SIP method?
  • What happens to SIP subscriptions setup on the access device during a failover to a secondary site?

This TR documents the best results determined for supporting this service on the Polycom SIP phones and geo-redundant SBCs, supported by BroadWorks.

Failover Retransmission Timing

Polycom -

Failover is based on a lack of response for both Polycom 3.x and 4.x software. It uses a 2-exponential backoff starting at 500 ms with a maximum delay of 2000 ms.

Assuming a SIP REGISTER 200 OK Contact expires=30 value causing the phone to re-register every 30 seconds, the worst-case timeline for failover is as follows, for both Adtran TA900 and Polycom phones.

  • Time <0: Device Under Test (DUT )receives 200 OK for SIP REGISTER
  • Time 0: DUT is successfully registered.
  • Time 15: DUT transmits REGISTER to lab-sd1
  • Time 15.5: DUT retransmits REGISTER to lab-sd1
  • Time 16.5: DUT retransmits REGISTER to lab-sd1
  • Time 18.5: DUT transmits REGISTER to lab-sd2
  • Time >18.5: DUT successfully registers via lab-sd2

Times are given in seconds.

Adtran -

The IAD uses a 2-exponential backoff starting at 500 ms with a default maximum delay of 2000 ms.

Assuming default settings, the worst-case timeline for failover is as follows.

  • Time <0: Device Under Test (DUT )receives 200 OK for SIP REGISTER
  • Time 0: DUT is successfully registered.
  • Time 15: DUT transmits REGISTER to lab-sd1
  • Time 15.5: DUT retransmits REGISTER to lab-sd1
  • Time 16.5: DUT retransmits REGISTER to lab-sd1
  • Time 18.5: DUT transmits REGISTER to lab-sd2
  • Time >18.5: DUT successfully registers via lab-sd2

Times are given in seconds.

Subscriptions

Polycom -

SIP Subscriptions are lost when the DUT re-registers with the secondary SBC. They are re-established after an hour. DUT does not re-SUBSCRIBE when it switches to a new SBC. BroadWorks does not keep the subscriptions coupled to the registration Contact which would route through the user's current SBC.

Because subscriptions are not maintained between SBCs, features such as these will not be fully functional during the hours:

  • Busy Lamp Field
  • Shared Call Appearance
  • Message Waiting Indicator

Adtran -

SIP Subscriptions are not applicable when dealing with Adtran IADs.

Testing
Polycom SIP Version 3

The Polycom 3.x software is the newest software available for many popular phones, including the SoundPoint IP 330 and 500. Therefore, for networks that include these and related generations of phones, the geo-redundancy behavior of these devices affects the core network operation significantly.

In the tests below, the DUT is a Polycom SoundPoint IP 330 running 3.1.2c software.

Configuration and DNS Lookups

The DUT was configured to perform DNS lookups for "lab2.e-c-group.com", and configured for transport="DNSnaptr".

@ORIGIN lab2.e-c-group.com.
_sip._udp 600        IN        SRV 20 10 5060 lab-sd2
_sip._udp 600        IN        SRV 10 10 5060 lab-sd1
lab-sd1   600        IN        A              216.128.128.11
lab-sd2   600        IN        A              216.128.128.40

Fault Detection

DUT detects the fault only when it fails to receive a reply to a SIP REGISTER.

If the SBC returns a SIP 400 or SIP 503 response to DUT, DUT does not attempt lab-sd2.

Affinity for Active SBC

Every new registration request restarts on the primary SBC.

And, even after registered on the secondary SBC, lab-sd2, every new INVITE is attempted on the primary SBC.

Revert

Because every new request is attempted in the primary SBC, each new REGISTER or INVITE will cause an attempt to revert to the primary SBC.

Key Findings

Geographic Failover Should Work

The Polycom 3.x software should provide a functional failover option.

Overload-After-Recovery Risk

Polycoms running 3.x will attempt to register with a recovered SBC during the registration expiration interval. For example, if all devices are configured to re-register every 30 seconds, and the Polycoms re-attempt registration at half of the registration expiration time, then every Polycom 3.x device will attempt to register with the primary SBC, after its recovery, in the space of 15 seconds. This is likely to cause an overload on the newly-recovered SBC.

Polycom SIP Version 4

The Polycom 4.x software is available for many of Polycom's newer SoundPoint IP Phones, such as the 550 and 670. This software provides several additional configuration options for proper support of failover.

Parameter Explanation Default Recom-
mendation
authOptimizedInFailover If set to 1, when failover occurs, the first new SIP request is sent to the server that sent the proxy authentication request.

If set to 0, when failover occurs, the first new SIP request is sent to the server with the highest priority in the server list.

0 1
onlySignalWithRegistered If 1, the phone determines if the user is registered (voIpProt.SIP.outboundProxy.failOver.RegisterOn must be enabled). 1 1
failRegistrationOn If 1, the phone will silently invalidate an existing registration at the point of failing over. 1 1
failOver.failBack.mode The mode for failover failback.

 

newRequests – all new requests are forwarded first to the primary server regardless of the last used server.

DNSTTL – the phone tries the primary server again after a timeout equal to the DNS TTL configured for the server that the phone is registered to.

registration – the phone tries the primary server again when the registration renewal signaling begins.

 

duration – the phone tries the primary server again after the time specified by ...failOver.failBack.timeout expires.

newRequests DNSTTL
reRegisterOn If 1, the phone will first attempt to register with (or via) the server to which the signaling is to be diverted, and only if the registration succeeds (200 OK with valid expires) will the signaling diversion proceed with that server. 0 1

Configuration and DNS Lookups

The DUT was configured to perform DNS lookups for "lab2.e-c-group.com", and configured for transport="DNSnaptr".

We tested with both TCP and UDP on the Polycom 4.x software.

TCP Configured

@ORIGIN lab2.e-c-group.com.
@         600        IN        NAPTR 50 10 "S" "SIP+D2T" "" _sip._tcp
_sip._udp 600        IN        SRV   20 10 5060             lab-sd2
_sip._udp 600        IN        SRV   10 10 5060             lab-sd1
lab-sd1   600        IN        A                            216.128.128.11
lab-sd2   600        IN        A                            216.128.128.40

UDP Configured

@ORIGIN lab2.e-c-group.com.
_sip._udp 600        IN        SRV 20 10 5060 lab-sd2
_sip._udp 600        IN        SRV 10 10 5060 lab-sd1
lab-sd1   600        IN        A              216.128.128.11
lab-sd2   600        IN        A              216.128.128.40

Fault Detection

DUT detects the fault only when it fails to receive a reply to a SIP REGISTER.

If the SBC returns a SIP 400 response to DUT, DUT does not attempt lab-sd2.

If the SBC returns a SIP 503 response, the DUT attempts to re-register with lab-sd2. But it does not stay registered properly with lab-sd2; it allows the registration to expire. In effect, when the primary SBC returns a SIP 503, the DUT stays registered only part of the time.

Affinity for Active SBC

Using the settings that we recommend in this document, DUT continues to consistently use a specific SBC until it fails, or the DNS TTL timer expires.

Revert

Using the recommendations of this document, DUT will continue to use the secondary SBC for the duration specified by DNS TTL value. After this expires, DUT will re-attempt the primary SBC.

Key Findings

Geographic Failover Should Work

The Polycom 4.x software should provide a functional failover option. Because the affinity/revert function is less aggressive than the 3.x software, the 4.x software should provide better functionality for a geographically-redundant system.

Adtran TA900E IAD

Testing began with firmware version A4.07.00E. However, due to an issue with this version of the firmware the failover SIP register command is malformed. Thus, a firmware upgrade is required to allow the Adtran DUT to support failover. The firmware was upgraded to the most recent version of R10.5.0E.

Configuration and DNS Lookups

The DUT was configured to perform DNS lookups for "lab2.e-c-group.com". However, the Adtran IADs only support “SRV” lookup and have no support for “DNSnaptr”. The DUT does correctly prioritize the SBCs based on the DNS lookup.

Fault Detection

DUT detects the fault primarily when it fails to receive a reply to a SIP REGISTER. If the SIP Trunk setting “sip-server rollover service-unavailable-or-timeout” is set then failover on a SIP 503 response can occur for requests other than SIP REGISTER.

If the SBC returns a SIP 400 response to DUT, DUT does not attempt lab-sd2.

Affinity for Active SBC

Every new registration request restarts on the primary SBC.

And, even after registered on the secondary SBC, lab-sd2, every new INVITE is attempted on the primary SBC.

Revert

Because every new request is attempted in the primary SBC, each new REGISTER or INVITE will cause an attempt to revert to the primary SBC.

Key Findings

Geographic Failover Should Work with updated firmware

The A4.07.00E software sends malformed the failover SIP messages. However, the R10.5.0EE firmware for the Adtran IAD will provide a functional failover option.

Overload-After-Recovery Risk

Default Adtran IAD configurations will attempt to register with a recovered SBC during the registration expiration interval. For example, if all devices are configured to re-register every 30 seconds, and the IAD re-attempts registration at half of the registration expiration time, then every IAD will attempt to register with the primary SBC, after its recovery, in the space of 15 seconds. This is likely to cause an overload on the newly-recovered SBC.

Lab Testing

Test-By-Test Lab Testing records are available here. In these tests, we used the following equipment:

  • Polycom SoundPoint IP 330 0a1e
  • Polycom SoundPoint IP 601 35b2
  • Polycom SoundPoint IP 550 28ec8e
  • Adtran TA 908e
  • Acme Packet NN4250, 6.2 software, lab-sd1
  • Acme Packet NN4250, 6.2 software, lab-sd2
  • BroadWorks R18 Lab, lab2.e-c-group.com
  • Cisco Small Business SG300-10P PoE Ethernet Switch

 

This article based on ECG Tech Report TR-ECG15273.
Lab Testing: Matt Keathley