The Polycom SoundPoint IP SIP Phones and Adtran IADs are used for Hosted IP PBX Access Devices, managed by the BroadWorks platform. In a non-geographically-redundancy network, the devices use SIP to register to a single SIP SBC IP address.
To support geographic redundancy of SBCs, the devices must support registration to multiple IP addresses. It must select the proper IP address in each case to maintain its service and operation on the platform.
The behavior of a SIP Access Device controls the effectiveness of geo-redundant failover. There are no hard and fast standards on the proper behavior. For example:
This TR documents the best results determined for supporting this service on the Polycom SIP phones and geo-redundant SBCs, supported by BroadWorks.
Polycom -
Failover is based on a lack of response for both Polycom 3.x and 4.x software. It uses a 2-exponential backoff starting at 500 ms with a maximum delay of 2000 ms.
Assuming a SIP REGISTER 200 OK Contact expires=30 value causing the phone to re-register every 30 seconds, the worst-case timeline for failover is as follows, for both Adtran TA900 and Polycom phones.
Times are given in seconds.
Adtran -
The IAD uses a 2-exponential backoff starting at 500 ms with a default maximum delay of 2000 ms.
Assuming default settings, the worst-case timeline for failover is as follows.
Times are given in seconds.
Polycom -
SIP Subscriptions are lost when the DUT re-registers with the secondary SBC. They are re-established after an hour. DUT does not re-SUBSCRIBE when it switches to a new SBC. BroadWorks does not keep the subscriptions coupled to the registration Contact which would route through the user's current SBC.
Because subscriptions are not maintained between SBCs, features such as these will not be fully functional during the hours:
Adtran -
SIP Subscriptions are not applicable when dealing with Adtran IADs.
The Polycom 3.x software is the newest software available for many popular phones, including the SoundPoint IP 330 and 500. Therefore, for networks that include these and related generations of phones, the geo-redundancy behavior of these devices affects the core network operation significantly.
In the tests below, the DUT is a Polycom SoundPoint IP 330 running 3.1.2c software.
The DUT was configured to perform DNS lookups for "lab2.e-c-group.com", and configured for transport="DNSnaptr".
@ORIGIN lab2.e-c-group.com.
_sip._udp 600 IN SRV 20 10 5060 lab-sd2
_sip._udp 600 IN SRV 10 10 5060 lab-sd1
lab-sd1 600 IN A 216.128.128.11
lab-sd2 600 IN A 216.128.128.40
DUT detects the fault only when it fails to receive a reply to a SIP REGISTER.
If the SBC returns a SIP 400 or SIP 503 response to DUT, DUT does not attempt lab-sd2.
Every new registration request restarts on the primary SBC.
And, even after registered on the secondary SBC, lab-sd2, every new INVITE is attempted on the primary SBC.
Because every new request is attempted in the primary SBC, each new REGISTER or INVITE will cause an attempt to revert to the primary SBC.
The Polycom 3.x software should provide a functional failover option.
Polycoms running 3.x will attempt to register with a recovered SBC during the registration expiration interval. For example, if all devices are configured to re-register every 30 seconds, and the Polycoms re-attempt registration at half of the registration expiration time, then every Polycom 3.x device will attempt to register with the primary SBC, after its recovery, in the space of 15 seconds. This is likely to cause an overload on the newly-recovered SBC.
The Polycom 4.x software is available for many of Polycom's newer SoundPoint IP Phones, such as the 550 and 670. This software provides several additional configuration options for proper support of failover.
Parameter | Explanation | Default | Recom- mendation |
authOptimizedInFailover | If set to 1, when failover occurs, the first new SIP request is sent to the server that sent the proxy authentication request. If set to 0, when failover occurs, the first new SIP request is sent to the server with the highest priority in the server list. |
0 | 1 |
onlySignalWithRegistered | If 1, the phone determines if the user is registered (voIpProt.SIP.outboundProxy.failOver.RegisterOn must be enabled). | 1 | 1 |
failRegistrationOn | If 1, the phone will silently invalidate an existing registration at the point of failing over. | 1 | 1 |
failOver.failBack.mode | The mode for failover failback.
newRequests – all new requests are forwarded first to the primary server regardless of the last used server. registration – the phone tries the primary server again when the registration renewal signaling begins.
duration – the phone tries the primary server again after the time specified by ...failOver.failBack.timeout expires. |
newRequests | DNSTTL |
reRegisterOn | If 1, the phone will first attempt to register with (or via) the server to which the signaling is to be diverted, and only if the registration succeeds (200 OK with valid expires) will the signaling diversion proceed with that server. | 0 | 1 |
The DUT was configured to perform DNS lookups for "lab2.e-c-group.com", and configured for transport="DNSnaptr".
We tested with both TCP and UDP on the Polycom 4.x software.
@ORIGIN lab2.e-c-group.com. @ 600 IN NAPTR 50 10 "S" "SIP+D2T" "" _sip._tcp
_sip._udp 600 IN SRV 20 10 5060 lab-sd2
_sip._udp 600 IN SRV 10 10 5060 lab-sd1
lab-sd1 600 IN A 216.128.128.11
lab-sd2 600 IN A 216.128.128.40
@ORIGIN lab2.e-c-group.com.
_sip._udp 600 IN SRV 20 10 5060 lab-sd2
_sip._udp 600 IN SRV 10 10 5060 lab-sd1
lab-sd1 600 IN A 216.128.128.11
lab-sd2 600 IN A 216.128.128.40
DUT detects the fault only when it fails to receive a reply to a SIP REGISTER.
If the SBC returns a SIP 400 response to DUT, DUT does not attempt lab-sd2.
If the SBC returns a SIP 503 response, the DUT attempts to re-register with lab-sd2. But it does not stay registered properly with lab-sd2; it allows the registration to expire. In effect, when the primary SBC returns a SIP 503, the DUT stays registered only part of the time.
Using the settings that we recommend in this document, DUT continues to consistently use a specific SBC until it fails, or the DNS TTL timer expires.
Using the recommendations of this document, DUT will continue to use the secondary SBC for the duration specified by DNS TTL value. After this expires, DUT will re-attempt the primary SBC.
The Polycom 4.x software should provide a functional failover option. Because the affinity/revert function is less aggressive than the 3.x software, the 4.x software should provide better functionality for a geographically-redundant system.
Testing began with firmware version A4.07.00E. However, due to an issue with this version of the firmware the failover SIP register command is malformed. Thus, a firmware upgrade is required to allow the Adtran DUT to support failover. The firmware was upgraded to the most recent version of R10.5.0E.
The DUT was configured to perform DNS lookups for "lab2.e-c-group.com". However, the Adtran IADs only support “SRV” lookup and have no support for “DNSnaptr”. The DUT does correctly prioritize the SBCs based on the DNS lookup.
DUT detects the fault primarily when it fails to receive a reply to a SIP REGISTER. If the SIP Trunk setting “sip-server rollover service-unavailable-or-timeout” is set then failover on a SIP 503 response can occur for requests other than SIP REGISTER.
If the SBC returns a SIP 400 response to DUT, DUT does not attempt lab-sd2.
Every new registration request restarts on the primary SBC.
And, even after registered on the secondary SBC, lab-sd2, every new INVITE is attempted on the primary SBC.
Because every new request is attempted in the primary SBC, each new REGISTER or INVITE will cause an attempt to revert to the primary SBC.
The A4.07.00E software sends malformed the failover SIP messages. However, the R10.5.0EE firmware for the Adtran IAD will provide a functional failover option.
Default Adtran IAD configurations will attempt to register with a recovered SBC during the registration expiration interval. For example, if all devices are configured to re-register every 30 seconds, and the IAD re-attempts registration at half of the registration expiration time, then every IAD will attempt to register with the primary SBC, after its recovery, in the space of 15 seconds. This is likely to cause an overload on the newly-recovered SBC.
Test-By-Test Lab Testing records are available here. In these tests, we used the following equipment:
This article based on ECG Tech Report TR-ECG15273.
Lab Testing: Matt Keathley