#609 Prosody won't try other SRV records if the first one fails
Reporter
Jack L. Frost
Owner
Zash
Created
Updated
Stars
★ (1)
Tags
Priority-Medium
Type-Defect
Status-Invalid
Jack L. Frost
on
I have a sort of weird usecase with SRV records for my domain:
_xmpp-server._tcp.fleshless.org. 300 IN SRV 15 0 5269 malganis.fleshless.org.
_xmpp-server._tcp.fleshless.org. 300 IN SRV 5 0 5269 h.fleshless.org.
The higher priority one is an internal IP for the hyperboria darknet:
h.fleshless.org has IPv6 address fcf2:ff45:a10:2ca1:fb5c:426d:3505:4f18
The lower priority one has two alias records, both public, both listening for connections:
malganis.fleshless.org has address 176.9.22.146
malganis.fleshless.org has IPv6 address 2a01:4f8:150:2085::2
Prosody, as tested on two different machines, won't even try connecting to the lower priority SRV record if the machine it's on has an IPv6 address. Which results every single IPv6-enabled jabber server out there probably not being able to connect to me.
Steps to reproduce: try to establish an s2s connection frpm an IPv6-enabled box to fleshless.org.
Jack L. Frost
on
every single IPv6-enabled jabber server using prosody*, of course.
Jack L. Frost
on
Oh. Forgot the details.
The tested prosody versions are 0.9.9 (Debian wheezy) and 0.9.10 (Debian jessie).
Jack L. Frost
on
Scratch that, the only one misbehaving is the jessie one now:
<pre>Jan 28 19:09:13 s2soutfb4b40 info Beginning new connection attempt to fleshless.org ([fcf2:ff45:a10:2ca1:fb5c:426d:3505:4f18]:5269)
Jan 28 19:10:43 s2soutfb4b40 info outgoing s2s stream vdrandom.org->fleshless.org closed: connection-timeout
Jan 28 19:10:43 s2soutfb4b40 info Sending error replies for 1 queued stanzas because of failed outgoing connection to fleshless.org
Jan 28 23:12:23 s2soutff8050 info Beginning new connection attempt to fleshless.org ([fcf2:ff45:a10:2ca1:fb5c:426d:3505:4f18]:5269)
Jan 28 23:13:53 s2soutff8050 info outgoing s2s stream vdrandom.org->fleshless.org closed: connection-timeout
Jan 28 23:13:53 s2soutff8050 info Sending error replies for 2 queued stanzas because of failed outgoing connection to fleshless.org
Jan 29 10:36:18 s2sout14a7730 info Beginning new connection attempt to fleshless.org ([fcf2:ff45:a10:2ca1:fb5c:426d:3505:4f18]:5269)
Jan 29 10:37:48 s2sout14a7730 info outgoing s2s stream vdrandom.org->fleshless.org closed: connection-timeout
Jan 29 10:37:48 s2sout14a7730 info Sending error replies for 1 queued stanzas because of failed outgoing connection to fleshless.org
Jan 29 11:52:38 s2sout131d010 info Beginning new connection attempt to fleshless.org ([fcf2:ff45:a10:2ca1:fb5c:426d:3505:4f18]:5269)
Jan 29 11:54:08 s2sout131d010 info outgoing s2s stream vdrandom.org->fleshless.org closed: connection-timeout
Jan 29 11:54:08 s2sout131d010 info Sending error replies for 2 queued stanzas because of failed outgoing connection to fleshless.org
Jan 29 12:11:41 s2sout10854d0 info Beginning new connection attempt to fleshless.org ([fcf2:ff45:a10:2ca1:fb5c:426d:3505:4f18]:5269)
Jan 29 12:13:11 s2sout10854d0 info outgoing s2s stream vdrandom.org->fleshless.org closed: connection-timeout
Jan 29 12:13:11 s2sout10854d0 info Sending error replies for 6 queued stanzas because of failed outgoing connection to fleshless.org
Jan 29 12:13:17 s2soute635e0 info Beginning new connection attempt to fleshless.org ([fcf2:ff45:a10:2ca1:fb5c:426d:3505:4f18]:5269)
Jan 29 12:14:47 s2soute635e0 info outgoing s2s stream vdrandom.org->fleshless.org closed: connection-timeout
Jan 29 12:14:47 s2soute635e0 info Sending error replies for 1 queued stanzas because of failed outgoing connection to fleshless.org
Jan 29 12:25:55 s2souteeac40 info Beginning new connection attempt to fleshless.org ([fcf2:ff45:a10:2ca1:fb5c:426d:3505:4f18]:5269)
Jan 29 12:27:25 s2souteeac40 info outgoing s2s stream vdrandom.org->fleshless.org closed: connection-timeout</pre>
It only ever tries the highest priority one.
Jack L. Frost
on
It seems to work perfectly for everyone else. The issue is, thus, invalid, we'll try to figure out what's that specific box' problem.
Zash
on
Prosody has a connection timeout for the entire s2s connection attempt that defaults to 90 seconds. It can be adjusted by setting 's2s_timeout' in the config.
If one or more of the individual TCP connection attempts adds up to more than that, the connection will be aborted. On Linux, look up the kernel parameter 'net.ipv4.tcp_syn_retries'. On a server mainly used for XMPP it may make sense to turn it down a notch.
Implementing the 'Happy eyeballs' https://tools.ietf.org/html/rfc6555 algorithm is something we would want to look at some day.
Jack L. Frost
on
Ah. I misunderstood you in the chat.
Thanks for the reply, it's at least clear that reconfiguring all the servers in the world is not feasible :)
Zash
on
Caused by TCP connection timeout being longer than s2s connection timeout.
Can be compensated by fiddling with configuration as mentioned in earlier comment. There's also a 'network_settings.connect_timeout' option for net.server.
Closing this issue.
(Maybe open a new issue for happy eyeballs.)
I have a sort of weird usecase with SRV records for my domain: _xmpp-server._tcp.fleshless.org. 300 IN SRV 15 0 5269 malganis.fleshless.org. _xmpp-server._tcp.fleshless.org. 300 IN SRV 5 0 5269 h.fleshless.org. The higher priority one is an internal IP for the hyperboria darknet: h.fleshless.org has IPv6 address fcf2:ff45:a10:2ca1:fb5c:426d:3505:4f18 The lower priority one has two alias records, both public, both listening for connections: malganis.fleshless.org has address 176.9.22.146 malganis.fleshless.org has IPv6 address 2a01:4f8:150:2085::2 Prosody, as tested on two different machines, won't even try connecting to the lower priority SRV record if the machine it's on has an IPv6 address. Which results every single IPv6-enabled jabber server out there probably not being able to connect to me. Steps to reproduce: try to establish an s2s connection frpm an IPv6-enabled box to fleshless.org.
every single IPv6-enabled jabber server using prosody*, of course.
Oh. Forgot the details. The tested prosody versions are 0.9.9 (Debian wheezy) and 0.9.10 (Debian jessie).
Scratch that, the only one misbehaving is the jessie one now: <pre>Jan 28 19:09:13 s2soutfb4b40 info Beginning new connection attempt to fleshless.org ([fcf2:ff45:a10:2ca1:fb5c:426d:3505:4f18]:5269) Jan 28 19:10:43 s2soutfb4b40 info outgoing s2s stream vdrandom.org->fleshless.org closed: connection-timeout Jan 28 19:10:43 s2soutfb4b40 info Sending error replies for 1 queued stanzas because of failed outgoing connection to fleshless.org Jan 28 23:12:23 s2soutff8050 info Beginning new connection attempt to fleshless.org ([fcf2:ff45:a10:2ca1:fb5c:426d:3505:4f18]:5269) Jan 28 23:13:53 s2soutff8050 info outgoing s2s stream vdrandom.org->fleshless.org closed: connection-timeout Jan 28 23:13:53 s2soutff8050 info Sending error replies for 2 queued stanzas because of failed outgoing connection to fleshless.org Jan 29 10:36:18 s2sout14a7730 info Beginning new connection attempt to fleshless.org ([fcf2:ff45:a10:2ca1:fb5c:426d:3505:4f18]:5269) Jan 29 10:37:48 s2sout14a7730 info outgoing s2s stream vdrandom.org->fleshless.org closed: connection-timeout Jan 29 10:37:48 s2sout14a7730 info Sending error replies for 1 queued stanzas because of failed outgoing connection to fleshless.org Jan 29 11:52:38 s2sout131d010 info Beginning new connection attempt to fleshless.org ([fcf2:ff45:a10:2ca1:fb5c:426d:3505:4f18]:5269) Jan 29 11:54:08 s2sout131d010 info outgoing s2s stream vdrandom.org->fleshless.org closed: connection-timeout Jan 29 11:54:08 s2sout131d010 info Sending error replies for 2 queued stanzas because of failed outgoing connection to fleshless.org Jan 29 12:11:41 s2sout10854d0 info Beginning new connection attempt to fleshless.org ([fcf2:ff45:a10:2ca1:fb5c:426d:3505:4f18]:5269) Jan 29 12:13:11 s2sout10854d0 info outgoing s2s stream vdrandom.org->fleshless.org closed: connection-timeout Jan 29 12:13:11 s2sout10854d0 info Sending error replies for 6 queued stanzas because of failed outgoing connection to fleshless.org Jan 29 12:13:17 s2soute635e0 info Beginning new connection attempt to fleshless.org ([fcf2:ff45:a10:2ca1:fb5c:426d:3505:4f18]:5269) Jan 29 12:14:47 s2soute635e0 info outgoing s2s stream vdrandom.org->fleshless.org closed: connection-timeout Jan 29 12:14:47 s2soute635e0 info Sending error replies for 1 queued stanzas because of failed outgoing connection to fleshless.org Jan 29 12:25:55 s2souteeac40 info Beginning new connection attempt to fleshless.org ([fcf2:ff45:a10:2ca1:fb5c:426d:3505:4f18]:5269) Jan 29 12:27:25 s2souteeac40 info outgoing s2s stream vdrandom.org->fleshless.org closed: connection-timeout</pre> It only ever tries the highest priority one.
It seems to work perfectly for everyone else. The issue is, thus, invalid, we'll try to figure out what's that specific box' problem.
Prosody has a connection timeout for the entire s2s connection attempt that defaults to 90 seconds. It can be adjusted by setting 's2s_timeout' in the config. If one or more of the individual TCP connection attempts adds up to more than that, the connection will be aborted. On Linux, look up the kernel parameter 'net.ipv4.tcp_syn_retries'. On a server mainly used for XMPP it may make sense to turn it down a notch. Implementing the 'Happy eyeballs' https://tools.ietf.org/html/rfc6555 algorithm is something we would want to look at some day.
Ah. I misunderstood you in the chat. Thanks for the reply, it's at least clear that reconfiguring all the servers in the world is not feasible :)
Caused by TCP connection timeout being longer than s2s connection timeout. Can be compensated by fiddling with configuration as mentioned in earlier comment. There's also a 'network_settings.connect_timeout' option for net.server. Closing this issue. (Maybe open a new issue for happy eyeballs.)
Changes