#871 Dead s2sout connection hinders connectivity to affected domain for hours

Reporter Jonas Wielicki
Owner Nobody
Created
Updated
Stars ★ (1)  
Tags
  • Milestone-0.9
  • Priority-Medium
  • Type-Defect
  • Status-NeedInfo
  1. Jonas Wielicki on

    What steps will reproduce the problem? 1. Have an s2sout TCP connection which is "unauthed"(?) and which doesn’t appear to be received by the peer anymore and which doesn’t get ReSeTted by the peer. (sorry, I don’t have logs from when this situation occurred. It should be reproducable by simply dropping packets on the TCP connection, for example with https://github.com/horazont/aioxmpp/blob/devel/utils/mutetcp.py ) What is the expected output? What do you see instead? Following scenario: my domain (conference.zombofant.net), peer domain (foo.example). s2sout from conference.zombofant.net to foo.example is in the described dead condition. I expect prosody to close that s2sout at some point (e.g. after 5 minutes or so). However, the following happens: 1. foo.example opens connection to conference.zombofant.net; prosody sees new s2sin 2. prosody attempts dialback verification over dead s2sout 3. s2sin times out Some logs: Mar 28 04:30:23 s2sin11681a0 debug Incoming s2s connection Mar 28 04:30:23 s2sin11681a0 debug Incoming s2s received <stream:stream xmlns='http://etherx.jabber.org/streams' to='conference.zombofant.net' version='1.0' from='foo.example'> Mar 28 04:30:23 mod_s2s debug sending: <?xml version='1.0'?> Mar 28 04:30:23 mod_s2s debug sending: <stream:stream xmlns:db='jabber:server:dialback' xmlns:stream='http://etherx.jabber.org/streams' version='1.0' from='conference.zombofant.net' id='1980de8a-01ee-420d-b5d8-045707bea6b5' to='foo.example' xml:lang='en' xmlns='jabber:server'> Mar 28 04:30:23 mod_s2s debug Sending stream features: <stream:features><dialback xmlns='urn:xmpp:features:dialback'/><starttls xmlns='urn:ietf:params:xml:ns:xmpp-tls'/></stream:features> Mar 28 04:30:23 mod_s2s debug sending: <stream:features> Mar 28 04:30:23 s2sin11681a0 debug Received[s2sin_unauthed]: <starttls xmlns='urn:ietf:params:xml:ns:xmpp-tls'> Mar 28 04:30:23 mod_s2s debug sending: <proceed xmlns='urn:ietf:params:xml:ns:xmpp-tls'> Mar 28 04:30:23 socket debug server.lua: we need to do tls, but delaying until send buffer empty Mar 28 04:30:23 s2sin11681a0 debug TLS negotiation started for s2sin_unauthed... Mar 28 04:30:23 socket debug server.lua: attempting to start tls on tcp{client}: 0x2f3f718 Mar 28 04:30:23 socket debug server.lua: ssl handshake done Mar 28 04:30:23 s2sin11681a0 debug Incoming s2s received <stream:stream xmlns='http://etherx.jabber.org/streams' to='conference.zombofant.net' version='1.0' from='foo.example'> Mar 28 04:30:23 s2sin11681a0 debug certificate chain validation result: invalid Mar 28 04:30:23 s2sin11681a0 debug certificate error(s) at depth 1: self signed certificate in certificate chain Mar 28 04:30:23 mod_s2s debug sending: <?xml version='1.0'?> Mar 28 04:30:23 mod_s2s debug sending: <stream:stream xmlns:db='jabber:server:dialback' xmlns:stream='http://etherx.jabber.org/streams' version='1.0' from='conference.zombofant.net' id='b0c5e727-7349-4ca9-8cad-8ffea75f4355' to='foo.example' xml:lang='en' xmlns='jabber:server'> Mar 28 04:30:23 mod_s2s debug Sending stream features: <stream:features><dialback xmlns='urn:xmpp:features:dialback'/></stream:features> Mar 28 04:30:23 mod_s2s debug sending: <stream:features> Mar 28 04:30:23 s2sin11681a0 debug Received[s2sin_unauthed]: <result xmlns='jabber:server:dialback' to='conference.zombofant.net' from='foo.example'> Mar 28 04:30:23 s2sin11681a0 debug asking foo.example if key 0a7a03e73269dd1b64e673c3fb4647750c7855b538da511f80e563d57e0bc8ba belongs to them Mar 28 04:30:23 s2sout28ea980 debug trying to send over unauthed s2sout to foo.example Mar 28 04:30:23 s2sout28ea980 debug stanza [db:verify] queued Mar 28 04:31:53 s2sin11681a0 debug Destroying incomplete session foo.example->conference.zombofant.net due to inactivity Mar 28 04:31:53 s2sin11681a0 debug Disconnecting (unknown host)[s2sin_unauthed], <stream:error> is: connection-timeout Mar 28 04:31:53 mod_s2s debug sending: <stream:error> Mar 28 04:31:53 mod_s2s debug sending: </stream:stream> Mar 28 04:31:53 s2sin11681a0 info incoming s2s stream foo.example->conference.zombofant.net closed: connection-timeout Mar 28 04:31:53 s2sin11681a0 debug Destroying incoming session foo.example->conference.zombofant.net: connection-timeout Mar 28 04:31:53 s2sin11681a0 debug s2s disconnected: nil->nil (connection closed) What version of the product are you using? On what operating system? root@xmpp-public:~# apt show prosody Package: prosody Version: 0.9.7-2+deb8u3 Please provide any additional information below. I think that the following mitigations are not sufficient: * mod_pinger: c2s only * mod_s2s_idle_timeout: may not work depending on the rate connections from the other side are attempted, as the s2sout timeout would be based on when the last db:verify was sent * mod_s2s_keepalive may not work because it doesn’t care whether ping replies are received and I’m not sure it would even try to send pings because according to the logs the s2sout is "unauthed" ("trying to send over unauthed s2sout to foo.example").

  2. Jonas Wielicki on

    FWIW, I resolved the situation using s2s:close on admin_telnet. So I unfortunately cannot provide information such as "was the remote side sending ACKs on the TCP" or "how full was the TCP buffer". I should’ve thought about that before killing the s2s connection.

  3. Zash on

    More information would indeed have been useful, especially regarding the early life of s2sout28ea980 I've seen something similar on a different server, possibly related to DNS failures.

    Changes
    • tags Milestone-0.9 Status-NeedInfo

New comment