#871 Dead s2sout connection hinders connectivity to affected domain for hours
Reporter
Jonas Wielicki
Owner
MattJ
Created
Updated
Stars
★★★ (4)
Tags
Priority-Medium
Type-Defect
Milestone-0.9
Status-Fixed
Jonas Wielicki
on
What steps will reproduce the problem?
1. Have an s2sout TCP connection which is "unauthed"(?) and which doesn’t appear to be received by the peer anymore and which doesn’t get ReSeTted by the peer.
(sorry, I don’t have logs from when this situation occurred. It should be reproducable by simply dropping packets on the TCP connection, for example with https://github.com/horazont/aioxmpp/blob/devel/utils/mutetcp.py )
What is the expected output? What do you see instead?
Following scenario: my domain (conference.zombofant.net), peer domain (foo.example). s2sout from conference.zombofant.net to foo.example is in the described dead condition. I expect prosody to close that s2sout at some point (e.g. after 5 minutes or so).
However, the following happens:
1. foo.example opens connection to conference.zombofant.net; prosody sees new s2sin
2. prosody attempts dialback verification over dead s2sout
3. s2sin times out
Some logs:
Mar 28 04:30:23 s2sin11681a0 debug Incoming s2s connection
Mar 28 04:30:23 s2sin11681a0 debug Incoming s2s received <stream:stream xmlns='http://etherx.jabber.org/streams' to='conference.zombofant.net' version='1.0' from='foo.example'>
Mar 28 04:30:23 mod_s2s debug sending: <?xml version='1.0'?>
Mar 28 04:30:23 mod_s2s debug sending: <stream:stream xmlns:db='jabber:server:dialback' xmlns:stream='http://etherx.jabber.org/streams' version='1.0' from='conference.zombofant.net' id='1980de8a-01ee-420d-b5d8-045707bea6b5' to='foo.example' xml:lang='en' xmlns='jabber:server'>
Mar 28 04:30:23 mod_s2s debug Sending stream features: <stream:features><dialback xmlns='urn:xmpp:features:dialback'/><starttls xmlns='urn:ietf:params:xml:ns:xmpp-tls'/></stream:features>
Mar 28 04:30:23 mod_s2s debug sending: <stream:features>
Mar 28 04:30:23 s2sin11681a0 debug Received[s2sin_unauthed]: <starttls xmlns='urn:ietf:params:xml:ns:xmpp-tls'>
Mar 28 04:30:23 mod_s2s debug sending: <proceed xmlns='urn:ietf:params:xml:ns:xmpp-tls'>
Mar 28 04:30:23 socket debug server.lua: we need to do tls, but delaying until send buffer empty
Mar 28 04:30:23 s2sin11681a0 debug TLS negotiation started for s2sin_unauthed...
Mar 28 04:30:23 socket debug server.lua: attempting to start tls on tcp{client}: 0x2f3f718
Mar 28 04:30:23 socket debug server.lua: ssl handshake done
Mar 28 04:30:23 s2sin11681a0 debug Incoming s2s received <stream:stream xmlns='http://etherx.jabber.org/streams' to='conference.zombofant.net' version='1.0' from='foo.example'>
Mar 28 04:30:23 s2sin11681a0 debug certificate chain validation result: invalid
Mar 28 04:30:23 s2sin11681a0 debug certificate error(s) at depth 1: self signed certificate in certificate chain
Mar 28 04:30:23 mod_s2s debug sending: <?xml version='1.0'?>
Mar 28 04:30:23 mod_s2s debug sending: <stream:stream xmlns:db='jabber:server:dialback' xmlns:stream='http://etherx.jabber.org/streams' version='1.0' from='conference.zombofant.net' id='b0c5e727-7349-4ca9-8cad-8ffea75f4355' to='foo.example' xml:lang='en' xmlns='jabber:server'>
Mar 28 04:30:23 mod_s2s debug Sending stream features: <stream:features><dialback xmlns='urn:xmpp:features:dialback'/></stream:features>
Mar 28 04:30:23 mod_s2s debug sending: <stream:features>
Mar 28 04:30:23 s2sin11681a0 debug Received[s2sin_unauthed]: <result xmlns='jabber:server:dialback' to='conference.zombofant.net' from='foo.example'>
Mar 28 04:30:23 s2sin11681a0 debug asking foo.example if key 0a7a03e73269dd1b64e673c3fb4647750c7855b538da511f80e563d57e0bc8ba belongs to them
Mar 28 04:30:23 s2sout28ea980 debug trying to send over unauthed s2sout to foo.example
Mar 28 04:30:23 s2sout28ea980 debug stanza [db:verify] queued
Mar 28 04:31:53 s2sin11681a0 debug Destroying incomplete session foo.example->conference.zombofant.net due to inactivity
Mar 28 04:31:53 s2sin11681a0 debug Disconnecting (unknown host)[s2sin_unauthed], <stream:error> is: connection-timeout
Mar 28 04:31:53 mod_s2s debug sending: <stream:error>
Mar 28 04:31:53 mod_s2s debug sending: </stream:stream>
Mar 28 04:31:53 s2sin11681a0 info incoming s2s stream foo.example->conference.zombofant.net closed: connection-timeout
Mar 28 04:31:53 s2sin11681a0 debug Destroying incoming session foo.example->conference.zombofant.net: connection-timeout
Mar 28 04:31:53 s2sin11681a0 debug s2s disconnected: nil->nil (connection closed)
What version of the product are you using? On what operating system?
root@xmpp-public:~# apt show prosody
Package: prosody
Version: 0.9.7-2+deb8u3
Please provide any additional information below.
I think that the following mitigations are not sufficient:
* mod_pinger: c2s only
* mod_s2s_idle_timeout: may not work depending on the rate connections from the other side are attempted, as the s2sout timeout would be based on when the last db:verify was sent
* mod_s2s_keepalive may not work because it doesn’t care whether ping replies are received and I’m not sure it would even try to send pings because according to the logs the s2sout is "unauthed" ("trying to send over unauthed s2sout to foo.example").
Jonas Wielicki
on
FWIW, I resolved the situation using s2s:close on admin_telnet. So I unfortunately cannot provide information such as "was the remote side sending ACKs on the TCP" or "how full was the TCP buffer". I should’ve thought about that before killing the s2s connection.
Zash
on
More information would indeed have been useful, especially regarding the early life of s2sout28ea980
I've seen something similar on a different server, possibly related to DNS failures.
Changes
tags Status-NeedInfo Milestone-0.9
Martin
on
I'm not sure but I might experience the same problem:
- I am running prosody on a shared hoster (uberspace.de, toast not involved as I used `./configure --prefix=${HOME} && make && make install`
- I was in conversations@conference.siacs.eu
- I updated prosody 0.10 to hg:8ea0484871e8 I restarted my server (19:25)
- prosody.log tells me s2s to conference.siacs.eu was successful but I can't connect to conversations@conference.siacs.eu
- At 21:21 I restarted again to no avail
```
prosodyctl about
Prosody hg:8ea0484871e8
# Prosody directories
Data directory: /home/md85/var/lib/prosody
Config directory: /home/md85/etc/prosody
Source directory: /home/md85/lib/prosody
Plugin directories:
/home/md85/opt/prosody-modules/ - prosody-modules rev: a6d19aea9b73
/home/md85/lib/prosody/modules/
# Lua environment
Lua version: Lua 5.1
Lua module search paths:
/home/md85/lib/prosody/?.lua
/home/md85/.luarocks/share/lua/5.1/?.lua
/home/md85/.luarocks/share/lua/5.1/?/init.lua
/usr/share/lua/5.1/?.lua
/usr/share/lua/5.1/?/init.lua
/home/md85/.luarocks/share/lua/5.1/?.lua
/home/md85/.luarocks/share/lua/5.1/?/init.lua
/usr/lib64/lua/5.1/?.lua
/usr/lib64/lua/5.1/?/init.lua
/home/md85/.luarocks/share/lua/5.1/?.lua
/home/md85/.luarocks/share/lua/5.1/?/init.lua
Lua C module search paths:
/home/md85/lib/prosody/?.so
/home/md85/.luarocks/lib/lua/5.1/?.so
/usr/lib/lua/5.1/?.so
/home/md85/.luarocks/lib/lua/5.1/?.so
/usr/lib64/lua/5.1/?.so
/usr/lib64/lua/5.1/loadall.so
/home/md85/.luarocks/lib/lua/5.1/?.so
LuaRocks: Installed (2.1.2)
# Lua module versions
lfs: LuaFileSystem 1.6.3
libevent: 2.1.8-stable
luaevent: 0.4.4
lxp: LuaExpat 1.3.0
socket: LuaSocket 3.0-rc1
ssl: 0.6
```
```
grep siacs.eu ~/var/lib/prosody/prosody.log
Aug 28 17:38:08 s2sout5c4f760 info Outgoing s2s connection mdosch.de->conference.siacs.eu complete
Aug 28 17:38:17 mdosch.de:saslauth info Accepting SASL EXTERNAL identity from conference.siacs.eu
Aug 28 17:38:17 s2sin12124230 info Incoming s2s connection conference.siacs.eu->mdosch.de complete
Aug 30 00:02:39 s2sout5c4f760 info Outgoing s2s stream mdosch.de->conference.siacs.eu closed: Received SIGTERM
Aug 30 00:02:39 s2sin12124230 info Incoming s2s stream conference.siacs.eu->mdosch.de closed: Received SIGTERM
Aug 30 00:03:01 s2sout766cf40 info Outgoing s2s connection mdosch.de->conference.siacs.eu complete
Aug 30 00:03:02 mdosch.de:saslauth info Accepting SASL EXTERNAL identity from conference.siacs.eu
Aug 30 00:03:02 s2sin7a41090 info Incoming s2s connection conference.siacs.eu->mdosch.de complete
Sep 04 19:25:21 s2sout766cf40 info Outgoing s2s stream mdosch.de->conference.siacs.eu closed: Received SIGTERM
Sep 04 19:25:21 s2sin7a41090 info Incoming s2s stream conference.siacs.eu->mdosch.de closed: Received SIGTERM
Sep 04 19:25:36 s2sout869e2d0 info Outgoing s2s connection mdosch.de->conference.siacs.eu complete
Sep 04 21:21:38 s2sout869e2d0 info Outgoing s2s stream mdosch.de->conference.siacs.eu closed: Received SIGTERM
Sep 04 21:21:59 s2sout7969bc0 info Outgoing s2s connection mdosch.de->conference.siacs.eu complete
```
As you mentioned a possible DNS issue I remembered I had some DNS stuff in my error log (but without ever experiencing problems) I had a look but there was no recent activity:
```
grep DNS ~/var/lib/prosody/prosody.err
Jul 29 17:12:15 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again
Jul 31 17:45:52 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again
Aug 07 19:23:33 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again
Aug 11 23:00:51 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again
Aug 13 15:29:09 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again
Aug 15 14:57:09 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again
Aug 18 16:41:14 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again
Aug 19 14:30:18 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again
```
Also there were no errors logged today:
```
tail -n1 ~/var/lib/prosody/prosody.err
Aug 19 14:30:18 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again
```
I haven’t noticed the issue for quite a while now. Then again, I also hadn’t noticed it for quite a while before initially reporting it, unfortunately.
MattJ
on
Closing this. Rationale:
This was originally reported against 0.9.7 (released in 2014). That branch is now beyond obsolete.
There have been no recent reports of this issue that I'm aware of.
Other bugs that may have been responsible for this, by not firing timers that would clean up the session, are #460 and #1572 (both fixed) (from the original report here, it looks like it was quite possibly using lua-event affected by #460).
Finally, the connection logic has been replaced by net.connect in trunk.
What steps will reproduce the problem? 1. Have an s2sout TCP connection which is "unauthed"(?) and which doesn’t appear to be received by the peer anymore and which doesn’t get ReSeTted by the peer. (sorry, I don’t have logs from when this situation occurred. It should be reproducable by simply dropping packets on the TCP connection, for example with https://github.com/horazont/aioxmpp/blob/devel/utils/mutetcp.py ) What is the expected output? What do you see instead? Following scenario: my domain (conference.zombofant.net), peer domain (foo.example). s2sout from conference.zombofant.net to foo.example is in the described dead condition. I expect prosody to close that s2sout at some point (e.g. after 5 minutes or so). However, the following happens: 1. foo.example opens connection to conference.zombofant.net; prosody sees new s2sin 2. prosody attempts dialback verification over dead s2sout 3. s2sin times out Some logs: Mar 28 04:30:23 s2sin11681a0 debug Incoming s2s connection Mar 28 04:30:23 s2sin11681a0 debug Incoming s2s received <stream:stream xmlns='http://etherx.jabber.org/streams' to='conference.zombofant.net' version='1.0' from='foo.example'> Mar 28 04:30:23 mod_s2s debug sending: <?xml version='1.0'?> Mar 28 04:30:23 mod_s2s debug sending: <stream:stream xmlns:db='jabber:server:dialback' xmlns:stream='http://etherx.jabber.org/streams' version='1.0' from='conference.zombofant.net' id='1980de8a-01ee-420d-b5d8-045707bea6b5' to='foo.example' xml:lang='en' xmlns='jabber:server'> Mar 28 04:30:23 mod_s2s debug Sending stream features: <stream:features><dialback xmlns='urn:xmpp:features:dialback'/><starttls xmlns='urn:ietf:params:xml:ns:xmpp-tls'/></stream:features> Mar 28 04:30:23 mod_s2s debug sending: <stream:features> Mar 28 04:30:23 s2sin11681a0 debug Received[s2sin_unauthed]: <starttls xmlns='urn:ietf:params:xml:ns:xmpp-tls'> Mar 28 04:30:23 mod_s2s debug sending: <proceed xmlns='urn:ietf:params:xml:ns:xmpp-tls'> Mar 28 04:30:23 socket debug server.lua: we need to do tls, but delaying until send buffer empty Mar 28 04:30:23 s2sin11681a0 debug TLS negotiation started for s2sin_unauthed... Mar 28 04:30:23 socket debug server.lua: attempting to start tls on tcp{client}: 0x2f3f718 Mar 28 04:30:23 socket debug server.lua: ssl handshake done Mar 28 04:30:23 s2sin11681a0 debug Incoming s2s received <stream:stream xmlns='http://etherx.jabber.org/streams' to='conference.zombofant.net' version='1.0' from='foo.example'> Mar 28 04:30:23 s2sin11681a0 debug certificate chain validation result: invalid Mar 28 04:30:23 s2sin11681a0 debug certificate error(s) at depth 1: self signed certificate in certificate chain Mar 28 04:30:23 mod_s2s debug sending: <?xml version='1.0'?> Mar 28 04:30:23 mod_s2s debug sending: <stream:stream xmlns:db='jabber:server:dialback' xmlns:stream='http://etherx.jabber.org/streams' version='1.0' from='conference.zombofant.net' id='b0c5e727-7349-4ca9-8cad-8ffea75f4355' to='foo.example' xml:lang='en' xmlns='jabber:server'> Mar 28 04:30:23 mod_s2s debug Sending stream features: <stream:features><dialback xmlns='urn:xmpp:features:dialback'/></stream:features> Mar 28 04:30:23 mod_s2s debug sending: <stream:features> Mar 28 04:30:23 s2sin11681a0 debug Received[s2sin_unauthed]: <result xmlns='jabber:server:dialback' to='conference.zombofant.net' from='foo.example'> Mar 28 04:30:23 s2sin11681a0 debug asking foo.example if key 0a7a03e73269dd1b64e673c3fb4647750c7855b538da511f80e563d57e0bc8ba belongs to them Mar 28 04:30:23 s2sout28ea980 debug trying to send over unauthed s2sout to foo.example Mar 28 04:30:23 s2sout28ea980 debug stanza [db:verify] queued Mar 28 04:31:53 s2sin11681a0 debug Destroying incomplete session foo.example->conference.zombofant.net due to inactivity Mar 28 04:31:53 s2sin11681a0 debug Disconnecting (unknown host)[s2sin_unauthed], <stream:error> is: connection-timeout Mar 28 04:31:53 mod_s2s debug sending: <stream:error> Mar 28 04:31:53 mod_s2s debug sending: </stream:stream> Mar 28 04:31:53 s2sin11681a0 info incoming s2s stream foo.example->conference.zombofant.net closed: connection-timeout Mar 28 04:31:53 s2sin11681a0 debug Destroying incoming session foo.example->conference.zombofant.net: connection-timeout Mar 28 04:31:53 s2sin11681a0 debug s2s disconnected: nil->nil (connection closed) What version of the product are you using? On what operating system? root@xmpp-public:~# apt show prosody Package: prosody Version: 0.9.7-2+deb8u3 Please provide any additional information below. I think that the following mitigations are not sufficient: * mod_pinger: c2s only * mod_s2s_idle_timeout: may not work depending on the rate connections from the other side are attempted, as the s2sout timeout would be based on when the last db:verify was sent * mod_s2s_keepalive may not work because it doesn’t care whether ping replies are received and I’m not sure it would even try to send pings because according to the logs the s2sout is "unauthed" ("trying to send over unauthed s2sout to foo.example").
FWIW, I resolved the situation using s2s:close on admin_telnet. So I unfortunately cannot provide information such as "was the remote side sending ACKs on the TCP" or "how full was the TCP buffer". I should’ve thought about that before killing the s2s connection.
More information would indeed have been useful, especially regarding the early life of s2sout28ea980 I've seen something similar on a different server, possibly related to DNS failures.
ChangesI'm not sure but I might experience the same problem: - I am running prosody on a shared hoster (uberspace.de, toast not involved as I used `./configure --prefix=${HOME} && make && make install` - I was in conversations@conference.siacs.eu - I updated prosody 0.10 to hg:8ea0484871e8 I restarted my server (19:25) - prosody.log tells me s2s to conference.siacs.eu was successful but I can't connect to conversations@conference.siacs.eu - At 21:21 I restarted again to no avail ``` prosodyctl about Prosody hg:8ea0484871e8 # Prosody directories Data directory: /home/md85/var/lib/prosody Config directory: /home/md85/etc/prosody Source directory: /home/md85/lib/prosody Plugin directories: /home/md85/opt/prosody-modules/ - prosody-modules rev: a6d19aea9b73 /home/md85/lib/prosody/modules/ # Lua environment Lua version: Lua 5.1 Lua module search paths: /home/md85/lib/prosody/?.lua /home/md85/.luarocks/share/lua/5.1/?.lua /home/md85/.luarocks/share/lua/5.1/?/init.lua /usr/share/lua/5.1/?.lua /usr/share/lua/5.1/?/init.lua /home/md85/.luarocks/share/lua/5.1/?.lua /home/md85/.luarocks/share/lua/5.1/?/init.lua /usr/lib64/lua/5.1/?.lua /usr/lib64/lua/5.1/?/init.lua /home/md85/.luarocks/share/lua/5.1/?.lua /home/md85/.luarocks/share/lua/5.1/?/init.lua Lua C module search paths: /home/md85/lib/prosody/?.so /home/md85/.luarocks/lib/lua/5.1/?.so /usr/lib/lua/5.1/?.so /home/md85/.luarocks/lib/lua/5.1/?.so /usr/lib64/lua/5.1/?.so /usr/lib64/lua/5.1/loadall.so /home/md85/.luarocks/lib/lua/5.1/?.so LuaRocks: Installed (2.1.2) # Lua module versions lfs: LuaFileSystem 1.6.3 libevent: 2.1.8-stable luaevent: 0.4.4 lxp: LuaExpat 1.3.0 socket: LuaSocket 3.0-rc1 ssl: 0.6 ``` ``` grep siacs.eu ~/var/lib/prosody/prosody.log Aug 28 17:38:08 s2sout5c4f760 info Outgoing s2s connection mdosch.de->conference.siacs.eu complete Aug 28 17:38:17 mdosch.de:saslauth info Accepting SASL EXTERNAL identity from conference.siacs.eu Aug 28 17:38:17 s2sin12124230 info Incoming s2s connection conference.siacs.eu->mdosch.de complete Aug 30 00:02:39 s2sout5c4f760 info Outgoing s2s stream mdosch.de->conference.siacs.eu closed: Received SIGTERM Aug 30 00:02:39 s2sin12124230 info Incoming s2s stream conference.siacs.eu->mdosch.de closed: Received SIGTERM Aug 30 00:03:01 s2sout766cf40 info Outgoing s2s connection mdosch.de->conference.siacs.eu complete Aug 30 00:03:02 mdosch.de:saslauth info Accepting SASL EXTERNAL identity from conference.siacs.eu Aug 30 00:03:02 s2sin7a41090 info Incoming s2s connection conference.siacs.eu->mdosch.de complete Sep 04 19:25:21 s2sout766cf40 info Outgoing s2s stream mdosch.de->conference.siacs.eu closed: Received SIGTERM Sep 04 19:25:21 s2sin7a41090 info Incoming s2s stream conference.siacs.eu->mdosch.de closed: Received SIGTERM Sep 04 19:25:36 s2sout869e2d0 info Outgoing s2s connection mdosch.de->conference.siacs.eu complete Sep 04 21:21:38 s2sout869e2d0 info Outgoing s2s stream mdosch.de->conference.siacs.eu closed: Received SIGTERM Sep 04 21:21:59 s2sout7969bc0 info Outgoing s2s connection mdosch.de->conference.siacs.eu complete ``` As you mentioned a possible DNS issue I remembered I had some DNS stuff in my error log (but without ever experiencing problems) I had a look but there was no recent activity: ``` grep DNS ~/var/lib/prosody/prosody.err Jul 29 17:12:15 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again Jul 31 17:45:52 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again Aug 07 19:23:33 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again Aug 11 23:00:51 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again Aug 13 15:29:09 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again Aug 15 14:57:09 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again Aug 18 16:41:14 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again Aug 19 14:30:18 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again ``` Also there were no errors logged today: ``` tail -n1 ~/var/lib/prosody/prosody.err Aug 19 14:30:18 adns error Exhausted all 1 configured DNS servers, next lookup will try 127.0.0.1 again ```
Could someone try to reproduce this after https://hg.prosody.im/trunk/rev/756b8821007a to see if it helped?
I haven’t noticed the issue for quite a while now. Then again, I also hadn’t noticed it for quite a while before initially reporting it, unfortunately.
Closing this. Rationale: This was originally reported against 0.9.7 (released in 2014). That branch is now beyond obsolete. There have been no recent reports of this issue that I'm aware of. Other bugs that may have been responsible for this, by not firing timers that would clean up the session, are #460 and #1572 (both fixed) (from the original report here, it looks like it was quite possibly using lua-event affected by #460). Finally, the connection logic has been replaced by net.connect in trunk.
Changes