#649 rfc6724.source() performs badly on systems with many IP addresses

Reporter Adoa Coturnix
Owner MattJ
Created
Updated
Stars ★ (1)  
Tags
  • Priority-Medium
  • Type-Defect
  • Status-Accepted
  1. Adoa Coturnix on

    1. Install prosody 0.9.10 via toast 2. Start with the same configuration as 0.9.9 3. Connect with client With 0.9.9 I see an increase of cpu load in top to about 50% for a few seconds. When client finishes connecting, cpu load goes down to effectively 0. With 0.9.10 I see an increase of cpu load up to 100% upon connecting with a client (tried gajim, pidgin, conversations) and this does not go down. Clients get timeouts, because server does not react sufficiently fast. I do not use libevent – which (I think) does not work with 0.9.x TLS is mandatory (c2s & s2s) and works fine with 0.9.9 according to IM Observatory. No specific ssl.options set The server is running on a shared webspace (uberspace.de) with centOS 6.7 I tried deactivating muc, proxy65, bosh, mam, smacks, (& some others) – did not help. Just as making TLS optional, or deactivating it altogether. When deactivating s2s the problem seems to go away, but s2s is crucial for me – I am (basically) the only user on my server. Downgrading to 0.9.9 restores desired behavior but can only be seen as a workaround. I manually installed prosody-trunk from the hg repo -> same problem with cpu load.

  2. Adoa Coturnix on

    By the way, my prosody.cfg.lua looks very similar to this one: https://geeklabor.de/index.php?url=archives/194-Eigener-XMPP-Server-mit-Prosody-auf-Uberspace.html However, I do not use http_upload The author of that article is using the same shared hoster (but different host) and was complaining about high cpu load, as well.

  3. MattJ on

    Hi, thanks for the report! What storage do you use? And do you have mod_privacy or mod_compression enabled? Also, what size is your certificate's key (e.g. 2048, 4096)?

    Changes
    • owner MattJ
  4. MattJ on

    Additionally, since you are apparently comfortable building from source... please start with 0.9.9 (and confirm it works ok), and then apply this patch: https://hg.prosody.im/0.9/raw-diff/5c6e78dc1864/plugins/mod_dialback.lua And test to see if it causes the high CPU usage. If it does not, please try this patch: https://hg.prosody.im/0.9/raw-diff/0386ccf20ac7/core/portmanager.lua Let us know how it goes :)

  5. Adoa Coturnix on

    The certificate's private key is 4096 RSA, I use 2048 bit Diffie-Helman parameters. The IM Observatory gave me this report: https://xmpp.net/result.php?domain=ctrnx.de&type=server Looks OK for all that I can say. And actually, I get the this results with either 0.9.9 (at any time I ask for it) or with 0.9.10 *before* any of my clients attempts to connect. Enabling or disabling privacy / compression did not change anything about the cpu load, as far as I remember. I did not specify any specific storage, so I guess it uses the file based internal storage. I was wondering whether I should try SQL (which I could) but somewhere it said that SQL is only supported with 0.10, so I did not try, yet. About the last bit, the documentation was actually confusing: I don't know whether SQL for the storage should work with 0.9 and/or whether it was only the MAM module that cannot connect to SQL in 0.9 … any tip about this is welcome but maybe off topic. About building from source: I am definitively able to run wget or hg clone, ./configure, make and ./prosody. Since it just works, I cannot call it difficult. However, I have no clue about how to "apply this patch". I am not a software developer, but willing to learn ;-)

  6. Adoa Coturnix on

    Oh and one more thing: I recently told Gajim to "synchronize contacts" with another account on a different server. Now I have about 55 contacts, some of which are supposed to be located on servers that do not exist or are not reachable anymore (gmail anyone?). So any attempt of prosody to contact those will fail. I am aware of this and I wanted to remove those contacts, but first I wanted to understand this CPU issue. I hope they are unrelated. In any case: this must not be a problem for prosody. If this is it: There is a serious issue.

  7. MattJ on

    To apply the patches, in the source directory you can simply run: wget PATCH_URL -O- | patch -p1

  8. Zash on

    The output of `prosodyctl about` could also be useful, as would debug logs and strace logs (e.g. strace -o prosody.trace -r lua prosody) And how many s2s connections are established? And, since you already have mercurial set up, do you think you could try bisect? Something like: hg bisect --good 0.9.9 hg bisect --bad 0.9.10 Then, run make; ./prosody, check if the issue appears. If it does, run hg bisect --bad, if not --good.

  9. Adoa Coturnix on

    You can have the prosodyctl about immediately – for either 0.9.9 or 0.9.10 installed by toast, they look identical. For the rest I will have to find some time, maybe tonight or tomorrow. Will keep you updated. $ prosodyctl about Prosody unknown # Prosody directories Data directory: /home/adoa/.toast/armed/var/lib/prosody Plugin directory: /home/adoa/var/prosody/community-modules/;/home/adoa/.toast/armed/lib/prosody/modules/ Config directory: /home/adoa/.toast/armed/etc/prosody Source directory: /home/adoa/.toast/armed/lib/prosody # Lua environment Lua version: Lua 5.1 Lua module search paths: /home/adoa/.toast/armed/lib/prosody/?.lua /home/adoa/.luarocks/share/lua/5.1/?.lua /home/adoa/.luarocks/share/lua/5.1/?/init.lua /usr/share/lua/5.1/?.lua /usr/share/lua/5.1/?/init.lua /home/adoa/.luarocks/share/lua/5.1/?.lua /home/adoa/.luarocks/share/lua/5.1/?/init.lua /usr/lib64/lua/5.1/?.lua /usr/lib64/lua/5.1/?/init.lua /home/adoa/.luarocks/share/lua/5.1/?.lua /home/adoa/.luarocks/share/lua/5.1/?/init.lua Lua C module search paths: /home/adoa/.toast/armed/lib/prosody/?.so /home/adoa/.luarocks/lib/lua/5.1/?.so /usr/lib/lua/5.1/?.so /home/adoa/.luarocks/lib/lua/5.1/?.so /usr/lib64/lua/5.1/?.so /usr/lib64/lua/5.1/loadall.so /home/adoa/.luarocks/lib/lua/5.1/?.so LuaRocks: Installed (2.1.2) # Lua module versions lfs: LuaFileSystem 1.6.3 lxp: LuaExpat 1.3.0 pposix: 0.3.6 socket: LuaSocket 3.0-rc1 ssl: 0.5.1

  10. Adoa Coturnix on

    I decided to go for the hg bisect method. The bisection result is this: The first bad revision is: changeset: 7092:bee63de49663 parent: 7077:0386ccf20ac7 user: Kim Alvefur <zash@zash.se> date: Thu Jan 21 22:21:19 2016 +0100 summary: Backout 63f5870f9afe, no longer needed since Windows is currently unsupported What should I do next? I still cannot see anything suspicious in the logs, neither info nor debug. Error logs only mention that it cannot load "mod_blacklist". Is it not in the prosody-modules repo? Anyhow, I guess that is unrelated, as it appears both when the CPU problem occurs, as well as and when it doesn't. Maybe the logs would tell you something. But I will not post the debug logs publicly. Do you clearly need them or is the hint to the first bad revision enough for the moment? I will only post info logs if you say they could help you. I could probably send debug logs via PGP email. Once again the symptoms: The CPU load is going up only after my client reconnects (and opens several connections to contacts’ servers). Irrespective of whether I try it with pidgin or gajim. Before the fist bad commit, the CPU load goes down to basically 0% after the client successfully connects and shows my populated roster. After the first bad commit, the CPU load is not going down again, but stays at 100% until I kill (SIGKILL) the process. I think that a SIGTERM also works in principle, but prosody reacts with a very heavy delay. The client by the way does not even get the info about the contacts in my roster, so everybody appears to be offline.

  11. Adoa Coturnix on

    Due to a hint of Zash in the chatroom: On the shared host, there are a lot of network interfaces: $ ip a | grep -c inet 581 $ ip a | grep -c "inet " 2 $ ip a | grep -c inet6 579 Explicity setting the s2s_interfaces and c2s_interfaces to only those that my prosody can use to talk to the outside solves the isuse. Now the log says Hello and welcome to Prosody version hg:0386ccf20ac7 and the CPU load is pretty much nothing.

  12. Zash on

    Looks like a performance issue with util.rfc6724 (decides which source address to use for outgoing connections). (Actually the kernel does this, but we need to pick IPv6 or IPv4 correctly first.) Running rfc6724.source() with a list of 581 IPv6 addresses 55 times took 18m24.206s on my machine. Specifying s2s_interfaces to non-"wildcard" addresses (0.0.0.0 and ::) bypasses retrieval of the full list of local network interfaces, so it only needs to check two addresses instead of over 500. The commit that bisect identifies removes a hack where Prosody would trick the kernel into doing most of this work, but it was fragile and didn't work on some systems.

  13. Adoa Coturnix on

    For the record: Since my prosody is reachable only from exactly two IP adresses (v4 + v6), I entered them into the global interfaces = {} Now any module will use only these interfaces, not only s2s and c2s. So in the future e.g. http and https (for bosh) will not start going crazy because of this.

  14. MattJ on

    Changes
    • title Prosody 0.9.10 using up to 100% of CPU with s2s connections rfc6724.source() performs badly on systems with many IP addresses
    • tags Status-Accepted

New comment