#460 Prosody trunk fails to execute module timers after running some time (a.k.a. THE timer problem)

Reporter Ge0rG
Owner MattJ
Created
Updated
Stars ★★ (3)
Tags
  • Priority-Medium
  • Type-Defect
  • Status-Fixed
  1. Ge0rG on

    How to reproduce: Run prosody-trunk from the prosody debian repository (tested with 1nightly541-1~wheezy and some earlier versions), and have either mod_bosh or mod_smacks (or both) enabled. After some time, prosody will fail to execute timer code (mod_pinger, mod_smacks timeouts, ...). waqas suggested to run the following from the admin console: >local h,now=timer.add_task.h,timer.reschedule.get_time();for i,v in ipairs(h.priorities) do local cb,param=h.items[h.ids[i]],timer.add_task.params[h.ids[i]]; if(tostring(cb)=="function(moduleapi.lua:366)") then cb="api:"..tostring(param.callback); elseif(tostring(cb)=="function(watchdog.lua:12)") then cb="watchdog:"..tostring(cb.watchdog.callback) end print(i,v,v - now, cb); end Currently it reports >1500 events in the queue with negative wait times: ... | 1514 1422293771.4827 -8874.7103559971 function(mod_s2s.lua:557) | 1515 1422294227.0253 -8419.1677281857 function(mod_smacks.lua:262) | 1516 1422295940.906 -6705.2870650291 function(mod_smacks.lua:262) | 1517 1422297073.3799 -5572.8131351471 function(mod_smacks.lua:262) | 1518 1422298475.1318 -4171.0612580776 function(mod_smacks.lua:262) | 1519 1422298191.1517 -4455.0413451195 function(dns.lua:775) | 1520 1422298191.1523 -4455.0407102108 function(dns.lua:775) | 1521 1422298191.1561 -4455.0369620323 function(mod_manifesto.lua:70) | 1522 1422298777.1561 -3869.036935091 watchdog:function(mod_pinger.lua:14) | 1523 1422298266.1904 -4380.0026640892 function(mod_s2s.lua:557) | 1524 1422298176.9323 -4469.2607090473 api:function(mod_smacks.lua:155) | 1525 1422298269.9654 -4376.2276530266 function(mod_s2s.lua:557) | 1526 1422299816.5679 -2829.6250941753 function(mod_s2s.lua:557) | Result: nil

  2. Zash on

    My attempt at fixing this: (includes some overly verbose debug logging that would be stripped before commiting) diff --git a/util/timer.lua b/util/timer.lua --- a/util/timer.lua +++ b/util/timer.lua @@ -14,11 +14,21 @@ local debug_traceback = debug.traceback; local tostring = tostring; local xpcall = xpcall; +local os_date = os.date; + +local function datefmt(t) + if os_date("%F") == os_date("%F", t) then + return os_date("%T.%%03d", t):format(t%1*1000); + end + return os_date("%F %T", t); +end local _ENV = nil; local _add_task = server.add_task; +local _server_timer; +local _active_timers = 0; local h = indexedbheap.create(); local params = {}; local next_time = nil; @@ -29,23 +39,44 @@ local peek; while true do peek = h:peek(); - if peek == nil or peek > now then break; end + if peek == nil or peek > now then + log("debug", "No more pending timers"); + break; + end local _; _, _callback, _id = h:pop(); _now = now; _param = params[_id]; params[_id] = nil; - --item(now, id, _param); -- FIXME pcall + log("debug", "_callback: %s", tostring(_param and _param.callback or _callback)); local success, err = xpcall(_call, _traceback_handler); if success and type(err) == "number" then h:insert(_callback, err + now, _id); -- re-add params[_id] = _param; end end + + if peek ~= nil and _active_timers > 1 and peek == next_time then + log("debug", "Another timer is scheduled for %s already", datefmt(next_time)); + peek = nil; + else next_time = peek; - if peek ~= nil then + end + + if peek then + log("debug", "Wait until %s", datefmt(peek)); return peek - now; end + if not next_time and _active_timers == 0 then + log("debug", "No more timers"); + if _server_timer then + _server_timer:close(); + _server_timer = nil; + end + else + _active_timers = _active_timers - 1; + log("debug", "Not renewing, %d active timers left", _active_timers); + end end local function add_task(delay, callback, param) local current_time = get_time(); @@ -55,7 +86,14 @@ params[id] = param; if next_time == nil or event_time < next_time then next_time = event_time; - _add_task(next_time - current_time, _on_timer); + log("debug", "server.add_task(%s, _on_timer)", datefmt(next_time)); + if _server_timer then + _server_timer:close(); + _server_timer = nil; + else + _active_timers = _active_timers + 1; + end + _server_timer = _add_task(next_time - current_time, _on_timer); end return id; end

  3. Zash on

    This might be fixed with http://hg.prosody.im/trunk/rev/f5fd2c5cdf28 (and the couple of commits just before it), yell if the issue persists.

    Changes
    • owner Zash
    • tag Status-Fixed
  4. Ge0rG on

    *yell* sorry, the issue still persists. After only three minutes of uptime: timer:info() | -- util.timer | 2016-06-29 12:06:51 function(mod_statistics.lua:104) | 2016-06-29 12:07:06 function(mod_s2s.lua:584) | 2016-06-29 12:07:07 function(mod_s2s.lua:584) | 2016-06-29 12:07:40 function(mod_s2s.lua:584) | 2016-06-29 12:16:11 function(watchdog.lua:12) | 2016-06-29 12:07:42 function(mod_s2s.lua:584) | 2016-06-29 12:07:42 function(mod_s2s.lua:584) | 2016-06-29 12:15:09 function(watchdog.lua:12) | 2016-06-29 12:07:40 function(mod_s2s.lua:584) | 1 libevent callbacks | OK: Next event at 2016-06-29 12:06:51 (in -18.169477s)

  5. Zash on

    Changes
    • tags Status-Accepted
  6. Ge0rG on

    Just a small reminder that this is probably related to how lua-event (0.4.3 tested) handles the difference between NULL events and events with time=0. The following patch fixed THE timer bug for me: https://op-co.de/tmp/lua-event-callback.patch

  7. Zash on

    Can I close this then?

  8. Ge0rG on

    I'd like to keep this open until there is a new lua-event version and prosody's dependency has been updated accordingly.

  9. Zash on

    https://github.com/harningt/luaevent/commit/e9efea74d2eb8f3b1cb7ee2d3d0c83a88524ac2e Looks like this should be in the 0.4.4 release.

  10. Zash on

    So this is now a package repo issue?

    Changes
    • owner Zash MattJ
  11. MattJ on

    This is a major packaging issue. Debian and Ubuntu are currently providing a 0.4.3 package (without the fix), but that package reports itself as 0.4.4 (which is what shows up in 'prosodyctl about'). I too would like to keep this issue open until we can resolve this sensibly, possibly by offering lua-event from our repo.

  12. Zash on

    Relevant bug report in Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=889743

  13. MattJ on

    The fixed Debian package for lua-event was included in buster and stretch-backports. A related issue with similar symptoms (but less likely to trigger) was #1572 (0.11, fixed). I've not seen any other recent reports of timers not firing, so I'm going to close this now. Thanks to everyone who helped with debugging!

    Changes
    • tags Status-Fixed

New comment

Not published. Used for spam prevention and optional update notifications.