#734 Non-ASCII characters dumped to debug log

Reporter ge0rg
Owner Nobody
Stars ★ (1)  
  • Status-New
  • Priority-Medium
  • Type-Defect
  1. ge0rg on

    When an invalid input is detected on a connection, prosody dumps the raw data (with control characters stripped) to the logging facility: Aug 31 09:57:28 mod_c2s debug Received invalid XML (not well-formed (invalid token)) 77 bytes: ____H___D__Wƍ�2_M��x>_��a__����#ȱN�_4c��________ ___d_b_________c____�____ As the character encoding of syslog can not (and should not) be assumed, the dump should only contain the printable ASCII characters verbatim (codes 32-127). Otherwise, it might be possible to craft a character sequence that results in special or control characters, especially on UTF-8 systems (i.e. an RTL text-flow marker or even ASCII control characters composed via overlong UTF-8 sequences). Ideally, the invalid characters should be replaced by a safe encoding (urlencoded %1C or some other encoding that preserves the whole content and is easy to reconstruct). However, stripping of all non-printable non-ASCII characters would provide for a sufficient first approximation. Thanks, Georg

New comment