When an invalid input is detected on a connection, prosody dumps the raw data (with control characters stripped) to the logging facility:
Aug 31 09:57:28 mod_c2s debug Received invalid XML (not well-formed (invalid token)) 77 bytes: ____H___D__Wƍ�2_M��x>_��a__����#ȱN�_4c��________ ___d_b_________c____�____
As the character encoding of syslog can not (and should not) be assumed, the dump should only contain the printable ASCII characters verbatim (codes 32-127).
Otherwise, it might be possible to craft a character sequence that results in special or control characters, especially on UTF-8 systems (i.e. an RTL text-flow marker or even ASCII control characters composed via overlong UTF-8 sequences).
Ideally, the invalid characters should be replaced by a safe encoding (urlencoded %1C or some other encoding that preserves the whole content and is easy to reconstruct). However, stripping of all non-printable non-ASCII characters would provide for a sufficient first approximation.
Thanks,
Georg
When an invalid input is detected on a connection, prosody dumps the raw data (with control characters stripped) to the logging facility: Aug 31 09:57:28 mod_c2s debug Received invalid XML (not well-formed (invalid token)) 77 bytes: ____H___D__Wƍ�2_M��x>_��a__����#ȱN�_4c��________ ___d_b_________c____�____ As the character encoding of syslog can not (and should not) be assumed, the dump should only contain the printable ASCII characters verbatim (codes 32-127). Otherwise, it might be possible to craft a character sequence that results in special or control characters, especially on UTF-8 systems (i.e. an RTL text-flow marker or even ASCII control characters composed via overlong UTF-8 sequences). Ideally, the invalid characters should be replaced by a safe encoding (urlencoded %1C or some other encoding that preserves the whole content and is easy to reconstruct). However, stripping of all non-printable non-ASCII characters would provide for a sufficient first approximation. Thanks, Georg
FWIW it did escape ASCII codes 0-31 as underscores already. Fixed in https://hg.prosody.im/trunk/rev/7fa273f8869e
ChangesRelated: https://hg.prosody.im/trunk/rev/5f4a657136bc Unsure if this is too disruptive tho