When an invalid input is detected on a connection, prosody dumps the raw data (with control characters stripped) to the logging facility:
Aug 31 09:57:28 mod_c2s debug Received invalid XML (not well-formed (invalid token)) 77 bytes: ____H___D__Wƍ�2_M��x>_��a__����#ȱN�_4c��________ ___d_b_________c____�____
As the character encoding of syslog can not (and should not) be assumed, the dump should only contain the printable ASCII characters verbatim (codes 32-127).
Otherwise, it might be possible to craft a character sequence that results in special or control characters, especially on UTF-8 systems (i.e. an RTL text-flow marker or even ASCII control characters composed via overlong UTF-8 sequences).
Ideally, the invalid characters should be replaced by a safe encoding (urlencoded %1C or some other encoding that preserves the whole content and is easy to reconstruct). However, stripping of all non-printable non-ASCII characters would provide for a sufficient first approximation.