Wreschnig, Alexander Scott
2018-10-29 19:32:31 UTC
I have what is hopefully a quick question regarding clamd. What's a good method for determining ideal chunk sizes when streaming data to the daemon over a socket connection? Or should I ignore chunking altogether and just stream one big contiguous file?
The background: I've developed a very simple plugin for an unrelated application that sends user-uploaded files of varying formats to clamd over a socket for some basic virus scanning. At the moment, and based on some of the clamd documentation, it loops over each file grabbing small chunks at a time and streams each of those chunks to clamd. It's working fine, so I can in theory leave it exactly as-is. But I used an arbitrary value for chunk size and as I'm looking more closely I'm having a hard time finding documentation on how this works or what my chunk size should be (beyond the maximum chunk size, which I can see is StreamMaxLength). For reference, from man clamd:
"The stream is sent to clamd in chunks, after INSTREAM, on the same socket on which the command was sent. This avoids the overhead of establishing new TCP connections and problems with NAT. The format of the chunk is: '<length><data>' where <length> is the size of the following data in bytes expressed as a 4 byte unsigned integer in network byte order and <data> is the actual chunk. Streaming is terminated by sending a zero-length chunk. Note: do not exceed StreamMaxLength as defined in clamd.conf [...]"
StreamMaxLength, on the other hand, is documented as
"[...] This option allows you to specify the upper limit for data size that will be transfered to remote daemon when scanning a single file. It should match your MTA's limit for a maximum attachment size."
Looking at this combination I'm wondering if, since I'm only worrying about attachments (which by definition shouldn't be larger than maximum attachment size), there's another good reason to chunk things up or if I should just stream everything in one go.
Sorry if there's an obvious answer staring at me and I'm not seeing it-I swear I looked! And thanks for any advice.
-
Alex Wreschnig
The background: I've developed a very simple plugin for an unrelated application that sends user-uploaded files of varying formats to clamd over a socket for some basic virus scanning. At the moment, and based on some of the clamd documentation, it loops over each file grabbing small chunks at a time and streams each of those chunks to clamd. It's working fine, so I can in theory leave it exactly as-is. But I used an arbitrary value for chunk size and as I'm looking more closely I'm having a hard time finding documentation on how this works or what my chunk size should be (beyond the maximum chunk size, which I can see is StreamMaxLength). For reference, from man clamd:
"The stream is sent to clamd in chunks, after INSTREAM, on the same socket on which the command was sent. This avoids the overhead of establishing new TCP connections and problems with NAT. The format of the chunk is: '<length><data>' where <length> is the size of the following data in bytes expressed as a 4 byte unsigned integer in network byte order and <data> is the actual chunk. Streaming is terminated by sending a zero-length chunk. Note: do not exceed StreamMaxLength as defined in clamd.conf [...]"
StreamMaxLength, on the other hand, is documented as
"[...] This option allows you to specify the upper limit for data size that will be transfered to remote daemon when scanning a single file. It should match your MTA's limit for a maximum attachment size."
Looking at this combination I'm wondering if, since I'm only worrying about attachments (which by definition shouldn't be larger than maximum attachment size), there's another good reason to chunk things up or if I should just stream everything in one go.
Sorry if there's an obvious answer staring at me and I'm not seeing it-I swear I looked! And thanks for any advice.
-
Alex Wreschnig