Discussion:
[clamav-users] Structuring instream calls to clamd
Wreschnig, Alexander Scott
2018-10-29 19:32:31 UTC
Permalink
I have what is hopefully a quick question regarding clamd. What's a good method for determining ideal chunk sizes when streaming data to the daemon over a socket connection? Or should I ignore chunking altogether and just stream one big contiguous file?

The background: I've developed a very simple plugin for an unrelated application that sends user-uploaded files of varying formats to clamd over a socket for some basic virus scanning. At the moment, and based on some of the clamd documentation, it loops over each file grabbing small chunks at a time and streams each of those chunks to clamd. It's working fine, so I can in theory leave it exactly as-is. But I used an arbitrary value for chunk size and as I'm looking more closely I'm having a hard time finding documentation on how this works or what my chunk size should be (beyond the maximum chunk size, which I can see is StreamMaxLength). For reference, from man clamd:

"The stream is sent to clamd in chunks, after INSTREAM, on the same socket on which the command was sent. This avoids the overhead of establishing new TCP connections and problems with NAT. The format of the chunk is: '<length><data>' where <length> is the size of the following data in bytes expressed as a 4 byte unsigned integer in network byte order and <data> is the actual chunk. Streaming is terminated by sending a zero-length chunk. Note: do not exceed StreamMaxLength as defined in clamd.conf [...]"

StreamMaxLength, on the other hand, is documented as

"[...] This option allows you to specify the upper limit for data size that will be transfered to remote daemon when scanning a single file. It should match your MTA's limit for a maximum attachment size."

Looking at this combination I'm wondering if, since I'm only worrying about attachments (which by definition shouldn't be larger than maximum attachment size), there's another good reason to chunk things up or if I should just stream everything in one go.

Sorry if there's an obvious answer staring at me and I'm not seeing it-I swear I looked! And thanks for any advice.

-
Alex Wreschnig
Micah Snyder (micasnyd)
2018-10-30 17:08:08 UTC
Permalink
Hi Alex,

I don't like seeing a well researched question go un-answered, though I don't have a very good answer for you. We don't have any documentation from any previous work to say if there is an optimum chunk size for TCP sockets or unix sockets.

Intuitively, if you're using a TCP socket, particularly if sending over the network (hopefully using an encrypted SSH tunnel) then chunking will probably be done for you, and if you do chunking then ensuring that your chunk size is lower than the MTU for the TCP/IP stack may prevent you from sending ittybitty chunks every other packet.

If you're using a unix local socket, I really don't know if chunking buys you anything. If you do end up doing some testing, it would be interesting to find out what you learn.

Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.


On Oct 29, 2018, at 3:32 PM, Wreschnig, Alexander Scott <***@pitt.edu<mailto:***@pitt.edu>> wrote:

I have what is hopefully a quick question regarding clamd. What’s a good method for determining ideal chunk sizes when streaming data to the daemon over a socket connection? Or should I ignore chunking altogether and just stream one big contiguous file?

The background: I’ve developed a very simple plugin for an unrelated application that sends user-uploaded files of varying formats to clamd over a socket for some basic virus scanning. At the moment, and based on some of the clamd documentation, it loops over each file grabbing small chunks at a time and streams each of those chunks to clamd. It’s working fine, so I can in theory leave it exactly as-is. But I used an arbitrary value for chunk size and as I’m looking more closely I’m having a hard time finding documentation on how this works or what my chunk size should be (beyond the maximum chunk size, which I can see is StreamMaxLength). For reference, from man clamd:

“The stream is sent to clamd in chunks, after INSTREAM, on the same socket on which the command was sent. This avoids the overhead of establishing new TCP connections and problems with NAT. The format of the chunk is: '<length><data>' where <length> is the size of the following data in bytes expressed as a 4 byte unsigned integer in network byte order and <data> is the actual chunk. Streaming is terminated by sending a zero-length chunk. Note: do not exceed StreamMaxLength as defined in clamd.conf [
]”

StreamMaxLength, on the other hand, is documented as

“[
] This option allows you to specify the upper limit for data size that will be transfered to remote daemon when scanning a single file. It should match your MTA's limit for a maximum attachment size.”

Looking at this combination I’m wondering if, since I’m only worrying about attachments (which by definition shouldn’t be larger than maximum attachment size), there’s another good reason to chunk things up or if I should just stream everything in one go.

Sorry if there’s an obvious answer staring at me and I’m not seeing it—I swear I looked! And thanks for any advice.

—
Alex Wreschnig

_______________________________________________
clamav-users mailing list
clamav-***@lists.clamav.net<mailto:clamav-***@lists.clamav.net>
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Wreschnig, Alexander Scott
2018-10-30 17:48:51 UTC
Permalink
Thanks for the response, Micah. If the benefits are, indeed, unclear, then I probably won't be futzing much with a perfectly functional implementation in the near future--but if I do any experiments, I'll be sure to share the results.

- Alex
________________________________
From: clamav-users <clamav-users-***@lists.clamav.net> on behalf of Micah Snyder (micasnyd) <***@cisco.com>
Sent: Tuesday, October 30, 2018 1:08:08 PM
To: ClamAV users ML
Subject: Re: [clamav-users] Structuring instream calls to clamd

Hi Alex,

I don't like seeing a well researched question go un-answered, though I don't have a very good answer for you. We don't have any documentation from any previous work to say if there is an optimum chunk size for TCP sockets or unix sockets.

Intuitively, if you're using a TCP socket, particularly if sending over the network (hopefully using an encrypted SSH tunnel) then chunking will probably be done for you, and if you do chunking then ensuring that your chunk size is lower than the MTU for the TCP/IP stack may prevent you from sending ittybitty chunks every other packet.

If you're using a unix local socket, I really don't know if chunking buys you anything. If you do end up doing some testing, it would be interesting to find out what you learn.

Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.


On Oct 29, 2018, at 3:32 PM, Wreschnig, Alexander Scott <***@pitt.edu<mailto:***@pitt.edu>> wrote:

I have what is hopefully a quick question regarding clamd. What’s a good method for determining ideal chunk sizes when streaming data to the daemon over a socket connection? Or should I ignore chunking altogether and just stream one big contiguous file?

The background: I’ve developed a very simple plugin for an unrelated application that sends user-uploaded files of varying formats to clamd over a socket for some basic virus scanning. At the moment, and based on some of the clamd documentation, it loops over each file grabbing small chunks at a time and streams each of those chunks to clamd. It’s working fine, so I can in theory leave it exactly as-is. But I used an arbitrary value for chunk size and as I’m looking more closely I’m having a hard time finding documentation on how this works or what my chunk size should be (beyond the maximum chunk size, which I can see is StreamMaxLength). For reference, from man clamd:

“The stream is sent to clamd in chunks, after INSTREAM, on the same socket on which the command was sent. This avoids the overhead of establishing new TCP connections and problems with NAT. The format of the chunk is: '<length><data>' where <length> is the size of the following data in bytes expressed as a 4 byte unsigned integer in network byte order and <data> is the actual chunk. Streaming is terminated by sending a zero-length chunk. Note: do not exceed StreamMaxLength as defined in clamd.conf […]”

StreamMaxLength, on the other hand, is documented as

“[…] This option allows you to specify the upper limit for data size that will be transfered to remote daemon when scanning a single file. It should match your MTA's limit for a maximum attachment size.”

Looking at this combination I’m wondering if, since I’m only worrying about attachments (which by definition shouldn’t be larger than maximum attachment size), there’s another good reason to chunk things up or if I should just stream everything in one go.

Sorry if there’s an obvious answer staring at me and I’m not seeing it—I swear I looked! And thanks for any advice.

—
Alex Wreschnig

_______________________________________________
clamav-users mailing list
clamav-***@lists.clamav.net<mailto:clamav-***@lists.clamav.net>
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.clamav.net%2Fcgi-bin%2Fmailman%2Flistinfo%2Fclamav-users&data=02%7C01%7Casw76%40pitt.edu%7C88798b7583a1492d14a208d63e8a5da1%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C636765161336080249&sdata=jmfeWvpjY6NyO2S6wj4j1vj6XIMMLvBqU9L02inSvsc%3D&reserved=0>


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvrtadmin%2Fclamav-faq&data=02%7C01%7Casw76%40pitt.edu%7C88798b7583a1492d14a208d63e8a5da1%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C636765161336090257&sdata=3kJmnMtXagOBlpuJ3B18a5rm2iDLiS9COqPd9SoqVvs%3D&reserved=0>

http://www.clamav.net/contact.html#ml<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.clamav.net%2Fcontact.html%23ml&data=02%7C01%7Casw76%40pitt.edu%7C88798b7583a1492d14a208d63e8a5da1%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C636765161336100265&sdata=Ub33go65LMCuOcmdKPlJ0Ma4AH0AoHhKjuGxH2qES0s%3D&reserved=0>
Loading...