Discussion:
Calling Clamd INSTREAM on blocks of data, can a virus sneak by the edge of a block?
John-Charles D. Sokolow
2011-12-25 05:48:47 UTC
Permalink
I am experimenting with a python script which uses
http://xael.org/norman/python/pyclamd/ to scan blocks of data.
Here is my scenario, I read one block, ( 4096 bytes in my case ) from a
socket. I call pyclamd.scan_stream( block ), which I assume is in turn
calling either INSTREAM, or STREAM, ( I don't know since the docs for
pyclamd don't specify which actual calmd call occurs when calling
scan_stream. ) I then check the return code from calmd if it returns
None (NULL) I know that the block is safe and I pass it along, otherwise
I throw an exception and close the connection. My question is this since
I'm breaking the stream up into blocks and scanning each block
separately am I running the risk of a virus sneeking by the edge of the
blocks and not matching a pattern. For example take the block 'Hello
Vir' and the block 'us World' assume that the sub string 'Virus' is the
actual virus, since neither 'Vir' ( the last 3 bytes of the first block
) nor 'us'( the first two bytes of the second block ) are 'Virus' it
would seem that clamd would miss "Virus" and not return a match, letting
the virus essentially sneak through the sides as it were. Is this true?
If so, is there a work around? Or do I need to save the complete stream
to disk then call clamd.scan_file("/tmp/tfile.bin") before
re-transmitting the file?
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml
Török Edwin
2011-12-25 09:12:14 UTC
Permalink
I am experimenting with a python script which uses http://xael.org/norman/python/pyclamd/ to scan blocks of data.
Here is my scenario, I read one block, ( 4096 bytes in my case ) from a socket. I call pyclamd.scan_stream( block ), which I assume is in turn calling either INSTREAM, or STREAM, ( I don't know since
the docs for pyclamd don't specify which actual calmd call occurs when calling scan_stream. ) I then check the return code from calmd if it returns None (NULL) I know that the block is safe and I pass
it along, otherwise I throw an exception and close the connection. My question is this since I'm breaking the stream up into blocks and scanning each block separately am I running the risk of a virus
sneeking by the edge of the blocks and not matching a pattern. For example take the block 'Hello Vir' and the block 'us World' assume that the sub string 'Virus' is the actual virus, since neither
'Vir' ( the last 3 bytes of the first block ) nor 'us'( the first two bytes of the second block ) are 'Virus' it would seem that clamd would miss "Virus" and not return a match, letting the virus
essentially sneak through the sides as it were. Is this true? If so, is there a work around? Or do I need to save the complete stream to disk then call clamd.scan_file("/tmp/tfile.bin") before
re-transmitting the file?
Clamd needs the entire file, without that you won't get the results you are expecting.
Scanning 4k blocks at a time is not a good idea.

It appears to be a limitation of the python wrapper you are using: you don't need to send all your data at once.
You can send the STREAM/INSTREAM command, and then stream your data when you get it.

You don't necesarely have to save the file to disk prior to scanning though, you can just stream
all your blocks using INSTREAM (which will create the tempfile on clamd's end).
The format for INSTREAM on the socket is:
1. send the INSTREAM command: zINSTREAM\0, or nINSTREAM\n
2. send <length> (big endian, 4 bytes)
3. send the chunk of data corresponding to the above length
4. repeat at 2 as long as you have more blocks to send
5. send a 0-length block to mark end of stream

And STREAM is similar to FTP, you get port back where you can send the entire data.

Best regards,
--Edwin
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml

Loading...