Discussion:
[clamav-users] ClamAV 0.100.1 - clamd signal 11, leaves unix domain socket behind?
Karl Pielorz
2018-09-19 10:13:13 UTC
Permalink
Hi,

I'm running ClamAV 0.100.1 (from pkg) under FreeBSD 11.2 amd64 -
occasionally it's being passed something it doesn't like, and 'clamd' is
dieing with a signal 11, i.e.

Sep 19 10:34:50 host clamd[855]: SelfCheck: Database status OK.
...
Sep 19 10:35:27 host kernel: pid 855 (clamd), uid 106: exited on signal 11

I'm trying to track down what content is causing this.

Software connects to clamd via a local unix domain socket, e.g.
'/var/run/clamav/clamd.sock'. If clamd dies - this socket is left behind,
and software still blindly connects to it.

Is there any way to have the socket removed when clamd dies? (i.e. even due
to a signal/failure?)

Having it still there (e.g. 'telnet -u /var/run/clamav/clamd.sock' still
connects) causes processes to backup - as they patiently wait for a daemon
that's sadly long gone.

Thanks,

-Kp
_______________________________________________
clamav-users mailing list
clamav-***@lists.clamav.net
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
G.W. Haywood
2018-09-19 16:28:02 UTC
Permalink
Hi there,
Post by Karl Pielorz
Is there any way to have the socket removed when clamd dies?
(i.e. even due to a signal/failure?)
I do things like this with ad-hoc watchdog scripts running from cron.

You could write a shell script, called from cron every few minutes or
so, which sends 'PING' to the clamd socket and if it doesn't get the
'PONG' reply back within a reasonable time, then:

/usr/bin/killall clamd; sleep n; rm -f /path/to/socket; /etc/init.d/clamd start

or something like that. Even if a utility offers an ability to remove
its own socket, if it's unreliable it probably won't do that reliably
so I'd use a script like that anyway. Think carefully about what the
'reasonable time' might be. I've seen scans take minutes, you won't
want for example to interrupt the scanning of a large document.

Having said that, I've never had to do anything like that for clamd,
which has been one of the more reliable daemons I've used (and I've
used it for well over a decade).

So something seems to be wrong, and I'll suggest you need to find out
what that is rather than just fix the symptom - although you obviously
need to do something in the interim to keep mail flowing.
--
73,
Ged.

_______________________________________________
clamav-users mailing list
clamav-***@lists.clamav.net
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Micah Snyder (micasnyd)
2018-09-19 16:43:02 UTC
Permalink
Alternatively, you could switch your clamd.conf to use a TCP Socket. Just make sure it isn't internet accessible.


Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.


On Sep 19, 2018, at 12:28 PM, G.W. Haywood <***@jubileegroup.co.uk<mailto:***@jubileegroup.co.uk>> wrote:

Hi there,

On Wed, 19 Sep 2018, Karl Pielorz wrote:

Is there any way to have the socket removed when clamd dies?
(i.e. even due to a signal/failure?)

I do things like this with ad-hoc watchdog scripts running from cron.

You could write a shell script, called from cron every few minutes or
so, which sends 'PING' to the clamd socket and if it doesn't get the
'PONG' reply back within a reasonable time, then:

/usr/bin/killall clamd; sleep n; rm -f /path/to/socket; /etc/init.d/clamd start

or something like that. Even if a utility offers an ability to remove
its own socket, if it's unreliable it probably won't do that reliably
so I'd use a script like that anyway. Think carefully about what the
'reasonable time' might be. I've seen scans take minutes, you won't
want for example to interrupt the scanning of a large document.

Having said that, I've never had to do anything like that for clamd,
which has been one of the more reliable daemons I've used (and I've
used it for well over a decade).

So something seems to be wrong, and I'll suggest you need to find out
what that is rather than just fix the symptom - although you obviously
need to do something in the interim to keep mail flowing.

--

73,
Ged.

_______________________________________________
clamav-users mailing list
clamav-***@lists.clamav.net<mailto:clamav-***@lists.clamav.net>
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Karl Pielorz
2018-09-20 07:47:52 UTC
Permalink
--On 19 September 2018 16:43 +0000 "Micah Snyder (micasnyd)"
Post by Micah Snyder (micasnyd)
Alternatively, you could switch your clamd.conf to use a TCP Socket.
Just make sure it isn't internet accessible.
Hi,

I'd have to look into this as the software talking to ClamAV expects a
local unix domain socket, I don't know if it supports a TCP socket.

Do you know if ClamD the "usual" unix trick of unlinking the file before it
binds it? - i.e. is there any way of avoiding a stale / lingering socket on
death (any death).

-Kp
_______________________________________________
clamav-users mailing list
clamav-***@lists.clamav.net
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Micah Snyder (micasnyd)
2018-09-20 15:44:20 UTC
Permalink
Clamd has a FixStaleSocket option that is default on.
FixStaleSocket will unlink the lingering stale socket and bind again if it failed to bind when restarting clamd.

# Remove stale socket after unclean shutdown.
# Default: yes
#FixStaleSocket yes

I all ears if anyone knows of a better way to remove the stale socket on death instead of on startup.
As Ged Haywood suggested, your best option may be to have an ad-hoc watchdog script monitor clamd and kill the socket if clamd become unresponsive for too long.

That said, if you figure out which file was killing clamd, I'd love to have a sample so I can try to fix the bug. It would be very helpful.

Regards,
Micah


Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.


On Sep 20, 2018, at 3:47 AM, Karl Pielorz <***@tdx.co.uk<mailto:***@tdx.co.uk>> wrote:



--On 19 September 2018 16:43 +0000 "Micah Snyder (micasnyd)" <***@cisco.com<mailto:***@cisco.com>> wrote:

Alternatively, you could switch your clamd.conf to use a TCP Socket.
Just make sure it isn't internet accessible.

Hi,

I'd have to look into this as the software talking to ClamAV expects a local unix domain socket, I don't know if it supports a TCP socket.

Do you know if ClamD the "usual" unix trick of unlinking the file before it binds it? - i.e. is there any way of avoiding a stale / lingering socket on death (any death).

-Kp
_______________________________________________
clamav-users mailing list
clamav-***@lists.clamav.net<mailto:clamav-***@lists.clamav.net>
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Karl Pielorz
2018-09-21 08:49:23 UTC
Permalink
--On 20 September 2018 15:44 +0000 "Micah Snyder (micasnyd)"
Post by Micah Snyder (micasnyd)
Clamd has a FixStaleSocket option that is default on.
FixStaleSocket will unlink the lingering stale socket and bind again if
it failed to bind when restarting clamd.
Hi, yeah - I saw that option.
Post by Micah Snyder (micasnyd)
I all ears if anyone knows of a better way to remove the stale socket on
death instead of on startup. As Ged Haywood suggested, your best option
may be to have an ad-hoc watchdog script monitor clamd and kill the
socket if clamd become unresponsive for too long.
Being simplistic, a sigsegv handler? :) [simplistic as it just fixes my
case ]
Post by Micah Snyder (micasnyd)
That said, if you figure out which file was killing clamd, I'd love to
have a sample so I can try to fix the bug. It would be very helpful.
I'd love to be able to do that - but the usual 'needle in a haystack', and
that fact it's very intermittent isn't helping us much (nor the fact it
gets delivered if it fails during the scan) - if I find it, you'll be the
2nd person to know :) - I am still looking. I guess turning on coredumps
might provide some info captured to disk - I'll post anything I find.

Thanks,

-Kp
_______________________________________________
clamav-users mailing list
clamav-***@lists.clamav.net
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Mark Fortescue
2018-09-24 10:31:11 UTC
Permalink
Hi Micah,

Can you not have a two part demon process. Part one fork's the real
demon and then waits for it to die (with 'wait()').
On death of the child, it cleans up and exits. Yes I know it is not
quite as simple as that. It will have to have signal handlers etc. to
kill the child etc. and should also have logging.

It would have to be built into 'clamd' as 'clamd' should already be
doing things to become a demon process and this additional 'fork' would
need to be after all that has been done.

Regards
Mark.
Post by Karl Pielorz
--On 20 September 2018 15:44 +0000 "Micah Snyder (micasnyd)"
Post by Micah Snyder (micasnyd)
Clamd has a FixStaleSocket option that is default on.
FixStaleSocket will unlink the lingering stale socket and bind again if
it failed to bind when restarting clamd.
Hi, yeah - I saw that option.
Post by Micah Snyder (micasnyd)
I all ears if anyone knows of a better way to remove the stale socket on
death instead of on startup. As Ged Haywood suggested, your best option
may be to have an ad-hoc watchdog script monitor clamd and kill the
socket if clamd become unresponsive for too long.
Being simplistic, a sigsegv handler? :) [simplistic as it just fixes my
case ]
Post by Micah Snyder (micasnyd)
That said, if you figure out which file was killing clamd, I'd love to
have a sample so I can try to fix the bug. It would be very helpful.
I'd love to be able to do that - but the usual 'needle in a haystack',
and that fact it's very intermittent isn't helping us much (nor the fact
it gets delivered if it fails during the scan) - if I find it, you'll be
the 2nd person to know :) - I am still looking. I guess turning on
coredumps might provide some info captured to disk - I'll post anything
I find.
Thanks,
-Kp
_______________________________________________
clamav-users mailing list
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users
https://github.com/vrtadmin/clamav-faq
http://www.clamav.net/contact.html#ml
_______________________________________________
clamav-users mailing list
clamav-***@lists.clamav.net
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Micah Snyder (micasnyd)
2018-09-24 16:14:53 UTC
Permalink
That seems like a good approach to me.

First, we would want to change the order of operations a bit so we fork (both times) before loading the database. I'm pretty sure that as it is written now, it loads over 500MB worth of signature database content into RAM and then forks, temporarily resulting in over 1000MB of ram consumed until the parent process exits. That all seems doable though.


Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.


On Sep 24, 2018, at 6:31 AM, Mark Fortescue <***@thurning-instruments.co.uk<mailto:***@thurning-instruments.co.uk>> wrote:

Hi Micah,

Can you not have a two part demon process. Part one fork's the real demon and then waits for it to die (with 'wait()').
On death of the child, it cleans up and exits. Yes I know it is not quite as simple as that. It will have to have signal handlers etc. to kill the child etc. and should also have logging.

It would have to be built into 'clamd' as 'clamd' should already be doing things to become a demon process and this additional 'fork' would need to be after all that has been done.

Regards
Mark.

On 21/09/18 09:49, Karl Pielorz wrote:


--On 20 September 2018 15:44 +0000 "Micah Snyder (micasnyd)"
<***@cisco.com<mailto:***@cisco.com>> wrote:

Clamd has a FixStaleSocket option that is default on.
FixStaleSocket will unlink the lingering stale socket and bind again if
it failed to bind when restarting clamd.

Hi, yeah - I saw that option.

I all ears if anyone knows of a better way to remove the stale socket on
death instead of on startup. As Ged Haywood suggested, your best option
may be to have an ad-hoc watchdog script monitor clamd and kill the
socket if clamd become unresponsive for too long.

Being simplistic, a sigsegv handler? :) [simplistic as it just fixes my
case ]

That said, if you figure out which file was killing clamd, I'd love to
have a sample so I can try to fix the bug. It would be very helpful.

I'd love to be able to do that - but the usual 'needle in a haystack',
and that fact it's very intermittent isn't helping us much (nor the fact
it gets delivered if it fails during the scan) - if I find it, you'll be
the 2nd person to know :) - I am still looking. I guess turning on
coredumps might provide some info captured to disk - I'll post anything
I find.

Thanks,

-Kp
_______________________________________________
clamav-users mailing list
clamav-***@lists.clamav.net<mailto:clamav-***@lists.clamav.net>
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Karl Pielorz
2018-09-25 11:03:56 UTC
Permalink
--On 24 September 2018 11:31 +0100 Mark Fortescue
Post by Mark Fortescue
Hi Micah,
Can you not have a two part demon process. Part one fork's the real demon
and then waits for it to die (with 'wait()').
On death of the child, it cleans up and exits. Yes I know it is not quite
as simple as that. It will have to have signal handlers etc. to kill the
child etc. and should also have logging.
Anything which fixes the issue (and this sounds like it would) gets my
vote. I think it's compounded by the fact that clamd doesn't offer up any
connection 'banner' or anything - i.e. for the local unix domain socket you
connect, and push data - that's it. It's not like you connect, wait for
'greeting' - then send data.

This also makes it hard to implement timeouts. I'm currently looking at
something that will check clamd is running, before the connect (i.e. PID
wise) - but that, or "running it from another script" are all kind of
band-aids - compared to something like the above...

fwiw - Hacking a sigsegv handler into it to remove the file, seemed to work
- but that's a very specific "hack", again, compared to the above (and
means we have to build from source + patch, not pkg on FreeBSD)

-Kp
_______________________________________________
clamav-users mailing list
clamav-***@lists.clamav.net
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
G.W. Haywood
2018-09-21 16:14:02 UTC
Permalink
Hi there,
... it gets delivered if it fails during the scan ...
It doesn't have to be that way, and if someone knows a way to stop
clamd then maybe they could use it to get past your defences.
--
73,
Ged.
_______________________________________________
clamav-users mailing list
clamav-***@lists.clamav.net
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Karl Pielorz
2018-09-25 10:47:51 UTC
Permalink
--On 21 September 2018 17:14 +0100 "G.W. Haywood"
Post by G.W. Haywood
Hi there,
... it gets delivered if it fails during the scan ...
It doesn't have to be that way, and if someone knows a way to stop
clamd then maybe they could use it to get past your defences.
This is true, but not always avoidable in high volume situations... It's
arguable if they can get clamd to crash anyway - delivering non-scanned may
not be your largest concern :)

-Kp

_______________________________________________
clamav-users mailing list
clamav-***@lists.clamav.net
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
G.W. Haywood
2018-09-25 18:16:10 UTC
Permalink
Hi there,
... as it is written now, it loads over 500MB worth of signature
database content into RAM and then forks, temporarily resulting in
over 1000MB of ram consumed until the parent process exits. ...
Won't they share the memory (on a sane OS)?
--
73,
Ged.
_______________________________________________
clamav-users mailing list
clamav-***@lists.clamav.net
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Loading...