Ideas of CMDB, cfengine and nagios integration

For a while we’ve been discussing how we can become as lazy as possible when it comes to systemadministration, and this time we’ve made a quite neat integration between our homemade CMDB, cfengine and nagios.

Here’s the idea:

First of all, nobody likes to manually update a CMDB. Also its never really possible to maintain it in a way that makes its info become obsolete after some time. This is why we made a script, cfcmdb, that is triggered from cfengine on every host. This script fills the CMDB database with all sort of info from tools like dmidecode and also from standard commandline tools. (Memory, networkingcards, os version, cpu, vendor etc etc). So now our CMDB pretty much keeps itself up to date.

Lately we came up with the idea to fill our CMDB with cfengine classes information. So adding to the cfcmdb script mentioned above :

cfagent –no-splay -p -v | grep Defined

..and a little perl split and join, we now have all the classes in our CMDB bount to hostid’s.

Cool. On our nagios-server, we made another script, cmdb2nagios, which takes the parameters “hosts”, “hostgroups” or “services”.

cmdb2nagios hosts : creates the nagios host-config file

cmdb2nagios hostgroups : creates the nagios hostgroup-config file

cmdb2nagios services : creates the servicefile

The services parsing is quite nice now, cause we can automatically monitor any services set up with cfengine. Lets say we have a bunch of hosts installed with cfengine and cfengine tells them to have apache2 running. That means that this will be part of a cfengine class, that will be available in our CMDB.

Example of cmdb2nagios service parsing :

[snip]

$sql = “select hosts.name from hosts,classes where classes.name = ‘class_apache’ and hosts.hostid = classes.hostid”;
$execute = $connect->query($sql) or die “wtf? it didnt work …check syntax.”;
my @servicehosts;
while (@results = $execute->fetchrow()) {
push(@servicehosts, $results[0]);
}

$hosts = join(”,”,@servicehosts);
print “define service{\n”;
print “\tuse\t\t\tgeneric-service\n”;
print “\thost_name\t\t” . $hosts . “\n”;
print “\tservice_description\tcfg_CHECK_APACHE\n”;
print “\tis_volatile\t\t0\n”;
print “\tmax_check_attempts\t1\n”;
print “\tnormal_check_interval\t5\n”;
print “\tretry_check_interval\t1\n”;
print “\tcontact_groups\t\tlinux-admins\n”;
print “\tnotification_period\t24×7\n”;
print “\tnotification_options\tc,w,r\n”;
print “\tprocess_perf_data\t1\n”;
print “\tcheck_command\t\tcheck_apache\n”;
print “\t}\n\n”;
[snip]

As you can see, monitoring apache will be applied to all hosts running apache.

This leaves us to really only having to maintain our cfengine configuration, while the CMDB is auto-updated and the nagios-config is auto-parsed.

Also our eventhandlers in nagios tells cfengine to do this and that, so now we can sit back, enjoy a coffee and watch this show.

(see previous post about eventhandlers and cfengine : http://www.sladder.org/?p=261)

Tags: , ,

Monitor it with nagios, fix it with cfengine.

This is something we started doing some time during the end of last year. We wanted to have nagios and cfengine to cooperate. We didn’t want cfengine to monitor if processes were running, cause thats nagios’ job and we didnt want nagios to fix problems occuring, cause that’s cfengine’s job ….SO ..we found out that cfrun could help us out with this problem and make a simple integration. Here’s how we did it.

In this particular scenario, we had some segfaults in our apache logs caused by some PHP errors we couldnt fix, which ended up in apache spawning a lot of child processes and giving us lots of defuncts and we had to restart apache every now and then.

So first we configured a check in our services config file in nagios. Something like this :

services.cfg

define service{
use generic-service
host_name webserver.somedomain.com
service_description CHECK_LOG_SEGFAULT
is_volatile 0
max_check_attempts 1
normal_check_interval 5
retry_check_interval 1
contact_groups admins
notification_period 24x7
notification_options c,w,r
process_perf_data 1
check_command check_log_segfault
event_handler restart-apache
}

Now configure the commands. One command for the event-handler,n ame it “restart-apache” which is what the “event_handler” option in the example above says. One command for the logcheck, “check_command check_log_segfault”

commands.cfg :

define command{
command_name restart-apache
command_line /usr/bin/sudo /usr/sbin/cfrun $HOSTNAME$ -T -- -q -D restart_apache2_now
}
define command{
command_name check_log_segfault
command_line $USER1$/check_by_ssh -l root -t 30 -H $HOSTADDRESS$ -C "/usr/lib/nagios/plugins/check_log -F /var/log/apache2/error.log -O /var/log/apache2/check_log_oldlog -q Segmentation"
}

(The check_log command is being run on every host that needs it, but you could for instance call it via net-snmp’s EXEC function if you dont want to use ssh. NRPE is prolly also an alternative).
Be sure to enable eventhandlers in nagios.cfg for this to work.

nagios.cfg :

enable_event_handlers=1

Thats what’s needed for nagios. Let’s conf some cfengine.

In the nagios config you can see we’re running the cfengine class “restart_apache2_now”, so lets create a cfengine class with the same name.

cf.apache2 :

###############################################################
control:
actionsequence = ( packages shellcommands )
AddInstallable = ( has_apache2 )
IfElapsed = ( 0 )
###############################################################
classes:
###############################################################
packages:
debian::
apache2
pkgmgr=dpkg
define=has_apache2
################################################################
shellcommands:
# apache2 initscript
# Usage: /etc/init.d/apache2 {start|stop|restart|reload|force-reload}
debian.has_apache2.restart_apache2_now::
"/etc/init.d/apache2 restart"

Be sure to include this class in your cfengine config so that cfengine knows about it.

So now nagios monitors the logfile, checks for segfault messages and tells cfengine to restart apache if a segfault is found. (The nagios plugin check_log takes care of comparing new and old segfault messages, so that’s nothing to worry about). Everyone is happy and we (the sysadmins) dont have to do shit. Just the way we want it.

Tags: , , ,