gzip support in check_http

If you need gzip support in your nagios check_http plugin, here’s what you need to do.
First of all, fetch the latest version (1.4.15) of the nagios-plugins :

http://sourceforge.net/projects/nagiosplug/files/nagiosplug/1.4.15/

tar xzfv the downloaded file somewhere and enter the nagios-plugins-1.4.15/plugins directory…
Here you’ll find the check_http.c sourcefile which needs to be patched.
You can find the patch here :

http://sourceforge.net/tracker/index.php?func=detail&aid=3294169&group_id=29880&atid=397599

patch the sourcefile with the patch command : patch check_http.c checkhttpgzipdeflate.patch
Go down one directory and run ./configure && make
You’ll have a freshly compiled check_http plugin with gzip support in the plugins-directory.
Copy it to your nagios-plugins directory or wherever you keep maintained versions.

Tags: , , ,

Fun with sudo

Wanna have some fun with sudo?

A couple of neat tricks:

1. Insults when you type wrong password:
echo "Defaults insults" >> /etc/sudoers
When your users type incorrect password they are insulted:
$ sudo su -
Password:
Are you on drugs?

2. Make custom password-prompt when your users sudo
Add line to /etc/sudoers: Defaults passprompt="YOU BREAK IT, YOU FIX IT!:"
When ppl log in and try to use sudo they get a modified passwordprompt:
user@server:~$ sudo su -
YOU BREAK IT, YOU FIX IT!:

Any more tricks? Use comments :>

Tags: , ,

Monitor Dell servers on Debian Squeeze with Nagios

Im just writing up this post because the dellomsa packages arent working with the new Debian Squeeze 6.0.

I had problems with the omreport command not giving me info of ex memory/psu/cpu. (omreport chassis info said No sensors found etc)

I used some hours to try to get it working with a newer dellomsa but that didnt work either.
Then i found some official Dell Ubuntu packages, which i found working excellent on Debian Squeeze as well:
dpkg -P dellomsa #Make sure dellomsa isnt installed.
echo 'deb http://linux.dell.com/repo/community/deb/latest /' | sudo tee -a /etc/apt/sources.list.d/linux.dell.com.sources.list
apt-get update
apt-get install srvadmin-base smbios-utils

You will also need the libsmbios2_2.2.13-0ubuntu4_amd64.deb from Ubuntu Lucid to get smbios stuff working.
dpkg -i libsmbios2_2.2.13-0ubuntu4_amd64.deb
/etc/init.d/dataeng start #if this starts, omreport works!

Now you have the newer Debian Squeeze Dell stuff working.

We have deployed our hwmonitoring of our Dell servers with check_openmanage and Nagios
Read more about the check_openmanage on the check_openmanage site (this is a great plugin btw!)

Resources:
http://folk.uio.no/trondham/software/check_openmanage.html
http://linux.dell.com/repo/community/deb/latest/

Tags: , , , , ,

Use screen instead of !”¤#¤”&”# minicom

I didn’t know this until the other day, but how awesome is this – You can use screen to connect to your serial console :)

screen /dev/ttyS0 9600

VOILA – you’re in

Tags: , ,

Test your jumbo frame enabled network with ping

ping -Mdo -s

If it works:
$ ping -Mdo -s 8001 10.0.20.26
PING 10.0.20.26 (10.0.20.26) 8001(8029) bytes of data.
8009 bytes from 10.0.20.26: icmp_req=1 ttl=64 time=0.450 ms
8009 bytes from 10.0.20.26: icmp_req=2 ttl=64 time=0.468 ms (DUP!)
8009 bytes from 10.0.20.26: icmp_req=3 ttl=64 time=0.447 ms

If it doesnt:
$ ping -Mdo -s 2001 195.10.34.51 -c3
PING 195.10.34.51 (195.10.34.51) 2001(2029) bytes of data.
From XX.XX.XX.XX icmp_seq=1 Frag needed and DF set (mtu = 1500)
From XX.XX.XX.XX icmp_seq=1 Frag needed and DF set (mtu = 1500)
From XX.XX.XX.XX icmp_seq=1 Frag needed and DF set (mtu = 1500)

Tags: , , , , , ,

307 Temporary Redirect, to myself. Lets hope the image is done next time around

Funny FAIL-bug in the drupal imagecache module. Last night we had some serious trouble with our sites and witnessed the requests on our loadbalancers going from 1k to 10k pr second. From our graphs we found out which site was being hammered and then checked varnishtop to see what was going on. 2 missing images were causing a 307 temporary redirect. We fixed it fast by touching the missing files and the traffic went away. Today we did some research into what was going on, and with firebug we found out that the page was trying to redirect to it self and fetch the same missing image over and over. Here’s the rather naughty code in the imagecachemodule :


if (file_exists($lockfile)) {
watchdog('imagecache', 'ImageCache already generating: %dst, Lock file: %tmp.', array('%dst' => $dst, '%tmp' => $lockfile), WATCHDOG_NOTICE);
// 307 Temporary Redirect, to myself. Lets hope the image is done next time around.
header('Location: '. request_uri(), TRUE, 307);
exit;
}

Now imagine if the image “doesnt exist the next time around”.
FAIL!

Tags: , , , ,

Nvidia and invalid checksum for EDID (Xorg-issues)

Having troubles installing a DVI splitter through a HDMI converter on our Jira-dashboard i found that the splitter made the EDID(Extended display identification data) fancy automagically validation shit fucked up and made the screen falling back to 640×480.
This made me shat brix.

After alot of googling i found a faboulous trigger called IgnoreEDIDChecksum that i put under the Screen section in the xorg.conf.

Hurrayh for new fancy automagic-probe-validation-fuckups

Tags: , , ,

Our new dashboard

The old dashboard we used earlier had a couple of issues. It showed all SOFT nagios states and it also listed every service pr host that was down. Since it’s pretty obvious that a service is down on a host that is down, we wanted to change that. Instead of continuing the rather hard work of changing the dirty status.dat parsing, we just dropped that project and checked out Merlin. Once installed and configured correctly, merlin will enable an eventbrokermodule in the nagiosconfig and update merlins mysql database via the eventbroker. The database contains all hosts and statuses state changes and so on, so this is what we ended up with : (Pic of our current dashboard in our office)

dashern

This dashboard lists only hosts that are down and not acknowledged in nagios in its upper left corner. Then there’s a little tactical overview in the upper right corner (this will have more info shortly) and finally all unhandled serviceproblem listed below. Exactly what we want.

The bottom “toolbar” is transparent and has a countdown timer for page refresh and shows the current time.
Thanks again to Jonas, for the design!

You can download the 2 php files here : dashv2
Just change the login info in merlin.php to match your merlin database and it should run smoothly.

Note: The dashboard needs firefox 3.6.
Enjoy.

EDIT: Now available on github for those interested in contributing: http://github.com/mortis1337/nagios-dashboard

Tags: , ,

Mobile redirects using Varnish

Our loadbalancers TMM has gone up quite a bit lately, so I started looking into how to move some of the workload to Varnish instead. I came across this configexample and pulled out the mobile redirects part. It’s a rather dirty hack, but it works. Varnish does not have support of HTTP redirects, so you have to trigger an error and then pick it up in the vcl_error subroutine later.
This is what the redirect-config looks like on my test-system :

sub vcl_recv {
if ( req.http.user-agent ~ "(.*Blackberry.*|.*BlackBerry.*|.*Blazer.*|.*Ericsson.*|.*htc.*
|.*Huawei.*|.*iPhone.*|.*iPod.*|.*MobilePhone.*|.*Motorola.*|.*nokia.*
|.*Novarra.*|.*O2.*|.*Palm.*|.*Samsung.*|.*Sanyo.*|.*Smartphone.*
|.*SonyEricsson.*|.*Symbian.*|.*Toshiba.*|.*Treo.*|.*vodafone.*
|.*Xda.*|^Alcatel.*|^Amoi.*|^ASUS.*
|^Audiovox.*|^AU-MIC.*|^BenQ.*|^Bird.*|^CDM.*|^DoCoMo.*|^dopod.*
|^Fly.*|^Haier.*|^HP.*iPAQ.*|^imobile.*|^KDDI.*|^KONKA.*|^KWC.*
|^Lenovo.*|^LG.*|^NEWGEN.*|^Panasonic.*|^PANTECH.*|^PG.*|^Philips.*
|^portalmmm.*|^PPC.*|^PT.*|^Qtek.*|^Sagem.*|^SCH.*|^SEC.*|^Sendo.*
|^SGH.*|^Sharp.*|^SIE.*|^SoftBank.*|^SPH.*|^UTS.*|^Vertu.*
|.*Opera.Mobi.*|.*Windows.CE.*|^ZTE.*)"
&& req.http.host ~ "(www.somehost.com)"
&& req.url == "/") {
set req.http.newhost = regsub(req.http.host, "(www)?\.(.*)", "http://m.\2");
error 750 req.http.newhost;
}


sub vcl_error {
if (obj.status == 750) {
set obj.http.Location = obj.response;
set obj.status = 302;
deliver;
}
}

Tags: , ,

Automatically create bugs in Jira with a Nagios eventhandler

The most important part about this is ….dont use it too often, but it CAN make sense on really critical events, like warnings/criticals on partitionspace. For instance, if your mysql server is running out of space on /var/lib/mysql and your operationsteam didnt see the WARNING/CRITICAL notification from Nagios, it might be a good idea to have the bug created in Jira to make it even more visible.

Here’s how you do it.
First of all, be sure to have eventhandlers enabled in Nagios.

Configure your commands.cfg file to have something similar to this :

define command{
command_name jira_eventhandler
command_line $USER1$/jira_eventhandler -a morten -s $SERVICESTATE$ -t $SERVICESTATETYPE$ -A $SERVICEATTEMPT$ -H $HOSTNAME$ -S $SERVICEDESC$
}

Configure your services.cfg to have something similar to this :
define service{
use generic-service
host_name myhost
service_description CHECK_DISK_ROOT
is_volatile 0
max_check_attempts 3
normal_check_interval 10
retry_check_interval 1
contact_groups linux-admins
notification_period 24x7
notification_options c,w,r
check_command check_remote_disk_nagios!10%!5%!/
process_perf_data 1
event_handler jira_eventhandler
flap_detection_enabled 0
}

And be sure to have the jira_eventhandler script in place. You can download mine here : jira_eventhandler

Tags: , , ,