Posted May 30, 2011 by morten
If you need gzip support in your nagios check_http plugin, here’s what you need to do.
First of all, fetch the latest version (1.4.15) of the nagios-plugins :
http://sourceforge.net/projects/nagiosplug/files/nagiosplug/1.4.15/
tar xzfv the downloaded file somewhere and enter the nagios-plugins-1.4.15/plugins directory…
Here you’ll find the check_http.c sourcefile which needs to be patched.
You can find the patch here :
http://sourceforge.net/tracker/index.php?func=detail&aid=3294169&group_id=29880&atid=397599
patch the sourcefile with the patch command : patch check_http.c checkhttpgzipdeflate.patch
Go down one directory and run ./configure && make
You’ll have a freshly compiled check_http plugin with gzip support in the plugins-directory.
Copy it to your nagios-plugins directory or wherever you keep maintained versions.
Tags: 1.4.15, check_http, gzip, nagios
Posted April 19, 2011 by MrBerry
Wanna have some fun with sudo?
A couple of neat tricks:
1. Insults when you type wrong password:
echo "Defaults insults" >> /etc/sudoers
When your users type incorrect password they are insulted:
$ sudo su -
Password:
Are you on drugs?
2. Make custom password-prompt when your users sudo
Add line to /etc/sudoers: Defaults passprompt="YOU BREAK IT, YOU FIX IT!:"
When ppl log in and try to use sudo they get a modified passwordprompt:
user@server:~$ sudo su -
YOU BREAK IT, YOU FIX IT!:
Any more tricks? Use comments :>
Tags: funny, insults, sudo
Posted February 10, 2011 by MrBerry
Im just writing up this post because the dellomsa packages arent working with the new Debian Squeeze 6.0.
I had problems with the omreport command not giving me info of ex memory/psu/cpu. (omreport chassis info said No sensors found etc)
I used some hours to try to get it working with a newer dellomsa but that didnt work either.
Then i found some official Dell Ubuntu packages, which i found working excellent on Debian Squeeze as well:
dpkg -P dellomsa #Make sure dellomsa isnt installed.
echo 'deb http://linux.dell.com/repo/community/deb/latest /' | sudo tee -a /etc/apt/sources.list.d/linux.dell.com.sources.list
apt-get update
apt-get install srvadmin-base smbios-utils
You will also need the libsmbios2_2.2.13-0ubuntu4_amd64.deb from Ubuntu Lucid to get smbios stuff working.
dpkg -i libsmbios2_2.2.13-0ubuntu4_amd64.deb
/etc/init.d/dataeng start #if this starts, omreport works!
Now you have the newer Debian Squeeze Dell stuff working.
We have deployed our hwmonitoring of our Dell servers with check_openmanage and Nagios
Read more about the check_openmanage on the check_openmanage site (this is a great plugin btw!)
Resources:
http://folk.uio.no/trondham/software/check_openmanage.html
http://linux.dell.com/repo/community/deb/latest/
Tags: debian, dell, dellomsa, monitoring, nagios, squeeze
Posted February 3, 2011 by MrDingle
I didn’t know this until the other day, but how awesome is this – You can use screen to connect to your serial console
screen /dev/ttyS0 9600
VOILA – you’re in
Tags: minicom, screen, serial
Posted February 3, 2011 by MrDingle
ping -Mdo -s
If it works:
$ ping -Mdo -s 8001 10.0.20.26
PING 10.0.20.26 (10.0.20.26) 8001(8029) bytes of data.
8009 bytes from 10.0.20.26: icmp_req=1 ttl=64 time=0.450 ms
8009 bytes from 10.0.20.26: icmp_req=2 ttl=64 time=0.468 ms (DUP!)
8009 bytes from 10.0.20.26: icmp_req=3 ttl=64 time=0.447 ms
If it doesnt:
$ ping -Mdo -s 2001 195.10.34.51 -c3
PING 195.10.34.51 (195.10.34.51) 2001(2029) bytes of data.
From XX.XX.XX.XX icmp_seq=1 Frag needed and DF set (mtu = 1500)
From XX.XX.XX.XX icmp_seq=1 Frag needed and DF set (mtu = 1500)
From XX.XX.XX.XX icmp_seq=1 Frag needed and DF set (mtu = 1500)
Tags: frame, jumbo, jumboframe, jumboframe jumbo frame network ping test, network, ping, test
Posted November 2, 2010 by morten
Funny FAIL-bug in the drupal imagecache module. Last night we had some serious trouble with our sites and witnessed the requests on our loadbalancers going from 1k to 10k pr second. From our graphs we found out which site was being hammered and then checked varnishtop to see what was going on. 2 missing images were causing a 307 temporary redirect. We fixed it fast by touching the missing files and the traffic went away. Today we did some research into what was going on, and with firebug we found out that the page was trying to redirect to it self and fetch the same missing image over and over. Here’s the rather naughty code in the imagecachemodule :
if (file_exists($lockfile)) {
watchdog('imagecache', 'ImageCache already generating: %dst, Lock file: %tmp.', array('%dst' => $dst, '%tmp' => $lockfile), WATCHDOG_NOTICE);
// 307 Temporary Redirect, to myself. Lets hope the image is done next time around.
header('Location: '. request_uri(), TRUE, 307);
exit;
}
Now imagine if the image “doesnt exist the next time around”.
FAIL!
Tags: 307, drupal, imagecache, loop, redirect
Posted June 14, 2010 by MrBerry
Having troubles installing a DVI splitter through a HDMI converter on our Jira-dashboard i found that the splitter made the EDID(Extended display identification data) fancy automagically validation shit fucked up and made the screen falling back to 640×480.
This made me shat brix.
After alot of googling i found a faboulous trigger called IgnoreEDIDChecksum that i put under the Screen section in the xorg.conf.
Hurrayh for new fancy automagic-probe-validation-fuckups
Tags: edid, linux, nvidia, xorg
Posted April 28, 2010 by morten
The old dashboard we used earlier had a couple of issues. It showed all SOFT nagios states and it also listed every service pr host that was down. Since it’s pretty obvious that a service is down on a host that is down, we wanted to change that. Instead of continuing the rather hard work of changing the dirty status.dat parsing, we just dropped that project and checked out Merlin. Once installed and configured correctly, merlin will enable an eventbrokermodule in the nagiosconfig and update merlins mysql database via the eventbroker. The database contains all hosts and statuses state changes and so on, so this is what we ended up with : (Pic of our current dashboard in our office)

This dashboard lists only hosts that are down and not acknowledged in nagios in its upper left corner. Then there’s a little tactical overview in the upper right corner (this will have more info shortly) and finally all unhandled serviceproblem listed below. Exactly what we want.
The bottom “toolbar” is transparent and has a countdown timer for page refresh and shows the current time.
Thanks again to Jonas, for the design!
You can download the 2 php files here : dashv2
Just change the login info in merlin.php to match your merlin database and it should run smoothly.
Note: The dashboard needs firefox 3.6.
Enjoy.
EDIT: Now available on github for those interested in contributing: http://github.com/mortis1337/nagios-dashboard
Tags: dashboard, merlin, nagios
Posted March 17, 2010 by morten
Our loadbalancers TMM has gone up quite a bit lately, so I started looking into how to move some of the workload to Varnish instead. I came across this configexample and pulled out the mobile redirects part. It’s a rather dirty hack, but it works. Varnish does not have support of HTTP redirects, so you have to trigger an error and then pick it up in the vcl_error subroutine later.
This is what the redirect-config looks like on my test-system :
sub vcl_recv {
if ( req.http.user-agent ~ "(.*Blackberry.*|.*BlackBerry.*|.*Blazer.*|.*Ericsson.*|.*htc.*
|.*Huawei.*|.*iPhone.*|.*iPod.*|.*MobilePhone.*|.*Motorola.*|.*nokia.*
|.*Novarra.*|.*O2.*|.*Palm.*|.*Samsung.*|.*Sanyo.*|.*Smartphone.*
|.*SonyEricsson.*|.*Symbian.*|.*Toshiba.*|.*Treo.*|.*vodafone.*
|.*Xda.*|^Alcatel.*|^Amoi.*|^ASUS.*
|^Audiovox.*|^AU-MIC.*|^BenQ.*|^Bird.*|^CDM.*|^DoCoMo.*|^dopod.*
|^Fly.*|^Haier.*|^HP.*iPAQ.*|^imobile.*|^KDDI.*|^KONKA.*|^KWC.*
|^Lenovo.*|^LG.*|^NEWGEN.*|^Panasonic.*|^PANTECH.*|^PG.*|^Philips.*
|^portalmmm.*|^PPC.*|^PT.*|^Qtek.*|^Sagem.*|^SCH.*|^SEC.*|^Sendo.*
|^SGH.*|^Sharp.*|^SIE.*|^SoftBank.*|^SPH.*|^UTS.*|^Vertu.*
|.*Opera.Mobi.*|.*Windows.CE.*|^ZTE.*)"
&& req.http.host ~ "(www.somehost.com)"
&& req.url == "/") {
set req.http.newhost = regsub(req.http.host, "(www)?\.(.*)", "http://m.\2");
error 750 req.http.newhost;
}
sub vcl_error {
if (obj.status == 750) {
set obj.http.Location = obj.response;
set obj.status = 302;
deliver;
}
}
Tags: mobile, redirect, varnish
Posted March 9, 2010 by morten
The most important part about this is ….dont use it too often, but it CAN make sense on really critical events, like warnings/criticals on partitionspace. For instance, if your mysql server is running out of space on /var/lib/mysql and your operationsteam didnt see the WARNING/CRITICAL notification from Nagios, it might be a good idea to have the bug created in Jira to make it even more visible.
Here’s how you do it.
First of all, be sure to have eventhandlers enabled in Nagios.
Configure your commands.cfg file to have something similar to this :
define command{
command_name jira_eventhandler
command_line $USER1$/jira_eventhandler -a morten -s $SERVICESTATE$ -t $SERVICESTATETYPE$ -A $SERVICEATTEMPT$ -H $HOSTNAME$ -S $SERVICEDESC$
}
Configure your services.cfg to have something similar to this :
define service{
use generic-service
host_name myhost
service_description CHECK_DISK_ROOT
is_volatile 0
max_check_attempts 3
normal_check_interval 10
retry_check_interval 1
contact_groups linux-admins
notification_period 24x7
notification_options c,w,r
check_command check_remote_disk_nagios!10%!5%!/
process_perf_data 1
event_handler jira_eventhandler
flap_detection_enabled 0
}
And be sure to have the jira_eventhandler script in place. You can download mine here : jira_eventhandler
Tags: eventhandler, jira, monitoring, nagios