Playing with SaltStack and external classifiers

We have been discussing this a lot lately. How do we structure our SaltStack-config in a way that lets us do changes without possibly breaking abseloutly everything. Finding a good hierarchy is not always easy. How do we build it so that we can open it up later…. How can we manage to mostly leave the top.sls file alone and how do we include the right config for the right minions without having to maintain a list of minions in the config and make it so large its unreadable.

Turns out… the solution was pretty easy for us, as soon as we came up with the idea.

We allready have an internally written CMDB-solution and we wanted to use that as an external classifier.
First we had to write a simple module that made pillars from the data in our CMDB. More about that some other time. This post is about the structure we went for.
Anyway, our cmdb-module creates pillars for all hosts containing hostname, status (dev,qa,prod) and product (or role if you prefer).
This will typically look like this for a dev-server:

ourservername:
status: development
product: ourwebsite

So what we ended up with, was using file_roots in salt, matching each our environments like this:
(top.sls)
base:
"*"
- somebasestuff
"cmdb:status:prod*":
- match: pillar
- role
"cmdb:status:qa":
- match: pillar
- role
"cmdb:status:dev*":
- match: pillar
- role

This will match the role.sls file in all 3 environments.
In all 3 environments we have 2 subfolders. “products” and “services”.
The “product” folders contain the state of the final product, using services
from the “services” folder. For instance, say you have a product called “yourwebsite”.
It will probably contain installation and configuration of web, cache and db. Those 3 are
reusable services under the services folder and doesnt change much.
In our role.sls we are now matching on the pillar “product” in our CMDB like this:

include:
- products/{{ pillar.get('cmdb', {}).get('product') }}

What this will do, is look for the CMDB-value for “product” and then include the matching item in the products-folder …and so, we do not need to maintain the top.sls OR any hostnames in the salt-config. So far we think it is a good idea, but we will see in a few weeks if it actually lives up to our expectations.

Anyways, figured I should share our thoughts.

Tags: , ,

Retrieving files from a broken harddrive on linux

We recently had some trouble with a software raid (raid1 in this case) that was set up with mdadm. Dmesg showed that the drive had errors and so we replaced it. After the disk was replaced the other disk also started reporting errors and fsck would tell us that the superblock was fucked up and couldnt be read.

First thing we did was to dd the broken disk to the new one using ddrescue .

We tried rebuilding it by finding backup superblocks :
mke2fs -n /dev/sdb1 # lists superblocks
e2fsck -b number_from_output_above /dev/sdb1
While the above works in many cases, it didnt help us. All the superblocks where gone.

Thankfully we came across a tool named TestDisk that let us view and copy the data from the disk. Check it out, it’s awesome :)

 

Tags: , , , , , , , , , ,

running nagiosplugins via saltstacks peer communicationsystem

So …my previous post was  similar to this, but you most likely dont want to run the salt-master and nagios on the same server, so I had to find a way to let the nagios-server execute its plugins on hosts via the salt-master. This can be done using the python client api and saltstacks own peer communication system.

First of all, read this : http://docs.saltstack.com/ref/peer.html

Then check out my wrapper here : https://github.com/mortis1337/nagios-plugins/blob/master/check_by_saltpeer.py

Yay! Now you can throw away NRPE forever and stop using ssh-keys for the nagiosuser if you are doing that allready.

Nagiosplugins over zmq? I like it :)

Tags: , , , , ,

Running nagios-plugins via saltstack

I’m so sick of maintaining NRPE-config on my servers, and I dont really want root-sshkeys all over the place. Recently I discovered saltstack and started to play with it a bit. I came up with the idea of running Nagios(or Icinga) on the same server as my salt-master and so I created a little wrapper that lets me run nagios-checks via saltstack.

Here’s how it works.

This is my little wrapper-script written in python: https://github.com/mortis1337/nagios-plugins/blob/master/check_by_salt.py

The wrapper takes hostname, plugin and a timeoutvalue as arguments:

$ python check_by_salt.py -H examplehost -p “/path/to/existing/nagiosplugin arg1 arg2″ -t 10

The wrapper imports salt and runs commands on minions with cmd.run_all and returns the output and the exitcode.

For this to work as the nagios/icinga user, you will have to configure the client_acl for the user in the salt-master config, so go ahead and edit the master-configfile (default: /etc/salt/master)

Search for “client_acl” in the file and add this :

client_acl:
icinga:
- cmd.*

Yeeaaaap, thats quite the security risk right there, but read up on how to limit what can be done with the cmd-state in salt and atleast it will be safer than using ssh-keys :)

check_by_salt in combination with https://github.com/mortis1337/nagios-plugins/blob/master/check_disk_generic.py will instantly give you monitoring of all your disks with no clientside-configuration.

Use it if you like it and feel free to improve it.

 

 

 

Tags: , , , ,

How a nerd monitors his wife’s weight

So I got myself a new bodyscale recently. Ofcourse it had to be something of a gadget so I went for this Withings BodyScale. Withings allready has a nice webpage with graphs and stuff and also a couple of really nice iphone/ipad apps for it. The fact that it is integrated with other services like Runkeeper and such, made me think about if they had an API i could query. And it had. Also a quick search for “python withings api” gave some results with examples on how to use it.

I came across this thing : https://github.com/mote/python-withings …and then it was pretty much just about writing a bit of nagios-logic around it to make it into a plugin.

The first result is here: https://github.com/mortis1337/check_wife

The script takes a userid, an apikey and a name as arguments.

$ ./check_wife.py  -u 1111111 -k xxxxxxxxxxxxx -n Your(or your wife’s;)name
WARNING: <yourname>’s overweight. Size: <yoursize> – Weight: <yourweight> BMI: <yourbmi>

The script will give a WARNING whenever the BMI-value is about 25 or below 18,5.

Add this to your nagios-config and your operators can come point and laugh at you whenever a WARNING occurs :)

( yes, the “wife”-part is a joke…. go monitor your own weight;) )

Tags: , , , , , ,

gzip support in check_http

If you need gzip support in your nagios check_http plugin, here’s what you need to do.
First of all, fetch the latest version (1.4.15) of the nagios-plugins :

http://sourceforge.net/projects/nagiosplug/files/nagiosplug/1.4.15/

tar xzfv the downloaded file somewhere and enter the nagios-plugins-1.4.15/plugins directory…
Here you’ll find the check_http.c sourcefile which needs to be patched.
You can find the patch here :

http://sourceforge.net/tracker/index.php?func=detail&aid=3294169&group_id=29880&atid=397599

patch the sourcefile with the patch command : patch check_http.c checkhttpgzipdeflate.patch
Go down one directory and run ./configure && make
You’ll have a freshly compiled check_http plugin with gzip support in the plugins-directory.
Copy it to your nagios-plugins directory or wherever you keep maintained versions.

Tags: , , ,

Fun with sudo

Wanna have some fun with sudo?

A couple of neat tricks:

1. Insults when you type wrong password:
echo "Defaults insults" >> /etc/sudoers
When your users type incorrect password they are insulted:
$ sudo su -
Password:
Are you on drugs?

2. Make custom password-prompt when your users sudo
Add line to /etc/sudoers: Defaults passprompt="YOU BREAK IT, YOU FIX IT!:"
When ppl log in and try to use sudo they get a modified passwordprompt:
user@server:~$ sudo su -
YOU BREAK IT, YOU FIX IT!:

Any more tricks? Use comments :>

Tags: , ,

Monitor Dell servers on Debian Squeeze with Nagios

Im just writing up this post because the dellomsa packages arent working with the new Debian Squeeze 6.0.

I had problems with the omreport command not giving me info of ex memory/psu/cpu. (omreport chassis info said No sensors found etc)

I used some hours to try to get it working with a newer dellomsa but that didnt work either.
Then i found some official Dell Ubuntu packages, which i found working excellent on Debian Squeeze as well:
dpkg -P dellomsa #Make sure dellomsa isnt installed.
echo 'deb http://linux.dell.com/repo/community/deb/latest /' | sudo tee -a /etc/apt/sources.list.d/linux.dell.com.sources.list
apt-get update
apt-get install srvadmin-base smbios-utils

You will also need the libsmbios2_2.2.13-0ubuntu4_amd64.deb from Ubuntu Lucid to get smbios stuff working.
dpkg -i libsmbios2_2.2.13-0ubuntu4_amd64.deb
/etc/init.d/dataeng start #if this starts, omreport works!

Now you have the newer Debian Squeeze Dell stuff working.

We have deployed our hwmonitoring of our Dell servers with check_openmanage and Nagios
Read more about the check_openmanage on the check_openmanage site (this is a great plugin btw!)

Resources:
http://folk.uio.no/trondham/software/check_openmanage.html
http://linux.dell.com/repo/community/deb/latest/

Tags: , , , , ,

Use screen instead of !”¤#¤”&”# minicom

I didn’t know this until the other day, but how awesome is this – You can use screen to connect to your serial console :)

screen /dev/ttyS0 9600

VOILA – you’re in

Tags: , ,

Test your jumbo frame enabled network with ping

ping -Mdo -s

If it works:
$ ping -Mdo -s 8001 10.0.20.26
PING 10.0.20.26 (10.0.20.26) 8001(8029) bytes of data.
8009 bytes from 10.0.20.26: icmp_req=1 ttl=64 time=0.450 ms
8009 bytes from 10.0.20.26: icmp_req=2 ttl=64 time=0.468 ms (DUP!)
8009 bytes from 10.0.20.26: icmp_req=3 ttl=64 time=0.447 ms

If it doesnt:
$ ping -Mdo -s 2001 195.10.34.51 -c3
PING 195.10.34.51 (195.10.34.51) 2001(2029) bytes of data.
From XX.XX.XX.XX icmp_seq=1 Frag needed and DF set (mtu = 1500)
From XX.XX.XX.XX icmp_seq=1 Frag needed and DF set (mtu = 1500)
From XX.XX.XX.XX icmp_seq=1 Frag needed and DF set (mtu = 1500)

Tags: , , , , , ,