I’ve been using Nagios for the better part of a decade. It’s an incredibly powerful monitoring platform that’s highly extensible. About 4 or so years ago I discovered an excellent replacement for Nagios’s (outdated) interface, NPCD, and many other Nagios plugins: Check_MK. Check_MK is very light-weight and scales excellently. It can also be made to run entirely over SSH, which makes dealing with corporate firewalls a piece of cake.
Setting up a basic installation is fairly straight-forward and covered thoroughly in Check_MK’s online documentation. I’m not going to bore you with it. What I found less-intuitive was configuring the distriubuted single-pane-of-glass interface known as Multisite. Multisite lets you consolidate several Nagios+Check_MK installations into a single view. Unfortunately, by default it’s all over cleartext and requires you to expose xinetd to the world - gross. Getting Multisite to tunnel over SSH is not difficult, but also not documented. I found several threads in the check_mk mailing list where users asked how to do it, but no one ever had a solution. Here’s mine.
Prerequisites
Start with two CentOS systems, built using the Minimal
package group. One system, we’ll call it master.example.com
, is going to be your single-pane-of-glass. The second system, slave.example.com
, will be your remote site you want to view.
As of CentOS 6, the mod_python
package is no longer included in base, which means EPEL is required. If you’re installing this on RHEL 6, don’t forget to subscribe to the rhel-x86_64-server-optional-6 repo.
# curl -O http://[mirror]/fedora/epel/6/i386/epel-release-6-8.noarch.rpm
# yum localinstall epel-release-6-8.noarch.rpm
Next, let’s install the required packages.
# yum install -y gcc gcc-c++ man make httpd gd-devel perl wget \
samba-client postgresql-devel openssh-clients \
openldap-devel net-snmp net-snmp-utils \
bind-utils mysql mysql-devel rpcbind mod_python \
mod_ssl php rrdtool-perl perl-Time-HiRes php-gd
If you want Perl Net::SNMP checks, RADIUS checks, and fping, you’ll also need to install these optional packages:
# yum install -y perl-Net-SNMP radiusclient-ng-devel fping
Monitoring Software Installation
Once you’ve got all the required software, go ahead and download the source packages for Nagios, nagios-plugins, check_mk, and pnp4nagios:
nagios-3.5.1.tar.gz - Skip to download
nagios-plugins-2.0.3.tar.gz - Handy to have
check_mk-1.2.5i5p2.tar.gz - Live dangerously: get the innovation release
pnp4nagios-0.6.24.tar.gz - Pretty graphs
I configure each in the above order, and I install them all into /usr/local/
, which is probably a holdover from my long-time romance with FreeBSD. We’ll start with Nagios:
First create the Nagios user. For some reason this has been broken in the source package for several years, and no one has bothered to fix it.
# useradd -r -d /var/log/nagios -s /bin/sh -G apache -c "nagios" nagios
Then the standard unpack, configure, and make commands, plus a bunch of extra makes:
# tar -zxvf nagios-3.5.1.tar.gz
# cd nagios
# ./configure
# make all
# make install
# make install-init
# make install-commandmode
# make install-config
# make install-webconf
# make install-exfoliation
Nagios-plugins is pretty simple, just watch out for any plugins that get skipped due to dependencies, just in case you actually want them.
# tar -zxvf nagios-plugins-2.0.3.tar.gz
# cd nagios-plugins-2.0.3
# ./configure
# make
# make install
Check_MK is a little bit different. It comes with a setup.sh
script that walks you through the configuration directories, then compiles itself and performs the install. Here are the answers I use to get everything installed under /usr/local/check_mk/
:
# tar -zxvf check_mk-1.2.5i5p2.tar.gz
# cd check_mk-1.2.5i5p2
# ./setup.sh
Executable programs /usr/local/bin
Check_MK configuration /usr/local/check_mk/etc
Check_MK software /usr/local/check_mk
documentation /usr/local/check_mk/doc
check manuals /usr/local/check_mk/doc/checks
working directory of Check_MK /usr/local/check_mk/var/lib
extensions for agents /usr/local/check_mk
configuration dir for agents /usr/local/check_mk/etc
Name of Nagios user nagios
User of Apache process apache
Common group of Nagios+Apache nagios
Nagios binary /usr/local/nagios/bin/nagios
Nagios main configuration file /usr/local/nagios/etc/nagios.cfg
Nagios object directory /usr/local/nagios/etc/check_mk.d
Nagios startskript /etc/init.d/nagios
Nagios command pipe /usr/local/nagios/var/rw/nagios.cmd
Check results directory /usr/local/nagios/var/spool/checkresults
Nagios status file /usr/local/nagios/var/status.dat
Path to check_icmp /usr/local/nagios/libexec/check_icmp
URL Prefix for Web addons /[SITE NAME]/ !! CHANGE THIS TO YOUR SITE NAME !!
Apache config dir /etc/httpd/conf.d
HTTP authentication file /usr/local/nagios/etc/htpasswd.users
HTTP AuthName Nagios Access
PNP4Nagios templates /usr/local/pnp4nagios/share/templates
RRD files /usr/local/check_mk/pnp-rraconf
rrdcached socket /tmp/rrdcached.sock
compile livestatus module yes
Nagios / Icinga version 3.5.1
check_mk's binary modules /usr/local/check_mk/lib
Unix socket for Livestatus /usr/local/nagios/var/rw/live
Backends for other systems /usr/local/check_mk/share/livestatus
Install Event Console no
Pay very close attention to the URL Prefix for Web addons
configuration value. This is going to be the key value for your site name in later parts of the Multisite configuration. To keep with my example, my two installations will use master
and slave
as the values here.
Lastly, we set up PNP4Nagios, which gives us the awesome RRD graphs for all our services. It also users the same site prefix, so be mindful:
# tar -zxvf pnp4nagios-0.6.24.tar.gz
# cd pnp4nagios-0.6.19
# ./configure --with-base-url=/[SITE NAME]/pnp4nagios
# make all
# make fullinstall
Configuration Files
Now, let’s get configuring! For brevity, I’ve summarized my changes below:
/usr/local/nagios/etc/nagios.cfg:
# Comment out the localhost config:
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
cfg_dir=/usr/local/nagios/etc/check_mk.d
broker_module=/usr/local/check_mk/lib/livestatus.o /usr/local/nagios/var/rw/live
broker_module=/usr/local/pnp4nagios/lib/npcdmod.o config_file=/usr/local/pnp4nagios/etc/npcd.cfg
use_syslog=0
check_for_updates=0
process_performance_data=1
admin_email=root@localhost
admin_pager=root@localhost
You’ll have to read Check_MK’s online documentation to understand what each option is doing, but this is a pretty good starter config. Each system will monitor hosts only accessible by it. For example, if your master
site was local to you in San Francisco, it might monitor all your California resources and not have access to resources in your remote office branch in New York (thus the slave server).
Check_MK supports all kinds of configuration options, most of which rely on the tags defined in the all_hosts
variable. You can set a different data-aquisition method for all kinds of systems (in my example, executing the local Check_MK agent directly & remotely via SSH). You could also use SNMP or UNIX sockets.
/usr/local/check_mk/etc/main.mk:
all_hosts = [
'master.example.com|local',
'host-monitored-by-master.example.com|ssh',
]
datasource_programs = [
( "/usr/bin/sudo /usr/local/check_mk/agents/check_mk_agent.linux", [ 'local' ], ALL_HOSTS ),
( "ssh -i ~nagios/.ssh/id_rsa nagios@<IP> sudo /usr/local/bin/check_mk_agent", [ 'ssh' ], ALL_HOSTS ),
]
ipaddresses = {
"master.example.com" : "127.0.0.1",
}
extra_service_conf["normal_check_interval"] = [
( '5', ALL_HOSTS, [ "" ] ),
]
extra_host_conf["max_check_attempts"] = [
( '3', ALL_HOSTS ),
]
System Configuration and Cleanup
That’s basically it for configuration files to get started, now let’s configure the systems themselves.
Enable services to start at boot:
# chkconfig httpd on
# chkconfig nagios on
# chkconfig npcd on
Fix various SELinux (read: disable) and permissions:
# mkdir /usr/local/check_mk/var/lib/web/admin
# chown apache:nagios /usr/local/check_mk/var/lib/web/admin
# chmod 770 /usr/local/check_mk/var/lib/web/admin
# setenforce 0
# echo 'SELINUX=permissive' >> /etc/sysconfig/selinux
Add some sudo permissions for the nagios user:
# echo 'Defaults:nagios !requiretty' >> /etc/sudoers
# echo 'nagios ALL = (root) NOPASSWD: /usr/local/check_mk/agents/check_mk_agent.linux' >> /etc/sudoers
Create an inventory of services to monitor on our systems:
# check_mk -I master.example.com host-monitored-by-master.example.com
Rebuild and restart Nagios / Check_MK
# check_mk -R
Start the other services:
# service httpd start
# service npcd start
Create an HTTPD user and password (use the same htpasswd file on both systems):
# htpasswd -c -s /usr/local/nagios/etc/htpasswd.users nagiosadmin
Now to complete the installation of PNP4nagios, open http://master.example.com/[SITE NAME]/pnp4nagios/
in your browser, then remove or rename the installation script:
# mv /usr/local/pnp4nagios/share/install.php{,.orig}
Multisite Configuration
Once you repeat the above steps for the slave.example.com
server, you’re ready to configure Multisite. You’ll want to create an SSH public/private keypair for the nagios
user. This keypair will be used to establish the SSH tunnel, but nothing else. We’ll lock it down to keep things safe. The keypair should only reside on master.example.com
, and we’ll copy the public key into slave.example.com
’s authorized_keys
file.
Master:
# su - nagios
$ mkdir .ssh
$ chmod 700 .ssh
$ cd .ssh
$ ssh-keygen -t rsa -b 4096 -f ./id_rsa
$ chmod 400 id_rsa
$ chmod 444 id_rsa.pub
Slave:
# su - nagios
$ mkdir .ssh
$ chmod 700 .ssh
$ cd .ssh
$ vi authorized_keys
command="exit",no-pty,permitopen="localhost:80",permitopen="localhost:6557",permitopen="localhost:2000",permitopen="localhost:2001" ssh-rsa AAAAAAA..long-key..ZZZZZ
The options leading the authorized_keys
file keeps the nagios
user from being able to do pretty much anything except get our host data and forward a few ports. The only command it can run is “exit”, it can’t open a psuedo-terminal, and it can only forward the ports we’ve specified. Ports 80 (HTTP), 6557 (check_mk), 2000 & 2001 (SSH tunnel status checks).
We’re going to start by setting up xinetd
on slave.example.com
— don’t worry, it’ll only listen on localhost
.
# yum install -y xinetd
# cat << 'END' > /etc/xinetd.d/livestatus
service livestatus {
bind = 127.0.0.1
type = UNLISTED
port = 6557
socket_type = stream
protocol = tcp
wait = no
cps = 100 3
instances = 500
per_source = 250
flags = NODELAY
user = nagios
server = /usr/local/bin/unixcat
server_args = /usr/local/nagios/var/rw/live
only_from = 127.0.0.1 ::1
disable = no
log_type = SYSLOG daemon info
}
END
# chkconfig xinetd on
# service xinetd start
Now on master.example.com
we’re going to setup autossh
to establish and maintain the SSH tunnel.
# yum install -y autossh
# su - nagios
$ cat << 'END' > ~/autossh.bash
#!/bin/bash
autossh -f -M 2000 -i ~nagios/.ssh/id_rsa -L 8081:localhost:80 -L 6558:localhost:6557 -N nagios@slave.example.com
END
$ chmod a+x autossh.bash
$ crontab -e
@reboot bash ~/autossh.bash
$ ./autossh.bash
Lastly we need to tell Multisite about our other site and set up HTTPD to proxy the requests through our tunnel.
# vi /usr/local/check_mk/etc/multisite.mk
sites = {
"master" : {
"alias" : "Master Site"
},
"slave": {
"alias" : "Slave Site",
"socket": "tcp:localhost:6558",
"url_prefix": "/slave/",
},
}
# vi /etc/httpd/conf/httpd.conf
<Location /slave>
RewriteEngine On
RewriteRule ^/.+/slave/(.*) http://localhost:8081/slave/$1 [P]
</Location>
Go ahead and restart Nagios/Check_MK and HTTPD, and you should be all set.
# check_mk -R
# service httpd restart
If all went according to plan, you should be able to go to http://master.example.com/master/check_mk/
and see the systems monitored by both the master and slave servers! You should probably go ahead and configure HTTPD to use SSL, as well as configure iptables
or another suitable software/hardware filewall to limit traffic appropriately.