ADUFRAY

I’ve been using Nagios for the better part of a decade. It’s an incredibly powerful monitoring platform that’s highly extensible. About 4 or so years ago I discovered an excellent replacement for Nagios’s (outdated) interface, NPCD, and many other Nagios plugins: Check_MK. Check_MK is very light-weight and scales excellently. It can also be made to run entirely over SSH, which makes dealing with corporate firewalls a piece of cake.

Setting up a basic installation is fairly straight-forward and covered thoroughly in Check_MK’s online documentation. I’m not going to bore you with it. What I found less-intuitive was configuring the distriubuted single-pane-of-glass interface known as Multisite. Multisite lets you consolidate several Nagios+Check_MK installations into a single view. Unfortunately, by default it’s all over cleartext and requires you to expose xinetd to the world - gross. Getting Multisite to tunnel over SSH is not difficult, but also not documented. I found several threads in the check_mk mailing list where users asked how to do it, but no one ever had a solution. Here’s mine.

Prerequisites

Start with two CentOS systems, built using the Minimal package group. One system, we’ll call it master.example.com, is going to be your single-pane-of-glass. The second system, slave.example.com, will be your remote site you want to view.

As of CentOS 6, the mod_python package is no longer included in base, which means EPEL is required. If you’re installing this on RHEL 6, don’t forget to subscribe to the rhel-x86_64-server-optional-6 repo.

# curl -O http://[mirror]/fedora/epel/6/i386/epel-release-6-8.noarch.rpm
# yum localinstall epel-release-6-8.noarch.rpm

Next, let’s install the required packages.

# yum install -y gcc gcc-c++ man make httpd gd-devel perl wget   \
                 samba-client postgresql-devel openssh-clients   \
                 openldap-devel net-snmp net-snmp-utils          \
                 bind-utils mysql mysql-devel rpcbind mod_python \
                 mod_ssl php rrdtool-perl perl-Time-HiRes php-gd 

If you want Perl Net::SNMP checks, RADIUS checks, and fping, you’ll also need to install these optional packages:

# yum install -y perl-Net-SNMP radiusclient-ng-devel fping 

Monitoring Software Installation

Once you’ve got all the required software, go ahead and download the source packages for Nagios, nagios-plugins, check_mk, and pnp4nagios:

nagios-3.5.1.tar.gz - Skip to download

nagios-plugins-2.0.3.tar.gz - Handy to have

check_mk-1.2.5i5p2.tar.gz - Live dangerously: get the innovation release

pnp4nagios-0.6.24.tar.gz - Pretty graphs

I configure each in the above order, and I install them all into /usr/local/, which is probably a holdover from my long-time romance with FreeBSD. We’ll start with Nagios:

First create the Nagios user. For some reason this has been broken in the source package for several years, and no one has bothered to fix it.

# useradd -r -d /var/log/nagios -s /bin/sh -G apache -c "nagios" nagios

Then the standard unpack, configure, and make commands, plus a bunch of extra makes:

# tar -zxvf nagios-3.5.1.tar.gz
# cd nagios
# ./configure
# make all
# make install
# make install-init
# make install-commandmode
# make install-config
# make install-webconf
# make install-exfoliation

Nagios-plugins is pretty simple, just watch out for any plugins that get skipped due to dependencies, just in case you actually want them.

# tar -zxvf nagios-plugins-2.0.3.tar.gz
# cd nagios-plugins-2.0.3
# ./configure
# make
# make install

Check_MK is a little bit different. It comes with a setup.sh script that walks you through the configuration directories, then compiles itself and performs the install. Here are the answers I use to get everything installed under /usr/local/check_mk/:

# tar -zxvf check_mk-1.2.5i5p2.tar.gz
# cd check_mk-1.2.5i5p2
# ./setup.sh

Executable programs             /usr/local/bin
Check_MK configuration          /usr/local/check_mk/etc
Check_MK software               /usr/local/check_mk
documentation                   /usr/local/check_mk/doc
check manuals                   /usr/local/check_mk/doc/checks
working directory of Check_MK   /usr/local/check_mk/var/lib
extensions for agents           /usr/local/check_mk
configuration dir for agents    /usr/local/check_mk/etc
Name of Nagios user             nagios
User of Apache process          apache
Common group of Nagios+Apache   nagios
Nagios binary                   /usr/local/nagios/bin/nagios
Nagios main configuration file  /usr/local/nagios/etc/nagios.cfg
Nagios object directory         /usr/local/nagios/etc/check_mk.d
Nagios startskript              /etc/init.d/nagios
Nagios command pipe             /usr/local/nagios/var/rw/nagios.cmd
Check results directory         /usr/local/nagios/var/spool/checkresults
Nagios status file              /usr/local/nagios/var/status.dat
Path to check_icmp              /usr/local/nagios/libexec/check_icmp
URL Prefix for Web addons       /[SITE NAME]/    !! CHANGE THIS TO YOUR SITE NAME !!
Apache config dir               /etc/httpd/conf.d
HTTP authentication file        /usr/local/nagios/etc/htpasswd.users
HTTP AuthName                   Nagios Access
PNP4Nagios templates            /usr/local/pnp4nagios/share/templates
RRD files                       /usr/local/check_mk/pnp-rraconf
rrdcached socket                /tmp/rrdcached.sock
compile livestatus module       yes
Nagios / Icinga version         3.5.1
check_mk's binary modules       /usr/local/check_mk/lib
Unix socket for Livestatus      /usr/local/nagios/var/rw/live
Backends for other systems      /usr/local/check_mk/share/livestatus
Install Event Console           no

Pay very close attention to the URL Prefix for Web addons configuration value. This is going to be the key value for your site name in later parts of the Multisite configuration. To keep with my example, my two installations will use master and slave as the values here.

Lastly, we set up PNP4Nagios, which gives us the awesome RRD graphs for all our services. It also users the same site prefix, so be mindful:

# tar -zxvf pnp4nagios-0.6.24.tar.gz
# cd pnp4nagios-0.6.19
# ./configure --with-base-url=/[SITE NAME]/pnp4nagios
# make all
# make fullinstall    

Configuration Files

Now, let’s get configuring! For brevity, I’ve summarized my changes below:

/usr/local/nagios/etc/nagios.cfg:
    # Comment out the localhost config:
    #cfg_file=/usr/local/nagios/etc/objects/localhost.cfg

    cfg_dir=/usr/local/nagios/etc/check_mk.d
    broker_module=/usr/local/check_mk/lib/livestatus.o /usr/local/nagios/var/rw/live
    broker_module=/usr/local/pnp4nagios/lib/npcdmod.o config_file=/usr/local/pnp4nagios/etc/npcd.cfg
    use_syslog=0
    check_for_updates=0
    process_performance_data=1
    admin_email=root@localhost
    admin_pager=root@localhost

You’ll have to read Check_MK’s online documentation to understand what each option is doing, but this is a pretty good starter config. Each system will monitor hosts only accessible by it. For example, if your master site was local to you in San Francisco, it might monitor all your California resources and not have access to resources in your remote office branch in New York (thus the slave server).

Check_MK supports all kinds of configuration options, most of which rely on the tags defined in the all_hosts variable. You can set a different data-aquisition method for all kinds of systems (in my example, executing the local Check_MK agent directly & remotely via SSH). You could also use SNMP or UNIX sockets.

/usr/local/check_mk/etc/main.mk:
    all_hosts = [ 
      'master.example.com|local',
      'host-monitored-by-master.example.com|ssh',
    ]

    datasource_programs = [
      ( "/usr/bin/sudo /usr/local/check_mk/agents/check_mk_agent.linux", [ 'local' ], ALL_HOSTS ),
      ( "ssh -i ~nagios/.ssh/id_rsa nagios@<IP> sudo /usr/local/bin/check_mk_agent", [ 'ssh' ], ALL_HOSTS ),
    ]

    ipaddresses = {
      "master.example.com" : "127.0.0.1",
    }

    extra_service_conf["normal_check_interval"] = [
      ( '5', ALL_HOSTS, [ "" ] ),
    ]

    extra_host_conf["max_check_attempts"] = [
      ( '3', ALL_HOSTS ),
    ]

System Configuration and Cleanup

That’s basically it for configuration files to get started, now let’s configure the systems themselves.

Enable services to start at boot:

# chkconfig httpd on
# chkconfig nagios on
# chkconfig npcd on

Fix various SELinux (read: disable) and permissions:

# mkdir /usr/local/check_mk/var/lib/web/admin
# chown apache:nagios /usr/local/check_mk/var/lib/web/admin
# chmod 770 /usr/local/check_mk/var/lib/web/admin
# setenforce 0
# echo 'SELINUX=permissive' >> /etc/sysconfig/selinux

Add some sudo permissions for the nagios user:

# echo 'Defaults:nagios !requiretty' >> /etc/sudoers
# echo 'nagios ALL = (root) NOPASSWD: /usr/local/check_mk/agents/check_mk_agent.linux' >> /etc/sudoers

Create an inventory of services to monitor on our systems:

# check_mk -I master.example.com host-monitored-by-master.example.com

Rebuild and restart Nagios / Check_MK

# check_mk -R

Start the other services:

# service httpd start
# service npcd start

Create an HTTPD user and password (use the same htpasswd file on both systems):

# htpasswd -c -s /usr/local/nagios/etc/htpasswd.users nagiosadmin

Now to complete the installation of PNP4nagios, open http://master.example.com/[SITE NAME]/pnp4nagios/ in your browser, then remove or rename the installation script:

# mv /usr/local/pnp4nagios/share/install.php{,.orig}

Multisite Configuration

Once you repeat the above steps for the slave.example.com server, you’re ready to configure Multisite. You’ll want to create an SSH public/private keypair for the nagios user. This keypair will be used to establish the SSH tunnel, but nothing else. We’ll lock it down to keep things safe. The keypair should only reside on master.example.com, and we’ll copy the public key into slave.example.com’s authorized_keys file.

Master:

# su - nagios
$ mkdir .ssh
$ chmod 700 .ssh
$ cd .ssh
$ ssh-keygen -t rsa -b 4096 -f ./id_rsa
$ chmod 400 id_rsa
$ chmod 444 id_rsa.pub

Slave:

# su - nagios
$ mkdir .ssh
$ chmod 700 .ssh
$ cd .ssh
$ vi authorized_keys
    command="exit",no-pty,permitopen="localhost:80",permitopen="localhost:6557",permitopen="localhost:2000",permitopen="localhost:2001" ssh-rsa AAAAAAA..long-key..ZZZZZ

The options leading the authorized_keys file keeps the nagios user from being able to do pretty much anything except get our host data and forward a few ports. The only command it can run is “exit”, it can’t open a psuedo-terminal, and it can only forward the ports we’ve specified. Ports 80 (HTTP), 6557 (check_mk), 2000 & 2001 (SSH tunnel status checks).

We’re going to start by setting up xinetd on slave.example.com — don’t worry, it’ll only listen on localhost.

# yum install -y xinetd
# cat << 'END' > /etc/xinetd.d/livestatus
    service livestatus {
      bind            = 127.0.0.1
      type            = UNLISTED
      port            = 6557
      socket_type     = stream
      protocol        = tcp
      wait            = no
      cps             = 100 3
      instances       = 500
      per_source      = 250
      flags           = NODELAY
      user            = nagios
      server          = /usr/local/bin/unixcat
      server_args     = /usr/local/nagios/var/rw/live
      only_from       = 127.0.0.1 ::1
      disable         = no
      log_type        = SYSLOG daemon info
    }
  END 
# chkconfig xinetd on
# service xinetd start

Now on master.example.com we’re going to setup autossh to establish and maintain the SSH tunnel.

# yum install -y autossh
# su - nagios
$ cat << 'END' > ~/autossh.bash
    #!/bin/bash

    autossh -f -M 2000 -i ~nagios/.ssh/id_rsa  -L 8081:localhost:80 -L 6558:localhost:6557 -N nagios@slave.example.com
  END
$ chmod a+x autossh.bash
$ crontab -e
    @reboot bash ~/autossh.bash
$ ./autossh.bash

Lastly we need to tell Multisite about our other site and set up HTTPD to proxy the requests through our tunnel.

# vi /usr/local/check_mk/etc/multisite.mk
   sites = {
     "master" : {
       "alias" : "Master Site"
     },
     "slave": {
       "alias" : "Slave Site",
       "socket": "tcp:localhost:6558",
       "url_prefix": "/slave/",
     },
   }

# vi /etc/httpd/conf/httpd.conf
   <Location /slave>
       RewriteEngine On
       RewriteRule ^/.+/slave/(.*) http://localhost:8081/slave/$1 [P]
   </Location>

Go ahead and restart Nagios/Check_MK and HTTPD, and you should be all set.

# check_mk -R
# service httpd restart

If all went according to plan, you should be able to go to http://master.example.com/master/check_mk/ and see the systems monitored by both the master and slave servers! You should probably go ahead and configure HTTPD to use SSL, as well as configure iptables or another suitable software/hardware filewall to limit traffic appropriately.