I’ve detailed the work I’ve done with nagios at the various organizations I’ve
worked at. If you have any questions, just ping me.
Rackspace (April 2013 - October 2014)
Ongoing design and implementation of nagios infrastructure to manage 50,000+ machines and thousands of service checks
Development of custom nagios checks, in bash and python, to monitor in house applications
Developed custom generator scripts to allow co-workers to re-generate the nagios infrastructure on the fly for all new and existing cells
Involved in the development of an in-house application that aggregates nagios checks using the nagios API and provides for automated fixing of problems
Involved in the constant monitoring, and remediation, of alerts to support the organization’s standards for uptime
FNAL (August 2006 - April 2013)
Complete design and implementation of nagios infrastructure to manage 145+ servers and network attached devices covering 750+ service and health checks.
Development of custom nagios checks to monitor in-house applications
Single-handedly took control of all system administration tasks for a group that managed almost 100 servers.
Self-starter on the project. Required no guidance from management, relieving the burden on them.
Proved to management that a single individual with the right tools is capable of managing a large number of systems. This saves the organization money in the form of head-count, as those resources can instead be focused on devlivering great products.