[AusNOG] Network Management and Tools

Mon Jul 5 11:35:45 EST 2010

On 05/07/10 09:56, David Hughes wrote:
> On 05/07/2010, at 8:38 AM, Julien Goodwin wrote:
>>> What do you use for
>>>
>>>   * Alarm/event management (SNMP traps, syslog)
>> Nagios, although we poll and don't use traps.
>>
>>>   * Performance management (SNMP Polls)
>> Cacti, have been trying ObserverNMS
> 
> This is one thing that amazes me about FOSS monitoring systems.  Everyone thinks its normal and totally acceptable to have 2 different systems for realtime monitoring and historical reporting.  So,  for example I have nagios poll a device to ensure CPU is within desired bounds.  I then have cacti poll the same device for the same OID so that it can store the data in an RRD file so I can look at a graph.
> 
> I know we all do it but why do we all think it's perfectly fine to collect almost every single monitoring metric twice ?  What an amazingly inefficient way of doing this.  Problem is I can't find anything that I'd be happy to use as a replacement for nagios and cacti.  Surely it would make total sense to extend nagios so it actually remembered the data it collected rather than just doing a real-time evaluation and then throwing the data away.  There are some nagios rrd type hacks but, really, how hard could this really be.

In my case it's because alterting and stats are two different sets of
data, that only overlap about 10%.

For example we alert on things like bad ram or half duplex links on our
servers, alarms on our Juniper gear that just don't make sense to graph.
We graph things like interface utilisation that just don't make sense
(for us) to alert on.