My netflow resolver had a slight problem with its first go around.

To sum up the problem, we have a server that sends us 2 netflow feeds. We receive the netflow and pipe it through some commands to get it to the point of ascii text that we then dump in a text file.

Then, when we start resolving things, the resolutions go to our site DNS server which then talks to the outside world. These external queries that our DNS servers are making are generating more netflow which we then log and that means more stuff to resolve via DNS. You can see the vicious loop that starts.

The end result of this is that node would run at 100% CPU. In addition to that, it eventually killed itself because it ran out of memory.

So I made some changes. First, I added the ability to blacklist certain flows from processing. So I just skip over any flows that are related to the DNS servers. This takes a HUGE load off of the whole system.

Next, I added the ability to blacklist certain IP addresses from resolution. So I now exclude addresses in each flow record that are "on site" addresses. This takes another huge load off the resolver.

Where did these changes bring me?

node now runs consistently at ~10% CPU. The total amount of RAM used is ~60 meg. The resolver stack tends to consistently have 1500 - 2000 jobs waiting to return from the DNS servers; aka, it keeps up.

This is really sweet. node ftw

Part of this node.js twiddling has been me learning the nuances of javascript. Among other things was learning about callback and variable scope. I've included the stackoverflow links that REALLY helped me learn this stuff.