[AusNOG] Data Retention and CGNAT - educational exercise

Thu Mar 26 00:16:08 EST 2015

G’Day Noggers,

Just a little light hearted post to get the minds ticking.  There has been a bit of talk about the data retention requirements of late, what’s required to be kept, and what CGNAT might mean for the complexity and amount of information to keep.

I’m not sure if providers are asked just to keep information of "who had what address” at a given time or if full records of sessions made (including src and dst address) are required such that you need to keep logs of flows somehow, be it netflow or some other way.  It has been mentioned that if it were the former, keeping just RADIUS logs would not be enough in an environment where CGNAT is used.  To get a successful match of “who had what address and port” at a given time, you’re going to have to be logging all the translations through a NAT appliance.  An exercise which can be expensive for the amount of data you need to keep.  Even with the deployment of ipv6 where more traffic becomes native and you’ll have to look after less translations, this is something that might be still required for a time yet! ;)

I’m curious of another way to do NAT to reduce the amount of data you need to keep and have an idea.  A little background, a single IP address can be translated a lot of times.  A TCP/UDP IP packet can have 65535 ports.  The translation table can be made up of a tuple of src address, src port, dst port AND destination address.  The last part means you can use the same address AND PORT for multiple translations to a bunch of different hosts out there on the internet.  This means if the NAT appliance was capable of it, there is a ton of translations that can occur on just a single IP and Port.

The idea to reduce the amount of data required to be collected in a CGNAT environment is to allocate a user behind the CGNAT a static range of ports (let’s say 100 for conversation sake, perhaps ip 1.1.1.1, ports 400-500).  This user could potentially make hundreds of thousands of different connections only using that single IP/Port combination.  With this static ip/port allocation, instead of logging every single translation/session, you can just log what user “owns this port”.  Because the destination address and port are part of the tuple identifying the translation, the only time a customer realistically needs to use more than one public address/port combo is when they make concurrent connections to the same destination address.  The idea of allocating say 100 ports means they could make 100 concurrent connections to a destination.  The magic CGNAT appliance here in question could hopefully be smart enough to notice when 50 ports are used, to allocate (and log the allocation) of another chunk of say 100 ports on some address to use for that customer (with some arbitrary upper limit)

I guess the question I’m curious to ask the masses of network geeks is do you see this as a viable way of performing CGNAT on the average consumer base?  Would it work?  I think it’d be useful to be able to more quickly resolve those “who” questions and reduce the amount of data kept in a CGNAT environment.

Thanks!
~ Scott O'Brien