[AusNOG] NIC Packets of death

Mon Feb 11 10:51:22 EST 2013

Here is some information that might help.

Testing http://www.kriskinc.com/intel-pod

Intel Packet of Death Testing:
As described in my blog post here 
<http://blog.krisk.org/2013/02/packets-of-death.html> I experienced an 
issue with certain Intel ethernet controllers. Here's how to see if your 
controllers are affected.

For this simplified test you'll need two machines (one to replay the 
packet and one to receive it) and you'll need to be on the same ethernet 
segment.  No routers or VLAN aware switches should be in the mix (but 
dumb switches/hubs should be fine).

 1. On the replay machine install tcpreplay <http://tcpreplay.synfin.net/>.
 2. Connect the receiving machine to the network and bring the interface
    up (IP address doesn't matter).
 3. Replay one (or all) of the packets attached to this post from the
    replay machine:

/sudo tcpreplay -v -i [transmitting interface] [pcap name]/

Example:

/sudo tcpreplay -v -i eth1 pod-icmp-ping.pcap/

If your controllers are affected the ethernet interface will lose link.  
In many circumstances the only way to get the controller to work again 
is to physically power off the machine and power it back on.

NOTE: These packets will be sent to the ethernet broadcast address (to 
simplify testing).  If you are affected by this issue it will take down 
all of the ethernet interfaces on the connected network.  If that is of 
concern you should use tcpreplay-edit to set a specific destination 
ethernet address:

/sudo tcpreplay-edit --enet-dmac=00:11:22:33:44:55 -v -i eth1 
pod-icmp-ping.pcap/

Where "00:11:22:33:44:55" is the MAC address of the machine you'd like 
to test.

Fixing:

As news of this issue spreads further some controllers are affected and 
some aren't.That's more or less what I expected. Here's what I know 
about fixingthis.

It has been my understanding that Intel provides at least two EEPROM 
versions for this chip: one withBMC enabled and one without.My 
controllers do not have BMC enabled, therefore my fix only applies to 
non-BMC enabled controllers. This is unfortunate because theBMCenabled 
controllers seem to be much more widely used. Even with thatother than 
the very basics (MAC address and checksum) I don't know the meaning of 
these values. Another reason not to reprogram the EEPROM on your NIC 
based on what some guy on the internet told you.

With that being said hereis a diff between an affected EEPROM and a good 
EEPROM:

OffsetValues

-0x0010:ff ff ff ff 6b 02 00 00 86 80 d3 10 ff ff 5a c0
+0x0010:01 01 ff ff 6b 02 d3 10 d9 15 d3 10 ff ff 58 85

-0x0030:c9 6c 50 31 3e 07 0b 46 84 2d 40 01 00 f0 06 07
+0x0030:c9 6c 50 21 3e 07 0b 46 84 2d 40 01 00 f0 06 07

-0x0060:ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
+0x0060:20 01 00 40 16 13 ff ff ff ff ff ff ff ff ff ff

Where the "-" lines were the bad EEPROM and the "+" lines were the good 
EEPROM.

Under Linux you can view these values with ethtool:

/# ethtool -e [interface]/

and Intel's Official statement

Recently there were a few stories published, based on a blog post by an 
end-user, suggesting specific network packets may cause the Intel® 
82574L Gigabit Ethernet Controller to become unresponsive until 
corrected by a full platform power cycle.

Intel was made aware of this issue in September 2012 by the blog's 
author. Intel worked with the author as well as the original motherboard 
manufacturer to investigate and determine root cause. Intel root caused 
the issue to the specific vendor's mother board design where an 
incorrect EEPROM image was programmed during manufacturing. We 
communicated the findings and recommended corrections to the motherboard 
manufacturer.

*It is Intel's belief that this is an implementation issue isolated to a 
specific manufacturer, not a design problem with the Intel 82574L 
Gigabit Ethernet controller.* Intel has not observed this issue with any 
implementations which follow Intel's published design guidelines. Intel 
recommends contacting your motherboard manufacturer if you have 
continued concerns or questions whether your products are impacted.

Mark

On 10/02/2013 2:42 PM, Daniel O'Connor wrote:
> On 09/02/2013, at 20:56, Edwin Groothuis <edwin at mavetju.org> wrote:
>
>> On 7/02/13 10:37 , Heinz N wrote:
>>> Seems that certain packets can completely bring down certain Intel chipset network controllers.
>>>
>>> http://blog.krisk.org/2013/02/packets-of-death.html
>>>
>> http://communities.intel.com/community/wired/blog/2013/02/07/intel-82574l-gigabit-ethernet-controller-statement
>>
>> Intel blames it on a faulty EEPROM, but they don't say which mother board manufacturer.
>
> They don't provide any tools or instructions for testing the problem either.
>
> Hopeless :-/
>
> --
> Daniel O'Connor software and network engineer
> for Genesis Software - http://www.gsoft.com.au
> "The nice thing about standards is that there
> are so many of them to choose from."
>    -- Andrew Tanenbaum
> GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C
>
>
>
>
>
>
> _______________________________________________
> AusNOG mailing list
> AusNOG at lists.ausnog.net
> http://lists.ausnog.net/mailman/listinfo/ausnog
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ausnog.net/pipermail/ausnog/attachments/20130211/afa1a1aa/attachment.html>