Discussing a lesser-known field in sk_buff: nohdr

Today I encountered an interesting question, "What is the use of the 'nohdr' field in sk_buff?" I will write a brief article here to document it.

Main Content#

Background#

First of all, regardless of how obscure a field is, since it involves SKBUFF, let's start by giving a brief introduction to sk_buff.

In short, sk_buff is the core data structure of the Linux networking subsystem. From the link layer to our final operations on packets, sk_buff is always involved.

To fully explain sk_buff would be equivalent to explaining the entire Linux networking system, which is impossible to do completely in one go, or even in a lifetime!

Let's briefly discuss a few key points that may help everyone understand the lesser-known field mentioned in this article, nohdr.

Firstly, let's talk about the three most important fields: data, mac, and nh, which respectively represent the starting addresses of the data area, the L2 header, and the L3 header of the current sk_buff. Here's a diagram to help everyone understand:

sk_buff trio

After looking at the diagram, you may have some understanding. In the kernel, new headers are added layer by layer through pointer offsets to process network requests. This aligns with our intuition. Some of you may ask, since I know the starting address of the L3 header, can't I calculate the offset of L4 and handle it manually?

Bingo! In the kernel, there is a data structure called tcphdr (corresponding to iphdr for IP). By casting based on the offset, you can handle it manually. However, the detailed method will be discussed later.

Next, let's talk about two important fields, len and data_len. Both of these fields indicate the length of the data, but in short, len represents the total length of all the data in the current sk_buff (including the headers and payload of the current protocol), while data_len represents the length of the current valid data (i.e., the length of the current protocol's payload).

OK, that concludes the background.

About nohdr#

Two flowers bloom, each showing its own beauty. After discussing some background knowledge about sk_buff, let's talk about the field nohdr. To be honest, this field is really obscure.

Firstly, there is an official description for it:

The 'nohdr' field is used in the support of TCP Segmentation Offload ('TSO' for short). Most devices supporting this feature need to make some minor modifications to the TCP and IP headers of an outgoing packet to get it in the right form for the hardware to process. We do not want these modifications to be seen by packet sniffers and the like. So we use this 'nohdr' field and a special bit in the data area reference count to keep track of whether the device needs to replace the data area before making the packet header modifications.

Hmm, this paragraph is a bit convoluted. Firstly, I'm sure most people are familiar with TSO. It uses the network card to segment large packets (the implementation of GSO/TSO under Linux can be discussed in another article). In this case, the network card may need to make some minor modifications to the header to perform the segmentation.

However, sometimes we don't need to care about the modified header for the packets at the L4 layer, we only need to focus on the payload. So how do we handle this? This is where nohdr comes into play.

Here, the effectiveness of nohdr needs to be combined with another field, dataref. dataref is a counter field that specifically refers to how many sk_buffs reference the data area pointed to by the data field. There are two situations here:

When nohdr is 0, the value of dataref represents the reference count of the data area.
When nohdr is 1, the higher 16 bits represent the reference count of the payload data area, and the lower 16 bits represent the reference count of the data area.

The official documentation describes it as follows:

/* We divide dataref into two halves. The higher 16 bits hold references * to the payload part of skb->data. The lower 16 bits hold references to * the entire skb->data. It is up to the users of the skb to agree on * where the payload starts.

* * All users must obey the rule that the skb->data reference count must be * greater than or equal to the payload reference count.

* * Holding a reference to the payload part means that the user does not * care about modifications to the header part of skb->data.

*/ 
#define SKB_DATAREF_SHIFT 16 #define SKB_DATAREF_MASK ((1 << SKB_DATAREF_SHIFT) - 1)

Actually, it's not too difficult to understand why it is designed this way. Firstly, when we retrieve packets in the kernel, sometimes we don't need to care about the specific headers, only the payload. Therefore, we need to handle the reference count of the payload separately to ensure its correctness. This ensures that the data won't be released by the kernel before we finish processing it. Of course, when working with this, you need to ensure that the reference count of the data area is greater than the reference count of the payload (it feels like a "convention over configuration" approach here). (Of course, not following this convention will result in a kernel dump, haha.)

Finally, our kernel also uses dataref to release the memory space of the data area at the appropriate time. The release conditions are as follows:

!skb->cloned: The skb is not cloned.
!atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, &skb_shinfo(skb)->dataref) This determines whether the data area needs to be released when nohdr is 1. When nohdr is 0, it is determined by dataref-1.

Conclusion#

That's about it for this article... nohdr is truly a very obscure field. Well, because some of the references for this article were looked up on the subway... I'm too lazy to list them in the article... That's about it... I'm off to solve some problems now...