# Injecting TCP segments¶

Packet capture tools like tcpdump and Wireshark are very useful to observe the segments that transport protocols exchange. They are also very useful to understand and debug network problems as we’ll discuss in subsequent labs. TCP is a complex protocol that has evolved a lot since its first specification RFC 793. TCP includes a large number of heuristics that influence the reaction of a TCP implementation to various types of events. A TCP implementation interacts with the application through the socket API. Recently, several researchers from Google proposed packetdrill [CCB+2013]. packetdrill is a TCP test suite that was designed to develop unit tests to verify the correct operation of a TCP implementation. A detailed description of packetdrill in [CCB+2013]. packetdrill uses a syntax which is a mix between the C language and the tcpdump syntax. To understand the operation of packetdrill, it is useful to study several examples in details. The TCP implementation in the Linux kernel supports all the recent TCP extensions to improve its performance. For pedagogical reasons, we disable [1] most of these extensions to use a simple TCP stack.

Let us start with a very simple example that uses packetdrill to open a TCP connection on a server running on the Linux kernel. A packetdrill script is a sequence of lines that are executed one after the other. Each of these lines can specify :

• packetdrill executes a system call and verifies its return value
• packetdrill injects [2] a packet in the instrumented Linux kernel as if it were received from the network
• packetdrill compares a packet transmitted by the instrumented Linux kernel with the packet that the script expects

Each line starts with a timing parameter that indicates at what time the event specified on this line should happen. packetdrill supports absolute and relative timings. An absolute timing is simply a number that indicates the delay in seconds between the start of the script and the event. A relative timing is indicated by using + followed by a number. This number is then the delay in seconds between the previous event and the current line. Additional informations may be found in [CCB+2013].

For this first example, we will program packetdrill to behave as a client that attempts to create a connection. The first step is thus to prepare a socket that can be used to accept this connection. This socket can be created by using the four system calls below.

// create a TCP socket. Since stdin, stdout and stderr are already defined,
// the kernel will assign file descriptor 3 to this socket
// 0 is the absolute time at which the socket is created
0   socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0  setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
// binds the created socket to the available addresses
+0  bind(3, ..., ...) = 0
// configure the socket to accept incoming connections
+0  listen(3, 1) = 0


At this point, the socket is ready to accept incoming TCP connections. packetdrill needs to inject a TCP segment in the instrumented Linux stack. This can be done with the line below.

+0  < S 0:0(0) win 1000 <mss 1000>


packetdrill uses a syntax that is very close to the tcpdump syntax. The +0 timing indicates that the line is executed immediately after the previous event. The < sign indicates that packetdrill injects a TCP segment and the S character indicates that the SYN flag must be set. Like tcpdump, packetdrill uses sequence numbers that are relative to initial sequence number. The three numbers that follow are the sequence number of the first byte of the payload of the segment (0), the sequence number of the last byte of the payload of the segment (0 after the semi-column) and the length of the payload (0 between brackets) of the SYN segment. This segment does not contain a valid acknowledgement but advertises a window of 1000 bytes. All SYN segments must also include the MSS option. In this case, we set the MSS to 1000 bytes. The next line of the packetdrill script is to verify the reply sent by the instrumented Linux kernel.

+0  > S. 0:0(0) ack 1 <...>


This TCP segment is sent immediately by the stack. The SYN flag is set and the dot next to the S character indicates that the ACK flag is also set. The SYN+ACK segment does not contain any data but its acknowledgement number is set to 1 (relative to the initial sequence number). The packetdrill script does not match the window size advertised in the TCP segment nor the TCP options (<...>).

The third segment of the three-way handshake is sent by packetdrill after a delay of 0.1 seconds. The connection is now established and the accept system call will succeed.

+.1 < . 1:1(0) ack 1 win 1000
+0  accept(3, ..., ...) = 4


The accept system call returns a new file descriptor, in this case value 4. At this point, packetdrill can write data on the socket or inject packets.

+0 write(4, ..., 10)=10
+0 > P. 1:11(10) ack 1
+.1 < . 1:1(0) ack 11 win 1000


packetdrill writes 10 bytes of data through the write system call. The stack immediately sends these 10 bytes inside a segment whose Push flag is set [3]. The payload starts at sequence number 1 and ends at sequence number 10. packetdrill replies by injecting an acknowledgement for the entire data after 100 milliseconds.

packetdrill can also inject data that will be read by the stack as shown by the lines below.

+.1 < P. 1:3(2) ack 11 win 4000
+0 > . 11:11(0) ack 3


In the example above, packetdrill injects a segment containing two bytes. This segment is acknowledged and after that the read system call succeeds and reads the available data with a buffer of 1000 bytes. It returns the amount of read bytes, i.e. 2.

We can now close the connection gracefully. Let us first issue inject a segment with the FIN flag set.

//Packetdrill closes connection gracefully
+0 < F. 3:3(0) ack 11 win 4000
+0 > . 11:11(0) ack 4


packetdrill injects the FIN segment and the instrumented kernel returns an acknowledgement. If packetdrill issues the close system call, the kernel will send a FIN segment to terminate the connection. packetdrill injects an acknowledgement to confirm the end of the connection.

+0 close(4) = 0
+0 > F. 11:11(0) ack 4
+0 < . 4:4(0) ack 12 win 4000


The complete packetdrill script is available from /exercises/packetdrill_scripts/connect.pkt

packetdrill can be used to explore in details the operation of the Linux TCP implementation to understand how it reacts to system calls and the reception of packets.

1. A first interesting point to explore is how TCP reacts with out-of-order segments. Consider the packetdrill script shown below :
0   socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0  setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0  bind(3, ..., ...) = 0
+0  listen(3, 1) = 0

//TCP three-way handshake
+0  < S 0:0(0) win 4000 <mss 1000>
+0  > S. 0:0(0) ack 1 <...>
+.1 < . 1:1(0) ack 1 win 1000
+0  accept(3, ..., ...) = 4

+0 < P. 1:201(200) win 4000
+0 > . 1:1(0) ack 201

+0 < P. 231:251(20) win 4000
+0 > . 1:1(0) ack 201

1. packetdrill now issues a FIN segment to indicate that all data has been transmitted.

+0 < F. 251:251(0) win 257

1. A second topic that we can explore with packetdrill are the retransmissions when there are packet losses. TCP uses a mix of go-back-n and selective repeat to retransmit the missing segments. When the retransmission timer expires, it retransmits one segment due to the congestion control scheme, see below :
0   socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0  setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0  bind(3, ..., ...) = 0
+0  listen(3, 1) = 0

+0  < S 0:0(0) win 4000 <mss 1000>
+0  > S. 0:0(0) ack 1 <...>
+.1 < . 1:1(0) ack 1 win 4000
+0  accept(3, ..., ...) = 4

+0  write(4, ..., 1000) = 1000
+0  > P. 1:1001(1000) ack 1
+.1 < . 1:1(0) ack 1001 win 4000

+0  write(4, ..., 2000) = 2000
+0  > . 1001:2001(1000) ack 1
+0  > P. 2001:3001(1000) ack 1

// timeout

+0.3  > . 1001:2001(1000) ack 1
+0.6  > . 1001:2001(1000) ack 1
+1.2  > . 1001:2001(1000) ack 1


Note that TCP applies an exponential backoff to the retransmission timer that doubles after each expiration.

1. The TCP state machine allows two hosts to simultaneously open a TCP connection. In this case, both the clients and the server start the connection by sending a SYN segment. The following packetdrill script demonstrates this simultaneous establishment of a connection.
+0   socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0

// Establish connection
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0 > S 0:0(0) <...>
+0 < S 0:0(0) win 5792 <mss 1000>
+0 > S. 0:0(0) ack 1 <...>
+0 < . 1:1(0) ack 1 win 5792

+0 < F. 1:1(0) ack 1 win 5792
+0 > . 1:1(0) ack 2

//Kernel closes connection gracefully
+0 close(3) = 0
+0 > F. 1:1(0) ack 2
+0 < . 2:2(0) ack 2 win 5792

1. A TCP connection can be terminated gracefully by exchaning FIN segments. In practice, since these segments can be exchanged at any time, there are multiple ways to express a graceful connection release in packetdrill

Consider a TCP connection where no data has been exchanged that needs to be gracefully closed. The connection starts as follows :

0   socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0  setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0  bind(3, ..., ...) = 0
+0  listen(3, 1) = 0
+0  < S 0:0(0) win 1000
+0  > S. 0:0(0) ack 1 <...>
+.1 < . 1:1(0) ack 1 win 1000
+0  accept(3, ..., ...) = 4


Footnotes

 [1] On Linux, most of the parameters to tune the TCP stack are accessible via sysctl. The /exercises/packetdrill_scripts/sysctl-cnp3.conf file contains all the sysctl variables that we change to disable these various TCP extensions.
 [2] By default, packetdrill uses port 8080 when creating TCP segments. You can thus capture the packets injected by packetdrill and the responses from the stack by using  tcpdump -i any -n port 8080
 [3] The Push flag is one of the TCP flags defined in RFC 793. TCP stacks usually set this flag when transmitting a segment that empties the send buffer. This is the reason why we observe this push flag in our example.