Writing simple networked applications¶

Networked applications were usually implemented by using the socket API. This API was designed when TCP/IP was first implemented in the Unix BSD operating system [Sechrest] [LFJLMT], and has served as the model for many APIs between applications and the networking stack in an operating system. Although the socket API is very popular, other APIs have also been developed. For example, the STREAMS API has been added to several Unix System V variants [Rago1993]. The socket API is supported by most programming languages and several textbooks have been devoted to it. Users of the C language can consult [DC2009], [Stevens1998], [SFR2004] or [Kerrisk2010]. The Java implementation of the socket API is described in [CD2008] and in the Java tutorial. In this section, we will use the python implementation of the socket API to illustrate the key concepts. Additional information about this API may be found in the socket section of the python documentation .

The socket API is quite low-level and should be used only when you need a complete control of the network access. If your application simply needs, for instance, to retrieve data with HTTP, there are much simpler and higher-level APIs.

A detailed discussion of the socket API is outside the scope of this section and the references cited above provide a detailed discussion of all the details of the socket API. As a starting point, it is interesting to compare the socket API with the service primitives that we have discussed in the previous chapter. Let us first consider the connectionless service that consists of the following two primitives :

DATA.request(destination,message) is used to send a message to a specified destination. In this socket API, this corresponds to the send method.

DATA.indication(message) is issued by the transport service to deliver a message to the application. In the socket API, this corresponds to the return of the recv method that is called by the application.

The DATA primitives are exchanged through a service access point. In the socket API, the equivalent to the service access point is the socket. A socket is a data structure which is maintained by the networking stack and is used by the application every time it needs to send or receive data through the networking stack. The socket method in the python API takes two main arguments :

an address family that specifies the type of address family and thus the underlying networking stack that will be used with the socket. This parameter can be either socket.AF_INET or socket.AF_INET6. socket.AF_INET, which corresponds to the TCP/IPv4 protocol stack is the default. socket.AF_INET6 corresponds to the TCP/IPv6 protocol stack.

a type indicates the type of service which is expected from the networking stack. socket.STREAM (the default) corresponds to the reliable bytestream connection-oriented service. socket.DGRAM corresponds to the connectionless service.

A simple client that sends a request to a server is often written as follows in descriptions of the socket API.

# A simple client of the connectionless service
import socket
import sys
HOSTIP=sys.argv[1]
PORT=int(sys.argv[2])
MSG="Hello, World!"
s = socket.socket( socket.AF_INET, socket.SOCK_DGRAM ) 
s.sendto( MSG, (HOSTIP, PORT) )

A typical usage of this application would be

python client.py 127.0.0.1 12345

where 127.0.0.1 is the IPv4 address of the host (in this case the localhost) where the server is running and 12345 the port of the server.

The first operation is the creation of the socket. Two parameters must be specified while creating a socket. The first parameter indicates the address family and the second the socket type. The second operation is the transmission of the message by using sendto to the server. It should be noted that sendto takes as arguments the message to be transmitted and a tuple that contains the IPv4 address of the server and its port number.

The code shown above supports only the TCP/IPv4 protocol stack. To use the TCP/IPv6 protocol stack the socket must be created by using the socket.AF_INET6 address family. Forcing the application developer to select TCP/IPv4 or TCP/IPv6 when creating a socket is a major hurdle for the deployment and usage of TCP/IPv6 in the global Internet [Cheshire2010]. While most operating systems support both TCP/IPv4 and TCP/IPv6, many applications still only use TCP/IPv4 by default. In the long term, the socket API should be able to handle TCP/IPv4 and TCP/IPv6 transparently and should not force the application developer to always specify whether it uses TCP/IPv4 or TCP/IPv6.

Another important issue with the socket API as supported by python is that it forces the application to deal with IP addresses instead of dealing directly with domain names. This limitation dates from the early days of the socket API in Unix 4.2BSD. At that time, the DNS was not widely available and only IP addresses could be used. Most applications rely on DNS names to interact with servers and this utilisation of the DNS plays a very important role to scale web servers and content distribution networks. To use domain names, the application needs to perform the DNS resolution by using the getaddrinfo method. This method queries the DNS and builds the sockaddr data structure which is used by other methods of the socket API. In python, getaddrinfo takes several arguments :

a name that is the domain name for which the DNS will be queried

an optional port number which is the port number of the remote server

an optional address family which indicates the address family used for the DNS request. socket.AF_INET (resp. socket.AF_INET6) indicates that an IPv4 (IPv6) address is expected. Furthermore, the python socket API allows an application to use socket.AF_UNSPEC to indicate that it is able to use either IPv4 or IPv6 addresses.

an optional socket type which can be either socket.SOCK_DGRAM or socket.SOCK_STREAM

In today’s Internet hosts that are capable of supporting both IPv4 and IPv6, all applications should be able to handle both IPv4 and IPv6 addresses. When used with the socket.AF_UNSPEC parameter, the socket.getaddrinfo method returns a list of tuples containing all the information to create a socket.

import socket
socket.getaddrinfo('www.example.net',80,socket.AF_UNSPEC,socket.SOCK_STREAM)
[ (30, 1, 6, '', ('2001:db8:3080:3::2', 80, 0, 0)),
  (2, 1, 6, '', ('203.0.113.225', 80))]

In the example above, socket.getaddrinfo returns two tuples. The first one corresponds to the sockaddr containing the IPv6 address of the remote server and the second corresponds to the IPv4 information. Due to some peculiarities of IPv6 and IPv4, the format of the two tuples is not exactly the same, but the key information in both cases are the network layer address (2001:db8:3080:3::2 and 203.0.113.225) and the port number (80). The other parameters are seldom used.

socket.getaddrinfo can be used to build a simple client that queries the DNS and contact the server by using either IPv4 or IPv6 depending on the addresses returned by the socket.getaddrinfo method. The client below iterates over the list of addresses returned by the DNS and sends its request to the first destination address for which it can create a socket. Other strategies are of course possible. For example, a host running in an IPv6 network might prefer to always use IPv6 when IPv6 is available [1]. Another example is the happy eyeballs approach which is being discussed within the IETF [WY2011]. For example, [WY2011] mentions that some web browsers try to use the first address returned by socket.getaddrinfo. If there is no answer within some small delay (e.g. 300 milliseconds), the second address is tried.

import socket
import sys
HOSTNAME=sys.argv[1]
PORT=int(sys.argv[2])
MSG="Hello, World!"
for a in socket.getaddrinfo(HOSTNAME, PORT, socket.AF_UNSPEC,socket.SOCK_DGRAM,0, socket.AI_PASSIVE) :
    address_family,sock_type,protocol,canonicalname, sockaddr=a
    try:
        s = socket.socket(address_family, sock_type) 
    except socket.error:
        s = None
        print "Could not create socket"
        continue
    if s is not None:
        s.sendto(MSG, sockaddr)
        break

Now that we have described the utilisation of the socket API to write a simple client using the connectionless transport service, let us have a closer look at the reliable byte stream transport service. As explained above, this service is invoked by creating a socket of type socket.SOCK_STREAM. Once a socket has been created, a client will typically connect to the remote server, send some data, wait for an answer and eventually close the connection. These operations are performed by calling the following methods :

socket.connect : this method takes a sockaddr data structure, typically returned by socket.getaddrinfo, as argument. It may fail and raise an exception if the remote server cannot be reached.

socket.send : this method takes a string as argument and returns the number of bytes that were actually sent. The string will be transmitted as a sequence of consecutive bytes to the remote server. Applications are expected to check the value returned by this method and should resend the bytes that were not send.

socket.recv : this method takes an integer as argument that indicates the size of the buffer that has been allocated to receive the data. An important point to note about the utilisation of the socket.recv method is that as it runs above a bytestream service, it may return any amount of bytes (up to the size of the buffer provided by the application). The application needs to collect all the received data and there is no guarantee that some data sent by the remote host by using a single call to the socket.send method will be received by the destination with a single call to the socket.recv method.

socket.shutdown : this method is used to release the underlying connection. On some platforms, it is possible to specify the direction of transfer to be released (e.g. socket.SHUT_WR to release the outgoing direction or socket.SHUT_RDWR to release both directions).

socket.close: this method is used to close the socket. It calls socket.shutdown if the underlying connection is still open.

With these methods, it is now possible to write a simple HTTP client. This client operates over both IPv6 and IPv4 and writes the homepage of the remote server on the standard output. It also reports the number of socket.recv calls that were used to retrieve the homepage [2] .

#!/usr/bin/python 
# A simple http client that retrieves the first page of a web site

import socket, sys

if len(sys.argv)!=3 and len(sys.argv)!=2:
    print "Usage : ",sys.argv[0]," hostname [port]"

hostname = sys.argv[1]
if len(sys.argv)==3 :
    port=int(sys.argv[2])
else:
    port = 80

READBUF=16384   # size of data read from web server
s=None

for res in socket.getaddrinfo(hostname, port, socket.AF_UNSPEC, socket.SOCK_STREAM): 
    af, socktype, proto, canonname, sa = res
    # create socket
    try:
        s = socket.socket(af, socktype, proto)
    except socket.error:
        s = None
        continue
    # connect to remote host
    try:
        print "Trying "+sa[0]
        s.connect(sa)
    except socket.error, msg:
        # socket failed
        s.close()
        s = None
        continue
    if s :
        print "Connected to "+sa[0]
        s.send('GET / HTTP/1.1\r\nHost:'+hostname+'\r\n\r\n')
        finished=False
        count=0
        while not finished:
            data=s.recv(READBUF)
            count=count+1
            if len(data)!=0:
                print repr(data)
            else:
                finished=True
        s.shutdown(socket.SHUT_WR)        
        s.close()
        print "Data was received in ",count," recv calls"
        break

As mentioned above, the socket API is very low-level. This is the interface to the transport service. For a common and simple task, like retrieving a document from the Web, there are much simpler solutions. For example, the python standard library includes several high-level APIs to implementations of various application layer protocols including HTTP. For example, the httplib module can be used to easily access documents via HTTP.

#!/usr/bin/python 
# A simple http client that retrieves the first page of a web site, using
# the standard httplib library

import httplib, sys

if len(sys.argv)!=3 and len(sys.argv)!=2:
 print "Usage : ",sys.argv[0]," hostname [port]"
 sys.exit(1)
 
path = '/'
hostname = sys.argv[1]
if len(sys.argv)==3 :
 port = int(sys.argv[2])
else:
 port = 80

conn = httplib.HTTPConnection(hostname, port)
conn.request("GET", path)
r = conn.getresponse()
print "Response is %i (%s)" % (r.status, r.reason)
print r.read()

Another module, urllib2 allows the programmer to directly use URLs. This is much more simpler than directly using sockets.

But simplicity is not the only advantage of using high-level libraries. They allow the programmer to manipulate higher-level concepts ( e.g. I want the content pointed by this URL) but also include many features such as transparent support for the utilisation of TLS or IPv6.

The second type of applications that can be written by using the socket API are the servers. A server is typically runs forever waiting to process requests coming from remote clients. A server using the connectionless will typically start with the creation of a socket with the socket.socket. This socket can be created above the TCP/IPv4 networking stack (socket.AF_INET) or the TCP/IPv6 networking stack (socket.AF_INET6), but not both by default. If a server is willing to use the two networking stacks, it must create two threads, one to handle the TCP/IPv4 socket and the other to handle the TCP/IPv6 socket. It is unfortunately impossible to define a socket that can receive data from both networking stacks at the same time with the python socket API.

A server using the connectionless service will typically use two methods from the socket API in addition to those that we have already discussed.

socket.bind is used to bind a socket to a port number and optionally an IP address. Most servers will bind their socket to all available interfaces on the servers, but there are some situations where the server may prefer to be bound only to specific IP addresses. For example, a server running on a smartphone might want to be bound to the IP address of the WiFi interface but not on the 3G interface that is more expensive.

socket.recvfrom is used to receive data from the underlying networking stack. This method returns both the sender’s address and the received data.

The code below illustrates a very simple server running above the connectionless transport service that simply prints on the standard output all the received messages. This server uses the TCP/IPv6 networking stack.

import socket, sys

PORT=int(sys.argv[1])
BUFF_LEN=8192

s=socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
s.bind(('',PORT,0,0))
while True:
    data, addr = s.recvfrom( BUFF_LEN ) 
    if data=="STOP" :
        print "Stopping server"
        sys.exit(0)
    print "received from ", addr, " message:", data

A server that uses the reliable byte stream service can also be built above the socket API. Such a server starts by creating a socket that is bound to the port that has been chosen for the server. Then the server calls the socket.listen method. This informs the underlying networking stack of the number of transport connection attempts that can be queued in the underlying networking stack waiting to be accepted and processed by the server. The server typically has a thread waiting on the socket.accept method. This method returns as soon as a connection attempt is received by the underlying stack. It returns a socket that is bound to the established connection and the address of the remote host. With these methods, it is possible to write a very simple web server that always returns a 404 error to all GET requests and a 501 errors to all other requests.

# An extremely simple HTTP server

import socket, sys, time

# Server runs on all IP addresses by default
HOST=''
# 8080 can be used without root priviledges
PORT=8080 
BUFLEN=8192 # buffer size

s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)
try:
    print "Starting HTTP server on port ", PORT
    s.bind((HOST,PORT,0,0))
except socket.error :
    print "Cannot bind to port :",PORT
    sys.exit(-1)

s.listen(10) # maximum 10 queued connections

while True:
    # a real server would be multithreaded and would catch exceptions
    conn, addr = s.accept()
    print "Connection from ", addr
    data=''
    while not '\n' in data :  # wait until first line has been received
        data = data+conn.recv(BUFLEN) 
    if data.startswith('GET'):
        # GET request
        conn.send('HTTP/1.0 404 Not Found\r\n')
        # a real server should serve files
    else:
        # other type of HTTP request
        conn.send('HTTP/1.0 501 Not implemented\r\n')

    now = time.strftime("%a, %d %b %Y %H:%M:%S", time.localtime())
    conn.send('Date: ' + now +'\r\n')
    conn.send('Server: Dummy-HTTP-Server\r\n')
    conn.send('\r\n')
    conn.shutdown(socket.SHUT_RDWR)
    conn.close()

This server is far from a production-quality web server. A real web server would use multiple threads and/or non-blocking IO to process a large number of concurrent requests [3] . Furthermore, it would also need to handle all the errors that could happen while receiving data over a transport connection. These are outside the scope of this section and additional information on more complex networked applications may be found elsewhere. For example, [RG2010] provides an in-depth discussion of the utilisation of the socket API with python while [SFR2004] remains an excellent source of information on the socket API in C.

Footnotes

[1]	Most operating systems today by default prefer to use IPv6 when the DNS returns both an IPv4 and an IPv6 address for a name. See http://ipv6int.net/systems/ for more detailed information.

[2]

Experiments with the client indicate that the number of socket.recv calls can vary at each run. There are various factors that influence the number of such calls that are required to retrieve some information from a server. We’ll discuss some of them after having explained the operation of the underlying transport protocol.

[3]	There are many production quality web servers software available. apache is a very complex but widely used one. thttpd and lighttpd are less complex and their source code is probably easier to understand.

Computer Networking : Principles, Protocols and Practice

Writing simple networked applications

Writing simple networked applications¶