CAIDA CoralReef Exercises

CAIDA CoralReef Exercises
Protocol Stacks and Encapsulation




CoralReef is a CoralReef is a comprehensive software suite developed by CAIDA to collect and analyze data from passive Internet traffic monitors, in real time or from trace files. Full details are provided on the http://www.caida.org/tools/measurement/coralreef/. This exercise is one of a set designed to introduce you to CoralReef, providing a `hands-on' experience of analysing network data.

Before you can use CoralReef it must be installed on your Unix or Linux system. A copy of the CoralReef distriubution package is provided on the IEC CD, and these exercies have all been tested using it. Alternatively, the current version is available from the CoralReef web site.

Installing the software involves running its autoconfigure script, running make depend and make to build the libraries and applications, then
make install to put the CoralReef material into appropriate directories. Again, full details of the install process are given on the web site.

By default, CoralReef is installed in the
/usr/local/Coral directory. To simplify running CoralReef application programs, the /usr/local/Coral/bin directory should be included in your PATH environment variable. Once this is done the applications can be run by entering their name, followed by the name of the trace file(s) they are to work on, e.g. crl_info ODU-962010098.crl.enc

Level

Introductory to middle level.

Prerequisites

To get the most from this exercise a student should have:

System Resources

  1. CoralReef or WWW access

Background

This section contains a brief review of the notions of a protocol stack and encapsulation and how they relate to one another. Students who are familiar with these concepts, including a typical IP over ATM stack using LLC/SNAP, may safely skip this section.

Protocol Stacks

Like many computer systems, computer communications and networking are complex. To manage this complexity, the task of communicating is broken down into a number of hierarchal layers, where each layer builds on top of the underlying layer. Layering can also occur when one protocol is carried by another because of management requirements, or where there is a need to mix older and new protocols. It can also be used where only one protocol is available between two parts of the network but another protocol is desired.

Encapsulation

Figure

Figure 1: Encapsulation of Internet protocols over ATM

Conceptually we think of protocols as stacked on top of one another as shown in figure . In practice the PDU1 each layer is carried in the payload field of the layer below it as shown in figure 1. The overall effect is that when a packet is transmitted is contains a list of headers starting from the lowest layer header followed by those for higher layers. This is followed by the application layer data and by any trailers, starting from the highest layer trailer and finishing with the lowest layer trailer.

The IP over ATM Stack

Figure

Figure 2: Internet over ATM Stack

In the case of the Internet protocols being carried across an ATM link (which is common for high speed WAN links) the protocol stack is shown in figure 2.

The lowest layer, Synchronous Optical NETwork, or SONET, is responsible for carrying bits over an optical fiber. SONET was originally designed to carry telephone traffic. In addition to defining the physical requirements for transmission of bits SONET is arranged so that several slower rate connections can be multiplexed into one higher rate connection. The most common SONET multiplexing levels to be used for data are OC3 (155Mbps), OC12 (622Mbps) and OC48 (2.4Gbps).

Figure

Figure 3: The ATM Cell

ATM uses SONET to carry 53-byte ATM cells. These cells are carried in the SONET payload. Small cells are used because they are easy to switch in hardware and allow traffic from different types (data, voice and video) to be mixed without unduly affecting one another. Although ATM is the most widely deployed high speed technology there is a trend in high speed data networks, which do not share their links between IP and other traffic types, to sidestep the ATM layer and carry IP packets directly in the SONET payload. This is known as Packet over SONET or PoS. In these exercises all the traces are IP over ATM. The format of ATM cells is shown in figure 3. The UNI (User Network Interface) format is used between an ATM network and an external device where as the NNI (Network Network Interface) is used with and ATM network, between ATM switches. Before data can be carried over an ATM link an ATM connection must be established.

Figure

Figure 4: ATM Adaptation Layer 5 (AAL5) Format

The 48 bytes of payload available in an ATM cell is insufficient for most applications. ATM uses Adaptation Layers to take packets from a higher level protocol and carry then in ATM cells. In almost all cases AAL5 is used to carry Internet traffic. AAL5 is very simple. It has no header but has a trailer with a CRC to provide error detection on the content of the packet (remember that the ATM cell has no check on the payload part of the cell). The format of the AAL5 frame is shown in figure 4.

Figure

Figure 5: Logical Link Control and Subnetwork Attachment Point header formats

At the time the connection is established the type and format of data that will be transfered, including the adaptation layer, is agreed on. Each connection is uniquely identified by its virtual path and virtual channel number.

Within a single virtual channel it may be useful carry packets from different higher level protocols, e.g. IP and ATMARP. If this is done, some way of telling which protocol a particular packet belongs to is needed, so that the packet can be passed to the correct software of the next-higher layer. A widely used protocol2 for this is the IEEE 802.2 LLC/SNAP. LLC/SNAP is used on many LANs, including the IEEE-standardized version of Ethernet (802.3).

The Link Layer Control (LLC) / SubNetwork Attachment Point (SNAP) is a simple protocol that contains a 3-byte field that identifies the higher layer. LLC/SNAP contains two sublayers, LLC and SNAP. The (older) LLC sub-layer is based on the widely used HDLC protocol and has a 3-byte header with a single byte to identify the upper layer (see figure 5). A single byte is not always enough, and some special combinations of the 3 LLC bytes have been defined to indicate that there is a second sub-layer with a larger header. The value 0xAA AA 03 indicates that this next sub-layer is the SNAP protocol. The SNAP protocol has a 5 byte header consisting of a 3 byte organisation identifier3 and a 2 byte protocol identifier. The organisational identifier allows a number of different sets of protocol identifiers to be defined and managed by different organisations. The value 0x00 00 00 indicates that the protocol identifier contains an Ethernet protocol number, e.g. 0x08 00 for IP. So the complete value for LLC carrying SNAP indicating the EtherType for IP is AA AA 03 00 00 00 08 00. The format of the LLC and SNAP headers are shown in figure 5.

Figure

Figure 6: The IP Header

Figure

Figure 7: The TCP Header

The standard IP and TCP or UDP protocol can then be carried inside the SNAP protocol. A detailed description of these protocols is beyond this tutorial. The formats for IP and TCP are shown in figures 6 and 7 respectively.

Preparation

Obtain the following trace:

A large number of ``packet header traces'', containing the first 48 bytes4 of each IP packet are available at http://moat.nlanr.net/Traces/

In addition there are some traces that have been selected specifically for these exercises. Your instructor may have made these available locally. They are also available at:

The traces normally include data from two interfaces, one collecting data from each direction.

The names of these traces consist of a three letter code, a Unix timestamp, and an extension indicating the format of the trace. The three letter code identifies the location of the monitor. For example ODU-947926964.crl.enc refers to the Old Dominion University vBNS link collected at 01:02 on Saturday January 15, 2000.

If you need to convert the Unix timestamp to a date and time try:
       perl -e 'print scalar(localtime(time_value));'
or
       date -r time_value

Note that these commands will give date and time in your local time zone.

The Waikato University also collects traces using their DAG hardware. These are named differently to the NLANR traces. The DAG traces start with a three letter code identifying the trace, followed by -dag- identifying them as DAG traces, followed by the date and time of the trace, followed by the interface.

For example ACK-dag-19990708-121553-0-160000-161000.crl was collected at The University of Auckland at 12:15 on July 8, 1999 on interface 0 of the monitor.

Dumping ATM Cell Contents

cell 0, interface 0
        c8 03 10 23  99 04 00 00  30 03 00 00  aa aa 03 00  |...#....0.......|
        00 00 08 00  45 00 00 28  f1 cb 40 00  76 06 65 e5  |....E..(..@.v.e.|
        00 01 00 01  00 02 00 01  05 dd 00 50  00 5b fc ab  |...........P.[..|
        0c da f5 16  50 10 21 80  dc 10 00 00               |....P.!.....|

Figure 8: crl_print -r output

The coral application crl_print prints each cell of the trace in a raw format. When crl_print is run with the -r option no interpretation of the cell content is done.

An example of the way a cell is formatted is shown in figure 8. The left side of each line of the output lists the cell content5 in hexadecimal. Each pair of digits represents a single byte in the cell. The right side of the line shows the same bytes of the cell in ASCII form. Between the two vertical lines (|) there are 16 characters, one for each byte shown on the left. If the byte corresponds to an ASCII character that has a printable representation, such as a letter, digit or punctuation mark, that character is printed. If the byte corresponds to a non-printable (control) ASCII character a . is printed instead. bBecause we will be looking at cells that contain control information the ASCII form will not be useful.

The exact format of the data presented by crl_print is based on the Coral trace file format and requires some explanation. The first 8 bytes of the output are the 8 byte timestamp that is recorded by the monitor when is captures the cell. This is followed by the ATM cell. An ATM cell has a 5 byte header and 48 bytes of data. The last byte of the header (the 5th byte of the cell) is a checksum that checks for, and in some cases corrects, errors in the rest of header. Because only cells with correct headers are recorded by the monitor this field is redundant in recorded traces and is omitted.

In summary the cell dump consists of 3 parts: an 8 byte timestamp, the first 4 bytes of the ATM cell header and the 48 bytes of cell content. Check that you can identify the first and last byte of each component in figure 8.6 The bytes that were transmitted in the SONET frame would have been those shown in the dump as bytes 9, 10, 11 and 12, followed by the ATM Header Error Checksum (which is not included in the trace), then bytes 13 to 60 from the dump.

cell 0, interface 0 
        time: 3.085793920 
        gfc: 3 (0x0003) 
        vp:vc: 0:12288 (0x00:0x3000) 
        pti: 0 0 0 (user data, no congestion, not last cell)
        payload: 
        aa aa 03 00 00 00 08 00 45 00 00 28 f1 cb 40 00   |........E..(..@.| 
        76 06 65 e5 00 01 00 01 00 02 00 01 05 dd 00 50   |v.e............P| 
        00 5b fc ab 0c da f5 16 50 10 21 80 dc 10 00 00   |.[......P.!.....|

Figure 9: crl_print output

Figure 9 shows the output of crl_print (without the -r option). The output is similar to that with the -r option except that the timestamp and ATM header are interpreted rather than printed as raw data. Timestamps are normally relative to the start of the trace but may also be relative to real time.

Exercises

Manual Decode

  1. Look at cell number 10 in the ODU-962010098.crl.enc trace with
            crl_print -r.
    Break the cell down into PDUs for the different protocols. Expand the header and or trailer of each layer into their component fields. Label each field and, as far as you are able, describe the meaning of the value given. You may need to employ knowledge you have learned outside this tutorial to do so.

    Check the ATM part of your answer with crl_print (no -r option).

Manual Encode

  1. Artificially construct the output of crl_print -r for a cell with the following parameters:

    Notes:

    1. There is an IPv4 header error checksum calculator at:
      http://byerley.cs.waikato.ac.nz/~tonym/hec.html
    2. You may insert all zeroes for the TCP checksum.
    3. If you need to know the well know number for ports and protocols look in /etc/services on a Unix host or .
    4. Details of the IP and TCP headers can be found in most technically oriented data communications text books or in RFC791 and RFC793 respectively. RFCs may be found at:http://www.faqs.org/rfcs/

Selection

  1. Write a small program to search the output of crl_print and count the number of UDP DNS packets in the ODU-962010098.crl.enc trace.

    Notes:

    1. Look up the IP protocol number for UDP and the UDP port number for DNS (Domain Name Server) in .
    2. Don't forget the check that the cells are carrying UDP/IP (in LLC/SNAP).

Conclusion

By looking at the raw output of crl_print and by manually coding and decoding packets you have seen how packets are carried inside other packets and the way this appears at the lower layers of the protocol stack.

If you needed to do these operations in practice (rather than for the purpose of learning about protocol stacks) other prewritten CoralReef applications may be helpful. Alternatively you could write your own application using the CoralReef C or perl library. Other exercises in this series demonstrate these tools.


Footnotes:

1Protocol Data Unit (PDU) is a generic term for packets, cells, frames etc.

2Although LLC/SNAP is described as a protocol with two sub-layers it is (like AAL5) very simple, consisting only of a simple header with no special procedures for its use.

3We have simplified the terminology a little here rather than using the standard names for the fields.

4Why 48 bytes?

5As we shall see later the output of crl_print also includes the timestamp for the cell, which is added by the measurement card.

6They are +c8 00 30 00 aa 00+.

7You need to be aware that integers are stored in network byte order, that is most significant byte first.

8This means that the flags field will be 0


File translated from TEX by TTH, version 2.92.
On 21 Nov 2001, 14:00.