CAIDA CoralReef Exercises
Timestamps -more than meets the eye
CoralReef is a CoralReef is a comprehensive software suite developed by CAIDA to collect and analyze data from passive Internet traffic monitors, in real time or from trace files. Full details are provided on the http://www.caida.org/tools/measurement/coralreef/. This exercise is one of a set designed to introduce you to CoralReef, providing a `hands-on' experience of analysing network data.
Before you can use CoralReef it must be installed on your Unix or Linux system. A copy of the CoralReef distriubution package is provided on the IEC CD, and these exercies have all been tested using it. Alternatively, the current version is available from the CoralReef web site.
Installing the software involves running its autoconfigure script,
running make depend and make to build the libraries and
applications, then
make install to put the CoralReef material
into appropriate directories. Again, full details of the install
process are given on the web site.
By default, CoralReef is installed in the
/usr/local/Coral
directory. To simplify running CoralReef application programs,
the /usr/local/Coral/bin directory should be included in your
PATH environment variable. Once this is done the applications
can be run by entering their name, followed by the name of the
trace file(s) they are to work on, e.g.
crl_info ODU-962010098.crl.enc
Introductory.
Conceptually a CoralReef trace is a series of records. Each record describes a cell or a packet recorded from a communications link. Each record contains a timestamp that records the time at which this cell was captured by the monitor. This exercise examines these timestamps.
Obtain or locate the following traces:
A large number of ``packet header traces'', containing the first 48 bytes1 of each IP packet are available at http://moat.nlanr.net/Traces/
In addition there are some traces that have been selected specifically for these exercises. Your instructor may have made these available locally. They are also available at:
The traces normally include data from two interfaces, one collecting data from each direction.
The names of these traces consist of a three letter code, a Unix timestamp, and an extension indicating the format of the trace. The three letter code identifies the location of the monitor. For example ODU-947926964.crl.enc refers to the Old Dominion University vBNS link collected at 01:02 on Saturday January 15, 2000.
If you need to convert the Unix timestamp to a date and time try:
perl -e 'print scalar(localtime(time_value));'
or
date -r time_value
Note that these commands will give date and time in your local time zone.
The Waikato University also collects traces using their DAG hardware. These are named differently to the NLANR traces. The DAG traces start with a three letter code identifying the trace, followed by -dag- identifying them as DAG traces, followed by the date and time of the trace, followed by the interface.
For example ACK-dag-19990708-121553-0-160000-161000.crl was collected at The University of Auckland at 12:15 on July 8, 1999 on interface 0 of the monitor.
The coral application crl_time prints timestamp information about each cell. The seven columns of its output are:
Some difficulties arise when working with traces because of their size. Some of these difficulties are related to the time it takes to process the trace. This is especially true if interpreted languages like perl are used (which is the case with some of the CoralReef applications).
A more significant problem is the difficulty of verifying that the data collected is reasonable. It is valuable to develop strategies for making some simple validity checks on the data. In the case of timestamps we know they should always increase in value and - under normal circumstances - should increase at a roughly constant rate, without any sudden changes.
One way to verify this is to plot the timestamp value against the cell number. To do this you need to:
You will notice that there is a discontinuity near the start of the trace. At about cell 1650 the timestamps return to zero and start again. Most of the older NLANR traces have this discontinuity near their start times.
To understand this behaviour you need to know a little about the way the monitor is constructed. Most OCx monitors have two ATM or POS interface cards, one to monitor each direction of the link. The cards are operated by firmware which is downloaded to the card at the beginning of a measurement. Once this firmware is loaded the card starts recording data. It is desirable to have the timestamps as close as possible on the two cards so once both are operational the monitor system resets both cards timestamps at the same time, effectively synchronising them.
Because of this behaviour any analysis that uses timestamps should discard the early part of the trace. The average cell rate you calculated earlier will be wrong.
While the difference between the correct and incorrect calculations is not large in this case it should be clear that similar hidden behaviour in cell traces could have a significant effect on the analysis.
Other traces can show other kinds of unexpected timestamp behaviour. For example, you may care to look at ODU-962010098.crl.enc
The timestamps in the ACK traces are derived from an external real time reference (the global positioning system, GPS, is used to provide an accurate time reference), and the two interface cards are synchronized electronically so that their timestamps are both derived from the same GPS time signals.
There may be a number of zero timestamps at the end of the trace because the raw file format used for CoralReef is based around blocks of or cells but the DAG monitor is not. A partly filled block of cells is padded with zeros by the conversion software.
Researchers are often interesting in the typical time between packets or the distribution of packet inter-arrival times. For example the designers of a high speed switch might want to know the shortest and most common inter-packet times and the percentages of packets at those times so that they can design a switch that can switch at the appropriate number of packets per second.
If you are using gnuplot you might like to experiment with different plot styles including with impulses.
Note: The smallest inter-cell time is not zero.
You will need to zoom in on parts of the graph to discover
the full structure. If you are using gnuplot try
set xrange [min:max] e.g.
set xrange [0:1.0]
and/or setscale .
Because of the external time source, which synchronises both interface cards in a DAG monitor, the cells in the traces collected for the two directions can be combined and, if sorted on timestamp, their cells will be correctly ordered between the two traces. This is not the case for traces collected where the two cards are independent because the separate clocks in the cards will drift slightly over time and cells that are reported in one order might have actually occurred in a different order.
This ordering problem makes some kinds of analysis impossible. For example, with correctly ordered cells retransmission can be detected when data is seen after it has been acknowledged. If the ordering of cells can not be relied on this analysis will not be meaningful.
If two or more traces are listed on the crl_time command line (and most other coral applications) coral will merge the data from the two files as the data is processed so that the cells or packets are processed in order.
The concepts of timestamps and multiple interfaces are simple, but practical considerations make timestamps more difficult to deal with. For example the large size of traces requires care to ensure that the methodology accounts for all the events in the trace, not just the most common events.
1Why 48 bytes?
2 Most monitors that record data are connected passively to the communications channel that they are recording. For example a monitor recording cells from an OC12 ATM link may be connected to the link using a fiber splitter which takes a small percentage of the light energy and redirects it to an ATM interface on the monitor. In this way the monitor can record the cells it sees. Each interface is able to record one stream of data. Most connections are full duplex so two interfaces are needed the record the data for both directions.
3The raw timestamp is reported by the card to CoralReef, and its format depends on the type of card used. It is generally only useful to CoralReef and driver developers.
4The option -i0 tells CoralReef applications to only processes data from interface 0.
5Gnuplot is a simple to use plotting program. Grace (also known as xmgr) is an alternative which some people prefer.
Gnuplot plots data in a file. In the simplest case the data should be lines with pairs of xvalue yvalue separated by a space or a comer. If there is only one value per line it is assumed that the x values start at 0 and increase by 1 each line.
To plot data start gnuplot and use a command like:
plot "filename" with lines
Don't forget the quotes around the
file name. If you prefer you can specify with points or
with linespoints . Extra lines can be added by including more
filenames (with or without a with clause) separated by a
coma. For example:
plot "foo", "bar" with linespoints
To print a gnuplot graph set the output type to the type of printer with the set printer command and send the output to a file with the set output command. For example:
set terminal postscript color set output "ts.ps"
There is a lot more gnuplot can do for you if you are adventurous.
Try the online help, e.g. help plot