Cooperative Association for Internet Data Analysis

Analysis using Cflowd on a connection between a university and commerce link.


| Overview | Table of Contents | Learn Pages | CoralReef | Readings | Traces | Exercises | Animations | Bonus Animations |

Preview

A Preview of analysis performed on cflowd data collected on an outgoing connection between a large research university and commerce link.


Characteristics of traffic traversing a router between a university and commercial Internet providers (instead of research networks.) General graph features:

This is a "stacking bar chart" of various Internet applications and the traffic they generate. Applications are identified by the port numbers they use. These numbers have been accumulated from both "official" (e.g. IANA, developer specifications) and unofficial sources (e.g. examination of tcpdump data from network games.) This graph shows data taken from a "typical" 24 hour period where time is given in Universal Time (GMT). This particular day was taken from the end of a quarter and is chosen to give the flavor of traffic during and between sessions. Unfortunately, the present cflowd aggregation method does not permit protocol and port information to be accumulated independently. This graph aggregates over all protocols including TCP and UDP.

Specific graph features:

The traffic observed through this outgoing link displays the typical diurnal cycle observed of network traffic. The minimum in traffic occurs in the early morning hours.

The traffic maximum also occurs during the peak of business hours. However, the appearance of the typical diurnal cycle is somewhat misleading. This is traffic leaving the university network; not traffic generated by user within the university. This observation suggests two plausible hypotheses: The first is that a sizable amount of the traffic generated from this site is "local" traffic (at least, local to the time zone.) The second hypothesis is that the variations in traffic through this link should be less pronounced than on links which service local users. While this does appear plausible, further investigation is underway to investigate the validity of this claim.

As is true generally on the Internet at the present time, web traffic (using HTTP and associated protocols) are the largest source of traffic on this link. However, once again, that fact that this is an outgoing link is important to the analysis. Large amount of web traffic leaving the university can only come from outside users accessing the web servers at this university. This suggests what should be pleasing news to the researcher, faculty, and staff of this university: that this university is an Internet destination that outsiders are willing to visit. Certainly a large fraction of this traffic is generated directly from university activities. Even so, it suggests that the web is playing a significant role for information access regarding this campus.

Historically, network news (NNTP) has accounted for a large amount of traffic at universities. This graph does not bear this out, but that is not particularly surprising given that this is once more an outgoing link. Only students, faculty, and staff connecting off-campus would provide NNTP traffic on this link. Since this campus provides extensive dorm facilities, there is not as large a student population off-campus as might be found at a "commuter school." In addition, this campus provides local networking to dorms, while students off-campus have more limited access to high-speed Internet connections (although both aDSL and Cable Modems are now available in this area.)

Perhaps unexpected is the amount of traffic that is not accounted for in the "usual" Internet application suspects. Some possible candidates for new sources of traffic include: IRC (Internet Relay Chat), RealAudio players, and Hotline collaborative environment. As it turns out, none of these sources amount to more than at most a few percent of traffic. The RealAudio result is to be expected since once more this is outbound traffic. The only way this university could generate significant RealAudio traffic would be to start their own Internet broadcasting. The Hotline traffic is more of a surprise. It has been observed exceeding 5% of traffic during off-peak hours. Analysts at Caida have learned that the Hotline architecture has become a favorite system for students seeking to put up servers for video clips. No confirmation is available, but this may explain why this service generates as much traffic as it does.

Even with the three new applications introduced in item #5 above, there is still a great deal of traffic unaccounted for. The top five applications account for over 86% of the total traffic. Yet, the next 15 top applications account for only 10% percent of the remaining traffic. With a cut off 0.10% of traffic, which leaves about 5% of traffic still unaccounted for. Preliminary analysis suggests that this remaining traffic consists of a large number of very small transmissions.

Also unresolved is the unexpected peaks in traffic in the late evening. Since these peak features appear in the known applications as well as in the unresolved category, it is clear that these peaks are not caused by one application. Yet, they do not correspond to any obvious user-driven activity. Further investigation is ongoing.


| Overview | Table of Contents | Learn Pages | CoralReef | Readings | Traces | Exercises | Animations | Bonus Animations |