When your packets travel through the Internet, where exactly do they go? How many of your web site's users live in Europe? How will your service be affected by a new trans-pacific cable? To answer these questions you need geographic information. Researchers studying the Internet frequently need to map their observed data, such as IP addresses, to geographic locations. But IP addresses are values in a logical hierarchy; they contain no geographic information. There is no authoritative database for mapping IP addresses to locations, so several sources of network information must be used, and these sources may be conflicting or incomplete. The large size of the typical data set used in Internet research also creates a problem. Manually mapping many thousands of IP addresses to locations is impractical and error-prone; an automated solution is required. In this paper we describe NetGeo, a tool that solves these problems efficiently.
NetGeo is a tool that maps IP addresses, domain names, and Autonomous System (AS) numbers to geographic locations. Ideally, the latitude and longitude corresponding to any Internet address would be available via a DNS LOC record (RFC 1876). Unfortunately, LOC records are not supported by most network administrators. NetGeo uses DNS LOC records if available, otherwise NetGeo uses heuristics to choose among several methods for determining locations. Locations may be found from host names or domain names, or by parsing the addresses found in whois records. The heuristics and the addresses from whois records can occasionally be misleading, but NetGeo is able to obtain acceptable results in most cases.
There are many possible uses for NetGeo. NetGeo could be used to automatically choose the geographically closest mirror site for downloads. ISPs could use geographic information from NetGeo when choosing locations to deploy new infrastructure, governments when setting policies, and businesses when designing regional advertising. NetGeo is currently being used in a graphical traceroute tool, and in studies of connectivity and traffic flow between countries.
NetGeo can be accessed through Java and Perl APIs, and can be used interactively on the web. The NetGeo back-end consists of a database and a collection of Perl scripts used for heuristic analysis and for parsing addresses from whois records. The NetGeo database caches the geographic information parsed from previous queries, to reduce the load on whois servers and to improve performance.
Prior to the development of NetGeo, Internet geographic information was not easily available. We expect there will be many uses for this information once researchers become aware of its availability.