Access to Trace Files

As of 1-April-2001, IRCache has changed its policy for giving access to the sanitized log files. Commercial users of the data are expected to pay for access in order to help support the project. Academic users will be given access at no cost.

All users are required to receive a username and password so that we can enforce this policy.

Research/academic users must answer the following questions when requesting access to the trace files:

  1. Please provide a 1-2 paragraph description of your research.
  2. For how long do you need access to the log files? When do you anticipate completing your work?
  3. If you have a web page that describes your work, please provide a URL.
  4. Do you plan to publish your results? If so, please provide publication information if possible.

Research/academic users that have published papers that use the IRCache log files should notify us and provide citation details. This helps us justify continuation of the project.

Please write to wessels at ircache.net regarding access to trace files.

Terms Of Use

  1. IRCache provides sanitized cache access logs in the hope they may be useful for researchers and developers. If circumstances require, IRCache may cease making data available.
  2. Presently, log files are discarded after seven (7) days. Older log files are not being kept.
  3. IRCache takes special steps to protect the privacy of those participating in our cache mesh. Client IP addresses are randomized from day-to-day, but consistent within a single log file. Query arguments of CGI requests are not logged.
  4. IRCache provides this data as-is, with no guarantees regarding accuracy or applicability to specific requirements.
  5. If you use this data for research or commercial purposes, you must give a credit reference to the National Science Foundation (grants NCR-9616602 and NCR-9521745), and the National Laboratory for Applied Network Research.

Data Format

This documentation is taken from the Squid FAQ. You may want to check the FAQ which may have more up-to-date information than this file.

Each line of the log file contains the following ten (10) fields:

Timestamp

The time when the client socket is closed. The format is "Unix time" (seconds since Jan 1, 1970) with millisecond resolution.

Elapsed Time

The elapsed time of the request, in milliseconds. This is time time between the accept() and close() of the client socket. For persistent HTTP connections, this is the time between reading the first byte of the request, and writing the last byte of the reply.

Client Address

A random IP address identifying the client. The client-to-address mapping stays the same for all requests in a single log file. The mapping is not the same between log files.

Log Tag and HTTP Code

The Log Tag describes how the request was treated locally (hit, miss, etc). All the tags are described in the Squid FAQ. The HTTP status code is the reply code taken from the first line of the HTTP reply header. Non-HTTP requests may have zero reply codes.

Size

The number of bytes written to the client.

Request Method

The HTTP request method.

URL

The requested URL. CGI query arguments (anything following a '?') are not logged.

User Ident

Always '-' for the IRCache logs.

Hierarchy Data and Hostname

A description of how and where the requested and Hostname object was fetched. See hierarchy codes in the Squid FAQ.

Content Type

The Content-type field from the HTTP reply.

Number of Requests in Trace Files

Samples

1020816229.231 516 61.87.2.67 TCP_MISS/304 333 GET http://www.creationent.com/pics/home_icons/new_sidebar.jpg - DIRECT/216.122.237.6 -
1020816267.836 193 61.87.2.67 TCP_MISS/302 644 GET http://home-l3.tiscali.nl/%7Eti017329/honeyz01.jpg - DIRECT/195.241.76.80 text/html
1020816304.598 55 226.90.141.125 TCP_REFRESH_HIT/304 203 GET http://ar.atwola.com/content/B0/0/H7pTL2Luf0_kw3xmlj8W1sns8a9RRNke8_SAqLzKBa609jmULHVa8jgFKtiL69KXCWvLTQ4eKHG6BVFfpwz9J2_nwVlARAAN-pkCJqF1Tww$/aol - DIRECT/152.163.226.185 -
1020816320.249 130 134.202.51.180 TCP_REFRESH_MISS/200 2226 GET http://disney.go.com/globalmedia/pardonourdust/background.gif - DIRECT/63.70.47.83 image/gif
1020816488.105 36 191.212.159.184 TCP_CLIENT_REFRESH_MISS/304 297 GET http://www.dailyjolt.com/images/usericons/usericon_pippi.gif - DIRECT/66.70.39.30 -
1020816531.633 293 134.202.51.180 TCP_MISS/304 261 GET http://www.traveldocs.com/images/nav_over-1x5.gif - DIRECT/63.148.100.225 -
1020816532.277 75 61.87.2.67 TCP_REFRESH_HIT/304 254 GET http://dl.www.juno.com/images/online_registration/arrow.gif - DIRECT/64.136.25.24 -
1020816537.022 239539 134.202.51.180 TCP_MISS/504 1130 GET http://www.hot.ee/avznpwzx/link.html - NONE/- -
1020816610.488 10 134.202.51.180 TCP_MEM_HIT/200 6166 GET http://carnetmadridista.realmadrid.com/rmadridEs/web/img/top/cabecero/logo_madrid.gif - NONE/- image/gif
1020816668.215 6 61.87.2.67 TCP_IMS_HIT/304 267 GET http://www.honda.com/images/1.gif - NONE/- image/gif

Other Sources for HTTP trace/log files