On the Use And Abuse of Referrer Information

July 11, 2003
by keli

Nightscape over Copenhagen I like browsing the stats for my website. I even take an occasional look at the raw access log.

What I enjoy the most is the list of referring sites, but especially after migrating my site to a weblog system, I've begun seeing a lot of web clients, that abuse the Referer HTTP header to include URLs of pages not linking to my site.

OK web client programmers, here's the relevant section about the Referer header taken from RFC-2616:


The Referer[sic] request-header field allows the client to specify, for the server's benefit, the address (URI) of the resource from which the Request-URI was obtained (the "referrer", although the header field is misspelled.)

Note that it does not say "just put any old URL promoting your software..." It also explicitly forbids including the Referer in requests where the URI was not obtained from a place with an URI:


The Referer field MUST NOT be sent if the Request-URI was obtained from a source that does not have its own URI, such as input from the user keyboard.

I interpret that to include URLs gotten from configuration files and bookmarks.

Now for the hall of shame. Here's a list of fake referrals I found in my access log:

  • http://www.technorati.com/
    Technorati is a sort of a search engine for weblogs, and their webbot downloads my front page and my RDF feed with their front page as the referrer. (And it looks as the bot even fakes the User-Agent to a Galeon/1.2.6 on a Debian Linux when fetching the RDF feed... I can't imagine why that's necessary.)
  • http://ranchero.com/software/netnewswire/
    This is the what a program called NetNewsWire Lite puts into the Referer header. NetNewsWire Lite is an otherwise very fine RSS browser for MacOS X. Only they could refrain from advertising the client in other than the User-Agent header.

    This is especially irritating as the client fetches the RSS feed quite regularly, and it quickly fills up a spot on my top 10 referrers list with a bogus URL.

  • http://blo.gs/ping.php
    People that ping blo.gs must be familiar with this one.... and no, http://blo.gs/ping.php does not refer to my site... It's a utility to tell blo.gs that a site has been updated... Put it into the User-Agent field!!!

  • http://www.blogosphere.us/about.php
    Blogosphere.us is another weblog search engine... Again, please play by the rules advertise yourself through the User-Agent header.

Please don't think I'm totally obsessed with standards compliance. But I do think that developers should strive to achieve them whenever possible within reason, and that when they knowingly abuse them is almost unforgivable.

... or maybe I'm just bothered because almost no one links to me, allowing these bogus links to get into my top-ten list. Please post your comments below.