Tor simply repeats requests as an anonymous transparent HTTP proxy, meaning it does not attach typical proxy headers (such as Via
or X-Forwarded-For
), or in any other way modify HTTP requests or responses (besides being "onion routed, encrypted and decrypted" through the Tor network).
As for identifying clients connecting through Tor network, the easiest to detect such clients on the web server end is to query the public TorDNSEL service that publishes Tor exit nodes:
TorDNSEL is an active testing, DNS-based list of Tor exit nodes. Since
Tor supports exit policies, a network service's Tor exit list is a
function of its IP address and port. Unlike with traditional DNSxLs,
services need to provide that information in their queries.
Previous DNSELs scraped Tor's network directory for exit node IP
addresses, but this method fails to list nodes that don't advertise
their exit address in the directory. TorDNSEL actively tests through
these nodes to provide a more accurate list.
This TorDNSEL querying can be automated e.g. in your web application, and example code in many programming languages can be found on the Internet. For example, here is some sample code demonstrating how to do that in PHP.
If you're going to implement this Tor checking in your web application, then I recommend you cache query results locally for some time it's reasonable to expect the exit nodes didn't change in the meantime, not to constantly repeat same queries and add an additional lag to your responses.
Edit to add: One more way to optimize this Tor exit node querying and avoid using TorDNSEL all the time is to do a reverse DNS lookup beforehand, and try and match it against a list of major known Tor exit node hosts. This can be actually quite effective, as a lot of major exit node hosts never change and they can operate a large number of exit nodes all using same or similar rDNS names. For example, you could try matching rDNS names to your list using regular expressions, LIKE
SQL operator, or similar. Some of the known Tor exit node hosts (real examples) will match these names:
tor[0-9].*
tor-exit*
*.torservers.*
*.torland.is
This is the list that I'm using. As you see, it's far from being complete, but it is a start and you can always add more entries as you detect them to follow an easily matched pattern. As it is meant to merely optimize querying, it doesn't really need to be complete, but each match will most certainly speed things up. Hope this helps!