TL;DR An attacker cannot see anything past the domain.
Structure of a HTTP request
HTTP works by sending two things to a website: the method, and the headers. The most common methods are GET
, POST
, and HEAD
, which retrieves a page, transfers data, or requests only response headers, respectively. TLS encrypts the entirety of HTTP traffic, including the headers and method. In HTTP, the path in the URL is sent along with the header body. Take this example, with wget loading the page foo.example.com/some/page.html
. This text, as ASCII, is sent to the server:
GET /some/page.html HTTP/1.1
User-Agent: Wget/1.19.1 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: foo.example.com
The server will then respond with an HTTP status code, some headers of its own, and optionally some data (such as HTML). An example, giving a 301 redirect and some plain text as a response, may be:
HTTP/1.1 301 Moved Permanently
Date: Wed, 27 Dec 2017 04:42:54 GMT
Server: Apache
Location: https://bar.example.com/new/location.html
Content-Length: 56
Content-Type: text/plain
Thank you Mario, but our princess is in another castle!
Which would tell the client that the correct location is elsewhere.
These are the headers sent directly to the site over TCP. TLS works on a different layer, making all of this encrypted. This includes the page you are accessing with the GET
method. Note that, although the Host
header is also in the header body and thus encrypted, the host can still be obtained through rDNS lookup on the IP address, or by checking SNI, which transmits the domain in plaintext.
Structure of a URL
https://foo.example.com/some/page.html#some-fragment
| proto | domain | path | fragment |
- proto - There are only two protocols in common use, HTTP and HTTPS.
- domain - The domain is
example.com
and *.example.com
, detectable with rDNS or SNI.
- path - The path is completely encrypted and can only be read by the target server.
- fragment - The fragment is visible only to the web browser and is not transmitted.
What an attacker can see
So what can an attacker see if you make a request over HTTPS? Let's take the previous hypothetical request from the perspective of a passive eavesdropper on the network. If I wanted to know what you are accessing, I have only limited options:
- I see you making a web request encrypted with TLS going to
203.0.113.98
.
- I see that the destination port is 443, which I know is used for HTTPS.
- I do an rDNS lookup and see that IP is used for
example.com
and example.org
.
- I look at the SNI record and see you are connecting to
foo.example.com
.
This is all I could do. I would not be able to see the path you are requesting, or even what method you are using, short of heuristic analysis based on the sizes of the data being sent and received, called traffic analysis attacks.
An important note about referers on older browsers
Even though HTTPS encrypts the path you are accessing, if you click a hyperlink within that site which goes to an unencrypted page, the full path may be leaked in the referer
header. This is not the case anymore for many newer browsers, but older or non-compliant browsers may still have this behavior, as will websites which set the HTML5 referer meta tag to always send the information. An example sent unencrypted by a client go from https://example.com/private/details.html
to http://example.org/public/page.html
in such a case would be:
GET /public/page.html
Referer: https://example.com/private/details.html
User-Agent: Wget/1.19.1 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: example.org
As such, navigating from an HTTPS page to an HTTP page may leak the full URL (excluding the fragment) of the previous page, so keep that in mind.