36

Let’s say I have never connected to the site example.com.

If this site is https and I write https://example.com/supersecretpage will the URL be sent in clear text since it's the first time I connect to the site and therefore the crypto keys were not yet exchanged? If not when does this take place? Could anyone explain the steps when I type that URL?

unor
  • 1,769
  • 1
  • 19
  • 39
user104545
  • 385
  • 1
  • 3
  • 4
  • 1
    Worth mentioning that if you're trying to hide a URI or a GET variable, and especially for the sake of security, you're doing security wrong. Just saying. –  Mar 16 '16 at 03:54
  • 2
    @TechnikEmpire: that depends upon whether you are trying to prevemt others from using the url, or from knowing that **you** used it. Obviously the recipient can know that you used it, but you may not want eavesdroppers to know as well. – jmoreno Mar 16 '16 at 03:58
  • @jmoreno True, makes sense. –  Mar 16 '16 at 04:08
  • 2
    @TildalWave: sure they might discover it another way, guess everything should be in clear-text. – jmoreno Mar 16 '16 at 14:23
  • 3
    Closer (more specific) dupe of http://security.stackexchange.com/questions/7705/does-ssl-tls-https-hide-the-urls-being-accessed and http://security.stackexchange.com/questions/34794/if-ssl-encrypts-urls-then-how-are-https-messages-routed (which rugk linked) – dave_thompson_085 Mar 17 '16 at 09:46

3 Answers3

78

Short answer: No, the URL is encrypted, but the (sub)domain is sent in plain-text. In your case a (passive) attacker knows that you are connecting to example.com, but it does not know which specific page you are accessing.

In short there are three times where an attacker can get information about the site you are accessing (ordered chronological):

  1. (Sub)domain in the DNS query
  2. (Sub)domain in the Client Hello (SNI)
  3. (Sub)domain in server certificate

However the...

  1. URL is sent encrypted via TLS

For more details read the explanation below.

Explanation

Note: When I am writing "(sub)domain" I mean both the domain (example.com) and the subdomain (mydomain.example.com). When I only write "domain" I really only mean the domain (example.com) without the subdomain.

Basically what happens is:

  1. You type in https://example.com/supersecretpage.
  2. The browser submits the (sub)domain in a special TLS extension (called SNI)
  3. At some point the browser gets the SSL certificate from the server.
    Depending on the certificate* this also includes the subdomain you're connecting to, but it may also include more than one subdomain and even more than one domain. So in fact the cert of example.com may also include www.example.com, devserver.example.com and how-to-develop-tls.example. These entries are called Subject Alternative Names and the cert is valid for all of them.
  4. After the client verified the certificate and the client & server choose a cipher the traffic is encrypted.
  5. After all this happened a "usual" HTTP request is send (over the secured TLS channel). This means this is the first time in the whole request where the full URL appears. The request e.g. looks like this:

    GET /supersecretpage HTTP/1.1
    Host: example.com
    [...]

* If the certificate is a wildcard certificate it does not include the subdomains, but just *.example.com.

Another thing is worth mentioning: Before the connection can be established at all the client needs to resolve the DNS name. To do this it sends unencrypted DNS queries to a DNS server and these ones also contain the (sub)domain, which can therefore be used by an attacker to see the visited domains.


However this does not always have to happen in this way, because you assume the user manually types in https://example.com/supersecretpage into the URL bar. But this is very rare as most users e.g. would rather type in example.com/supersecretpage. Another issue would be visitors clicking on an insecure link, which uses HTTP - in contrast to a secure HTTPS-link). Such links could e.g. be old links created when the site did not support HTTPS or did not redirect to HTTPS by default. You ask why this matters?

In the such a (usual) case there is no https:// in the URL. When no protocol is entered into the URL bar all browsers internally "convert" this URL to http://example.com/supersecretpage (note the http:// there) as they cannot expect the server to support HTTPS. This means the browser first tries to connect to the website using insecure HTTP and only after the website sends a (301) redirect to the HTTPS URL it uses the secure mode.
In this case the attacker can see the full URL in the unencrypted HTTP request.

You can easily test this by yourself by looking into the "network panel" of your browser. There you should see this "HTTPS upgrade":
HTTPS connection upgrade in Firefox

However note that there are techniques to prevent this insecure first HTTP request. Most notably HSTS and - to protect even the first connection you make to the site - HSTS preloading, but also HTTPS Everywhere helps against such attacks. FYI: According to Netcraft 95% of all HTTPS servers vulnerable were vulnerable to such (SSL Stripping) attacks as of March 2016.

rugk
  • 1,257
  • 1
  • 13
  • 26
  • 3
    It is also worth noticing that if you are using a proxy script, the URL will be passed to the proxy script which could leak information about the URL through DNS lookups. If your browser is using proxy auto detection the proxy script could be provided by the network you are connected to. – kasperd Mar 16 '16 at 10:18
  • 1
    One nitpick: the way you worded the last section kind of implies that links _always_ use HTTP. – David Z Mar 16 '16 at 17:46
  • http://www.securityweek.com/hackers-can-intercept-https-urls-proxy-attacks – Tom Jul 29 '16 at 14:40
15

No, the URL will not be sent in clear text.

Immediately after the TCP three-way handshake completes, your client initiates TLS negotiation with the server. Only after that negotiation is complete and encryption is in place is the HTTP request sent.

gowenfawr
  • 72,355
  • 17
  • 162
  • 199
2

Since you have never visited the site www.example.com, your browser needs to resolve the domain name to IP address. Hence, it sends out a DNS resolution query which contains the domain name of the site which you are trying to connect to. So a MITM can find out which domain you are trying to connect to unless you use secure DNS.

  • The DNS query however does only reveal the domain & subdomain, but not the whole URL. And for the DNS query it also does not matter whether you've visited the site before - it is always sent again unless it was cached. – rugk Mar 15 '16 at 21:53
  • Isn't that an integral part of his question whether the domain or sub-domain is revealed or not? He mentions that he has never connected to the site before so I believe that is of more concern to him that than the relative path. I don't really mind being downvoted but the upvoted answers do not clarify the security of the DNS query resolution. –  Mar 15 '16 at 21:59
  • At first: I did not downvote you. :) And I e.g. also included the DNS query part in my answer. The issue you're missing is the OP asks whether "**the URL** be sent in clear text". The URL is more than just the (sub)domain. ;) – rugk Mar 15 '16 at 22:01
  • I agree that the URL is more than just the domain/sub-domain but it also includes them. Saying that the URL will not be sent in plaintext is wrong because it is possible for the domain/sub-domain to be sent in clear. –  Mar 15 '16 at 22:06
  • 5
    Question title includes "**first connection**", question body includes "**the first time**", and this answer says "**since you never visited**". Altogether this gives a very strong and **false** impression that at subsequent visits the situation is different. – techraf Mar 16 '16 at 00:17
  • Are you serious? The question clearly says the user has never visited the website before which means he wants to know what happens on the first visit to website www.example.com. That includes DNS resolution and that means the correct answer should include securing the domain name as well and I am surprised that it doesn't. –  Mar 16 '16 at 00:24
  • 3
    I have given my reasons why this answer is badly phrased and misleading. What made you think I was not serious? – techraf Mar 16 '16 at 00:33
  • Did you bother reading my answer? I have mentioned by using "secure dns" the user can prevent the domain name from being sent in cleartext which makes me wonder if you are not really serious about reading answers. –  Mar 16 '16 at 00:44
  • 2
    @Rahul The point is, this doesn't answer the real question. –  Mar 16 '16 at 03:51