0

This is a follow up to the original question Are URLs viewed during HTTPS transactions to one or more websites from a single IP distinguishable?

D.W. provided an extensive summary on attack vectors on HTTPS/TLS connections.

My question: What are the attack vectors against http/2? Are the attack vectors against HTTP/2 still the same as HTTP or are some of them mitigated by HTTP/2 features? (and: are there new ones).

HTTP/2 allows serving multiple requests over a single connection, so potentially (?) an eavesdropper would no longer be able to extract size and timing information as in https. Furthermore since a single connection is used, the number and size of resources (other than the total) should be obscured too?

Existing vectors from HTTPS:

TLS reveals to an eavesdropper the following information:

  • the site that you are contacting
  • the (possibly approximate) length of the rest of the URL
  • the (possibly approximate) length of the HTML of the page you visited (assuming it is not cached)
  • the (possibly approximate) number of other resources (e.g., images, iframes, CSS stylesheets, etc.) on the page that you visited (assuming they are not cached)
  • the time at which each packet is sent and each connection is initiated. (@nealmcb points out that the eavesdropper learns a lot about timing: the exact time each connection was initiated, the duration of the connection, the time each packet was sent and the time the response was sent, the time for the server to respond to each packet, etc.)

All those reveals seem (?) to rely on the 1:1 relation between HTTP request and TLS connection. So with that 1:1 relation gone it should/might be harder to extract these information (but eventually not impossible)?

stwissel
  • 103
  • 4
  • It's not quite clear what you're asking and the question needs to make sense without referring to another question as pre-requisite reading. Is your question what are potential attack vectors for http2? – iainpb May 30 '17 at 15:09
  • OK. Let me rephrase it then – stwissel May 30 '17 at 15:11
  • Your question is actually fully answered in the question you cite once you understand that HTTPS is HTTP over TLS and HTTPS/2 is HTTP/2 over TLS. All the protection comes from TLS as described in this question and not from HTTP vs. HTTP/2. – Steffen Ullrich May 30 '17 at 16:39
  • @SteffenUllrich My question was (and is): what has changed? HTTP2 has a few properties that might mitigate the existing attack vectors: e.g. multiple requests are send over a single connection. Thus eventually (?) the length and timing information might no longer be available since an attacker wouldn't be able to distinguish if there is one or multiple requests served on one connection – stwissel May 31 '17 at 14:32
  • @stwissel: with the edit it now gets clearer that you ask about using heuristics based on meta data to get the details of the URL even though it is encrypted. – Steffen Ullrich May 31 '17 at 18:55

1 Answers1

1

You question acknowledges that TLS itself protects the real content of the HTTP request (and thus the full URL) and response using encryption. But you also correctly assume that meta data like flow analysis can be used to heuristically reveal the accessed URL in case the attacker can access the same URL to get the same content. Thus the question is only relevant for URL's which are used to access public available (or at least attacker available) content.

HTTP allows in this case flow analysis to narrow down the accessed URL. This means an attacker can scan a site and build a model which includes the size of the request and the size of the response or the inter packet arrival times for each URL. Based on these information the attacker can in many cases heuristically detect which of the (known) URL's on a site the victim is accessing.

With HTTP/2 this gets harder because contrary to HTTP/1.x the requests are not handled sequentially inside the TCP/TLS connection but in parallel. Still I think it is still possible to build a model of the flow pattern which happens when accessing a specific URL. But it might be harder to build such a model since now it is not only needed to get the pattern for a specific URL but instead to get the pattern when a specific URL gets accessed and all the resources gets loaded - and this for various variations of which resources where already cached and which not. Still, I think this is only more work but not a qualitatively harder problem.

Steffen Ullrich
  • 190,458
  • 29
  • 381
  • 434