58

While downloading a file via a torrent, what will happen if some of the peers send me fake chunks?

Also, can any of the peers send me a whole fake file? For example, if I download a .torrent file which should download a file with hash sum A, and a peer sends me a file with hash sum B, will the torrent client notice that and block it?

psmears
  • 900
  • 7
  • 9
  • 1
    The idea that this is possible was a part of Ross Ulbricht's defence in the silk road trial. They argued that material was planted on his computer while he was torrenting the Colbert Report - https://motherboard.vice.com/en_us/article/78xdva/defense-in-silk-road-trial-lays-out-its-full-alternative-perpetrator-theory – JMK Aug 14 '17 at 12:04
  • 1
    I think it's worth noting that there were organizations that were believed to be polluting trackers by intentionally sharing bad chunks. The idea wasn't to get you to download a malicious file, but simply to make torrenting unpleasant and slow. – Jesse K Aug 14 '17 at 17:20
  • 8
    Most practical attacks will aim at the .torrent file in the first place. It's so much easier to make the user download a torrent distrubuting malware and such, than corrupting a clean torrent by finding SHA1 collisions. – Dmitry Grigoryev Aug 15 '17 at 12:59

3 Answers3

70

Yes, a client will notice and block.

A torrent is divided into pieces, a piece is divided into chunks.
Every piece has its SHA1 hash included in the .torrents metadata.

If a peer send fake or corrupted chunks this will be detected when the whole piece has been received and the hash check fails.
A peer that repeatedly sends bad data will be blocked, but there is some leeway because corruption may naturally occur sometimes in a data transfer.
A good client has heuristics to find out exactly what peer(s) sent bad data by comparing the chunks sent in the piece that failed the hash check, with the chunks in the same piece when it has been re-downloaded and passed the hash check.

Encombe
  • 599
  • 1
  • 6
  • 9
  • 7
    A torrent is built on TCP (or uTP), right? Why would corruption naturally occur in a data transfer? – Oddthinking Aug 14 '17 at 05:12
  • 6
    May want to note that SHA-1 is kinda broken and explain collision vs second preimage etc. – Kevin Aug 14 '17 at 05:17
  • 2
    @Oddthinking TCP has poor checksumming. – user253751 Aug 14 '17 at 06:02
  • 11
    @Oddthinking I simplify a bit in my answer to keep it short. Things that can cause corruption; TCP/UDP weak checksums, client bugs, bad disk, bad memory and bad drivers. – Encombe Aug 14 '17 at 07:14
  • 4
    Unless the SHA1 is broken: https://shattered.io/ – Jakuje Aug 14 '17 at 07:25
  • @Oddthinking Furthermore, TCP has only two, 16 bit packet serial number, thus it is not hopeless to predict and fake it. – peterh Aug 14 '17 at 07:42
  • 1
    Okay. I would find it surprising if a client to gave leeway against immediately blocking because it thought the client or the source might have a bad disk. bad memory or bad drivers, or the client might have a bug, or the TCP checksum happened to pass corrupt data or SHA1 was cracked and the source was being attacked. – Oddthinking Aug 14 '17 at 07:51
  • @Oddthinking Your first comment asked good a question. You can read a bit how uTorrent do here: http://www.netcheif.com/Articles/uTorrent/html/AppendixA_02_12.html **bt.ban_*** – Encombe Aug 14 '17 at 08:00
  • 3
    Regarding SHA1 being broken: The shattered.io project had over 100 GPUs calculate for a year to generate their SHA-1 collision. That makes injecting fake data into a torrent download infeasible, unless one has access to a huge amount of processing power or it is a very long-living torrent. – Philipp Aug 14 '17 at 10:35
  • 1
    @Encombe There is some really valuable information in some of the comments here. You might want to edit it into your answer. – Philipp Aug 14 '17 at 10:39
  • While a SHA1 collision is unlikely it is very much possible. So this answer is - in the context of a security request - simply wrong. – TwoThe Aug 14 '17 at 10:51
  • 25
    @TwoThe: The attacker needs more than a SHA1 collision, he needs a second pre-image to attack a specific torrent. SHA-1 is still quite strong against that (although with so many chunks to choose from its almost like a collision scenario). However, protocols should switch to a better hash function as soon as practical, and no new work should use SHA1. – President James K. Polk Aug 14 '17 at 14:36
  • Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/63915/discussion-on-answer-by-encombe-is-receiving-fake-torrent-data-possible). – Rory Alsop Aug 16 '17 at 10:56
12

Adding onto Encombe's answer as to the likely-hood of a fake, "forgery" happening: It's overwhelming unlikely that such a situation could or would occur, though it is possible.

SHA-1 is an older method for file hashing and isn't recommended for new use, but it's arguably fine as it is now.

Scenario 1: A mistake occurs: If another user sends you a bad piece or chunk (perhaps a one-in-a-million error occurs and the data is corrupted in transit), the hashes will not match and the chunk will be rejected. It's almost impossibly unlikely that a chunk would just happen to have the same hash after being corrupted.

Scenario 2: A malicious user attempts to send you a fake file: Assuming that our attacker(s) knows what they're doing, it would still be very unlikely. Imagine the difficulty of even computing a fake file that has the same SHA1 hash - and the hashes of your fake file's chunks would need to match the hashes of the original file's chunks. I'd argue the amount of time it'd take to author such a file would render it almost impossible, even if you used a weaker hashing method such as MD5 out of the sheer difficulty of making so many conflicting and overlapping hashes.

You're much more likely to be at risk of someone putting up a "trap" torrent online that's designed to contain a virus or malicious content, but that would have been created by attackers and seeded by them as well (plus anyone who fell for the trap).

Having SHA-256 or greater hashes would be great, but the odds of such an attack ever succeeding are astronomically low.

Jeremy Kato
  • 229
  • 2
  • 7
  • "and having chunks that also match the original chunks' hashes." Why the and? If one can send fake chunks undetectably, that might be enough to perform some sort of DOS attack. – NPSF3000 Aug 15 '17 at 17:40
  • 1
    1) DOS attacks are unrelated - that's the difference between someone sending you a forged letter in the mail and sending you 100,000 letters in the mail. 2) The reason it's so hard is because generating even a single SHA-1 collision takes a ton of time. The .torrent describes how large the file is, what the SHA-1 hash is, and other metadata. Imagine, in order to create a colliding file, you'd need to create one of the exact same length and have the same SHA-1 hash as the original. Now imagine that the forged file's chunks now each have to match the original's? It's basically impossible. – Jeremy Kato Aug 15 '17 at 18:50
  • 1
    That's not even to mention that such an engineered file, at best, would probably be harmless junk data. Creating a file that collides with the original, meets the other matching requirements, and still has a virus in it would be absurdly hard. Attackers could never dream up something so complicated actually working. – Jeremy Kato Aug 15 '17 at 18:51
  • I suggest you read what I said again. – NPSF3000 Aug 15 '17 at 18:57
  • Apologies, did you mean the wording on the sentence that you quoted? I've edited it to make more sense, sorry if it sounded funny. – Jeremy Kato Aug 15 '17 at 19:05
  • *if* a malicious actor could in fact substitute one chunk for another, than a malicious actor could use that to hinder the spread of the torrent. There is no need for that chunk to be valid data (let alone a virus) nor for the checksum of the file to be correct - that's simply overcomplicating the picture. – NPSF3000 Aug 15 '17 at 19:46
  • 1
    That's true, but after sending too many incorrect chunks the other peers will block you. And this would only slow down the torrent's spread; it wouldn't allow you to spread a completely different file. – Jeremy Kato Aug 15 '17 at 20:00
  • How will they know it's an incorrect chunk if the hash matches? – NPSF3000 Aug 15 '17 at 20:57
  • 1
    Because when you put the file back together, the entire file's hash will mismatch and your torrent program will realize it needs to manually verify the contents. And again, all any of this does is slow down some people's torrents, so it's not only unlikely that someone would have the computational ability to do this, but it's unlikely because they would have essentially no motivation to do this. – Jeremy Kato Aug 16 '17 at 13:21
  • again you confuse with 'is this a valid attack' and 'is this practical'. Unfortunately, you lack the information and context to answer the second question (as we all do). It's worth noting though it's becoming increasingly viable should someone decide its in their advantage: https://arstechnica.com/information-technology/2017/02/at-deaths-door-for-years-widely-used-sha1-function-is-now-dead/ – NPSF3000 Aug 16 '17 at 15:04
  • 1
    You want to talk practical? Your article claims the cost to make a single collision happen is $110,000. The attack took over 9 quintillion SHA-1 hashes of the file. You've made all these points to show that you can spend hundreds of thousands or millions of dollars to inconvenience one chunk of someone's torrent. It's theoretically possible but right now you'd be better off pointing a microwave at someone's computer and hoping you cause corrupting interference. – Jeremy Kato Aug 16 '17 at 16:20
  • "You want to talk practical?" No, I explicitly state I feel that's a pointless discussion. "to inconvenience one chunk of someone's torrent" You're only limited by your imagination. – NPSF3000 Aug 16 '17 at 16:48
  • 1
    To quote the best answers in the Security SE: security is meaningless without a threat model. If there's a threat, it's still not apparent what it would be beyond annoyance. ....And you said we don't have the information to answer whether it's practical or not - I pointed out we very much do. It's pretty clear, in fact, if you look at the evidence that you provided. Theoretical security doesn't necessarily mean a lot or else the whole world would communicate on one-time pads. – Jeremy Kato Aug 16 '17 at 16:53
  • " pointed out we very much do. It's pretty clear, in fact, if you look at the evidence that you provided." So you're aware that $100,000 is basically chump change (and drastically decreasing year on year)? You're also aware that this doesn't limited to single download, but potentially could be used to poison any download for a torrent (or similar)? – NPSF3000 Aug 16 '17 at 17:15
  • Cont. Furthermore you realize that torrents (and other downloads mechanisms that use SHA1 as a hash) are used in legitimate software, where even a single day of downtime could potentially cause a lot of damage? Imagine if you could take an OS update (esp. a security fix for say randomware) or Battle.net day one launch offline? Might that me worth $100,000? What about $10,000 as costs come down? $1,000? How about compromising facebooks internal update system? – NPSF3000 Aug 16 '17 at 17:16
2

Torrents rely heavily on the SHA-1 hash function: torrents are split into equally sized pieces and each piece's SHA-1 hash is kept in the info section of the torrent, which itself is identified and hence protected by its SHA-1 hash.

SHA-1 is showing signs of weakening. For example, early this year, an attack was published that allows a large organisation like a state actor or corporation, or a wealthy individual, to produce two different blocks of data with the same SHA-1 hash. This is not quite the same thing as finding a collision for an existing block of data though, and mainly for this reason the attack could only be practically used against the hashes of the pieces for now and even then it wouldn't be possible to take over an third-party torrent, you'd have to produce your own. For now. It could be that a full collision attack isn't far off. Every year the chance that someone secretly has a full attack of SHA-1 increases.

To leverage the current attack, a torrent could be produced that serves benevolent content to most people, getting flagged as ‘trusted’ on torrent indexing sites, but a targeted individual might be tricked into downloading a malicious piece of data, like a virus.

The authors of the attack also published a way to harden SHA-1 though. The idea is to detect likely SHA-1 collisions and modify them in such a way that the different blocks of data get a different hash again. Technically this hardened SHA-1 is different from SHA-1, but since as far as we know no full attack of SHA-1 exists yet, for now non-malicious files will in all likelihood have the same hardened and regular SHA-1 hash so software relying on SHA-1 will keep working and only malicious files will have different hashes. But this fix will only help if software uses it and so far the most popular software library for torrenting hasn't fixed its implementation.

Currently, work is being done on a second version of the BitTorrent specification, which will use the more secure SHA-256 hash function. This version is currently in its drafting stage though and has been for a decade. Until it's done and version 1 gets deprecated, this doesn't help anyone.

It's hard to overemphasize how weak SHA-1 is getting. And the published attacks against SHA-1 don't tell the whole story. Despite years of intensive study of SHA-1's predecessor, an attack against it completely unknown to the public did surface in the wild. So the answer is probably YES.

Anonymous
  • 21
  • 1