I'm working on a photo sharing app that stores files in the cloud, encrypted, and lets you share those files with others. To do that, as suggested in answers to this previous question:
- files are encrypted locally with a (random) symmetric key, then uploaded
- then this first key is encrypted for each recipient using a shared key, derived using the recipient's public key and a key exchange protocol (diffie-hellman). Those encrypted recipient keys are stored on the cloud along the file.
Now, so far, I've stored files named according to their hash (the plaintext's hash). That lets anyone quickly check if they've already uploaded/downloaded this file. But that also lets an attacker quickly check if you've uploaded a specific forbidden/sensible file.
Is there a way to name files / choose their location, that would solve this problem ?
And still satisfy these properties:
- given an uploader's public key, two identical files will upload to the same name/location (deduplication)
- The uploader can quickly find a previously uploaded file (given the bytes of the files) and avoid re-uploading
- Any recipient can also quickly find a file it has access to, and quickly check if they've already downloaded it (to avoid re-downloading)
- given the bytes of a file to check, an attacker cannot quickly check if an uploader has uploaded this specific file
p.s.
I'm working with existing cloud storage, without any centrally managed users database or server hosted by me, if possible. Essentially I'm treating this cloud storage as completely public storage, accessible by anyone.
Here's what I attempted:
- using the ciphertext's hash as filename (without keeping records of previous uploads)
In that case, since encryption keys are random, the ciphertext hash changes for the same file encrypted twice. So I lose the ability to quickly check if a file is already uploaded (unless I kept track
Here's what I didn't attempt yet:
using the ciphertext's hash as filename (and keeping records of previous uploads). Could work, but it looks like the recipients would have to keep records of the downloaded files too (re
The recipient can quickly check if they've already downloaded a file
). it seems this gets a bit trickier when one user loses these records and needs to rebuild this index, it seems they'd have to re-download every file and match it against local files.using the ciphertext's hash as filename, and using deterministic keys instead of random ones (keys could be derived from the file bytes and uploader's pubkey for instance). Would this be viable, given the keys are not random anymore ? Also, the recipient would have to keep records of downloaded files, right ?