Please forgive the vagueness of the question title.
I am currently working/designing an opaque storage of immutable files. The purpose of the service is simple: storing files and being able to retrieve them by their identifier.
As the service is explicitly designed at storing immutable data, I had the idea of making the files identifier dependent on the contents of the files themselves.
The formula is as follow:
file_id = file-size + SHA-256(file-content)
With that design, it is guaranteed that each file will have a deterministic and unique id.
I thought about how to protect the data with typical access control (typical HTTP Authorization strategies come to mind) but the more I think of it, the more it feels unnecessary.
My rationale is as follow: the valid identifier space is extremely large. So large that the only way an attacker could fetch a specific file is either:
- Knowing the exact file content beforehand (allowing to compute the identifier).
- Gaining access to an existing identifier in some other system.
But...
- is... well. Idiotic: if you have the content of the file already, you don't need to access my storage service to fetch it.
- Is a security design problem for the other systems: regardless of whether they store the real data or a reference to it, they need to be secure by themselves anyway.
Should I still bother with an additional layer of security on top of my API or is that design secure enough in on itself?