Does a hash-based storage cache design require additional access-control?

Question

Please forgive the vagueness of the question title.

I am currently working/designing an opaque storage of immutable files. The purpose of the service is simple: storing files and being able to retrieve them by their identifier.

As the service is explicitly designed at storing immutable data, I had the idea of making the files identifier dependent on the contents of the files themselves.

The formula is as follow:

file_id = file-size + SHA-256(file-content)

With that design, it is guaranteed that each file will have a deterministic and unique id.

I thought about how to protect the data with typical access control (typical HTTP Authorization strategies come to mind) but the more I think of it, the more it feels unnecessary.

My rationale is as follow: the valid identifier space is extremely large. So large that the only way an attacker could fetch a specific file is either:

Knowing the exact file content beforehand (allowing to compute the identifier).
Gaining access to an existing identifier in some other system.

But...

is... well. Idiotic: if you have the content of the file already, you don't need to access my storage service to fetch it.
Is a security design problem for the other systems: regardless of whether they store the real data or a reference to it, they need to be secure by themselves anyway.

Should I still bother with an additional layer of security on top of my API or is that design secure enough in on itself?

score 1 · Accepted Answer · answered Feb 11 '22 at 19:57

This depends on the features of your API and on how your storage is intended to be used.

If your API provide a way to figure out IDs of existing resources (i.e. something like resource listings) then relying only on knowledge of the ID is surely not sufficient.
If you fear that users might just save their IDs somewhere accessible to others and want some additional protection, then relying only on the ID is also not sufficient.
If you want to implement a scheme where users might share resources with others but then revoke access later, then relying only on the ID is also not sufficient.
Otherwise this might just be comparable to the commonly used unique URLs. Therefore see Is a long, random string in a URL considered adequate protection from unauthorised access? or Is random URL token secure enough for file attachments and other user content? for more information.

Does a hash-based storage cache design require additional access-control?

1 Answers1