First, I have some conceptual questions, and then some more specific questions with regards to implementing HTTPS.
In an extremely simple system with only 2 hosts (A and B) talking on a LAN but an active MITM on the the LAN (Z), who is active from the very beginning and able to intercept and fully modify first connections between the 2 hosts A and B, it is true that fundamentally, some piece of data must be shared out-of-band to avoid this attack scenario, correct?
In other words: there is no algorithm, protocol, or mathemagical trick that exists that would allow host A to connect to B for the first time ever, having absolutely no prior knowledge of B, and somehow assert that B is not a MITM, correct? I know this question might sound obvious, because truthfully if A has no knowledge of B whatsoever, then there really is nothing distinguishing B from Z to A, but I need to hear it from the experts.
This is exactly how PKI on the internet works, right? Because a server's certificate is signed by a CA's private key, but the public certificate of the CA was effectively shared beforehand, and in an out-of-band manner (when you purchased the computer, or OS disc, it was already in the browser software).
Is it also true that a CA issued cert in comparison to a self-signed cert only provides a logistical benefit in the form of scalability? (in comparison to a hypothetical world where all parties shared certs securely, out-of-band) In other words say every site on the internet attempted to use self-signed certs, then technically it could still be secure if for example amazon.com and I communicated out of band and amazon shared its certificate (in a way that I could establish trust that the provided cert was actually amazon's), so I could validate future connections, right? But the problem would be needing to perform this out-of-band step for every website out there.
Now onto a related implementation problem.
Picture a hypothetical self-contained computer system with roughly 100 hosts on a private LAN. 1 host is the "master", and is initially installed manually (from a disc, drive, or some other trusted physical media). But the remaining 99 hosts are installed over the LAN (https). If the remaining 99 hosts have no persistent storage and install directly into RAM and begin executing, then it must be impossible to design such a system to avoid the initial MITM possibility described above, right? Especially considering each of the 99 hosts does a full re-install, and essentially connects for the first time on every reboot. Because there is no place or means to store any kind of pre-shared certificate or validation data so that the 99 clients can digitally verify an SSL certificate presented by the "master" server. And if there is no out-of-band method of communication, then they must resort solely to initially accepting the certificate over the same connection (creating the MITM possibility). Additionally, the 99 hosts are automated, so there is no possibility for a user to intervene, and validate a fingerprint, like with what SSH does. But this would really just be another form of out-of-band communication it seems.