Firstly, some important backstory. I'm producing a corporate version of an IM/VOIP/Screensharing client using the Open Source Jitsi. We are going to be using a central server and have already confirmed that TLS encryption is good to go and that the XMPP test messages are not being transmitted in plaintext (Wireshark). However I have realised that P2P streams (calls, screen sharing, video) are not protected by this.
We are keen to enhance the security of this system without compromising on it's usability, so we looked into Jitsi's support for the ZRTP protocol. However, the addition of the SAS is a system that is simply too unwieldy for day to day users who will never use this system, and will find it's inclusion in the UI to be confusing compared to Skype/GoogleTalk/etc. For this reason we have decided that the SAS key would have to be removed from the UI.
I am aware that this opens us up to the potential for a Direct Relay MITM during the initial ZRTP exchange but realistically I don't think I can do better than this in a user-friendly way that is acceptable to the business.
I can't just restrict access to this tool to a single site: our traffic will be crossing the pond between the UK and US. For this reason we have to assume that some of those intermediate hops are outside of our control.
What I'm asking is does the ZRTP protocol offer other benefits (closing attack vectors not otherwise discussed)? Or does the TLS handle that? EDIT: Jingle does not and cannot use TLS, I'm rather silly thinking it could. I have confirmed that the Jitsi client encrypts Text traffic through the XMPP spec, but I haven't tested any of the Jingle based Peer-to-Peer stuff: is this TLS'd by default? I am reading the specs and I can't seem to tell whether the Jingle P2P connection is wrapped by an external TLS handshake, given that the payload is peer-to-peer RTP that isn't being piped through the server.
Thanks in advance, and apologies for the acronym heavy posting (as you can tell I've been knee-deep in protocol specs for a while now and it's not really my area of expertise!)
EDIT2: Jitsi also appears to support SDES, which negotiates keys using the initial connection with the server. I am going to investigate the viability of this technique.
EDIT3: The SDES system appears to work. I'd like to see some Wireshark Captures to be sure that the clients are doing what they should before this gets deployed, but everything appears to be solid. I've solved my problem but I'll leave the question open as it's still valid and I'm still interested from an academic standpoint.