Putting findings in an answer instead of comments seems to be the best approach.
As usual, turns out this has been discussed before. A quick search on CiteSeerX gave 50 papers, however not quite up-to-date:
We present a structured security analysis of the VoIP protocol stack, which consists of signaling (SIP), session description (SDP), key establishment (SDES, MIKEY, and ZRTP) and secure media transport (SRTP) protocols.
Using a combination of manual and tool-supported formal analysis, we uncover several design flaws and attacks, most of which are caused by subtle inconsistencies between the assumptions that protocols at different layers of the VoIP stack make about each other.
The most serious attack is a replay attack on SDES, which causes SRTP to repeat the keystream used for media encryption, thus completely breaking transport-layer security. We also demonstrate a man-in-the-middle attack on ZRTP which disables authentication and allows the attacker to impersonate a ZRTP user and establish a shared key with another user. Finally, we show that the key derivation process used in MIKEY cannot be used to prove security of the derived key in the standard cryptographic model for secure key exchange.
We have come to notice that the three key generation protocols ZRTP, SDES and MIKEY are vulnerable to the Man-In-The Middle attack. Our analysis suggests that the key management protocols that operate in the media layer are indeed suitable media keying protocols despite their operational differences.
As pointed out by Philippe Lhardy, audio streams in their compressed form present an attacker with an opportunity to infer: identity of the speakers, language being spoken and a few other details.
Two modes of compression have been analysed in the literature:
Would be grateful for any other ideas and suggestions, especially related to videoconferencing.
EDIT: A related question: Can voice chat be spied?
EDIT #2: Cryptocat is an implementation of Off-the-record messaging: Flaws in Crypto Cat
Any discussion of VoIP should include this possible requirement.
Following a suggestion from landroni, here's a link to vulnerabilities found in the ZRTPCPP library: http://blog.azimuthsecurity.com/2013/06/attacking-crypto-phones-weaknesses-in.html