So how does this save time? If your interview has some structure (again, structured and semi-structured interviews are the most compatible with this method) you can simply glance at the waveforms and know when you asked a question. If you followed your script, you should have a good idea of where in the timeline you asked a specific question. You should also be able to discern the difference between an improvised prompt (e.g., "Tell me more about that.") and a question based on the length of the waveform. Jotting down a few cue points during the interview will help with this too. If you decide in your analysis to focus on specific questions/topics, you can expedite your process by only transcribing the responses to those questions.
The interview doesn't have to be conducted online to get the two separate audio files, but you need to have two microphones recording to separate tracks on a DAW to do this with an in-person interview. And, to state the obvious, you don't get the video data (unless you go all out and wear a GoPro).