Electric Politics
 
Donate to Electric Politics

Green Party USA
Blank
CoffeeGeek.com
Blank
Whole Foods
Blank
Grist
Blank
Whole Foods
Blank
Whole Foods
Blank
Ben & Jerry's
Blank
Al Jazeera English
Blank
911Truth.org
Blank
Politics and Prose
Blank
Politics and Prose
Blank
Pluto Press
Blank
In These Times
Blank
CASMII
Blank
CounterPunch
Blank
CounterPunch
Blank
News For Real
Blank
News For Real
Blank
The Agonist
Blank
Duluth Trading
Blank
Digital Photography Review
Blank
New Egg
Blank
Free Link

INTERMITTENT NOTESXML

Podcasting Protocols, Explained

Podcast screenshotIt seems easy enough: record a phone conversation, reduce it to .mp3 format, post on the internet. Voilà, a podcast. If only! Several of you have asked what, exactly, goes into making the EP podcast? Everything starts, of course, with trying to figure out who would be a good guest to talk with — and the final podcast product eventually gets posted to EP, which is itself a complex process that includes everything from graphic design to coding. Here, however, I want to explain the basics of the technical side of recording that results in the .mp3 file, from my perspective as EP's audio engineer.

Phone lines make recording difficult, particularly if you want to record each speaker's voice on its own channel, and you generally do want to do that because it makes possible the maximum range of editing options. A simple recorder, however, can't tell the difference between two people's voices; for that you need a specialized bit of gear called a digital hybrid phone tap. The digital hybrid detects slight variations in the line current between "the studio" voice and the "away" voice and separates them without (significant) overlap.

Complicating the process further, "the studio" will sound better on a proper microphone rather than the phone handset. Without going into the complexities of "mix-minus" inputs, basically each person's voice gets sent separately to a recorder, producing a stereo recording with one voice on the left and one on the right, with very, very slight overlap (all digital hybrids make minor mistakes and there's often a bit of bleed-through from the guest's handset, when its microphone picks up faint sounds from its speaker).

So the cabling looks like this: I talk into a mic which is plugged into a pre-amp. The pre-amp goes into the digital hybrid. A phone line also goes into the digital hybrid. Two cables go from the digital hybrid to a mixer (channels A and B), and a pair of headphones goes into the mixer so that I can hear the guest and myself (one in each ear). The mixer goes via firewire into my Macintosh, into a program called Sound Track Pro, which records everything. Identical outputs for channels A and B go via two cables into a small stand-alone digital recorder (if the computer should fail while recording an interview I'll have a more-or-less identical copy with slightly lower fidelity).

I'll have Sound Track Pro open, the pre-amp and mixer turned on, and I'll initiate a call through a regular handset. Once the guest is on the line I switch to digital on the hybrid (it sends a tone pulse down the line to check relative currents so the guest must hold the phone away from their ear for a moment), replace the phone handset in its cradle and put on the headphones. Good to go.

At that point I must check levels so I ask the guest about the weather or whatever and make any necessary adjustment to their channel through a software interface with the mixer. My mic level is well calibrated and I don't need to change it with each interview, though when I record my intro and exit comments on Friday mornings I like to boost the levels a bit as that seems to make things easier.

Before starting the interview I explain to the guest about digital recording on two channels, how that makes editing much simpler, and why then with a digital recording not to worry about coughs or other sounds, or interruptions which can be edited out, etc. Also that it's a good idea to have a glass of water or beverage of some sort. I have a regular patter for this which, I think, helps put guests at ease. Before we start I also record about 15 seconds of "silence" on the line, which becomes useful later during editing.

One thing I don't do, which I haven't figured out a diplomatic way to do, is to ask the guest to be sure to speak normally and consistently into their phone handset. Every once in a while a guest will think they're just having a regular conversation and start waving the phone around while they talk, which produces wildly varying levels and making editing a real chore. Most people know better, but some don't. I'd hate to insult anyone's intelligence by mentioning it.

Having recorded a conversation it must be edited in order for it to sound normal. You might think it would sound just as it did when speaking on the phone, but it doesn't. Digital recordings at high fidelity tend to pick up all kinds of noise from phone conversations that either the interlocutors don't hear or that the ear automatically tunes out (I think mainly the former).

I record at 48 kHz/24 bit, a moderately high fidelity. A one hour recording produces approximately one gigabyte of data. I found early on that if I record at even higher fidelity levels and produce files in excess of two gigabytes that Sound Track Pro will crash or scramble the file. Easier to keep fidelity within certain bounds than to try to trouble-shoot the issue.

OK, so we've got a one gigabyte file that needs to sound natural. And here's where the real backbreaking work happens. I do, depending how one counts, eight passes through the file.

(Until I got my new Mac Pro this would take roughly 7-10 hours of editing per hour of recording; based on a couple hours of editing I did yesterday for Friday's show that may now be cut to 2-5 hours — anyhow, I was so surprised at the difference I spent some time thinking about how to account for it. My joy at this discovery explains in part my writing this post!)

The first pass is to go through the guest's channel and drop levels to zero whenever they aren't speaking. Not between words, but between when they talk and when I talk. This removes any possibility of residual line static overlap onto my channel. Each section must be identified, clicked to highlight, and dropped. Lots of clicking.

The second pass is the same on my channel. In the graphic for this post I made a screenshot of my having highlighted a section of my channel to reduce to silence in the recording of my conversation with Jim Lobe, which will be this Friday's show. Note that the screenshot is of a very small section of the Sound Track Pro interface, which extends all the way across my 23" monitor. Note also how Jim's channel is grayed out while I perform operations only on my channel.

In the third pass I take the aforementioned recorded "silence" as an ambient noise print. Going through the file again from beginning to end I splice the ambient noise onto the tips of the waveforms between where the guest talks and I talk. The reason for this is that if there weren't any continuity of sound there would be noticeable breaks between speakers. Interestingly, once the ambient noise is added it can't be heard — it's only noticeable through its absence. I discovered this myself but about a year ago when talking with an experienced professional sound engineer learned that that used to be a routine procedure in broadcast, though now rarely done as it's so labor intensive.

In my fourth pass, and this is vanity on my part, I drop the levels on my channel for breathing. Otherwise I sound like I have asthma (I don't). Probably some combination of cutoff filter and mic technique would take care of this, but I haven't figured it out yet. It's fairly easy to see the offending breathing patterns and I'm quite familiar with what they look like so I rarely even listen anymore to the breathing I'm editing, I just do it by sight.

The fifth pass is to reduce line static on the guest's channel. This is tricky and something of an artform. I must find a passage where they're not breathing but there's static on the line (anything over about half a second is enough), copy the static as a noise print, then experiment with noise reduction levels. Taking all the static out makes a voice sound very unnatural, taking none out makes it difficult to listen to. And if for no other reason this is why recording on two channels is so important. If I were to apply the noise reduction appropriate for the guest's channel to both channels, that is, I suppose, if both were just the same channel, then my voice would sound terrible, noise reduction for the one not being appropriate for the other.

(In person interviews pose a whole other set of issues as the channels can't be separated despite being recorded separately, because each mic picks up at lower levels the same input as to the other.)

In the sixth pass I check how levels A and B compare and will reduce B as necessary. (This is something I'm still working on improving.)

In the seventh pass I listen through the conversation, making cuts as necessary for interruptions or side conversations or whatever. 99% of the time these can be done seamlessly. Very little in general, by the way, gets cut. Probably something less than 30 seconds, on average, per hour of recording.

The eighth pass is mastering. I use a plug-in to Sound Track Pro, called Izotope, which uses a root-mean-square algorithm to normalize levels. To be honest, I haven't quite figured out this software — if I knew what I were doing I'd be able to set it such that final levels after normalization were what I want. Not being able to do that I take the normalized file and re-set levels where I think they should be.

Thus the basic file.

Friday morning, early, I record an already prepared script for intro and exit comments. Then, in a project in Sound Track Pro, I put together the EP jingle, my intro and exit comments, and the conversation itself. Lots of tracks, like a layer cake, eventually merged. I save to AIFF and then must save again, reducing fidelity to 44.1 kHz so that it's compatible with later reduction to .mp3, in a .wav format. At this point I close Sound Track Pro. In iTunes I convert the .wav file to .mp3 and add information that may (or may not) be useful for listeners who file the shows.

I double-check that I can listen to the .mp3 without any problems (every now and then the conversion process has gotten screwed up, not sure why, but this hasn't happened recently), then upload the file to EP's server in Utah.

If all has gone well I'll publish the podcast at 7:00 a.m.

In a way the goal is to make it seem easy, almost effortless, but of course it isn't!

I've had a couple questions from other podcasters about "how-to's". If anybody has any questions about what I've described above I'd be happy to answer them.

« Snake Eyes | Main | Comments Hosed Again »



Comments



Sadly all George's hard work is then reduced to glorious lowest of the lo-fi by the two 50 cent (size and cost) speakers in my laptop.



Very interesting explanation. Thanks for sharing it. You really put a lot into these podcasts!

Leave a comment