New app called Descript will allow audio editing directly from text

Andrew Mason, the co-founder and former CEO of Groupon whose most recent company produced audio-guided city walking tours, has unveiled his next startup: Descript.

The app is designed to allow audio editors to make changes to an audio file by editing the text transcription of that audio file.

The idea, he says, is to offer anyone editing audio files, whether its podcasters, journalists, or musicians the ability to edit single-track audio clips as easily as they would edit words in a word processing document.

Descript was initially built as an in-house production tool for Detour, Mason’s walking-tour app, and is now being spun out as its own company.

“We’ve just crossed the threshold of where automatic speech recognition is accurate enough that automated transcription services are viable,” Mason said in a recent interview with The Verge, adding that editing audio more easily is the next obvious step after that.

The app relies on text-audio alignment in order to work. A text-based transcription is generated from the audio file, and from there, the app uses machine learning to match the audio sample and the text version of the words. A time code is assigned to each word, so that if the text editor is used to delete a word in text, it’s immediately synced with the audio file.

Mason claimed that Descript gets a “surprising number of edits right the first time” but added that there’s also a waveform editor in the app so that users can continue to tweak the audio file or add light effects as needed. There will be two versions of the app, a standard one that costs $20 a month (with an initial $10-per-month deal), and another version that’s free to download but doesn’t offer the text-to-audio edit tools. In the paid version, transcription services will cost $.07 per minute, while in the free app, transcriptions are $0.15 a minute.

In response to questions about the ethical implications of advanced audio-editing tools, Mason emphasised that the product is mostly for simple audio pick-ups, but said that he and his team are “thinking about it for down the road, to make sure we’re on the right side of this stuff.

“Even though we don’t intend to be on the vanguard of the fakery, it’s coming one way or the other. But we’ve been through this before,” Mason added. “Basically what’s happened to photos and print before will happen to audio and video, and society adjusts. The credibility of a piece of content comes down to the credibility of the source.”

Via: The Verge

(Image: Descript)