Audio: The next generation

In an object-based world, television sound will not only be more immersive it could also become personalised and interactive, as Will Strauss discovers.

While I am sure that I am preaching to the choir when I say this, great pictures are nothing without great sound. And never has that been more apparent than with the current move in the direction of Ultra High Definition.

With picture resolution quadrupling in size, viewers will benefit from greater image detail than they have ever had before while producers will potentially be offered further creative choices (plus the odd logistical headache). But unless the quality of the audio experience matches it, they might as well not bother.

“It is very challenging to unpick the visual experience from the audio,” details Tony Churnside, media technologist at BBC Research and Development (R&D). “One thing depends on the other. If you make the sound worse, it has a negative impact on people’s perception of the picture. In terms of the audience experience the two things should be tied up.”

So, in this next audio step change, what will sound, well, sound like? In an effort to be more immersive, will it be an extension of the familiar channel-based approach that requires complicated speaker configurations in 5.1, 7.2, or even 22.2 that make the listener’s living room look like the bridge of the Starship Enterprise? With TV now increasingly being consumed across multiple platforms and devices, this seems unlikely.

“Realistically not all of the audience can, or wants to experience TV sound that way,” says Churnside. “It’s no longer right to see this as one-size-fits-all. Now, we are looking at the development of a system agnostic environment or format that is object based.”

How does that work then? Well, rather than broadcasting the stereo loudspeaker signals and their pre-mixed combination of dialogue, narration, sound effects, music, and background atmospheres, each of those sounds is sent as a separate audio object with associated metadata. The viewing device or system at the other end then reassembles the objects into an output that can be slightly different for each listener by locally changing the metadata.

Building blocks

It works in a similar way to responsive website design, where a set of associated style parameters control how the content should look depending on the size, shape, and type of browser it is being viewed on. Churnside has a better analogy though: “When you buy a Lego set it comes with a load of bricks and instructions for how you can assemble those bricks,” he explains. “Sometimes those instructions can provide for the creation for more than one thing. That is what we’re doing with TV or radio programmes.”

Whether you prefer the responsive design comparison, or the Lego one, the key is that this agnostic approach means the listener gets the best possible audio experience for the situation they are in whether that is sat at home in front of a big plasma TV or watching on the move via their tablet computer.

BBC R&D carried out a test to this effect last year. A radio drama, Pinocchio, was able to be rendered in stereo for Radio 4 listeners but in surround sound for those listening online: and it was done so via a single production process.

Object-based audio can go further than simply adapting to the end-user device. Using what is termed ‘perceptive media’, where the programme knows something about its audience, content can be tailored to, say, a geographical location. A TV drama, for example, could have different, automated, dialogue feeds depending on the city that it is being viewed in.

It can afford elements of personalisation and interactivity too. Because an object can be any bit of audio in a programme, the listeners could be given the choice to reduce or increase the levels of the commentary or the crowd noise during coverage of a football match, or even choose which set of supporters they hear.

A BBC R&D test with BBC Radio 5 Live at Wembley Stadium last year, where the audio was streamed as a set of objects, allowed listeners to effectively ‘change where they sat in the ground’, an experience that Churnside’s research suggested gave the audience an experience that was ‘more like being there’.

It’s certainly an exciting development, but is it realistic, affordable and practical?

“We’re not there yet, other industries are moving in this direction,” says Churnside, citing the emergence of Dolby Atmos in the feature film market, but “there are challenges to be solved in design, production, and distribution, particularly what sits on your set-top-box at home”.

There is already support for object-based audio though. At this year’s NAB Dolby demonstrated a prototype of its object-based multichannel-mixing approach, while DTS, Fraunhofer, Fairlight, Calrec, and more all unveiled or discussed developments in this field. The EBU is also looking at incorporating object-based representations into the burgeoning BWAV format.

Clearly, further research is required before we can fully understand the impact that object-based sound would have on production, post production, and broadcast and what benefit it would provide for the audience. But if BBC R&D has anything to do with it, it will be mere child’s play.

“We don’t want to double the cost of production, the aim is to be able to create the audio bricks once,” says Churnside. “You don’t have to buy a new Lego set if you want to build a new thing. You just re-use the existing bricks to do it. That is what we’re trying to emulate.”