Finding the right balance

New audio technology has the potential to make listening easier and more intelligible, as well as adding to the excitement of the coverage. Kevin Hilton looks at how the market is developing.

Sound can be a strange thing. Something that is too loud or just incessant is annoying, but absolute silence can be very disconcerting. And when one is watching something like sport on TV, the lack of information about what is happening can add to the sense of bewilderment.

As much as some commentators and, even more so, their ‘expert’ sidekicks can make us want to throw the radio or TV set out of the window, they still tell us what we need to know about the action and are a familiar presence. The sound of the fans also helps to create an atmosphere for what could otherwise be a sterile, solitary experience (unless you’re at the pub watching with mates).

Digital broadcasting technology has given viewers and listeners more options in these situations. Red button interactive services have been used to offer a choice of commentary or to have just the crowd effects. German research institute Fraunhofer IIS has taken this further with its Dialogue Enhancement system, which allows people at home to choose whether they have more commentary or more background noise in the mix.

The fundamental aim of Dialogue Enhancement is to help hearing impaired viewers and listeners understand what is being said in programmes more clearly. It works by adding ‘side information’ relating to the individual dialogue and effects/music elements to a main mono, stereo, or 5.1 mix, which is transmitted as a single feed with the parametric additional data. The side information is then decoded by specially equipped TV sets or radio receivers to produce two individual channels: dialogue or commentary and crowd atmosphere, or sound effects and music.

Sporting arena

Because sport is dominant in broadcasting right now it has been the most frequent test bed for Dialogue Enhancement. The BBC held a trial of the technology during Radio 5 live’s coverage of the 2011 Wimbledon tennis tournament, with Dialogue Enhancement added to the station’s internet transmission. This year, France Télévisions demonstrated the technology during its coverage of the French Open, and Fraunhofer, working with Thomson Video Networks, made a world first at IBC 2013 with a live DVB (digital video broadcasting) chain featuring Dialogue Enhancement.

The BBC has now developed its own approach to the balance between speech and effects as part of an overall research project into new audio formats. This is based on object-based audio, which formed the basis of what is commonly referred to as “the BBC Radio 5 live football experiment”. This took place on 27 May for coverage of the Championship play-off final between Crystal Palace and Watford at Wembley Stadium. The internet stream of the broadcast allowed a set number of listeners – initially 1,000 but this figure was increased to 2,800 as the match went on due to demand – to not only select the balance between commentary and crowd effects but also which end of the stadium the noise of the fans came from. Essentially they could chose to have the Palace supporters louder than those of Watford or the other way round.

All this is described, along with results of a survey conducted after the broadcast, in BBC White Paper 272, Object-based Audio Applied to Football Broadcasts, which was published during November.

To capture the crowd two spaced pairs of Sennheiser 416 shotgun mics were placed at either end of the pitch. Technologist Anthony Churnside explains that the object-based approach would allow independent control of the level and panning of each mic, giving listeners “2D navigation around the area”, although it was not used in this case.

The mic feeds were added to the mono commentary signal; all three live IP streams were controlled through a HTML5 web audio Javascript API, enabling listeners to create their own balance of effects and commentary and choose which end of the stadium was louder than the other.

At the stadium the mono signals were passed through an A-D converter into a PC running a multichannel mapper and three custom 128kbps AAC codecs, which distributed the streams to BBC R&D’s facility. These were transcoded as both MP3 and Ogg Vorbis streaming formats so people with computers running any of the main operating systems would be able to join in the experiment. Churnside says there are still “some challenges that need to be solved” before the commentary/crowd balance and end selection system can move on but that work continues.

The White Paper claims the trial broadcast was an improvement in “audio quality and clarity” compared to 5 live’s “typical” online output, based on 56kps G.722 mono coding. The test was not without its problems; the Icecast server had to be restarted after 20 minutes due to the growing number of people connecting to the stream. The Ogg Vorbis system commentary feed also had to be rebooted at the start of the second half, making listeners change their settings.

The survey shows that 57% of listeners that took part through the option to alter the commentary/crowd balance gave a “much better” listening experience; 22% said it was “slightly better” and 7% declared it “much worse”. Being able to select which end of the stadium to listen to was deemed “much better” by only 19%; 53% found it “slightly better”, while the “much worse” figure was 7%.

But there could be an unexpected downside of new audio technology for those in the commentary box, as highlighted by one listener response: “Brilliant idea! Now I can reduce the crowd volume and actually hear the mumbled comments of the expert summariser.”