Post masters: Mixing sound for the screen

It used to be so easy, adding music, atmosphere and sound effects to accompany visual media. The sound engineers usually had access to the production relatively late in the process, got a few musicians together, pointed a couple of microphones at them and banged some coconut shells together for the effects. You only had to worry about mono or three-speaker mixes for the cinema and television’s tiny elliptical speakers were a lost cause anyway.

Today, sound mixers for film need to supply high quality audio that works on playback systems ranging from the cardboard speakers that seem to be a feature of most flat screen televisions, through to Dolby Atmos-equipped cinema complexes, while those working in television must meet the demand of an increasingly sonically sophisticated home-based audience who are buying ‘high-fidelity surround receivers’ and ‘sound bars’ in droves. And that’s before we start to think about the challenges and opportunities brought about by the explosion in new media formats, such as the non-linear nature of computer games and the immersive audio required for Virtual (VR), 360 degree and Augmented (AR) applications. So how are those at the coal-face approaching the final sound production that’s required to meet the needs of the various media?

Rich Aitken has been working in the music industry for twenty-three years, most of them as a sound mixer whose work straddles film, television and computer games, winning an Ivor Novello award for Guerrilla Games’ Killzone 2 with the composer Joris de Man. “For film, I mostly deliver 5.1 surround stems to dub stages,” he says.

“There is a very definite hand over date and generally, I have a near finished set of film reels to put to the music.” For television, Aitkin says that most music mixes he does are in stereo as there seems to be less request for stems. “Although I do them as a matter of course anyway,” he adds. Sam Girardin is CEO of GameOn, a Montreal-based game audio production company offering everything from Foley, dialogue, sound effects and music to games designers—their credits include work for Ubisoft and a 61st MPSE Golden Reel Awards nomination for  WB Games Montréal’s Batman Arkham Origins.

Girardin’s background is in music production but the move to game sound was almost inevitable given the lack of resource in this field when the company was founded in 2002. Even though games are, by their very nature non-linear, Girardin approaches the creation of the sonic landscape in quite a traditional fashion.

“In an interactive game the flow of each part is linear, so you need to have this kind of mind-set to create convincing dialogue,” he says. “Most games now have a story which branches a lot, so our job as an audio provider is to still think in a linear fashion and mix the dialogue in a similar way to a film.” Paul Zimmer is a games composer and sound designer at ZAP! (Zimmer Audio Production), whose clients include Electronic Arts (EA) and Disney. “I’m interested in continually developing fresh approaches to composition and sound design,” he says. “Sometimes this can be achieved by pushing technical boundaries, but often it’s as effective to try different compositional approaches and experiment.”

The technology, hardware, and mixing and mastering environments used by games sound designers to create their music scores, Foley and sound effects are similar to those used in most areas of audio production. For example, Girardin uses Avid’s Pro Tools, Aitkin works mostly ‘in the box’ for reproducibility (alongside a few choice hardware units) whilst Zimmer is a great fan of Virtual Instruments–although the latter believes that more traditional elements are also important when mixing for games. “Over the years I’ve become aware that the most important technology is ensuring that you can trust the sound that you are hearing when mixing and mastering,” he says.

“The key to that is good monitor speakers, a well-treated room and acoustic optimisation. How good the final mix ends up sounding all flows from that.” Girardin mixes in, what he calls, “a (very nice) living room-type environment,” in 5.1 surround and makes sure that he auditions his work on the same kind of playback systems as will be used by the gamers themselves.

Aitkin says that mixing audio for the various media has become remarkably similar. “I mix in a cinema style calibrated room so I tend to approach those kind of levels,” he says. “This has the wonderful side effect of being pretty much in line with television levels and today’s requirements for Spotify and so on. I send any artist-derived music off for mastering, and I do master soundtracks but they are mostly ones I haven’t mixed.” Zimmer believes that technology can only take you so far when producing quality audio for games.

“It’s important for me to find an interesting angle for every piece I work on – even if it’s not my favourite genre.” While television and film mixes are usually done to a complete (or near-complete) visual edit, it may not be the case with other media. “With games, I’m rarely given anything to mix to; it’s not a linear medium,” says Aitkin.

“Sometimes I get some video which helps in the pacing and allows me to see and hear how much space there is for music. It used to be the case that I’d be providing mixable stems or loops as well but this seems to be less of a requirement than it used to be; the interactive element of games has evolved to a much more creative level than simply adding or subtracting layers and its focussed much more on how the music is written.” When it comes to the placement of the audio assets within the game itself, the methodologies and technology differs somewhat from the dubbing of audio in film and television.

Games engines such as Unity Technologies Unity enables the sound designer to place the mixed and mastered audio within the game and help the audio engineer and games designers define how it reacts to the characters and their environemnts. Girardin uses Audiokinetic Wwise-which he calls the ‘Pro Tools of game audio’. “It’s not an editing system as such, but a huge mixing matrix which allows you to synchronise your animation with your audio and help make sure your ambiences and atmospheres are correctly rendered”, he says.

A new area for composers and sound designers is the emergence of VR, AR and 360 degree technologies that require suitable accompanying ‘three dimensional sound’. One of the issues faced by audio mixers in this field is that it’s not just the audio that must be placed in the soundfield to line up with the audience’s direction of sight, but it must also reflect the simulated – and ever changing – environment being presented to the consumer.

The ability to produce immersive audio has been with us for some time, whether using binaural or multi microphone array techniques, but it is the advent of powerful computer control that makes it possible to align these recordings with the head movements of a gamer wearing a VR headset. Norwich-based Immersive VR , founded by managing director Matthew Martin and technical manager James Burrows, has been producing VR-based material for the likes of Ikea and Yamaha for some time, but they soon realised that the technology and expertise could be put to good use in the production of VR-based games.

“Our game, Primal Reign, started off as a fun internal project that quickly escalated into a project for public release,” says Martin. To date the company have created several VR gaming titles as well as several mobile gaming products for various brands. Burrows oversees the audio side of the business, using typical Digital Audio Workstations (DAWs) such as Cockos Reaper and Steinberg’s Nuendo for mixing and mastering.

“My background is in music recording,” he says. "I make sure the audio works in iPhone earbuds, DT150 headphones and NS10 speakers  first and then I’ll perform the spatial positioning.” Burrows says that because VR headsets are expensive, it’s likely that the consumer also has a decent set of headphones. “We use binaural and special audio design inside the Unity game engine,” he adds. “For a recent project we used rifle mics in an array which we then mixed using Facebook’s 360 spatial audio tools to get a tracked point source audio—it takes your audio assets and puts them out as multi-channel, binaural or ambisonic formats.” However, those working on mobile-gaming platforms could end up being frustrated in all their efforts as it appears that, according to Garadin, the majority of these gamers play with the sound switched off.

In story-driven narratives, the dialogue and performance are all important. However, the quality of the dialogue in the final mix is always only going to be as good as that captured on set or in the studio – if you need to do further dialogue replacement (ADR) – something that many directors abhor – the performance, sonic consistency and narrative continuity may suffer. BAFTA and Oscar winning production sound mixer Simon Hayes has a long track record in working in film sound production and his recent work includes the upcoming Guy Richie film Aladdin and JK Rowling’s Fantastic Beasts and Where to Find Them, directed by David Yates.

Hayes works on what is arguably the most important element in creating a compelling narrative in film and television production, on set dialogue – so he’s understandably concerned as to how the actor’s performances emerge from the mixing process. Hayes says that the actual sound technology used on set hasn’t changed that much over the last five years, but there have been important changes that have helped improve the audio quality that we have come to expect, especially in television productions. “High end television now has production values as good as any feature film,” he says. “In high-end television production, we now have adequate crew to make sure that the audio quality is as good as any feature film.” What this means in practice is something that Hayes is keen to stress as essential – that multi-camera productions generally now have two first assistant ‘sounds’ -formally boom operators – and a dedicated second assistant covering radio mics and acoustic integrity on set. “That’s the same crew we’d have on a $150m feature film,” he adds.

Hayes believes that a boom-placed microphone will always provide the best sound quality on set, but that for wide shots, it has been usual to use radio-microphones as booms would be visible. While these systems have improved in quality over the years, they may sound quite different from the boom microphones and also be subject to costume-related noise interference that’s hard to remove. Interestingly, this issue is being tackled not by improvements in audio technology, but by the use of visual effects. The television series House of cards was created by, amongst others, the director David Fincher who, Hayes says, is notoriously obsessive about the quality of the audio.

“Fincher asked the sound mixer if it would help if booms could be used in the wide shots—where they would usually be visible—as well as in the close up, so that you could get almost perfect, and consistent sound,” says Hayes.

“Fifteen years ago that would mean painting out the booms frame by frame, but now they can use what we call a ‘plate shot’.” Hayes explains that the booms are taken out of the shot after the clapperboard has, well, clapped, to give the visual effect houses a boom-free frame to work with. “The booms are then swung into the edge of the close up within the top half of the wide shot. These can then be matted out of the wide shots simply, cheaply and effectively,” he adds. This method is now in common use across the industry and, in fact Hayes himself is now using the same method,notably, on Tom Hooper’s 2012 film, Les Miserables.

It appears that  the tools and methodologies that sound mixers use to produce the sonic assets for use in the various media are quite similar. It is how those assets are then placed in the sound field, whether it be stereo, 5.1, Dolby Atmos or VR-based immersive audio, where differences begin to manifest themselves, with new sophisticated tools being developed to assist engineers in this ever-increasingly complex process. Otherwise, the ‘traditional’ methods of capturing the performance, using the correct recording and mixing spaces, applying care and attention to the final mixes and masters and being aware of the ways consumers will be listening to the work are all important criteria for audio engineers whatever the field they work in. But perhaps the most important change over the years is an acknowledgement amongst media commissioning companies, directors and producers that high-quality productions require the accompaniment of commensurate high-quality audio.