Friday, February 3, 2012

Battle of the audio APIs

TL;DR: Native Web Audio API is better if you're making games.

Which is better? Google's Web Audio API [1] or Mozilla's Audio Data API [2][3]?

 Well, actually that is not a question which can be easily answered since the two, though sharing some use-cases, are quite different. Also, to emphasise the difference, Mozilla's Audio Data API specification, officially since 15 December 2011, falls under a more general "MediaStream Processing" API specification [3]. Just browsing the two specifications will show you that they are quite different animals and that the Web Audio API specification is currently more mature (less likely to change).

To really get at the core difference between the two APIs, known that Google's Web Audio API can be,(and has been [4]) implemented in JavaScript using Mozilla's Audio Data API. So, immediately it is apparent that the Mozilla API is a lot more "low-level". That is, it gives you some core functionality which allow access to HTML "audio" elements' internal data. The specification, now more generalized for "MediaStream Processing", also allows for access to HTML "video" elements' internal data.

So, the specification championed by Robert O'Callahan and Mozilla is great for managing media assets, and performing streaming, extracting, injecting or mixing operations on raw data from audio and video elements. Microphone input would be considered a media element under this specification. The big use-case will be for in-browser video-conferencing or web-casting applications.

What is the Web Audio API then? It allows the user to build complex audio graphs, place sound sources spatially (including features like Doppler effects and HRTF ), trigger sound sources in a timely fashion, and comes with a set of pre-defined filters and effect generators. So, everything you need to make a soft-synth or a computer game's audio.

Now, since you can implement Chris Rodger's Web Audio API specification on top of the Mozilla specification in JavaScript, why have native support for it? Answer: Performance.

A JavaScript implementation of the Web Audio API, though functional, will run much slower than a native implementation. When making a game, you need processing time for game logic. You don't want CPU cycles tied up performing a set of computationally expensive convolution operations in JavaScript. Also, since the API is only a specification, a good implementation can hand off processing to hardware accelerated back-ends (e.g. on a Creative EAX card), or a browser maintained audio thread to help boost performance. For a complex scene with many filtered and spatial sound sources playing simultaneously, like in a game, you'll need all the performance you can get to keep the user experience responsive.

 Also, since the audio sources in the Web Audio API are easy to fill with custom buffers, there is little reason why both specifications can't live side by side. AudioContext's MediaElementAudioSourceNode method seems like a move in the direction of inter-operability between the two specifications. The  MediaStream  API can manage audio streams between peer-connected clients while the Web Audio API "renders" the received data in interesting ways.

So, in conclusion, for me, there is no battle, though I personally prefer using Web Audio API since it is more geared toward the game use-case. I hope that both specifications are refined to be more complementary and get accepted and endorsed by W3C [5].

 Links:

No comments:

Post a Comment