Blocking Audio Ads

This article describes an approach to blocking ads in audio files and introduces software that implements it.

Motivation

The modern web is unbearable without a good ad-blocker such as uBlock Origin. Unfortunately, the online marketing industry, while destroying the web, providing creative distribution networks for malware and disinformation campaigns there, has also discovered podcasts as a medium for spreading obnoxious advertisements.

The Challenge

Since podcasts (also known as audiocasts) usually integrate advertisement clips inline as part of the audio stream in the audio file such that they can't simply be filtered by blocking certain URLs or manipulating scripts, as it's done with ads on websites or even Youtube. (NB: as of 2024, Youtube is experimenting rendering ads server-side into the video-stream)

However, many podcasts, especially German ones, clearly mark the beginning and the end of an advertisement by some sound bite or jingle. Such marking isn't entirely voluntarily, because a few legal norms regarding broadcasting and unfair business practices do apply.

A Solution

One approach to blocking audio ads in podcasts is thus to automatically search for the characteristic samples (templates) and cut everything in between out.

I searched for software that implements this approach, but couldn't find any.

Thus, I created cutbynoise - a small Python program that cuts audio files by characteristic templates. See also its README for a detailed technical description of how it works.

Of course, nobody wants to manually call cutbynoise on each new episode of a podcast. I thus also created castproxy - a small Python program that aggregates podcast feeds, caches embedded audio files, and is able to invoke audio filters on them, such as cutbynoise.

Results

Using cross-correlation to search for ad start/stop markers works surprisingly well on real world audio data. Especially given that podcasts are usually heavily post-processed and of course lossily compressed.

I'm running castproxy and cutbynoise for two months now on a dozen or so podcasts.

So far this setup successfully filtered out 5301 seconds of super annoying ads! Almost one and a half hours!