<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Get Info: #machinelearning</title>
    <description>Posts tagged “machinelearning” — Blog of independent game and app developer Matt Sephton. Featuring vintage Macintosh, game development, digital artwork, Japanese esoterica, video game reviews, hacks and tips, and much more.</description>
    <link>https://blog.gingerbeardman.com/tag/machinelearning/</link>
    <atom:link href="https://blog.gingerbeardman.com/tag/machinelearning/index.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Tue, 30 Jun 2026 01:15:23 +0000</pubDate>
    <lastBuildDate>Tue, 30 Jun 2026 01:15:23 +0000</lastBuildDate>
    <generator>Jekyll v4.4.1</generator>

    
      
        <item>
          <title>Automatically classifying the content of sound files using ML</title>
          <description>&lt;p&gt;Following on from yesterday’s &lt;a href=&quot;/2023/08/12/extracting-sounds-from-macromedia-director-files/&quot;&gt;extraction of old sound effects&lt;/a&gt;, I quickly realised I needed an easier way to search them as they came out of Director as unlabelled, numbered files. I can use QuickLook or a media player to quickly audition them, but how could I easily find the sample that contains the sound of running water or a horse trotting?&lt;/p&gt;

&lt;p&gt;I wondered if there was a way of using Machine Learning (ML) to automatically categorise sounds. It seemed like something that should be possible, especially given the recent explosion in “AI” (really: ML) tools. I quickly found Google’s AudioSet, which sounded like the perfect dataset:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. The ontology is specified as a hierarchical graph of event categories, covering a wide range of human and animal sounds, musical instruments and genres, and common everyday environmental sounds.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But the data set is only one half of the solution. You need to use the dataset to create a model and then run that model against your own data to get the required results. Thankfully, I found YAMNet:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;YAMNet is a deep net that predicts ~521 audio event classes from the AudioSet-YouTube corpus it was trained on.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I guess YAMNet is tracking behind AudioSet in terms of total categories, but it is good enough for me. Here is a &lt;a href=&quot;https://github.com/tensorflow/models/blob/master/research/audioset/yamnet/yamnet_class_map.csv&quot;&gt;list of all the classes&lt;/a&gt; of sounds it can recognise.&lt;/p&gt;

&lt;h2 id=&quot;lets-go&quot;&gt;Let’s go&lt;/h2&gt;

&lt;p&gt;I used the script described in &lt;a href=&quot;https://www.tensorflow.org/hub/tutorials/yamnet&quot;&gt;this tutorial&lt;/a&gt; as a starting point. I’m not a regular python user but using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pip&lt;/code&gt; to &lt;a href=&quot;https://www.tensorflow.org/install&quot;&gt;install tensorflow&lt;/a&gt;, along with any other missing imports, and after that it …just worked.&lt;/p&gt;

&lt;h2 id=&quot;getting-your-files-in-order&quot;&gt;Getting your files in order&lt;/h2&gt;

&lt;p&gt;According to the documentation all sound files need to be at a sample rate of 16000Hz. After getting some calssification results of “Silence”, I realised they also need to be 16-bit resolution. So I ran a quick &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sox&lt;/code&gt; command to create compliant copies of all my sounds. I’ll delete these when I’m done. Notice how I decided to trim sounds to a maximum length of 3 seconds. This helps speed things up and most sounds can still be recognised with such a short starting section.&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;find &lt;span class=&quot;nb&quot;&gt;.&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-iname&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;*.wav&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-exec&lt;/span&gt; sox &lt;span class=&quot;o&quot;&gt;{}&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; 1 &lt;span class=&quot;nt&quot;&gt;-r&lt;/span&gt; 16000 &lt;span class=&quot;nt&quot;&gt;-b&lt;/span&gt; 16 &lt;span class=&quot;o&quot;&gt;{}&lt;/span&gt;_16k.wav trim 0 00:03 &lt;span class=&quot;se&quot;&gt;\;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;optimisation&quot;&gt;Optimisation&lt;/h2&gt;

&lt;p&gt;Running the classifier works at about real-time, a few seconds per sound, but I noticed that it was leaving a lot of my CPU unused. This struck me as a prime candidate for parallelisation, which is pretty easy on the command line. I used the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;parallel&lt;/code&gt; command to scale up the classification to use all 10-cores of the M1 Pro CPU in my 2021 MacBook Pro.&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;find &lt;span class=&quot;nb&quot;&gt;.&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-iname&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;*.wav&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-exec&lt;/span&gt; parallel python3 classify.py &lt;span class=&quot;o&quot;&gt;{}&lt;/span&gt; ::: &lt;span class=&quot;o&quot;&gt;{}&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\+&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As I type my computer is making short order of the task, whilst remaining perfectly responsive, if a little warm. Final speed for me is one sound every ~0.85 seconds.&lt;/p&gt;

&lt;h2 id=&quot;python-script&quot;&gt;Python Script&lt;/h2&gt;

&lt;noscript&gt;&lt;p&gt;&lt;a href=&quot;https://gist.github.com/gingerbeardman/9e9bde623673ed2f50aeb15e97aae4a3&quot;&gt;View the source code as a Gist&lt;/a&gt;&lt;/p&gt;&lt;/noscript&gt;
&lt;script src=&quot;https://gist.github.com/gingerbeardman/9e9bde623673ed2f50aeb15e97aae4a3.js&quot;&gt;&lt;/script&gt;

&lt;h2 id=&quot;creating-my-sfx-library&quot;&gt;Creating my SFX Library&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://getsoundly.com&quot;&gt;Soundly&lt;/a&gt; is a sort of iTunes for sound effects. It’s an app that enables easy, automatic organisation of files, quick searching of metadata, painless playback/auditioning, non-destructive edits, and simple exporting of the final sounds. The free version allows a local library of 10,000 files which is more than enough for my usage. I’m not affiliated with them in any way, but they offer a free version and a &lt;a href=&quot;https://getsoundly.com/news/soundly-promo-code-free/&quot;&gt;1-month free trial of their paid version&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As you add your local folder of files it allows you to import a (semicolon-separated) .csv file containing additional metadata. It’s here that I point it to the file that was generated by the classifier. The categories are imported as the description of the sound, and are able to be searched. Perfect!&lt;/p&gt;
</description>
          <author>by Matt Sephton</author>
          <pubDate>Sun, 13 Aug 2023 17:01:00 +0000</pubDate>
          <link>https://blog.gingerbeardman.com/2023/08/13/automatically-classifying-the-content-of-sound-files-using-ml/</link>
          <guid isPermaLink="true">https://blog.gingerbeardman.com/2023/08/13/automatically-classifying-the-content-of-sound-files-using-ml/</guid>
        </item>
      
    

  </channel>
</rss>
