Deepgram, a YC backed startup using machine learning to analyze audio data for businesses, is open sourcing an internal deep learning tool called Kur. The release should further help those interested in the space get their ideas off the ground more easily. The startup is also including 10 hours of transcribed audio, spliced into 10 second increments, to expedite the training process.
Similar to Keras, Kur further abstracts the process of building and training deep learning models. By making deep learning easier, Kur is also making image recognition and speech analysis more accessible.
Scott Stephenson, CEO of Deepgram, explained to me that when the company was first getting off the ground, the team used LibriSpeech, an online dataset of audiobooks in the public domain split up and labeled for training early machine learning models.
Deepgram isn’t reinventing the wheel with its release. Coupled with data dumps and open source projects from startups, universities and big tech companies alike, frameworks like Tensorflow, Caffe and Torch have become quite useable. The ImageNet database has worked wonders for image recognition, and many developers use VoxForge for speech, but more open source data is never a bad thing.
“You can start with classifying images and end up with self driving cars,” added Stephenson. “The point is giving someone that first little piece and then people can change the model and make it do something different.”
Getting Kur into the hands of developers will also help Deepgram with recruiting talent. The strategy has proved itself quite useful for large tech companies looking to recruit technical machine learning and data science engineers.
Via Kurhub.com, developers will soon be able to share models, data sets and weights to spur more innovation in the space. Deepgram eventually wants to release weights for the data-set being released today so DIY-ers can avoid processor intensive training altogether. Even with a relatively modest 10 hours of audio, models still take about a day to train on a GPU and considerably longer with an off-the-shelf computer.
If you end up exhausting the Deepgram data set, you can also easily expand it with your own data. All you have to do is create WAV files with embedded transcriptions in 10 second increments. You can feed data-hungry deep learning models with more resources in the public domain to improve accuracy.