Introduction to using automated transcription software for qualitative research (part 1)

 

test test test

 

If you’ve conducted interviews and meetings as part of your research (PhD or otherwise), you’ll likely have experienced the arduous task of transcribing from hours of pre-recorded audio. There are a lot of reasons why you might want to transcribe audio recordings: to generate interview transcripts, produce written and annotated meeting notes, or to transcribe videos and other recordings to make them more accessible to users. 

This blog post is the first of a two-part series in using transcription apps for qualitative research, which aims to provide a brief introduction to these technologies, their potential uses and benefits (see part 2 for a tutorial using a practical example). I've also written a third post, which focuses on ethical/privacy/security aspects of auto-transcription software.

Unless you outsource this task to a costly service, researchers often spend hours typing up interview data. While manual transcription (and listening to/reading through your transcripts) is arguably important, not least for the accuracy of interpretation and getting a ‘feel’ for your interviews, it is very time-consuming. If you’ve got the time to explore and test out different options for making the transcription process more efficient, then it can be rewarding and free-up time for other tasks. At the end of  this blog post, I’ve put together a list of 10 useful features of one speech-to-text app called Otter.ai. Transcription apps like Otter.ai have huge potential to transform this essential research task, for example this blog post considers what the development of these technologies in general might mean for analysing and interpreting qualitative data.

There's no ‘right’ way to transcribe your data, and there are certainly a variety of approaches and tools you can use to make the process more efficient. In this blog post, I reflect on my personal experience of using (free) automated transcription applications to keep written records of interviews, meetings, and so forth, during my PhD research. While I focus on using free, automated transcription tools, there are a lot of great human and computer transcription services and applications out there. It’s worth having a look and weighing up the options yourself, to see what suits your needs and preferences.

What is automated speech-to-text transcription?

This is when a computer transcribes your interviews or meetings for you. Quite simply, all you need to do is have a clear audio recording with minimal background noise, and a state-of-the-art machine transcription service will convert your audio to text, almost instantly or in a matter of minutes.

Sound too good to be true? Speech recognition software has certainly come a long way in recent years, and now it is widely available at our fingertips. For example, smartphones and smart speakers easily recognise, process, and respond to our voice commands. We can also easily convert speech-to-text on most smartphones (however, these are often limited in terms of the length of time you can record for, and don’t have any of the additional in-built features as other apps). 

 

Photo by Nick Morrison on Unsplash

 

How does it work?

Sound too good to be true? Speech recognition software has certainly come a long way in recent years, and now it is widely available at our fingertips. For example, smartphones and smart speakers easily recognise, process, and respond to our voice commands. We can also easily convert speech-to-text on most smartphones (however, these are often limited in terms of the length of time you can record for, and don’t have any of the additional in-built features as other  

If you search the internet for ‘speech-to-text transcription tools’, you’ll likely see a lot of reference to artificial intelligence (AI) and machine learning, which are related to computer and data science. AI refers to intelligence that is demonstrated by machines, as opposed to humans or animals. Machine learning is the application of AI that enables computers to automatically learn and improve from experience, in a similar way that a human or animal might learn a new skill.

The field of interest here is called Natural Language Processing (NLP). NLP is a field of AI that gives computers the ability to read, understand, and derive meaning from human languages. It is through NLP that machines can process what humans are saying and make sense of the language in a way that is both meaningful and valuable to us. It is used for a variety of familiar applications, for example smart speakers and personal assistant applications (e.g. OK Google, Alexa, Siri, and Cortana), language translation (e.g. Google Translate), and spell checking in word documents (e.g. Microsoft Word, or in your emails).


Photo by Hitesh Choudhary on Unsplash

 

What are the best transcription tools?  

There are lots of helpful websites and articles which summarise the top free (and paid) transcription software in 2020. For example, this report and this article. It’s worth exploring the different options to find services and applications that best suit your research. 

During my PhD I’ve been using an application called Otter.ai (see this Forbes article for more information), which is a web and mobile application that provides speech-to-text transcription. It’s also free to use for all its basic functions (you can pay a subscription fee for more storage space for transcriptions and some extra features, which I haven’t tried yet). Otter was trained with machine learning on millions of hours of audio recording, so that it can automatically transform audio to text with a pretty high degree of accuracy.  Recently, Otter.ai has also launched a new feature in partnership with Zoom, which allows you to record meetings in via the popular conference, meeting, and webinar platform.

There’s no ‘perfect’ transcription application, but I’ve generally been quite impressed with Otter.ai - it certainly is useful and saves a lot of time. I’ve highlighted some of these useful features and provided a tutorial in part 2 of this blog post, alongside some reflections and important considerations when using these apps for research. I answer some FAQs and discuss some challenges regarding ethics, privacy/security, and safe storage here. The intention of these posts is to demonstrate some aspects of the utility of this sort of software for researchers engaged with qualitative research methods, using Otter.ai as an example.

What can I do with it?

Here are 10 things I like about Otter.ai:

 

Source: @CaitlinHafferty


 

I’m interested - can you provide any examples of how I can use it?

See part 2 of this blog post for annotated examples of some key features of Otter.ai, which have the potential to be useful for researchers to transcribe interviews, meeting notes, etc. I also highlight some things which require careful consideration when using the app, for example potential issues regarding accuracy, ethics, and privacy.


Links to some useful resources:  

Wordcloud blog title image created from the text in this article in R. R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.r-project.org/. The tm (vo.7-7; Feinerer & Hornik, 2019), readtext (v0.76; Benoit & Obeng, 2020), wordcloud2 (v0.2.2; Lang, 2020), RColorBrewer (v1.1-2; Neuwirth, 2014), wordcloud (v2.6; Fellows, 2018) packages were used.