Free (for now). Good accuracy. Slick, modern apps. Cross-conversation speaker identification.
Some features are still rough. Lacks advanced account settings.
- Bottom Line
Otter offers a glimpse into the future of real-time transcriptions, but its underlying technology isn't quite there yet.
We all know that the recording and transcription process requires a lot of hard work. That's why many people turn to transcription services for their needs. Otter takes an innovative approach to the task, offering real-time transcripts of conversations and meetings as they occur. It also integrates other features such as cross-conversation speaker identification, good search tools, and excellent mobile apps, though some features are a bit rough around the edges. Although Otter's more accurate than most automatic services, it's still not a viable alternative to human-based services at this point. Until Otter has more time to improve its technologies, we recommend Editors' Choice Rev for your important transcript jobs. For impromptu meetings or personal notes though, Otter may work well for you.
Otter is free for now, though the company says it will charge a subscription fee in the future. It remains unknown when the pricing change will take effect and whether it will be a flat subscription fee or based on the number of minutes of audio you record. We hope that Otter continues to let you record an unlimited number of recordings and does not charge additional fees like some human-based services do. On a positive note, this subscription fee means that Otter will not put ads into its app or monetize your recordings in any way.
For comparison, Scribie's automatic transcription service is completely free, though it does not have a mobile app. Trint is the only service that currently offers subscription pricing. Its cheapest plan charges $15 per month for up to three hours' worth of uploaded audio. Temi, the other automatic service, costs $0.10 per minute.
Freelance-based services cost more, given the human capital they require and the improved rates of accuracy. Most charge on a per-minute basis usually in the $1 to $3 range. Rev, for example, charges a base fee of $1 per minute for each minute of audio you order. It, along with others, also bump up the price if you add extra options to the transcript job such as speaker identifications and timestamps or even if the audio recording is just of poor quality.
How Otter Works
At its core, Otter is an ambient transcription service that relies on automatic speech recognition (ASR) to process your recordings in real time. All you need to do is hit the record button, start speaking, and watch your words appear in the app. It adds in proper punctuation and separates individual speakers (with mixed results) and even tracks playback to the correct line. In our experience, it takes a couple of minutes for the transcript to show up in the app, but after that, it updates in near real-time. We test Otter's accuracy in a later section of the review.
Otter's ASR technology is similar to that of other services we reviewed, including Scribie, Temi, and Trint. The underlying technology is developed by a group called AI Sense; it notably integrates with the video-conferencing service, Zoom. In case you're curious, there's nothing special about the name, Otter. Yes, it sounds a bit like "utter." However, my contact offered a better explanation: otters are cute animals. In any case, I agree that otters are cool creatures and do appreciate the logo.
Truly Mobile Transcripts
Otter offers apps for both Android and iOS. Setup is easy; just download the app from the respective app store and create an account. We installed the app on both a Google Pixel running Android 8.1 and an iPhone 8 with iOS 11, though the majority of our review refers to our experience with the Android version.
The app uses a primarily white interface with the occasional blue accent for emphasis. It looks clean and modern, but I would appreciate a dark mode as well. Navigation is controlled by tapping one of the five icons in the bottom menu: Dashboard, Conversations, Record, Groups, and Settings. You can't swipe to navigate between screens, however, which is annoying. Dashboard shows all your latest account activity, along with associated quick actions for most items. The Conversations section lists all your recordings in reverse chronological order. You can search for specific terms within the transcripts or for the name of the conversation itself. An option to arrange recordings into folders would be helpful, which is something that Trint offers.
If you click on a particular conversation, you can view the transcript in its entirety along with some basic information up top about the recording date, time, and its total length. In the upper right corner, you can share the recording and set editing permissions. Alternatively, you can share the link Otter generates directly. To edit a particular part, simply hold down on it and hit the Edit button. You need to tap on each section individually to make edits; you can't, for example, scroll to another section and make changes in one smooth session. This is a bit of a nuisance given that Otter isn't very accurate with how it identifies new speakers or breaks in the conversation.
Otter also lets you edit the title of a transcript, but you can't change the date. This option would be useful if you uploaded a file on the web interface (more on this later) after the original recording took place. You can scrub through a recording via the playback controls at the bottom and it highlights the words as the audio plays. A top menu lets you control the playback speed (0.5x-2x) or delete the recording entirely.
The next tab over, Groups, allows you to organize contacts for easily sharing conversations. Otter's approach to sharing is one of the better implementations we've seen. Other services like Rev and Trint let you set up collaborators or teams, but neither let you do so seamlessly on mobile. The settings section is basic; there's a toggle to restrict audio uploading and streaming to a Wi-Fi connection, along with the standard feedback and sharing links. The only other option is to train Otter to recognize your voice. This process entails recording yourself reading back text so it can get a good grasp on your voice model. I wish there were additional security and privacy settings here.
Otter's iPhone app looks almost identical to its Android counterpart, albeit with a few minor improvements. For example, in the Setting s section of the app, it lets you add a profile photo and import contacts from your phone or Google account. Instead of including the search icon in the top corner of the Dashboard and Conversations section, it moves it down below the header. I hope that Otter makes a concerted effort to keep these two platforms consistent with each other going forward.
Otter's web interface features the same clean and modern design style as its mobile apps. It would benefit from a dark mode as well. Along the left rail, there are only a few menu items: Conversations, Groups, Feedback, and Log Out. There's notably no account section or a top-level search bar, which is disappointing. It's worth noting that the web interface changed several times throughout my testing, and I hope to see the search bar and help sections return soon. There are also performance issues. For example, when I tried to open an older transcript, Otter got stuck on the Loading Data screen. The same transcript worked fine on mobile.
The conversations section works the same way as it does on mobile; it lists all of your recordings in reverse chronological order. You can also share, export, or delete a transcript from this view. Selecting one opens up the playback controls and a copy of the transcript itself. The top section displays a play/pause and 15-second rewind/forward buttons. To edit the text, simply click on the pencil icon to the right of each section. Instead of letting you edit everything all at once, it requires you (as with the mobile app) to click into each individual section before you can make any changes. Previous iterations let you search for individual terms in a transcript, but that feature seems to have disappeared for now. Otter does generate a series of static keywords under the title, but you can't click on them to find instances within the transcript, for example. Until Otter finalizes all these features, I recommend sticking with its mobile apps.
You can also change the speaker IDs for individual blocks of text. Unfortunately, this isn't very accurate and it often mislabels paragraphs as either having too many or too few speakers. That said, it does have some clever features. For example, once you add an ID to a section, it automatically goes through the rest of the transcript and adds it to whatever paragraphs it detects as having the same speaker. It uses the same information to detect the same speaker in any other conversations or recordings. This only partially worked in testing, but it's a promising feature.
To test out the accuracy of the transcription services, I uploaded the same 16-minute recording to each one. The original recording of a three-person conference call came from an Olympus VN-722PC dedicated voice recorder. It's not an easy recording, but all the voices are clearly audible. Although this is not Otter's primary purpose, it's the best way to compare its ASR engine directly to other services.
Otter finished the transcript process in about six minutes. All of the automated transcription services completed the task in the range of three to four minutes. The quickest human-based transcription, Rev, only required around an hour for the same task.
Instead of comparing the entirety of each transcript, I chose three paragraphs, one from each speaker on the call. For each snippet of the transcription, I marked an error wherever there was a missing or an extra word. I calculated the overall error rate by dividing the total number of mistakes into the total number of words across the combined sections (in this case, 201 words). The sample for section A is a short introductory section. Section B is slightly longer and uses more complex vocabulary. Section C is even lengthier and contains some technical language.
Otter produced excellent results for an automatic service (it only had an error rate of 17 percent), but it still fell short of the human-based service I tested. For comparison, Rev only had an error rate of three percent and Scribie turned in a final copy with six percent. Take a look at the full chart below for the complete breakdown.
I retested all the automatic services, including Otter, with a simpler recording (2 people, in-person) and calculated the error rate, in the same manner, using two samples, instead of three. The automatic services fared better with this task as a whole, but they still weren't perfect. Otter actually fell to the middle of the pack with an error rate of 21 percent, though this was not too far off from Trint's 14 percent or Temi's 20 percent. The full results of the second test appear below.
Talk and Transcribe
Wherever and whenever you have an important conversation, you should have some way to record it and turn it into usable data. Otter's focus on real-time transcriptions and sharing make it an innovative option in the space. It's more consistently accurate than any automatic transcription service we tested, even if it can't compete with human-based services. After a few iterations, Otter could very well be a front-runner in the category, but for now, we recommend using Editors' Choice Rev for affordable and accurate transcripts.
About the Author
Ben Moore is a Junior Analyst for PCMag?s software team. He has previously written for Laptop Mag, Neowin.net, and Tom?s Guide on everything from hardware to business acquisitions across the tech industry. Ben holds a degree in New Media and Digital Design from Fordham University at Lincoln Center, where he served as the Editor-in-Chief of The Obse… See Full Bio