I made an app that can record audio and write down what people are saying near your microphone. I made it initially as an audio lifelogging app on Android, where the idea was to transcribe nearby audio for a full day and search through it (possibly much later) on a timeline. A memory bank of sorts. I’ve made it cross platform and steadily added features as I’ve needed them or people have requested them.
Its powered by OpenAI’s largest whisper model, and some other preprocessing models. Its works well for lots of scenarios; in-person meetings, zoom calls, podcasts, lectures, are a few I’ve used it for. The app settings expose the translation-to-English feature of whisper, as well as multilingual settings. There are data export features for audio files, text, and CSVs of speech and timestamps.
It has some problems with hallucinations. Silence can sometimes be interpreted as meaningless things like “Thanks for watching”, so I added a bug reporting feature to help train some better filtering models.