Using AI to streamline video localisation

The advent of generative AI tools such as ChatGPT has thrust AI into the spotlight in recent times, but AI has been utilised in the captioning and localisation industry for a number of years now. For instance, CaptionHub’s customers have been using its AI capabilities since 2015.

The various AI tools you have at your disposal can save you time, streamline your localisation efforts, and drive ROI, but how can we apply these to localising video content?

How do I localise my video?

Firstly let’s break down the key steps in video localisation, namely:

  • Transcribing, or turning the audio and moving image into a written text transcription, timed to the audio itself
  • Creating audio descriptions
  • Translating that original language transcription into additional languages, and deploying domain and context specific translation
  • Identifying on-screen text and ensuring translations are provided here, too
  • Producing human-readable captions, ensuring they are timed exactly to the source video
  • Providing voiceover in translated languages
  • Developing and translating associated video metadata

The process for a single video, can be time-consuming, particularly if there are a wide range of languages required. Now imagine that this process has to be repeated for hundreds or thousands of videos. These are some of the challenges that our customers encounter on a daily basis, and where AI tools not only help overcome them but save them huge sums of money and time in the process.

What AI tools are available for localisation and captioning?

Taking the steps above that we’ve outlined, let’s look at a typical workflow in CaptionHub and how we can speed this up with AI:


There are a number of AI-powered transcription platforms and tools that use speech recognition algorithms to turn the audio in a video file into written text. Here at CaptionHub, we’ve partnered with both Speechmatics and Amazon Transcribe, who are market leaders in the speech-to-text field. Custom Dictionaries allows a list of custom words to be added for each transcription job. This helps when a specific word is not recognised during transcription. It could be that it's not in the vocabulary for that language, for example a company or person's name. Adding custom words can improve the likelihood they will be output.


Captioning presents an additional challenge; as an example, some languages feature words that are much longer or utilise reading speeds that vary differently from the source material. With CaptionHub’s Natural Captions®️ technology, we instantly create perfectly aligned captions, potentially saving hours of linguists’ time.


Next is the process of translating the original language transcription into as many additional languages as required. Again, there are a number of AI-powered machine translation tools and platforms available, offering near-instant translations in hundreds of languages. CaptionHub offers neural machine translation service that delivers fast, high-quality, and affordable language translation deploying Amazon Translate, Google, Systran, and Language Weaver – as well as direct access to over 30 other Machine Translations through our connected TMS platforms such as Phrase.

On screen text identification

CaptionHub detects and extracts textual content from on-screen video text and parses this into formatted text. This text can then be translated using the AI translation tools in the previous step, and relaid back onto the video in the same position.


Voiceover is another area where AI can save both time and resources. Compared to the traditional process of booking out a recording studio and systematically recording each translated voiceover one by one, synthetic text-to-speech AI tools can complete this process within minutes, and at a fraction of the cost. Using advancements in Neural Text to Speech (NTTS) systems, synthetic voiceovers are now even more natural and human-like, further increasing the quality of output.

How much time and cost will AI tools save me?

The answer will rely entirely on your existing workflow but, as as an example, one of our customers recently utilised the CaptionHub platform - and a fully AI-enabled workflow - to transcribe, translate, caption, and publish over 2000 videos, totalling over 200,000 minutes of captioning, in under a single weekend. The original time estimate for this process was in the region of weeks, if not months, and with an associated cost that was prohibitive.

If you’re looking to see how AI can help you localise your content, and streamline and improve your time to market while saving significant costs, get in touch with CaptionHub today. As experts in the captioning and localisation field, and having been early adopters of AI from our inception, we’ll help you get set up and running in no time at all.