SubtitleNEXT allows you to convert your voice to text through dictation in real-time. This feature is equipped to recognize and transcribe all popular languages including English, Español, Français, Italiano, Português, and more. To enable the dictation feature, go to view menu Toolbars > Enable dictation.
In the toolbar on the top of the window, a mike will appear.
When you are ready to speak, click on the mike, and the configuration engine will start transcribing your speech to text in the chosen language. You can change the language or choose a different language from the properties. To do so, go to File > Properties.
A properties window will pop-out. Select your desired language from the dropdown. The configuration engine will use the selected language to understand and transcribe the dictation.
SubtitleNEXT allows you to correct the dictation simultaneously. You can add punctuations, change the spellings, and/ or add more text to the subtitles, as needed.
Dictation preview
You can also see the dictation preview in order to check the overall transcription of the video. To do so, go to View > Dictation preview.
A dictation preview window will pop-out. Here, the overall content of your video will be displayed.
Dictation Configuration
The configuration of the transcription happens in preferences. To open the preferences, go to Options > Preferences.
Preferences window will be displayed. Select ‘Dictation’ from the list at the left. Here, the main thing to consider is the ‘Active Provider’. You can select the respective provider for your dictation from here.
There are total 6 configuration options in the ‘Active Provider dropdown.
A. None or use external compatible dictation
This configuration means that you are not able to directly use the Mike within SubtitleNEXT for dictation, however, you need an external support from third-party programs to serve the purpose. One such third-party software that can be used as external support for dictation is ‘Dragon Naturally Speaking’ Software for which SubtitleNEXT provides special support. With the help of such software, you will be able to use dictation with an aim to add subtitles in your videos.
B. UDP Dictation
Further, if you want some third party application, or another instance of SubtitleNEXT to provide text or dictation directly to this instance of SubtitleNEXT, select ‘UDP Dictation’ option from the ‘Active Provider’ dropdown. This option allows you to use UDP network protocol to stream text for dictation. Any third party application that streams text through UDP can be your dictation source.
C. Microsoft SPA Dictation
Microsoft SPA Dictation is a cloud dictation service offered by Microsoft for public use. If you have privacy concerns, you can also buy a premium package from Microsoft, where you can buy respective storage on Microsoft Server, or you can request Microsoft to install the server at your company premises. To enable the Microsoft SPA Dictation in SubtitleNEXT, select the respective option from the Active Provider Dropdown in Preferences.
To configure the Microsoft SPA Dictation, the SubtitleNEXT will ask you for following details, which will be provided by either Microsoft/ or your own administrator depending on your selection of server package with Microsoft;
· API key, which is a code to configure SubtitleNEXT to use Microsoft for the transcription. To get the API key, login to https://portal.azure.com.
· Click on “Create a resource” and search for “Speech”
· Click on “Create” for Speech
· Enter the data for the fields as in the picture below
· The confirmation page will also display the Location and then click on “Go to resource”
· Click on “Keys and Endpoint” to select one of the API Keys that has to be entered in SubtitleNEXT configuration window.
· Paste the API key in the respective field.
· Enter the parameters for Region, End Point, or Host. You can select any one of the parameter and add the respective details. Region is basically where your server is located, in case you are using public server for dictation stream; End Point is the connection point of your server, in case you have subscribed premium server from Microsoft; Host is the location of your local server, in case you have your server at your company premises.
· Custom trained models – If you have custom models, you can add the respective detail in this option. Each custom model has its own ID. Note that Custom trained models are for single language only for which they are trained, so your dictation language must always match the model's language.
· Dictation mode – Enable the dictation mode, if you are giving the dictation or expecting a single person to dictate. On the other hand, if you want to live stream any event or show, disable the dictation mode.
· Accuracy – Set the accuracy of the text as per the buffer rate. If there is less delay in the speech, select the lower delay. On the other hand, increase accuracy means maximum delay, which will allow the best results provided by the engine.
Click OK. Now you can use this option for dictation to add subtitles in your video.
D. Google Speech to Text
Google speech to text allows you to accurately convert speech into text using an API powered by Google’s AI technologies. To enable the option, select the respective option from the Active Provider dropdown.
To configure the Google Speech to Text, the SubtitleNEXT will ask you for the account key. The Account Key is a text file whose content has to be copied into this field here. You will find a description how to obtain this Account key in order to configure the Google Cloud to be used with SubtitleNEXT further below.
· Create a project with the name for example SubtitleNEXT-Project as shown below
· Select the project SubtitleNEXT project as the current project
· Create Credentials for a new service account in the API Manager
· Enter your service account details
· when it is ready select it from the screen
· Create the JSON key
· After the previous step the JSON file with the credentials for the new service account will be downloaded to the computer. The JSON file shall be imported into SubtitleNEXT window:
· Further, set the accuracy for the text as per the buffer rate. If there is less delay in the speech, select the lower delay. On the other hand, increase accuracy means maximum delay, which will allow the best results provided by the engine.
· Select Automatic punctuation, depends on the particular language. Not all languages support this.
· Select Profanity filter to enable the option. Profanity filters are pretty straightforward. They work by using a set blacklist/whitelist to allow or deny certain words. They're great at finding your typical four-letter words, especially when they're spelled correctly.
· Select Multi channel recognition if the audio is coming from more than one source or channel. Often times, audio data include a channel for each speaker present on the recording. For audio of two people talking over the phone, as an example, the audio may contain two channels where each line is recorded separately.
· Click on Audio Input at the bottom right corner of the window. Audio-In window will pop-out. You can select your input source from the window. You can also set the number of channels to be used while dictation. This way, google will be able to recognize speech from separate channels.
Click OK. Now you can use Google Speech to Text for dictation to add subtitles in your video.
E. Scriptix Speech to Text
Scriptix speech to text allows you to accurately convert speech into text using an API powered by Scriptix. To enable the option, select the respective option from the Active Provider dropdown.
To configure SubtitleNEXT to use Scriptix cloud service as the transcription service, enter the API key for Scriptix Live (Realtime Transcription. The API key can be generated by https://api.zoommedia.ai/. Follow the steps to create a new account and acquire the API key.
· Enter your billing profile (in the “Profile & Settings”) and select your preferred subscription (“Subscription & Bundles”)
· Go to “API Tokens”
· Select “Realtime” token type:
· Copy the realtime API token from the API Tokens page into the SubtitleNEXT window.
· Further, set the accuracy for the text as per the buffer rate. If there is less delay in the speech, select the lower delay. On the other hand, increase accuracy means maximum delay, which will allow the best results provided by the engine.
· Click OK. Now you can use Scriptix Speech to Text for dictation to add subtitles in your video.
F. SAPI Dictation
SAPI Dictation allows you to accurately convert speech into text using inbuilt options in the Windows. You can configure your speech recognition options using your respective settings in the windows that you are using. To enable the option, select the respective option from the Active Provider dropdown.
By default, Windows Speech Recognition is installed. If need be, you can install it manually through your control panel. Go to Control Panel > Voice Recognition. Also, there are third party engines that can be installed to serve the purpose. Once, the Speech Recognition software is in place, set the accuracy for the text as per the buffer rate. You are all set to proceed.