• Most new users don't bother reading our rules. Here's the one that is ignored almost immediately upon signup: DO NOT ASK FOR FANEDIT LINKS PUBLICLY. First, read the FAQ. Seriously. What you want is there. You can also send a message to the editor. If that doesn't work THEN post in the Trade & Request forum. Anywhere else and it will be deleted and an infraction will be issued.
  • If this is your first time here please read our FAQ and Rules pages. They have some useful information that will get us all off on the right foot, especially our Own the Source rule. If you do not understand any of these rules send a private message to one of our staff for further details.
  • Please read our Rules & Guidelines

    Read BEFORE posting Trades & Request

Subtitles - Am I doing it right?

daedal

Well-known member
Faneditor
Messages
264
Reaction score
283
Trophy Points
73
I started working on subtitles for my latest project and realized it was going to be more work than previously anticipated.
I wonder if I'm using Subtitle Edit correctly because it's producing weird results. explanation:
Under 'OCR correction/Spell checking' I select English but it in the final texts, I see letters with accents ('öiö' instead of 'did' for example)
Other mistakes are frequent like 'g' becoming 'q'

My understanding of the craft right now is as follows:
  • Extract or demux PGS (*.sup file) from the source movie. (I'm using tsMuxer to get the sub from an mkv)
  • Convert the *.sup file to a *.srt file (since Premiere Pro -and others- doesn't like graphic subs)
    • I'm using Subtitle Edit OCR functionality
  • Import *.srt file into Premiere Pro (which slows it down considerably when trying to move subtitle clips around)
  • Realign the clips to match my new edit
  • Create new subtitles clip for additions in my edit
  • Export file
My biggest disappointment is I have to go through almost all the dialogues in the movie to make corrections. it is really time consuming (and boring). I wonder if there is a better way?
Since I started, I realized it was easier to go back to Subtitle Edit and make my corrections there instead of inside Premiere Pro, but it is still a tedious process.

Am I doing it right?
 
I use much the same method, doing all the work in Subtitle Edit. It certainly is tedious, but I see it as a necessary evil.
 
do you get a lot of mistakes like the ones I mentioned? g becoming q and so on..?
 
When opening the .sup file I use the Tesseract 3.02 model as the OCR method as all the other models do tend to mangle letters and words in the way you describe, but even it isn't perfect and rarely gets accented letters correct. On average I probably have to correct about 10% of the resultant .srt file.
 
Oh ok, thanks, I'm going to try this Tesseract model. for the accented letters, it's the other way around, it creates accents for an all English subtitle. Which is weird in my opinion. If it takes the time to ask me what language am I using, I expect it won't guess an accented letter when English is selected... But anyway, not arguing here, just happy you could suggest a possible solution. Thanks!
 
The Tesseract 3.02 model works much better than the default 'Binary image compare' one. I also downloaded the Tesseract 5.3.3 but haven't seen much difference between the two Tesseract.
Overall it's a great improvement for subtitles conversions thanks a lot!

On another note, I have also tried the Premiere Pro feature that automatically creates subtitles from the audio in the timeline. It is not doing well with my file. There is a lot of corrections to be made, even more than with my first attempt using Subtitle Edit. I'll have to try with other movies to see if there is a difference, but for now, I'm not impressed.
 
To correct the letters with accents on, you can just open it in notepad and click edit>replace to replace ö with d through your entire document.
Then after that, run a spellcheck in whatever program you have that supports it.
You might have instances where the wrong letter has produced an actual word that is incorrect, but I don't see what you can do about that..... unless you can ask subtitle edit to change what letters it is detecting with specific images. I haven't used it in a little while, but I do recall having a mode where it would advise me if it's prediction wasn't producing a known word and I could tell it what letters to recognise that shape as.

edit: yes here it is!!
I thought I remembered doing this.
image.png
 
Last edited:
To clarify, you'll needto play with the settings here. for example, the comparison image is wrong so I pressed delete and asked for it to add a better match. I also adjusted the number of pixels is space, and that creates problems when text is italic. I'm still trying to find a balance.
 
I see you are using the binary image compare method. Changing this method to Tesseract 3.02 solved the accented letters problem entirely.
As for replacing letters or words, no need to use notepad. Subtitle Edit has a fonction that can do it as well (Discovered after spending time going through all the menues late last night).
For my g's being q's, I did exactly that and corrected only a handful of words with the letter q instead of correcting every third word ... There are so many g's in English!
Thanks for your input. It is appreciated!
 
I see you are using the binary image compare method. Changing this method to Tesseract 3.02 solved the accented letters problem entirely.
As for replacing letters or words, no need to use notepad. Subtitle Edit has a fonction that can do it as well (Discovered after spending time going through all the menues late last night).
For my g's being q's, I did exactly that and corrected only a handful of words with the letter q instead of correcting every third word ... There are so many g's in English!
Thanks for your input. It is appreciated!
Thanks for that tip too!
I did just try with different compare methods myself, not sure which ones I had checked, it's fun to play with it though. I know I didn't try Tesseract, I'll try it next time :)
 
Back
Top Bottom