The Blue Origin webcast has speed and altitude data vs. time, so I tried to download it and automatically parse the data with Tesseract.
1. Take screenshot of the webcast once per second.
2. Crop the time, altitude, and speed data. This is easy because the text regions stay in the same part of the frame for the entire webcast.
3. Convert the RGB images to black and white. Use the formula 0.30*R+0.59*G+0.11*B = a grayscale value between 0 and 255. Tune the cutoff to get a crisp black and white image, e.g. if the intensity is greater than 190, make the pixels pure black.
4. Save the resulting black and white image as .png.
5. Run Tesseract on each .png and save the Tesseract output to a .txt file.
It works okay (67% altitude correct and 94% velocity correct out 478 points), but struggles with 3 vs. 5 in the altitude font. I’ll try to train Tesseract on each font (altitude uses one and time/speed use another) and see if that works.