There’s a good reason why this blog post’s featured image says something very negative. I need it to be unquestionably unique because it’s a test.
Almost a year ago, I came across a Chrome extension called Project Naptha which allows you to highlight and extract text from an image. I was floored by the possibility. Then I saw that SnagIt from TechSmith – my go-to screen capture software – also was now able to grab text from a screenshot.
Surely if a Chrome extension and a screen capture program could read and extract text from an image, Google should be able to as well. Right?
History and Context
Knowing what has come before gives a clue on what to expect in the future. Below are some technologies that have been reasonably within the consumer’s hands in the last few decades.
Optical Character Recognition (OCR)
First, we already know there is some precedent for this. Optical character recognition (OCR) is a technology found in many consumer-level scanners and software. This allows you to scan a document and have the text extracted to a word processing program. However, the source needed to be printed text, and the fonts could not vary widely. Additionally, accuracy wasn’t high, requiring review and cleanup.
You might be familiar with Google’s reCAPTCHA tests to validate human input and deter SPAM and automated form submissions. Every time a user successfully passed validation, it helped Google’s own OCR software to become smarter. Having a human recognize words or phrases that the OCR software couldn’t, increased the accuracy of the software. This helped Google capture the entire New York Times archive.
More recently, Recaptcha now has us picking out buses, cars, chimneys, store fronts, bridges, etc. This version of Recaptcha is called No CAPTCHA, reCAPTCHA reduces the friction from used frustration in entering the wrong text. Instead, the user simply clicks on one or more images that match the requested image. Here, Google is now increasing it’s image recognition software, which increase it’s image search capabilities and Street View.
Mobile Check Cashing
Cashing your Grandma’s birthday check has never been easier thanks to mobile check cashing. Take a picture of the back and front of your check and boom, money in your account. This is just an instance of OCR benefiting consumers.
Image to Chart
Microsoft Excel’s mobile app now has the capability of capturing data from a picture of a chart and importing it to an spreadsheet. This feature is in beta, and the reviewed on Lifehacker states it’s not perfect.
So we know that there are precedents for such software, but can Google actually read, understand, and leverage text within images in it’s search results?
Google has stated that it’s bots and index cannot read images, but SEO Roundtable explains Google does have a patent that can read text in images. So they state they have the technology, just they aren’t using in a certain fashion.
Hold on there for a second.
Some quick Googling can provide some more insight. There are many articles explaining that evidence suggests that they can and do, and might have for some time:
Can I replicate these findings? I’m gonna try. That is how you discover new SEO research. Therefore, the blog post featured image in this post has something that I would not normally write about myself in order to have something unique.
To be clear, the text in the image will not appear anywhere else. This includes the filename, ALT text, embedded data, etc. It will only appear as text in the image. My WordPress installation, like many, will generate a few variant images for specific use-cases (.i.e. thumbnails, etc.) so there may be a few different versions that are found and indexed, but I’m uncertain which version will rank, although I believe that primary featured image will.
I’m going to give it about 3 months before I report back my findings. I feel that, since this website is still pretty new in the eyes of Google, I’m sure that the algorithm is still sorting out where this website belongs.