This week Microsoft introduced Seeing AI, a research project that uses a smartphone or Pivothead smart glass app to describe what’s happening in the world to help a blind person read signs or menus, identify the emotional state of people in a room, and more.
While Seeing AI isn’t yet available to the public, Microsoft has released a tool that uses some of the same technology. CaptionBot is a service that attempts to determine the contents of a photo and create text-based descriptions.

CaptionBot is still very much a work in progress, but it uses Microsoft’s Computer Vision, Emotion, and Bing Image APIs to determine what’s going on in a photograph and describe it in natural language.
It also uses machine learning, which means that the more photos it analyzes, the better it should get, particularly if people provide ratings for its captions (and don’t abuse the system).
Over the past few years, we’ve seen Microsoft introduce similar tools that attempt to guess your age from photographs, decide if you look like a celebrity, and more. While these things might seem gimmicky on their own, they’re helping Microsoft’s software get better at identifying and describing visual imagery which can have applications from improving Bing’s image search tools to improving the Seeing AI technology to help blind people navigate the world in new ways.
via Microsoft Blogs