Why would I, or anyone want this?
My wife, when she’s not wearing her prosthetic corneal lenses.
Multimodal models has a lot of potentials in terms of accessibility.
But fuck Microsoft. you’re not fooling anyone.
An image is worth a thousand words. How is reading a text describing what is on the screen going to be better than just looking at the screen yourself, something you’ll need to do to read the description anyway? Aside from accessibility for the blind, the practicality such a technology is questionable.
The motivation behind this is obviously to facilitate the collection and reporting user profiling data. Accessibility for the blind is only a side effect. Tech companies have been doing it with automated audio transcriptions for years already, now they’re after what you look at on your screen.
this is 100% right, you don’t need an AI to describe something you’re already looking at. This is an absurd feature (again aside from the accessibility portion but that’s not what this is).
A small conversation with Copilot makes my laptop choke on RAM, and that stuff is processed on cloud. No way I will allow Microsoft to run an AI locally.
Go and fuck yourselves.
Trust us bro.
Why.