Image Captioning Model

Microsoft’s computer vision model will generate alt text for Reddit images

Two years ago, Microsoft announced Florence, an AI system that it pitched as a “complete rethinking” of modern computer vision models. Unlike most vision models at the time, Florence was both “unified ...

TechCrunch

Google Gemini’s AI image model gets a ‘bananas’ upgrade

Google is upgrading its Gemini chatbot with a new AI image model that gives users finer control over editing photos, a step meant to catch up with OpenAI’s popular image tools and draw users from ...

SiliconANGLE

OpenAI launches new GPT Image 1.5 model optimized for image editing

OpenAI Group PBC today launched GPT Image 1.5, a new artificial intelligence model optimized for image generation tasks. The algorithm is rolling out a few weeks after Google LLC introduced a new ...

Ars Technica

Microsoft unveils AI model that understands image content, solves visual puzzles

On Monday, researchers from Microsoft introduced Kosmos-1, a multimodal model that can reportedly analyze images for content, solve visual puzzles, perform visual text recognition, pass visual IQ ...

VentureBeat

Ideogram bolsters AI image generator with description-based referencing

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More A little over a month after launching its most capable text-to-image ...

SiliconANGLE

Google updates Gemini, adding powerful new AI image model with photo editing capabilities

Google LLC said today it’s updating its Gemini app and chatbot with a powerful new artificial intelligence image model that will give users fine-grained photo editing capabilities. The new model, ...

9to5Mac

New Apple model combines vision understanding and image generation with impressive results

In the study titled MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer, a team of nearly 30 Apple researchers details a novel unified approach that enables both ...

VentureBeat

Google's upgraded Nano Banana Pro AI image model hailed as 'absolutely bonkers' for ...

Infographics rendered without a single spelling error. Complex diagrams one-shotted from paragraph prompts. Logos restored from fragments. And visual outputs so sharp ...

9to5Mac

You can try Apple’s lightning-fast video captioning model right from your browser

A few months ago, Apple released FastVLM, a Visual Language Model (VLM) that offered near-instant high-resolution image processing. Now, you can take it for a spin, provided you have an Apple ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果