Question 1

What is AI screen vision in Questie?

Accepted Answer

AI screen vision in Questie is a feature that allows your companion to see and interpret what's on your monitor in real time using a Vision Language Model. The VLM periodically captures your screen, analyzes the visual content to understand the game state, and uses that understanding to generate contextually relevant voice responses. Your companion reacts to boss fights, inventory changes, cutscenes, and other on-screen events as they happen — without you needing to describe them.

Question 2

How does the Vision Language Model work?

Accepted Answer

A Vision Language Model combines computer vision (understanding images) with language generation (producing text). It takes screen captures, identifies objects, UI elements, and scene context within the image, and generates a semantic description of what's happening. That description gets added to the context fed to your AI companion, so their response is informed by visual reality and not just the conversation history. Questie's implementation is optimized for gaming content — UI elements, health bars, inventory screens, and common game environment types.

Question 3

Does screen vision affect game performance?

Accepted Answer

No. Screen vision runs as a background process that captures and analyzes screenshots — it doesn't hook into the game process, modify rendering, or affect your GPU's game workload. The capture frequency is designed to catch meaningful events without running constantly. The VLM analysis happens server-side, not on your local hardware. You should not see frame rate drops, stuttering, or input latency from screen vision being active.

Question 4

What games does screen vision work with?

Accepted Answer

Screen vision works with any game running in windowed or borderless-windowed mode on your desktop. It reads the visual output, not game-specific data, so there's no integration required with individual games. RPGs, survival games, strategy titles, shooters, MOBAs, story-driven games, and simulation games all work. Fully exclusive fullscreen mode may limit capture capability depending on your OS configuration — windowed fullscreen (borderless) is the recommended mode.

Question 5

Can I control when screen vision is active?

Accepted Answer

Yes. Screen vision is opt-in and controlled by a toggle during your session. You can turn it on at the start of a boss fight and off during loading screens or menus. When disabled, the screen capture process stops and your companion switches to voice-only mode based on conversation context. The toggle is immediate — no cooldown or restart required.

Question 6

Does Questie store or record my screen captures?

Accepted Answer

No screen captures are stored beyond the current session's active context window. The VLM processes each capture to extract semantic information and then discards the raw image. Questie does not archive, log, or retain screenshots from your sessions. The analysis is used to generate your companion's contextual responses in the moment and nothing further.

Question 7

Is screen vision better than describing my gameplay to the AI?

Accepted Answer

Significantly. Describing gameplay adds latency, interrupts your focus, and is inherently incomplete — you're narrating what happened, not what's happening right now. Screen vision eliminates that delay. Your companion reacts to the moment as it occurs rather than processing your retrospective description. The responses are also more accurate because there's no information lost in translation between what you saw and what you chose to type or say about it.

Question 8

How does screen vision work for streaming on Twitch?

Accepted Answer

For streamers, screen vision means your AI co-host reacts to the same gameplay your audience is watching — at the same time. The reactions are genuine and timely rather than prompted. Boss kills, clutch moments, funny failures, and plot twists get real-time vocal reactions from your companion as they happen. This creates organic highlight moments that feel like co-op streaming rather than a streamer talking at a chatbot. The audio routes through OBS alongside your microphone for clean broadcast integration.

AI Screen Vision: Your Companion Watches Your Gameplay and Reacts in Real Time

What Is AI Screen Vision?

How Screen Vision Works: Four Steps

The VLM Captures Your Screen

The Model Understands What It Sees

Your Companion Reacts in Real Time

Privacy Toggle — Your Control

What Your Companion Actually Reacts To

Boss Fights and High-Tension Moments

Inventory, Economy, and Resource Management

Clutch Plays and Highlight Moments

Story Moments and Narrative Beats

A Real Session Example

Screen Vision Beyond Games

Watching Movies and Shows Together

Creative Work and Productivity

Streaming Reaction Content

Why Questie Is the Only AI Companion with Real Gaming Screen Vision

Character AI Cannot See Your Screen

Generic AI Assistants See Text, Not Gameplay

Screen Vision plus Memory Creates Actual Game Awareness

AI Screen Vision: Common Questions