ChatGPT / 4 min

ChatGPT image input prompts: guide screenshots, documents, and UI reviews with less guesswork

A practical ChatGPT image input prompt guide for screenshots, document OCR, and UI reviews. It shows how to specify the focus area, output format, and no-guess rules before you ask.

Watch on YouTube

BananaNL in 1 minute

A quick look at how selected prompts move into NotebookLM and AI Chat input fields.

Watch on YouTube

Uploading an image is not the same as asking the right question

When you attach an image or screenshot to ChatGPT, you may get a broad description back instead of the answer you actually need. If the goal is error diagnosis, text extraction, or UI review, the prompt has to say that up front.

OpenAI's ChatGPT Image Inputs FAQ explains that image inputs are useful for analyzing photos, documents, and visual content, but also notes limitations around ambiguous images, very small text, non-Latin scripts, rotated images, and some graph styles. That is why it helps to specify the focus area, output format, and what the model should not guess.

Once an image input prompt pattern works, it is easier to save it by use case than to search for it again every time.

Abstract image of a ChatGPT image input prompt split into focus area and output format

What to decide first in an image input prompt

State the goal in one line. Image description, OCR, UI review, competitor analysis, and error triage need different depth and structure.
Name the area to inspect. Tell ChatGPT whether to read the whole screen, the top-right error message, or a specific table section.
Fix the output format. Ask for bullets, a table, three key points, suggested fixes, or a verification checklist.
Add no-guess rules. If text is unreadable or a cause cannot be confirmed from the image, tell it to mark the item for verification instead of filling gaps.
Improve the image when needed. Cropping, enlarging small text, or adding markup to the image often makes the answer more reliable.

Four elements to include first

Image or screenshotFocus areaOutput formatNeeds verification

Where this pattern works well

Screenshot checks	When you want to separate what is visible on an error or settings screen from what still needs to be checked.
Document OCR	When you want to extract text from paper notes, whiteboards, or PDF screenshots and then organize it.
UI review	When you want feedback on CTA visibility, spacing, reading order, and friction from a page screenshot.
Upload failure triage	When image upload or analysis is failing and you want a safe checklist for format, size, permissions, or browser issues.

ChatGPT image input prompts to try

Triage an error screenshot

Look at this screenshot and separate the visible error message, likely cause candidates, and next things to check. If any text is unreadable or depends on information outside the screenshot, do not guess. Mark it as needs verification. Use the headings 'Visible facts', 'Possible causes', and 'Next checks'.

It separates visible facts from inference, which reduces overconfident answers.

Turn image text into a table

Transcribe the text visible in this image as faithfully as possible. Then organize it into a table with the columns 'Item', 'Content', and 'Hard to read'. Do not fill blanks from context. Keep unreadable parts marked as unclear.

It preserves OCR output and uncertainty separately so a person can verify the hard parts later.

Prioritize UI improvements from a screenshot

Review this webpage screenshot from a UI/UX perspective. Look at CTA discoverability, information order, spacing, and hard-to-read sections. List the top five improvements in priority order. Do not guess hidden flows or performance metrics that are not visible in the screenshot.

It fixes both the review lens and the scope, so the output is easier to turn into concrete changes.

Save recurring image input patterns in BananaNL

Image input prompts tend to diverge by use case: one for screenshots, one for OCR, one for UI review, one for competitor analysis. Saving those patterns is faster and more stable than rewriting them from memory.

BananaNL is a Chrome extension that inserts selected prompts into AI chat inputs such as ChatGPT, Gemini, and Grok. It does not auto-send, so you can attach the image, review the wording, and then decide whether to submit. NotebookLM use starts free, while AI Chat integrations are paid features.

See BananaNL Chrome extension See related style

Abstract image of reusable image input prompts inserted from BananaNL

FAQ

What should I write first in a ChatGPT image input prompt?

Start with the goal and the focus area. Tell ChatGPT why you are showing the image and what part matters most.

Can ChatGPT reliably read tiny text or Japanese text in images?

Not always. Small text and non-Latin scripts can reduce accuracy. Cropping, enlarging, and explicitly asking it not to guess unreadable text is safer.

What should I check first if image upload fails?

Check file format and size first, then try a new chat, private browsing or extension isolation, photo permissions on mobile, and network or VPN issues.

If searching for prompts is the hard part, use BananaNL

Prompts become useful when they are close to the input field. Use BananaNL to carry them there, then adjust before sending.

See BananaNL Chrome extension