If you're extracting Armenian text from images with an LLM, use
gemini-3-flash-previewwithtemperature: 0. Every other model I tested (claude-haiku-4-5,claude-sonnet-4-6,gpt-5-mini,gpt-5.4-mini) has a categorical weakness that makes it unusable for stylized fonts or specific glyph pairs. And it turns outgemini-3-flash-previewat its default temperature (1.0) silently garbles Armenian ~22% of the time on exactly the same image it handles perfectly at temperature 0.
The setup
Our event pipeline ingests Instagram posts from Armenian event organizers. One post (a concert poster for "Hayk Petrosyan") got stored with a garbled media description — ՀԱՅԱ ՁԵՏՐՈՄՅԱՆ instead of ՀԱՅԿ ՊԵՏՐՈՍՅԱՆ — and I wondered if it was a one-off hallucination or a systematic problem. Answer: both. Case-insensitive whitespace-normalized token matching, scored against 6 tokens on the poster (date, stylized title, performer name, subtitle, tickets-for label, venue name).

The surprising finding: temperature fixes everything
First observation: the same poster image produces different garbled versions across runs — ԱՂՐԷԱ, ԱՃՐԷԱ, ԱՄՐԲԴ instead of ԱՊՐԻԼ. Each failure is different, but not random — they're visually-similar-glyph confusions (Պ↔Ղ, Ի↔Է, Ս↔Մ, Տ↔Ճ), the same mistakes a person would make on stylized fonts.
The tell: every failure is different, but every success is identical and perfect. That's not how "the model can't read Armenian" looks — that would be the same wrong answer every time. It looked like sampling noise.
Which sent me back to a setting I'd honestly forgotten existed: temperature. gemini-3-flash-preview's default is 1.0 — high enough that on tokens where the model is uncertain (small stylized Armenian glyphs, in this case), it rolls the dice between visually-similar candidates instead of committing to its most likely read. Setting temperature: 0 collapses the decoder to its greedy answer.
50/50 perfect runs, 100% on all 6 tokens, no garble.
One line of config. Months of "LLMs are just flaky" chalked up to a default we never reviewed.
The other surprise: newer ≠ better
Everything I'd read online said gpt-5.4-mini should blow gpt-5-mini out of the water — it's the newer model, and the benchmarks back that up. On this task, nope:
gpt-5-miniwithreasoning: minimal— 3.4s latency, $0.81/1k calls, 90% critical passgpt-5.4-miniwithreasoning: low— 8.8s latency, 5% critical pass
At low reasoning, gpt-5.4-mini "hedges" — it ignores the transcription instruction 70% of the time and returns just a visual description. You have to crank up to medium to force it to commit, which costs $17/1k (vs $5/1k for gpt-5-mini medium). The older, cheaper model is strictly better here.
Other notable findings
- Neither OpenAI mini can read stylized Armenian cursive — 0/40 runs combined got the
Մի ձայնhandwritten title. They nail block text but go blind on decorative fonts.gemini-3-flash-previewgets it 100% at temp=0. claude-haiku-4-5ignored the transcription task entirely — returned only visual descriptions ("a man with a guitar"), 0/10.claude-sonnet-4-6tries hard but has systematic glyph confusions thatgemini-3-flash-previewdoesn't share:ՀԱՅԿ→ՀԱԿՈԲ(Hayk → Hakob),Մ→Ս.gemini-3-flash-previewwithmediaResolution: high(1120 tokens/image vs default) did not help — slightly hurt accuracy at n=20.gemini-3-flash-previewwiththinking_level: HIGHalso did not help meaningfully (76% all-6 vs 78% at LOW) and costs 2.85× more.
Full results
| Model | Reasoning | Temp | n | Date Ապրիլ | Title Մի ձայն | Name ՀԱՅԿ ՊԵՏՐՈՍՅԱՆ | Subtitle հեղինակային երգերի երեկո | Tickets Տոմսերի համար | Venue Ակումբ | All 6 | Latency | Cost/1k |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
gemini-3-flash-preview | LOW | 0 | 50 | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 6.0s | $1.05 |
gemini-3-flash-preview | LOW | 1 | 50 | 80% | 94% | 80% | 88% | 80% | 96% | 78% | 6.2s | $1.23 |
gemini-3-flash-preview | HIGH | 1 | 50 | 90% | 90% | 90% | 76% | 90% | 90% | 76% | 9.1s | $3.58 |
gemini-3-flash-preview | LOW · hi-res | 1 | 20 | 70% | 90% | 70% | 80% | 70% | 95% | 70% | 6.5s | $1.33 |
gpt-5-mini | medium | — | 10 | 100% | 0% | 100% | 40% | 100% | 40% | 0% | 27.6s | $4.96 |
gpt-5-mini | minimal | — | 20 | 90% | 0% | 100% | 10% | 85% | 15% | 0% | 3.4s | $0.81 |
gpt-5.4-mini | medium | — | 10 | 100% | 0% | 100% | 70% | 100% | 50% | 0% | 28.6s | $17.21 |
gpt-5.4-mini | low | — | 20 | 15% | 0% | 10% | 0% | 15% | 0% | 0% | 8.8s | $4.44 |
gpt-5.4-mini | none | — | 10 | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 2.3s | $1.74 |
claude-haiku-4-5 | — | — | 10 | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 1.4s | $2.84 |
claude-sonnet-4-6 | — | — | 10 | 100% | 0% | 0% | 90% | 80% | 50% | 0% | 8.9s | $11.07 |