Vahe Hovhannisyan
All posts

Best LLM for Armenian OCR: a small investigation

4 min read

If you're extracting Armenian text from images with an LLM, use gemini-3-flash-preview with temperature: 0. Every other model I tested (claude-haiku-4-5, claude-sonnet-4-6, gpt-5-mini, gpt-5.4-mini) has a categorical weakness that makes it unusable for stylized fonts or specific glyph pairs. And it turns out gemini-3-flash-preview at its default temperature (1.0) silently garbles Armenian ~22% of the time on exactly the same image it handles perfectly at temperature 0.

The setup#

Our event pipeline ingests Instagram posts from Armenian event organizers. One post (a concert poster for "Hayk Petrosyan") got stored with a garbled media description — ՀԱՅԱ ՁԵՏՐՈՄՅԱՆ instead of ՀԱՅԿ ՊԵՏՐՈՍՅԱՆ — and I wondered if it was a one-off hallucination or a systematic problem. Answer: both. Case-insensitive whitespace-normalized token matching, scored against 6 tokens on the poster (date, stylized title, performer name, subtitle, tickets-for label, venue name).

Concert poster for Hayk Petrosyan — notice the stylized Armenian text (Մի ձայն) that every model I tested struggled with except for gemini. I was surprised gemini caught it — it would be hard to read this even for some humans.
Concert poster for Hayk Petrosyan — notice the stylized Armenian text (Մի ձայն) that every model I tested struggled with except for gemini. I was surprised gemini caught it — it would be hard to read this even for some humans.

The surprising finding: temperature fixes everything#

First observation: the same poster image produces different garbled versions across runs — ԱՂՐԷԱ, ԱՃՐԷԱ, ԱՄՐԲԴ instead of ԱՊՐԻԼ. Each failure is different, but not random — they're visually-similar-glyph confusions (Պ↔Ղ, Ի↔Է, Ս↔Մ, Տ↔Ճ), the same mistakes a person would make on stylized fonts.

The tell: every failure is different, but every success is identical and perfect. That's not how "the model can't read Armenian" looks — that would be the same wrong answer every time. It looked like sampling noise.

Which sent me back to a setting I'd honestly forgotten existed: temperature. gemini-3-flash-preview's default is 1.0 — high enough that on tokens where the model is uncertain (small stylized Armenian glyphs, in this case), it rolls the dice between visually-similar candidates instead of committing to its most likely read. Setting temperature: 0 collapses the decoder to its greedy answer.

50/50 perfect runs, 100% on all 6 tokens, no garble.

One line of config. Months of "LLMs are just flaky" chalked up to a default we never reviewed.

The other surprise: newer ≠ better#

Everything I'd read online said gpt-5.4-mini should blow gpt-5-mini out of the water — it's the newer model, and the benchmarks back that up. On this task, nope:

  • gpt-5-mini with reasoning: minimal — 3.4s latency, $0.81/1k calls, 90% critical pass
  • gpt-5.4-mini with reasoning: low — 8.8s latency, 5% critical pass

At low reasoning, gpt-5.4-mini "hedges" — it ignores the transcription instruction 70% of the time and returns just a visual description. You have to crank up to medium to force it to commit, which costs $17/1k (vs $5/1k for gpt-5-mini medium). The older, cheaper model is strictly better here.

Other notable findings#

  • Neither OpenAI mini can read stylized Armenian cursive — 0/40 runs combined got the Մի ձայն handwritten title. They nail block text but go blind on decorative fonts. gemini-3-flash-preview gets it 100% at temp=0.
  • claude-haiku-4-5 ignored the transcription task entirely — returned only visual descriptions ("a man with a guitar"), 0/10.
  • claude-sonnet-4-6 tries hard but has systematic glyph confusions that gemini-3-flash-preview doesn't share: ՀԱՅԿՀԱԿՈԲ (Hayk → Hakob), ՄՍ.
  • gemini-3-flash-preview with mediaResolution: high (1120 tokens/image vs default) did not help — slightly hurt accuracy at n=20.
  • gemini-3-flash-preview with thinking_level: HIGH also did not help meaningfully (76% all-6 vs 78% at LOW) and costs 2.85× more.

Full results#

ModelReasoningTempnDate
Ապրիլ
Title
Մի ձայն
Name
ՀԱՅԿ ՊԵՏՐՈՍՅԱՆ
Subtitle
հեղինակային երգերի երեկո
Tickets
Տոմսերի համար
Venue
Ակումբ
All 6LatencyCost/1k
gemini-3-flash-previewLOW050100%100%100%100%100%100%100%6.0s$1.05
gemini-3-flash-previewLOW15080%94%80%88%80%96%78%6.2s$1.23
gemini-3-flash-previewHIGH15090%90%90%76%90%90%76%9.1s$3.58
gemini-3-flash-previewLOW · hi-res12070%90%70%80%70%95%70%6.5s$1.33
gpt-5-minimedium10100%0%100%40%100%40%0%27.6s$4.96
gpt-5-miniminimal2090%0%100%10%85%15%0%3.4s$0.81
gpt-5.4-minimedium10100%0%100%70%100%50%0%28.6s$17.21
gpt-5.4-minilow2015%0%10%0%15%0%0%8.8s$4.44
gpt-5.4-mininone100%0%0%0%0%0%0%2.3s$1.74
claude-haiku-4-5100%0%0%0%0%0%0%1.4s$2.84
claude-sonnet-4-610100%0%0%90%80%50%0%8.9s$11.07