Project ideas from Hacker News discussions.

How far back in time can you understand English?

Original Article

Hacker News Discussion

📝 Discussion Summary (Click to expand)

Four key take‑aways from the discussion

#	Theme	Representative quotes
1	Orthography is the biggest barrier – the shift from þ to th, the long s, the U/V swap, and other archaic glyphs make the text look “unreadable” even if the words are familiar.	“The text doesn’t use an `f`. If you copy from e.g. the 1700 passage you get `ſ` not `f`.” – rhdunn “The long s is really annoying … I had to think every time I saw it.” – BobAliceInATree
2	Vocabulary and semantic drift – many words keep their form but lose or change meaning, and new words appear that have no modern counterpart.	“The language crossed a boundary … the language crossed a boundary. Up to this point, comprehension felt like it was dropping gradually, but now it’s fallen off a cliff.” – dmurray “I could intuit the pronunciation but I didn’t make the connection from ‘wif’ to ‘woman’ … in hindsight I should have.” – antonvs
3	Pronunciation/accents and the Great Vowel Shift – how the spoken language diverges from the written form, and how modern accents can either help or hinder understanding of older speech.	“Accents have diverged a lot over time … American English (particularly the mid‑Atlantic seaboard variety) is closer to what Shakespeare and his cohort spoke.” – dhosek “I can drive a little over an hour from where I live and hardly understand the people working at the petrol station.” – JasonADrury
4	Cross‑lingual knowledge aids comprehension – familiarity with Germanic, Romance, or other related languages (Dutch, German, French, etc.) makes it easier to parse older English.	“Knowing a bit of German or Dutch helps as well.” – antonvs “I read everything truly ancient that I can get my hands on from any culture in any language (translated) and try and make sense of it.” – metalman

These four themes capture the main concerns and strategies that users shared when trying to read English texts from the 12th–17th centuries.

🚀 Project Ideas

Archaic Reader Extension

Summary

A browser extension that automatically replaces archaic characters (thorn, long s, w, y, etc.) with modern equivalents while providing instant pronunciation and dictionary tooltips.
Core value: eliminates the “letter‑recognition” barrier that stops most readers from engaging with medieval texts.

Details

Key	Value
Target Audience	Scholars, students, hobbyists reading medieval English
Core Feature	Real‑time character replacement + hover‑tooltip dictionary + optional phonetic transcription
Tech Stack	Chrome/Firefox extension API, WebAssembly for fast regex, SQLite for offline dictionary
Difficulty	Medium
Monetization	Revenue‑ready: $4.99/month for premium features (audio playback, offline mode)

Notes

HN commenters lament “I can’t read the long s” and “thorn is a nightmare” (e.g., “ſ” → “s”).
The extension directly addresses the frustration of “I can’t understand 1400” and “I need to replace the long‑s with the standard s.”
Discussion around regex one‑liners shows demand for a ready‑made solution.

Old‑English Immersion Platform

Summary

An interactive, gamified learning platform that guides users through texts from 1200‑1500, with audio, annotations, and progressive difficulty levels.
Core value: turns the tedious “guess‑and‑check” reading exercise into a structured learning path.

Details

Key	Value
Target Audience	ESL learners, literature students, curious readers
Core Feature	Adaptive reading modules, audio narration, word‑level glosses, quizzes
Tech Stack	React + Redux, Node.js backend, PostgreSQL, AWS Polly for TTS
Difficulty	High
Monetization	Revenue‑ready: $9.99/month for full library, free tier with limited content

Notes

Users repeatedly mention “I can read to 1400 but 1300 is hard” and “I need audio to understand pronunciation.”
The platform’s gamified quizzes echo the “immersive Ørberg method” praised in the discussion.
Potential for community‑generated content (e.g., user‑created glossaries).

Historical Text Converter API

Summary

A cloud API that accepts any historical English text and returns a modernized version with phonetic transcription, glosses, and optional audio.
Core value: gives developers a plug‑and‑play tool for building reading aids, educational apps, or research tools.

Details

Key	Value
Target Audience	Developers, publishers, educators
Core Feature	Text normalization, phonetic transcription, audio generation
Tech Stack	Python (NLTK, spaCy), FastAPI, Docker, Google Cloud Speech‑to‑Text
Difficulty	High
Monetization	Revenue‑ready: $0.01 per 1,000 words, tiered pricing

Notes

The discussion highlights the lack of “a service that can rewrite old texts through the years.”
The API can power the Archaic Reader extension or the Immersion Platform.
HN’s interest in “history‑llms” shows a market for AI‑augmented historical text processing.

Dialect Explorer Web App

Summary

A web app that lets users compare modern and historical dialects side‑by‑side, with audio samples and a visual timeline of linguistic changes.
Core value: satisfies curiosity about how accents evolve and why “I can’t understand a thick Spanish” or “I struggle with Indian English.”

Details

Key	Value
Target Audience	Linguists, language enthusiasts, ESL teachers
Core Feature	Interactive timeline, audio playback, word‑change visualizer
Tech Stack	Vue.js, D3.js, Node.js, MongoDB
Difficulty	Medium
Monetization	Hobby (free, open source)

Notes

Comments like “I can’t understand 1400” and “I need to know how to pronounce the unusual symbols” point to a demand for a visual, auditory tool.
The app can incorporate user‑submitted dialect recordings, fostering community engagement.

Legal Document Translator SaaS

Summary

A SaaS tool that translates historical legal documents (e.g., Magna Carta, 17th‑century contracts) into modern English with annotations and searchable glossaries.
Core value: removes the “I can’t read the 1400 legal text” pain point for lawyers and historians.

Details

Key	Value
Target Audience	Law firms, legal scholars, archivists
Core Feature	Document upload, OCR for old manuscripts, modern‑English output, annotation layer
Tech Stack	Python (Tesseract OCR), Flask, PostgreSQL, ElasticSearch
Difficulty	High
Monetization	Revenue‑ready: $499/month per firm, enterprise licensing

Notes

HN users mention “I can’t read 1400” and “I need to understand old legal texts.”
The tool addresses the practical need for accurate, annotated translations in legal contexts.
Potential integration with the Historical Text Converter API for bulk processing.