Project ideas from Hacker News discussions.

How far back in time can you understand English?

📝 Discussion Summary (Click to expand)

Four key take‑aways from the discussion

# Theme Representative quotes
1 Orthography is the biggest barrier – the shift from þ to th, the long s, the U/V swap, and other archaic glyphs make the text look “unreadable” even if the words are familiar. “The text doesn’t use an f. If you copy from e.g. the 1700 passage you get ſ not f.” – rhdunn
“The long s is really annoying … I had to think every time I saw it.” – BobAliceInATree
2 Vocabulary and semantic drift – many words keep their form but lose or change meaning, and new words appear that have no modern counterpart. “The language crossed a boundary … the language crossed a boundary. Up to this point, comprehension felt like it was dropping gradually, but now it’s fallen off a cliff.” – dmurray
“I could intuit the pronunciation but I didn’t make the connection from ‘wif’ to ‘woman’ … in hindsight I should have.” – antonvs
3 Pronunciation/accents and the Great Vowel Shift – how the spoken language diverges from the written form, and how modern accents can either help or hinder understanding of older speech. “Accents have diverged a lot over time … American English (particularly the mid‑Atlantic seaboard variety) is closer to what Shakespeare and his cohort spoke.” – dhosek
“I can drive a little over an hour from where I live and hardly understand the people working at the petrol station.” – JasonADrury
4 Cross‑lingual knowledge aids comprehension – familiarity with Germanic, Romance, or other related languages (Dutch, German, French, etc.) makes it easier to parse older English. “Knowing a bit of German or Dutch helps as well.” – antonvs
“I read everything truly ancient that I can get my hands on from any culture in any language (translated) and try and make sense of it.” – metalman

These four themes capture the main concerns and strategies that users shared when trying to read English texts from the 12th–17th centuries.


🚀 Project Ideas

Archaic Reader Extension

Summary

  • A browser extension that automatically replaces archaic characters (thorn, long s, w, y, etc.) with modern equivalents while providing instant pronunciation and dictionary tooltips.
  • Core value: eliminates the “letter‑recognition” barrier that stops most readers from engaging with medieval texts.

Details

Key Value
Target Audience Scholars, students, hobbyists reading medieval English
Core Feature Real‑time character replacement + hover‑tooltip dictionary + optional phonetic transcription
Tech Stack Chrome/Firefox extension API, WebAssembly for fast regex, SQLite for offline dictionary
Difficulty Medium
Monetization Revenue‑ready: $4.99/month for premium features (audio playback, offline mode)

Notes

  • HN commenters lament “I can’t read the long s” and “thorn is a nightmare” (e.g., “ſ” → “s”).
  • The extension directly addresses the frustration of “I can’t understand 1400” and “I need to replace the long‑s with the standard s.”
  • Discussion around regex one‑liners shows demand for a ready‑made solution.

Old‑English Immersion Platform

Summary

  • An interactive, gamified learning platform that guides users through texts from 1200‑1500, with audio, annotations, and progressive difficulty levels.
  • Core value: turns the tedious “guess‑and‑check” reading exercise into a structured learning path.

Details

Key Value
Target Audience ESL learners, literature students, curious readers
Core Feature Adaptive reading modules, audio narration, word‑level glosses, quizzes
Tech Stack React + Redux, Node.js backend, PostgreSQL, AWS Polly for TTS
Difficulty High
Monetization Revenue‑ready: $9.99/month for full library, free tier with limited content

Notes

  • Users repeatedly mention “I can read to 1400 but 1300 is hard” and “I need audio to understand pronunciation.”
  • The platform’s gamified quizzes echo the “immersive Ørberg method” praised in the discussion.
  • Potential for community‑generated content (e.g., user‑created glossaries).

Historical Text Converter API

Summary

  • A cloud API that accepts any historical English text and returns a modernized version with phonetic transcription, glosses, and optional audio.
  • Core value: gives developers a plug‑and‑play tool for building reading aids, educational apps, or research tools.

Details

Key Value
Target Audience Developers, publishers, educators
Core Feature Text normalization, phonetic transcription, audio generation
Tech Stack Python (NLTK, spaCy), FastAPI, Docker, Google Cloud Speech‑to‑Text
Difficulty High
Monetization Revenue‑ready: $0.01 per 1,000 words, tiered pricing

Notes

  • The discussion highlights the lack of “a service that can rewrite old texts through the years.”
  • The API can power the Archaic Reader extension or the Immersion Platform.
  • HN’s interest in “history‑llms” shows a market for AI‑augmented historical text processing.

Dialect Explorer Web App

Summary

  • A web app that lets users compare modern and historical dialects side‑by‑side, with audio samples and a visual timeline of linguistic changes.
  • Core value: satisfies curiosity about how accents evolve and why “I can’t understand a thick Spanish” or “I struggle with Indian English.”

Details

Key Value
Target Audience Linguists, language enthusiasts, ESL teachers
Core Feature Interactive timeline, audio playback, word‑change visualizer
Tech Stack Vue.js, D3.js, Node.js, MongoDB
Difficulty Medium
Monetization Hobby (free, open source)

Notes

  • Comments like “I can’t understand 1400” and “I need to know how to pronounce the unusual symbols” point to a demand for a visual, auditory tool.
  • The app can incorporate user‑submitted dialect recordings, fostering community engagement.

Legal Document Translator SaaS

Summary

  • A SaaS tool that translates historical legal documents (e.g., Magna Carta, 17th‑century contracts) into modern English with annotations and searchable glossaries.
  • Core value: removes the “I can’t read the 1400 legal text” pain point for lawyers and historians.

Details

Key Value
Target Audience Law firms, legal scholars, archivists
Core Feature Document upload, OCR for old manuscripts, modern‑English output, annotation layer
Tech Stack Python (Tesseract OCR), Flask, PostgreSQL, ElasticSearch
Difficulty High
Monetization Revenue‑ready: $499/month per firm, enterprise licensing

Notes

  • HN users mention “I can’t read 1400” and “I need to understand old legal texts.”
  • The tool addresses the practical need for accurate, annotated translations in legal contexts.
  • Potential integration with the Historical Text Converter API for bulk processing.

Read Later