Project ideas from Hacker News discussions.

Google unkills JPEG XL?

πŸ“ Discussion Summary (Click to expand)

The discussion revolves around technology choices, standards, and industry dynamics, primarily triggered by an initial brief comment about ebook formats.

Here are the three most prevalent themes:

1. PDF vs. Reflowable Formats (EPUB) for Ebooks

A significant portion of the early discussion centered on the merits and drawbacks of fixed-layout PDF versus adaptable EPUB for reading on different devices.

  • Supporting Quote: The core contention is captured by:
    • PaulHoule stating: "I liked the _idea_ of EPUB but when I recently installed an EPUB reader to save some files I was shocked at how awful it looked whereas for 15 years I've been reading PDF files on tablets with relish."
    • Countered by mubou2: "The whole point of PDF is to preserve a page layout as authored. EPUB is meant to adapt to your device."

2. Concerns Over Google's Dominance in Web Standards

A large segment of the thread shifts to a broader meta-discussion about browser engine implementation power, specifically focusing on decisions made by Google regarding formats like JPEG XL and XSLT. Users expressed concern that Google wields monopolistic power over web standards due to Chrome's market share. This is often framed as a concern for platform diversity and the power imbalance between browser implementers.

  • Supporting Quote: A user demanding regulatory oversight notes: "As a monopoly, Google should be barred from having standards positions and be legally required to build and support the web standards as determined by other parties."
  • Supporting Quote: Another user sums up the perception of market power: "IE lost the lead to Firefox when IE basically just stopped development and stagnated. Firefox lost to Chrome when Firefox became too bloated and slow. Firefox simply will not win back that market until either Chrome screws up majorly or Firefox delivers some significant value that Google cannot immediately copy."

3. Security Implications of C/C++ Codebases in Browser Vendors

When discussing why browser makers hesitate to adopt new, complex formats like JPEG XL, the low-level implementation language (C++) and associated security surface area emerged as a critical, practical barrier cited by Google and Mozilla.

  • Supporting Quote: A user suggests this is the crux of the support issue: "I think both Mozilla and Google are OK with this - if it is written in Rust in order to avoid that situation."
  • Supporting Quote: This concern is echoed later: "At this point, in 2025, any substantial (non-degenerative) image processing written in C++ is a security issue waiting to happen. That's not specific to JPEG XL."

πŸš€ Project Ideas

PDF Annotation Interoperability Service (PAIS)

Summary

  • A cloud service that standardizes, aggregates, and syncs PDF annotations (notes, highlights) across different, proprietary reader applications.
  • Solves the pain point of losing or silo-ing PDF annotations when switching between readers, addressing the preference for PDF annotation embedded data mentioned by users.

Details

Key Value
Target Audience Power readers of technical books/academic materials who rely heavily on PDF annotations (e.g., developers, researchers, students).
Core Feature A REST API that accepts annotations from various client apps (custom PDF readers, document managers) and stores them against a file hash, returning standardized/merged annotations upon request. It prioritizes embedding metadata within its own system rather than exclusively relying on file modification.
Tech Stack Python/FastAPI or Go (for high performance API), PostgreSQL or an ACID-compliant document store, simple client SDKs/plugins for popular readers (e.g., browser extensions, Calibre plugins).
Difficulty Medium
Monetization Hobby

Notes

  • Why HN commenters would love it (quote users if possible): Addresses the strong preference some users have for PDF's native annotation structure: "One thing I like about PDF is the annotations (notes & highlights) are embedded in the PDF itself" (leosanchez). This service respects the PDF for reading while solving the interoperability headache.
  • Potential for discussion or practical utility: Could spark debate on whether embedding metadata in the file or in a centralized service is superior, aligning with the discussion on EPUB vs. PDF annotation storage policies.

Device-Agnostic Ebook Reflow Utility (DAER)

Summary

  • A client-side utility/service that converts fixed-layout ebooks (like aesthetically pleasing PDFs or badly formed EPUBs) into a high-quality, truly adaptive reflowable format (like high-quality EPUB3 or HTML5).
  • Solves the major complaint about PDF's inflexibility on mobile/small screens while acknowledging the desires for good layout ("For novels I want and prefer epubs, but also non-novels if they were released in the last 5 years or so. PDF isn't magic..." - NoMoreNicksLeft).

Details

Key Value
Target Audience Users frustrated by PDF readability on phones/small screens, but who require the preserved structure of technical/academic content often locked in PDF format.
Core Feature Intelligent document parsing (using OCR/layout analysis or leveraging embedded XFA/structural hints) to generate a modern, responsive HTML/EPUB output that maintains figure fidelity while optimizing text flow, superior to basic PDF reflow modes.
Tech Stack Python (for layout analysis/pre-processing), Libraries like pdf2htmlEX for base data extraction, client-side rendering built on modern Web Components/React for presentation customization (dark mode, font scaling).
Difficulty High
Monetization Hobby

Notes

  • Why HN commenters would love it (quote users if possible): Directly attacks the core incompatibility noted: "...PDF doesn’t layout well on mobile, and very limited customization (like dark mode, changing text size, etc)." (majora2007). This product offers the customization EPUB provides, but applied to content trapped in PDF.
  • Potential for discussion or practical utility: High utility for anyone reading technical manuals or textbooks on non-tablet devices. Discussion would focus on the difficulty of automated layout reconstruction versus authoring quality.

Metadata Management and Sourcing Tool (Medusa)

Summary

  • An open-source desktop or self-hosted application focused solely on aggregating, validating, and correcting rich metadata (especially Dublin Core) for large ebook collections (PDF/EPUB), filling the observed tooling gap.
  • Solves the frustration that while metadata can exist in PDFs, the tooling to manage it externally outside of proprietary ecosystems (like Calibre) is missing.

Details

Key Value
Target Audience Advanced ebook collectors, archival projects, and self-hosters (like Kavita users) who need robust, cross-format metadata handling.
Core Feature Batch metadata extraction/injection for both PDF (leveraging embedded Acrobat/DC fields) and EPUB, with an integrated sourcing mechanism leveraging public domain/ISBN databases to enrich incomplete records.
Tech Stack Electron/Tauri (for cross-platform desktop app), Python libraries for file inspection/metadata writing (e.g., pypdf, dedicated EPUB editors), SQLite for local database management.
Difficulty Medium
Monetization Hobby

Notes

  • Why HN commenters would love it (quote users if possible): Directly responds to the gap identified: "Feels like a very big gap in the OSS world then. The PDF spec supports multiple standards for metadata..." (swiftcoder), especially relevant for tools like Kavita: "for cases like Kavita, storing in the file would be problematic if multiple users want their own annotations..." (majora2007) (Medusa focuses on file metadata, not user annotations, but serves the larger metadata collection need).
  • Potential for discussion or practical utility: Strong potential for community contribution given the OSS focus. Discussion would revolve around best practices for embedding metadata standards (Dublin Core vs. internal database schemas) for lossless collecting.