Project ideas from Hacker News discussions.

Recreating Epstein PDFs from raw encoded attachments

📝 Discussion Summary (Click to expand)

1. The technical grind of recovering the hidden PDFs
Many users discuss how to turn the base‑64 blobs into usable PDFs, debating brute‑force, OCR, and custom tooling.

“You should serially test if each edit decodes to a sane PDF structure…” – wahern
“I wonder if you could leverage some of the fuzzing frameworks tools like Jepsen rely on.” – pimlottc
“It decodes to binary pdf and there are only so many valid encodings.” – pyrolistical

2. The legal/ethical fallout of un‑redacted content
The conversation repeatedly highlights the presence of CSAM and other protected material in the DOJ releases, and the potential liability of anyone who downloads them.

“There’s more than enough credible reports of CSAM in the Epstein Files dump…” – mschuster91
“The DOJ releases contain CSAM, which may fall afoul of 18 U.S.C. 2252–2252A.” – ISL
“If you download a file that contains CSAM you could be prosecuted, even if you took the picture yourself.” – direwolf20

3. Criticism of the DOJ’s handling of the releases
Users accuse the administration of slow, sloppy, and unlawful disclosure, citing missed court orders and inadequate redactions.

“The US administration is, at present, regularly violating the law and ignoring court orders.” – ISL
“The Attorney General was to have produced the entirety of the Epstein files… She has not done so.” – ISL
“They illegally fired the IGs responsible for whistleblowers and fraud in every department.” – mikeyouse

These three threads—technical decoding, legal risk, and institutional criticism—dominate the discussion.


🚀 Project Ideas

Base64 PDF Resolver

Summary

  • A web‑app that takes a Base64 string with ambiguous 1/l characters and automatically brute‑forces to find a valid PDF by validating PDF structure and optional compression checks.
  • Provides a live preview, diff view, and manual edit interface for fine‑tuning ambiguous characters.

Details

Key Value
Target Audience Researchers, journalists, and hobbyists decoding leaked PDFs.
Core Feature Automated brute‑force decoding with PDF‑structure validation and interactive correction.
Tech Stack Node.js + Express, WebAssembly PDF parser (pdf.js), Web Workers, React UI.
Difficulty Medium
Monetization Hobby

Notes

  • “pimlottc: I wonder if you could leverage some of the fuzzing frameworks tools like Jepsen rely on.” – shows demand for automated fuzzing.
  • “wahern: It should be much easier than that.” – users want a simpler tool.
  • Practical utility: speeds up decoding of ambiguous PDFs, reduces manual effort, and can be used as a backend for other projects.

CrowdRedact

Summary

  • A gamified, crowdsourced platform where volunteers help decode ambiguous PDFs and verify redaction accuracy, with built‑in version control and reputation scoring.
  • Includes automated redaction pattern detection and suggestions for missing or over‑redacted content.

Details

Key Value
Target Audience Volunteer researchers, open‑source communities, law‑enforcement analysts.
Core Feature Collaborative PDF decoding, redaction verification, reputation system.
Tech Stack Django, PostgreSQL, Celery, Redis, Vue.js, Docker.
Difficulty High
Monetization Revenue‑ready: subscription for premium analytics and API access.

Notes

  • “altairprime: Non‑engineers are perfectly willing to volunteer their time to do drudgery.” – highlights willingness to crowdsource.
  • “fragmede: ...” – need for reliable data entry and verification.
  • Discussion potential: how to incentivize accurate redaction and handle sensitive content responsibly.

SafeDoc

Summary

  • An open‑source, XML‑based document format (“SafeDoc”) designed to replace PDF for government releases, with built‑in redaction, metadata stripping, and easy conversion from PDF.
  • Provides a CLI and web UI for converting, editing, and publishing SafeDoc files.

Details

Key Value
Target Audience Government agencies, NGOs, and any organization needing secure, transparent document releases.
Core Feature Simple, XML‑based format with mandatory redaction tags, metadata removal, and a converter from PDF.
Tech Stack Rust for core engine, Python for CLI, React for web UI, OpenXML libraries.
Difficulty High
Monetization Revenue‑ready: paid support contracts and consulting.

Notes

  • “legitster: ...” – frustration with PDF’s complexity and redaction issues.
  • “legitster: ...” – desire for a new format that governments can adopt without adoption hurdles.
  • Practical utility: reduces risk of accidental data leaks, simplifies auditing, and encourages transparency.

Read Later