Optimizing Top K in Postgres

📝 Discussion Summary (Click to expand)

Three prevailing themes in the discussion

Theme	Key points	Representative quotes
1. Index design & query‑plan correctness	Users debated whether a composite B‑tree index can satisfy a `WHERE severity < 3 ORDER BY timestamp DESC LIMIT 10` query, and how Postgres actually walks the tree. The discussion touched on low‑cardinality enums, skip‑scan optimisations in SQLite, and the existence of the `btree_gin` extension.	• “Postgres with this index can walk to a part of the tree with severity < 3, but timestamps are sorted only for the same severity.” – jmgimeno • “If severity is a low cardinality enum, it still seems acceptable.” – dragon96 • “The SQLite documentation explains how (and how well) this works: https://www.sqlite.org/optoverview.html#the_skip_scan_optimization.” – Cervisia • “Just in case, there is a btree_gin extension which can be used in queries combining gin‑indexable column and btree‑indexable column.” – Vadim_samokhin
2. Row‑based storage limits for filtering	The row‑based format of many databases makes arbitrary column filtering expensive. Users suggested columnar solutions or external warehouses for efficient filtering.	“The issue here is the row based format. You simply can’t filter on arbitrary columns with that. Either use an external warehouse or a columnar plug‑in like Timescale.” – tacone
3. Lucene vs SQL for full‑text and top‑K queries	Lucene is praised for its maturity and scalability for top‑K problems, while SQL FTS is seen as convenient but less powerful. The consensus is that the two systems serve different purposes and are best used together via document IDs.	“Lucene really does feel like magic sometimes. It was designed expressly to solve the top K problem at hyper‑scale. It’s incredibly mature technology.” – bob1029 “Every time I’ve used Lucene I have combined it with a SQL provider. It’s not necessarily about one or the other.” – bob1029

These themes capture the core technical concerns—index strategy, storage format, and the choice between specialized search engines and relational databases—expressed by the participants.

🚀 Project Ideas

TopK Optimizer

Summary

Adds skip‑scan support for inequality predicates on composite B‑tree indexes in PostgreSQL, enabling efficient top‑K queries.
Provides automatic planner hints and index‑usage recommendations to eliminate manual tuning.

Details

Key	Value
Target Audience	PostgreSQL developers, DBAs, data engineers
Core Feature	Skip‑scan extension for composite indexes, planner hint injection, automated index recommendation
Tech Stack	C extension for PostgreSQL, PL/pgSQL wrappers, optional Rust for safety
Difficulty	Medium
Monetization	Hobby

Notes

Why HN commenters would love it: jmgimeno’s frustration about CREATE INDEX ON benchmark_logs (severity, timestamp) not being used; tacone’s point that row‑based format hinders top‑K; hrmtst93837’s note that Postgres handles top‑K well if indexes are right.
Potential for discussion: How to balance planner hints vs. automatic optimization; compatibility with Postgres 18+ skip‑scan limitations.

Columnar Switch

Summary

A lightweight tool that automatically converts selected PostgreSQL tables to a columnar format (cstore_fdw, Timescale, or native columnar) for analytics workloads.
Minimizes operational overhead by providing a simple CLI and web UI to toggle between row and columnar storage.

Details

Key	Value
Target Audience	PostgreSQL users needing analytics without external warehouses
Core Feature	Automatic table conversion, data migration, query routing, minimal downtime
Tech Stack	Go or Python CLI, Docker, cstore_fdw/Timescale, PostgreSQL hooks
Difficulty	Medium
Monetization	Hobby

Notes

Why HN commenters would love it: tacone’s call for an external warehouse or columnar plug‑in; hrmtst93837’s caution about operational overhead; the need for a “plugin instead of having N databases”.
Potential for discussion: trade‑offs between row vs. columnar, impact on transactional workloads, integration with existing ORMs.

LuceneSQL Bridge

Summary

A unified API service that maps Lucene document IDs to PostgreSQL rows, allowing developers to perform full‑text search and analytics through a single query interface.
Handles ID mapping, schema synchronization, and provides a lightweight REST/GraphQL endpoint.

Details

Key	Value
Target Audience	Full‑text search engineers, data scientists, backend developers
Core Feature	ID mapping layer, combined search/analytics queries, schema sync
Tech Stack	Node.js/Express, Elasticsearch/Lucene, PostgreSQL, Docker
Difficulty	Medium
Monetization	Hobby

Notes

Why HN commenters would love it: bob1029’s praise for Lucene’s scalability; the discussion about mixing Lucene with SQL; the desire for a single “document id” approach.
Potential for discussion: best practices for schema sync, handling updates, latency trade‑offs.

Optimizing Top K in Postgres

🚀 Project Ideas

TopK Optimizer

Summary

Details

Notes

Columnar Switch

Summary

Details

Notes

LuceneSQL Bridge

Summary

Details

Notes

Read Later