How close is this to the way MySQL does it with InnoDB? It appears to be about t...

seunosewa · on May 7, 2025

Yep. It's a low hanging fruit they should've picked years ago.

They will eventually figure out using b-trees for tables too.

tehlike · on May 7, 2025

There's a lot more PG needs to do for storage layout / access pov, and they have been working on it for a while. Orioledb has shown what might be opssible, and they have been upstreaming it.

Having the ability to do something LSM as a storage engine would be great - and potentially allow better compression than what we currently get with TOAST - which is not a lot... PG doesn't even have oob page compression...

ARandomerDude · on May 7, 2025

Can you elaborate on this B-tree part of your comment? I know B-tree is the default index type in pg, but it sounds like there’s more to the story that I’m not familiar with.

greenavocado · on May 7, 2025

PostgreSQL uses heap files for the primary table storage, not B-trees. In PostgreSQL table data is primarily stored in heap files (unordered collections of pages/blocks). Indexes (including primary key indexes) use B-trees (specifically B+ trees). When you query a table via an index, the B-tree index points to locations in the heap file

InnoDB uses a clustered index approach. The primary key index is a B-tree. The actual table data is stored in the leaf nodes of this B-tree. Secondary indexes point to the primary key.

One is not better than the other in general terms. InnoDB's clustered B-tree approach shines when:

You frequently access data in primary key order

Your workload has many range scans on the primary key

You need predictable performance for primary key lookups

Your data naturally has a meaningful ordering that matches your access patterns

PostgreSQL's heap approach excels when:

You frequently update non-key columns (less page splits/reorganization)

You have many secondary indexes (they're smaller without primary keys)

Your access patterns vary widely and don't follow one particular field

You need faster table scans when indexes aren't applicable

I personally find PostgreSQL's approach more flexible for complex analytical workloads with unpredictable access patterns, while InnoDB's clustered approach feels more optimized for OLTP workloads with predictable key-based access patterns. The "better" system depends entirely on your specific workload, data characteristics, and access patterns.

Sesse__ · on May 7, 2025

Indexes that point directly to the disk column are also significantly faster to access; it is a persistent pain point for OLAP on InnoDB that all secondary indexes are indirect. You can work around it by adding additional columns to the index to make your lookups covering, but it's kludgy and imprecise and tends to bloat the index even further. (The flip side is that if you have tons of indexes, and update some unrelated column, InnoDB doesn't need to update those indexes to point to the location of the new row. But I'm generally very rarely annoyed by that in comparison.)

sroussey · on May 8, 2025

On commercial databases from Oracle and Microsoft, you can cluster the DB on any index. Really would love for Postresql to do the same.

cwbriscoe · on May 8, 2025

You can but new rows will not be clustered until you re-cluster.

CLUSTER table_name USING index_name;

sroussey · on May 10, 2025

I remember that for MySQL MyISAM (syntax was different) in the olden days. I thought Oracle let you do it though. My bad.

MySQL Innodb and other have real clustered primary key indexes with all the pluses and minuses that entails.

farazbabar · on May 7, 2025

Don't forget high speed committed writes to append only tables (the opposite of scans), postgres approach is better here as well.

saltcured · on May 7, 2025

It's also deeply entwined with the MVCC concurrency control and the ability to do DDL in transactions, right?

jiggawatts · on May 7, 2025

SQL Server supports every combination of heap storage, clustered storage, MVCC, and DDL in transactions.

foobahhhhh · on May 8, 2025

Takes me back 20 years to using SQL Server! It let you choose clustered index or not IIRC.

cryptonector · on May 7, 2025

SQLite3 only has b-trees for files and indices.

PostgreSQL only has heaps for tables, and various other data structures for indices, with b-tree being the default for indices.

There are many cases where having b-trees for tables would make performance better.

GP is basically poking fun at PG for still having nothing but heaps for tables.

ksec · on May 10, 2025

Despite having relatively fewer improvement on the MySQL side, especially the OSS version and not the enterprise version, there are still many areas Postgres have not exceeded MySQL, and doesn't look like will be doing so in next 3 - 4 years.