Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

JOE was written in the final days of expensive memory and was written so that it can edit files larger than memory. Even today this is sometimes useful: you can edit an 8 GB file on a 32-bit machine.

It uses a doubly linked list of gap buffers. Each gap buffer has a header and a 4K data page. The headers are always in memory, but the data pages can be swapped out to a file in /tmp. The memory usage limit is 32 MB. Possibly this is no longer a good idea- it's easily possible that you could have more RAM than /tmp space.

The header has the data page's offset in the swap file, the link pointers, the gap location and a count of the number of newlines in the gap buffer.

When a file is read in, the gap buffers are completely full. So read-in turns into a direct read of the file into memory (or into the swap file). The only thing it has to do is count the newlines in each 4K data page and generate the headers.

The newline count is to speed up seeks to specific line numbers. [A long standing enhancement idea is to generate the newline count on demand and use mmap. This would allow the read in to be a NOP- just demand load the pages from the original file as needed and use copy-on-write when any change is made to preserve the original. But I'm also not sure it's a good idea to not take a snapshot of the original file- so this probably should be optional.]

JOE uses smart pointers to the edit buffer. Each pointer has the address of the header and a memory pointer to the data page (which is always swapped in if there is a pointer to it). The software virtual memory system has a reference count on each page. Each pointer holds a reference on the data page it's pointing to. If there is no pointer to a page, the reference count is zero, so it can be swapped out.

The other purpose of the smart pointers is automatically stick to the text they are pointing to, even through insert and delete operations. So if you insert at one point in the file, any pointers to further locations are updated (including line number, byte offset, column number and memory offset).



Guessing from your user name, are you one of the authors of JOE? That seems rather likely, and is just generally why HN is awesome. Thanks for that very well-written explanation.


Remarkably satisfying treatise. Thank you very much.


Thank you, that was really interesting.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: