Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One of the shockers I came across with MongoDB is that each instance of a key takes up memory. There is no form of a symbol table for the keys, so this means a huge amount of data overhead if each 'row' of data uses keys at all, which they likely do. No one just uses arrays.


Not really sure what you mean about the rows given that it is JSON but anyway.

I think what you are referring to is the tokenization of field names: https://jira.mongodb.org/browse/SERVER-863


Right, what I really meant was a document. I was trying to relate it to SQL, but I may have made it more confusing than anything.

That issue is what I was referring to, and it would be a good step forward. However, it's a shame you have the overhead of field names in the first place. I understand why it is like it is, being schemaless, and I suppose in terms of scaling it isn't a huge issue, as the overhead scales linearly which is manageable. But, in most cases, it's still huge compared to the size of the data itself.

I'm not sure how they'll fix it..and I don't know much about other schemaless DBs, but perhaps some sort of pattern recognization would be appropriate. Now that MongoDB has lots of funding for research, it will be interesting to see what they come up with.


As in an INFORMATION_SCHEMA collection ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: