Wednesday, October 26, 2011

InfoQ Article: Finding the Right Data Solution for Your Application in the Data Storage Haystack

The article "Finding the Right Data Solution for Your Application in the Data Storage Haystack" just went live on InfoQ and you can also find the slide deck of the NoSQL Now talk in this earlier post.

I firmly believe that the reason scale and some of other problems are too hard because we are too lazy to consider specific cases and analysis them in detail. Instead we are trying to find general answers that works everywhere. For an example, I can not find a taxonomy of Computer Science usecase/domins anywhere (will write about this in a later post).

Following article takes four parameters about an application/usecas, then take some 40+ cases that arises from different combination of those parameters and make concrete recommendations for each case from the storage solutions Haystack (e.g. Local memory, Relational, Files, Distributed Cache, Column Family Storage, Document Storage, Name value pairs, Graph DBs, Service Registries, Queue, and Tuple Space etc.).

It is intended as a guide to choose the right storage (SQL or NoSQL) solution for a given usecase.

Four parameters are
  •  Types of data stored (structured, unstructured, semi-structured)
  • Scalability requirements (small 1-4,  medium 10s, and very large 100s)
  • Nature of data retrieval (i.e. Types of Queries: key lookup, WHERE, JOIN, Offline)
  • Consistency requirements (ACID, single atomic Operation, loose consistency)
I do not consider the second part is done in any way. I am sure there is lot to argue and analyze there, and please let me know if you have any thoughts on this. 

No comments: