The relational model takes the information that we want to store and divides it into tuples (rows). A tuple is a limited data structure: It captures a set of values, so you cannot nest one tuple within another to get nested records, nor can you put a list of values or tuples within another. This simplicity underpins the relational model—it allows us to think of all operations as operating on and returning tuples.
Example: Let’s assume we
have to build an e-commerce website; We can use this example to model the data
using a relation data store as well as NoSQL data stores and talk about their
pros and cons.
As we’re good relational
soldiers, everything is properly normalized, so that no data is repeated in
multiple tables. We also have referential integrity.So we will need a bunch of
tables with relations via foreign keys. So we will have Customer, Orders, Product,
BillingAddress, OrdeItem, Address, OrderPayment etc.,
// in
customers
{ "id":1, "name":"Martin", "billingAddress":[{"city":"Chicago"}] }
// in orders { "id":99, "customerId":1, "orderItems":[ { "productId":27, "price": 32.45, "productName": "NoSQL Distilled" } ], "shippingAddress":[{"city":"Chicago"}] "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Chicago"} } ], }
"customer": {
"id": 1,
"name": "Martin",
"billingAddress": [{"city": "Chicago"}],
"orders": [
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":27,
"price": 32.45,
"productName": "Data Modelling"
}
],
"shippingAddress":[{"city":"Chicago"}]
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Chicago"}
}],
}]
}
}
x
{ "id":1, "name":"Martin", "billingAddress":[{"city":"Chicago"}] }
// in orders { "id":99, "customerId":1, "orderItems":[ { "productId":27, "price": 32.45, "productName": "NoSQL Distilled" } ], "shippingAddress":[{"city":"Chicago"}] "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Chicago"} } ], }
"customer": {
"id": 1,
"name": "Martin",
"billingAddress": [{"city": "Chicago"}],
"orders": [
{
"id":99,
"customerId":1,
"orderItems":[
{
"productId":27,
"price": 32.45,
"productName": "Data Modelling"
}
],
"shippingAddress":[{"city":"Chicago"}]
"orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Chicago"}
}],
}]
}
}
x
In this model, we have two main aggregates: customer and order. We
use composition to show how data fits into the aggregation structure. The
customer contains a list of billing addresses; the order contains a list of
order items, a shipping address, and payments. The payment itself contains a
billing address for that payment.
A single logical address record appears three times in the example
data, but instead of using IDs it’s treated as a value and copied each time.
This fits the domain where we would not want the shipping address, nor is the
payment billing address, to change. In a relational database, we would ensure
that the address rows aren’t updated for this case, making a new row instead.
With aggregates, we can copy the whole address structure into the aggregate as
we need to.
The link between the customer and the order isn’t within either
aggregate—it’s a relationship between aggregates. Similarly, the link from an
order item would cross into a separate aggregate structure for products. The
product name is as part of the order item here—this kind of denormalization is
similar to the tradeoffs with relational databases, but is more common with
aggregates because we want to minimize the number of aggregates we access
during a data interaction.
The important thing to notice here isn’t the particular way we’ve
drawn the aggregate boundary so much as the fact that we have to think about
accessing that data—and make that part of our thinking when developing the
application data model. Indeed we could draw our aggregate boundaries
differently, putting all the orders for a customer into the customer aggregate.
Using the
above data model, an example Customer and Order would look like this:
// in customers
{
Like most things in modeling, there’s no universal answer for how
to draw your aggregate boundaries. It depends entirely on how you tend to
manipulate your data. If you tend to access a customer together with all of
that customer’s orders at once, then you would prefer a single aggregate.
However, if you tend to focus on accessing a single order at a time, then you
should prefer having separate aggregates for each order. Naturally, this is
very context-specific; some applications will
prefer one or the other.
The clinching reason for aggregate orientation is that it
helps greatly with running on a cluster, which is the killer argument for the
rise of NoSQL. If we’re running on a cluster, we need to minimize how many
nodes we need to query when we are gathering data. By explicitly including
aggregates, we give the database important information about which bits of data
will be manipulated together, and thus should live on the same node.
Key-Value and Document Stores - Data Models
Key-value and document databases are strongly
aggregate-oriented. The two models differ in that in a
key-value database, the aggregate is opaque to the database—just some big blob
of mostly meaningless bits. In contrast, a document database is able to see a
structure in the aggregate. The advantage of opacity is that we can store
whatever we like in the aggregate. The database may impose some general size
limit, but other than that we have complete freedom. A document database
imposes limits on what we can place in it, defining allowable structures and
types. In return, however, we get more flexibility in access.
With a key-value store, we can
only access an aggregate by lookup based on its key. With a document database,
we can submit queries to the database based on the fields in the aggregate, we
can retrieve part of the aggregate rather than the whole thing, and database
can create indexes based on the contents of the aggregate.
When
modeling data aggregates we need to consider how the data is going to be read
as well as what are the side effects on data related to those aggregates.
Let’s start with the model
where all the data for the customer is embedded using a key-value store
In this scenario, the
application can read the customer’s information and all the related data by
using the key. If the requirements are to read the orders or the products sold
in each order, the whole object has to be read and then parsed on the client
side to build the results. When references are needed, we could switch to
document stores and then query inside the documents, or even change the data
for the key-value store to split the value object into
Customer and Order objects and then maintain these objects’
references to each other.
With the references we can
now find the orders independently from the
Customer, and with the orderId reference in the Customer we can find all Orders for the Customer. Using aggregates
this way allows for read optimization, but we have to push the orderId reference into Customer every time with a new Order.
Aggregates can also be
used to obtain analytics; for example, an aggregate update may fill in information
on which
Orders have a given Product in them. This denormalization of the data allows for fast access to the
data we are interested in and is the basis for Real Time BI or Real Time Analytics where
enterprises don’t have to rely on end-of-the-day batch runs to populate data
warehouse tables and generate analytics; now they can fill
in this type of data, for multiple types of requirements, when the order is
placed by the customer.
And finally something to smile after a long read :-)
References:
1)http://www.amazon.com/Seven-Databases-Weeks-Modern-Movement/dp/1934356921/ref=sr_1_1?ie=UTF8&qid=1361896201&sr=8-1&keywords=seven+databases+in+seven+weeks
2)http://www.amazon.com/NoSQL-Distilled-Emerging-Polyglot-Persistence/dp/0321826620/ref=sr_1_1?s=books&ie=UTF8&qid=1361896233&sr=1-1&keywords=nosql+distilled
3)http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/










