4 Ways to make Couchbase do the hard work: Part III
In part 2 of this series we looked at using compound keys for more advanced querying, today we are going to modify our data set slightly so we can explore other querying methods.
Our new documents are going to look like this:
All that has changed is we've added in an 'offers' field which is an array that contains 0 or more offer codes that have been 'claimed' in our fake system.
To modify our data we are going to delete our old data and use the modified script available here to repopulate our bucket.
First delete the users bucket via clicking on data buckets on the console, clicking the drop down on our bucket, selecting edit and then scrolling to the bottom and clicking delete.
So after deleting and recreating our bucket, rerun the ruby script to populate our bucket with the new document. (As an aside, if this was a real system we'd write logic to update our documents rather than recreating them but as this is a demo to purely show the map reduce features we'll do it this way).
Deleting a bucket will not only remove all your data, it will also remove all your views, so be careful! However we generally favour deleting a bucket as opposed to flushing, the reason being is that a delete tells Couchbase to remove all documents and related data in one go. Whereas flush sends an individual delete call to each document while retaining our views. On a small dataset such as ours it is not a problem to do either, but we have seen flush commands lock up on larger datasets. Deleting the bucket is also always faster.
See part 1 if you have problems with running/installing the ruby script.
Queries for our new data structure
In our new document structure, each user may have claimed 0 or more 'offers', the offers are held in an array on each user document. Let's start with a simple query (named 'by_offers') to count how many offers have been claimed.
We can do that like so:
Yet again we'll select the count reduce, and then use a group level of one to produce an overall output of how popular each offer code is.
Now say we want to select all users that have had an offer code of "343-645-121". Using the same query we make enter a start and end key like so (alongside setting reduce to false):
Now let's assume we want to graph out the numbers of offers collected and how this changes over time, we want to see the totals for each of our offer codes every five minutes, we could do this via adding a 'collected' date field to each of our codes, emitting them and searching via dates, but as we've seen before compound keys can get complicated. Let's look at another way of achieving said functionality.
Persisting view data
So we are going to have to dive into a little bit of Ruby to achieve what we want, using the by_offers view we just created. Let's pretend the requirement is that we need to be able to view the claimed offer totals every five minutes, we also need to be able to select totals within ranges. Here is how we'll achieve this:
Every 5 minutes we need to query the view and then store that result in a new document. We aren't going to cover the logic to trigger this every X minutes (see cron) but we will write the retrieval and persistence logic.
This is the script that covers the functionality that we need:
Let's cover some of the key things here.
line:4 -> Set up the connection to our 'users' bucket.
line:7 -> Get a hold of our design documents
line:9 to 11 -> Instantiate a hash with the current time, and new document type.
line:13 -> Call our 'by offers' view, we also pass the group/group-level options as we did via the console
line:14 -> For each value that is produced from the count function we'll add the name of the offer code to the hash alongside it's value
line:19 -> Increment an atomic counter, we'll use this as part of the id for our new document, we pass the create parameter to it telling Couchbase to intialize the counter if it doesn't already exist.
line:20 -> Add our new document to our bucket with the id being formed like so: offers-collected + atomic counter value
The above script produces documents that look like this:
Now we can make a simple query that emits all our collated documents ordered by date, this allows you to graph the results with something like d3.js or R or even Google charts
With a simple view like this we can emit our new documents which will be ordered by our key (the date of the document):
So in this article we've seen how we can extract information about our documents, persist higher level overviews of the data and then extract that ready for further analysis or graphing in the style of our choosing.
In the final part of the series, we'll be examining how Couchbase integrates with other solutions, specifically Elastic Search
Any comments,problems or suggestions we'd love to hear them!