Couchbase woes on Stackoverflow


I spend quite a lot of time on Stackoverflow, possibly the greatest programming site in internet history. Why? Well it's a great way to learn from some incredibly talented and passionate people, usually documents the obvious pain points of a language/framework and it can be quite a lot of fun!

Over the years I've used the site many times and in the last year or so I've decided to make a concerted effort to give back to the community that has helped me so much.

I've tried to focus my efforts on the Couchbase tag, I really like working with the tech and feel pretty knowledgeable about it and also run it in production.

Over time similar themes keep cropping up on both the Couchbase and NoSQL tag (I'm sure this is applicable to MongoDB,Riak and to a lesser extent Redis tags). We are going to be cover some of the typical questions that crop up repeatedly, this isn't a bash on anyone, it's more of a chance to expose the flaws behind some of the common themes and point out what Couchbase really excels at!

Oh my gawd I have to use noSQL!

Theme Number One

In the distant distant future I'm going to need to scale to 100k writes a second and 500k reads a second, is Couchbase the right choice?

I see this theme crop up a lot, the user has a notion that their app/system is going to need to 'scale'. This is based on small tests, no evidence and really just a nagging desire to play with new shiny toys.

I often wonder how many of them have actually pushed MySQL or PostgreSQL really hard, NoSql isn't a magic bullet and abstractly defining huge potential requests a second with no data structure or access patterns in place isn't the best way to approach what your data store should be.

Theme Number Two

How can I do some complex sorting or joins across my distributed Couchbase cluster?

Across the NoSQL spectrum the querying abilities of each system varies greatly, Couchbase is pretty ok for querying, ok not great. So it really confuses me when people try to push a clearly relational model onto a non relational system and then complain about it's less powerful querying methods. You want joins? Use SQL!

Usually this is delivered in a one two combo with theme one: "I have to use X because we'll have 100k writes a second but I also need joins"

Map your data structure out, what are your predicted querying patterns, do they need to change, how complex are they, do they need to be consistent, are you dealing with relational data? These are the questions you need to answer before choosing a system.

Just because you don't need to define a schema doesn't mean you shouldn't have a well defined and thought data model at the application layer!

Theme Number Three

I want to access the data via a Rest API or the Couchbase console is inadequate

Couchbase has a Rest API for management/administration tasks, it does what it says on the tin but people still want to interact with their data via it. Use the SDKS!! Couchbase clearly state to use the SDKS and it's something well documented but still people push on with it, anyone reading, use the SDKS!

Couchbase also provide a management interface which I personally think is fantastic, allowing you to browse your cluster(s) and node(s), checking a whole bunch of statistics, it also allows some rudimentary browsing of the data but doesn't allow document editing of documents over 2.5kb. Even with this being built in I think people still expect similar tooling to the RDBMS world. I strongly suggest all data interactions are via the SDKS (and especially on production (I cannot stress that enough)).

Theme Number Four

I need Couchbase to do X,Y and Z and excel in each area!

Theme four is the 8 way button bashing Tekken mega combo! It usually involves a mixture of each of the above themes. Without sounding too hipster polyglot data storage is the way forward.

If you truly need features/characteristics X,Y and Z then you need to combine two or more storage engines be it NoSQL + NoSQL or RDBMS + NoSQL. This brings its own challenges (and rewards), the point I am getting at is that there are no magic bullets if you find yourself in this position.


My Couchbase experiences

To give a real world example I'll cover my use of Couchbase, how I arrived there and what I'd change.

I work for a company that makes mobile social games, previously I had also worked for a similar company that also made social games. In the previous company they relied heavily on MySQL, with some hugely popular games the company ran into difficulties with MySQL. It wasn't only scaling problems it was that changing the schema as data grew became problematic and development took longer. Redis came to the rescue, allowing the team to offload vital and high load parts of the system onto something a lot quicker.

When I joined my new company I was (and still am) the sole server developer, the game had already been launched but with client side features only so I already had an idea of the size of the active user base. Based off of previous experience and with some time available to experiment I started to play around with the various NoSQL engines after seeing the success of Redis.

Couchbase was a great fit for us because:

  • In a social game latency is key, we want to keep the application feel quick and responsive to the users, usually you'd approach this with MySQL/PostgreSQL backend and then with a memcache layer for caching hot data. Couchbase solves this nicely by holding all data in ram and if data size exceeds ram then it holds the most recently accessed data for us.
  • Our data fits perfectly into a document store, we have no need for joins and hold all user data in one document, the only separate referenced documents are for receipts from payments. Couchbase's querying model gives us enough flexibility to look up documents on several other id's and provide some basic analytical capabilities.
  • Our data evolves and we need to be able to accommodate that in a versioned api, changing huge tables can be problematic in RDBMS's and the schemaless model allows us fast development and flexbility.
  • Scaling up and down in cluster size is as easy as provision,point,click and automatic rebalancing (Couchbase's automatic sharding is one of it's best strengths and something best taken out of developers hands)
  • Nice plugin integration with ElasticSearch, as analytical needs grow this is crucial.

Conclusion

Couchbase works really well for high load reads and writes with great built in caching and a solid scale up and scale down model, just don't be expecting really flexible querying or ANY sort of data constraints.

P.s. We also use Redis because well, sorted sets in Redis are awesome and did I mention it's a cool piece of tech...

As always you can reach out to me on twitter or in the comments below!