In the first part of this 2 parts story (read it here: https://medium.com/@guillaumejacquart/databases-history-and-use-cases-part-1-b79beeb7e9b4), we had a quick overview of the evolution of database systems, from the flat file storage to SQL engines and the NoSQL alternatives.

In this article, we will have a look at the other database systems that handle specific business use cases.

The subset of those databases we are gonna handle are the following:

  • Graph databases
  • Cache databases
  • Time series databases

There are other types of databases (see Wikipedia for that) but to my knowledge their usage is restricted to very specific industries.

To conclude we will also give some tips on how to best choose your database architecture based on your business requirements

Graph databases

Graph databases are a type of NoSQL database that uses graph structures to allow fast and simple querying of complex organisational data. They uses the same graph representation of their data as the Network databases of the 1970s, but come up with simpler way to query a full traversal chain of graph edges.

They come up with their own query language different from SQL and more adapted to the graph modelling. The graph uses Nodes, Edges (link between to nodes) and Properties (set of key/value tuples associated to a node) to model the data.

Theoretically, SQL databases could store and query the same data as graph database, but the querying of connected data would be much more verbose in SQL, and probably less performant as well. An obvious example is when querying for hierarchy in the relationship between entities (https://neo4j.com/developer/cypher/guide-sql-to-cypher/#_joining_products_with_customers).

Graph databases thus express their full power when they are used to store highly connected object with complex connection paths, and when heavy querying and exploration is required. E-commerce (relation between customers and product recommendations for instance), supply chain, customer 360 analytics, and of course social networking use cases for instance could be worth implementing using a graph database.

Both open source & proprietary, and both self-hosted and managed graph databases can be found in the market, the most widespread ones being:

  • Neo4J: they offer an open-source self-hosted community edition, but restricted to a single node. Clusters comes with the enterprise edition
  • Amazon Neptune: the fully managed AWS graph database
  • Janus Graph: Open source graph engine that can use multiple underlying storage (like Cassandra or HBase)
  • Dgraph: the new kid on the block. Open source, self hosted or managed and fully scallable

Cache databases

https://aws.amazon.com/elasticache/

Cache databases are a type of NoSQL database that usually offers key/value in memory storage for fast querying of volume limited data. Their main usage is to enhance data querying performances and reduce the load on regular databases (whether SQL or NoSQL).

For instance, a cache database could be setup between a web server and its SQL database, in order to prevent multiple querying of the same resource in a time window. As long as the data is fresh enough, it can be cached in memory for faster access and less computing power.

Cache databases usually implement a key / value storage model because it is the most flexible. With this model, you can cache SQL queries or page URLs for instance.

You should consider using a cache database for the following use cases:

  • You want a fast and cost effective way to query the same data over and over again
  • You want to reduce the cost of an underlying querying engine such as SQL or NoSQL database
  • You want fast access to common data that rarely changes

There are only 2 widely used cache databases out there:

  • Redis: open-source, self-hosted but with managed engines in all main cloud services (AWS Elastic Cache, GCP MemoryStore, …). It’s easy to use, offers multiple data structures like lists, sets or hashes, and allows persisting data to prevent data loss in case of system failure.
  • Memcached: compared to Redis, Memcached can be slighty more performant that Redis, but offers less features (no resistance, complex data structures or replication)

Time series databases

https://www.influxdata.com/products/influxdb-overview/influxdb-2-0/

Time series databases are a type of NoSQL databases optimized for storing and querying time series data, which means a series of values associated to a time key, like the status of an IoT sensor over time. Even though that type of data could be easily stored in a SQL or a key/value database, what the time series databases implements are all the aggregation computation associated with time.

For instance, counting the number of time a sensor had an ‘active’ status during a 2 months period would be far more efficient using a time series database, because the underlying storage and querying engine is made for those kind of use cases.

Monitoring analytics, IoT logging and status querying, trading history are good examples of good use cases for time series databases. Time series databases are also usually highly scalable because their use cases involve storing a huge amount of simple time based data.

There are 2 widely used time series databases out there:

  • InfluxDB: free & open-source for a single node, enterprise licensing for clustering. Self-hosted & managed offers available
  • OpenTSDB: free & open-source, self-hosted and distributed

Conclusion

Graph, cache & time series databases all serve a specific purpose and come along with the increasing complexity of our modern age needs. Often times, mature product’s architectures must take advantage of several of those specialized database engines in order to answer to customer needs for performance and customization.

When thinking about what database engine you’ll need for your next project, take some time to consider the underlying data structure that you will store (relational, time based, highly connected, key value, document like, …) and the querying model your business requires you to implement (lookup for a single key, extract aggregated values from a time window, lookup for data that matches a connection pattern, …

As always, there are no silver bullet when choosing a database engine. It all depends on your business requirements.

--

--