Data Storage

3 minute read

A few articles ago, we delved into using Microsoft Azure for your cloud hosting. We followed-up that article with a discussion of serverless versus static provisioning. In this article, we will cover the important topic of data storage. It is important to understand the data your application will use, the data it will produce and where you will store its data. One common approach is to separate your operational data from your business intelligence data. 

Operational Data

Operational data will be kept in your operational data store. This can be a relational database such as Oracle, SQL Server, PostgreSQL, etc or it can be other storage systems such as document stores like mongoDB

When choosing a data storage solution, it is important to think about hosting. If you are planning to host your application in Azure, choosing SQL Server, more specifically, Azure SQL, is the natural choice. If you are planning to host your application in AWS, Amazon RDS supports most major RDMSs. However, the cost for these database systems in RDS can be quite different, so make sure to consider cost in addition to your team’s comfort level with the database system. 

If you are planning to select a document store such as mongoDB, consider using a managed service such as MONGODB ATLAS, which can host your data in AWS, Azure or Google Cloud.

Business Intelligence / Analytics Data

If your application can generate data, it is often a good idea to plan your application architecture with a goal of maximizing the amount of useful data your application collects and stores. Storing this data in your operational data store can work but, if your application generates a lot of data, eventually you will want to consider data warehousing

You can employ data warehousing using Kimball Techniques and store the data in separate tables within the same database as your operational data. This approach is often chosen if your application data requires more of a data mart(s) rather than a true data warehouse. 

Alternatively, you can opt to periodically (for example daily) push your operational data to a data warehouse such as Snowflake, AWS Redshift or Azure Synapse

When researching warehousing solutions, salespeople might tell you that you can build your application using only a warehouse – so without an operational data store. The performance of modern warehousing solutions is such that building a warehouse-only application is possible. However, data warehousing solutions are significantly more expensive than database or document storage solutions. No matter what data warehousing vendor salespeople tell you, a good approach for keeping your costs under control is to start with a database of some sort as your operational data store. In fact, the high cost of data warehousing solutions is one reason why many companies opt to build a data mart within their database as an entree into business intelligence and analytics.

Up next

Our next long form tip will continue on the data theme. More specifically, we will dive into topics such as: 

  • Sourcing external data to power your application
  • Exporting data out of your application to downstream systems
  • ETL and ELT (no we didn’t just type that wrong;)
    • Common ETL products
    • Common ELT products

Twitter

LinkedIn