Optimize your data warehouses, not just your warehouses
13 July, 2022
The importance of optimizing warehousing, maintaining a structured storage system, and ensuring clear record keeping are tried-and-tested methods of warehouse management. Companies that optimize data warehouses can improve performance, efficiency, and reliability in managing digital assets.
As production companies grow, storage inevitably becomes an issue. How and where should we store parts used in the construction process? How about finished products? What about spare parts? Should they be stored on-site or off-site? Do we want a centralized or decentralized system? How do we even keep track of all this coming from different suppliers who don’t even use the same naming system?
These are issues that are not only present in the physical world, but also in the digital one – data storage. Just like in the physical world, the choices we make on how our data is stored and designing a system that optimizes that storage approach can have a massive impact on your bottom line.
Data is said to be the 21st century’s most valuable asset, and companies that have a means to harness its power have a distinct competitive advantage. Today it’s not just a story of how to optimize our warehouses, whether to store onsite or offsite, and deciding how big of a warehouse to build, it’s also about optimizing your data warehouses, data lakes, and storage locations to store effectively while meeting the needs of your modern data teams and those valuable analytics and machine learning applications that come from them.
Before we delve deeper, let’s first cover a few key terms in today’s ever-changing modern database management system (DBMS):
Databases vs. Data Warehouses vs. Data Lakes
All of these have one thing in common: They all store data! But how they store it and their uses (and costs) are very different.
Let’s start with the basics.
Databases
Loosely defined, a database is a collection of organized data representing something in the real world. The most common type of database is for structured data (data stored in rows and columns) – although not the only type – and is often referred to as a SQL database. Ever wonder where the instruments monitoring wind speed on a turbine or pressure on a gas pipe are keeping the data? The same place your local bank and jeans retailer do when you swipe your credit card to modify the balances – A database. Data is captured live and sent to a database for live storage. As such, databases often need to be able to take high amounts of data on the fly and update frequently.
Ok, so what’s a Data Warehouses then?

While databases are designed to capture data, Data Warehouses are designed for analytical processing (OLAP) on historical data. Data will generally be brought from multiple sources (such as your databases) via an ETL pipeline and summarized into a strict data schema. As a source of historical data, it doesn’t need to be updated as often as your database – perhaps hourly, daily, or even monthly. Your warehouse is where you store primarily structured data optimized for analysis to be accessed by your Analysts and BI teams for later processing.
When to use a Database vs a Data Warehouse?
Ensuring your database’s integrity is vital. You and your customers need to be able to trust the data in your database, just like you need to be able to trust your bank not to double-charge you.
Great, so let’s get our Analysts to analyze our databases and gain insights from them! For the same reason, your databases should generally be optimized for transaction recording (OLTP), it shouldn’t be a source of data analysis. Running frequent SQL or NoSQL queries and updates on your database can cause a significant slowdown in its run time, leading to delayed and outdated data appearing on dashboards or possibly even creating errors, leading to data integrity issues. So how do we solve this issue? Simple – analyze your data warehouse! Once historical data has been sent to the warehouse for storage, it is free to be poked around in without worrying about causing issues to your data or customers. Keeping a rigid schema of data storage in your warehouse is not only something that allows for data processing automation, but may return its value numerous times over to your organization.
Hooray! Then what exactly is this Data Lake you’re talking about?

A Data Lake is a little bit like what it sounds like – A vast area we can store large amounts (often in petabytes) of raw, unprocessed data. Don’t know what to do with your data? Cool, throw it in the data lake. Got sound and image data of your pumps and generators you don’t know why you recorded? Cool, throw it in the data lake. Don’t even know where this data came from? Cool, throw it in the data lake.
Data Lakes store large amounts of data in any format from structured to partially structured to unstructured (think nice tidy tables, to web browser JSON data, to images or sound files), typically coming from a large variety of sources.
When to use a Data Warehouse vs a Data Lake?
So why are we storing all this anyway? While your Data Warehouse stores highly structured data for planned analysis later (typically your predictive analytics sent to your dashboards, etc.), your Data Lake is data that you hope to gain deeper knowledge from. This is the data that is the playground of your Data Scientists, who can use machine learning (ML) techniques on the raw data to gain insight and build models. Maybe that weird sound data you recorded from your pumps could be exactly what you’re looking for! Your Data Scientists may be able to build an ML model to look for patterns in the sound data, indistinguishable to the human ear, which can detect issues ahead of time to optimize maintenance scheduling and avoid downtime.
How to optimize your storage?
Well, this is grand and all…. But how does this help me optimize anything?
Much like when you’re reaching a point of deciding if you are better off building a new physical warehouse or storing off-site and how big of a warehouse to build (because construction isn’t free, right?) – the same issues exist in the world of data.
Speed and frequency vs size.
The key things to take into account to optimize your data storage are balancing the need between speed and frequency vs size.
How often does your data need to be accessed and updated? If the answer is frequently (maybe to the second or more), then you need a system set up for this. Put simply, the more we access our data, the more expensive it is to maintain (particularly if stored off-premise). This data and its databases also need to be maintained to ensure integrity. To minimize costs, you want a database of the right size for your task. Here, size is not an asset. A series of smaller databases may well fit your situation better, be able to update more reliably and quicker, and at a lower cost.
If, however, we are talking about massive amounts of data that will be accessed infrequently (or perhaps never), a storage solution optimized for size is likely a better choice for you. Here, we’re looking for cheap storage, lots of it, and preferably storage that can auto-scale to meet your needs rather than charging for space you’re not using. You are likely going to run into issues on the hardware side, and other issues, such as potential overheating leading to damage and potential data loss, are possible issues to consider. Keeping it onsite may or may not be the best solution here, depending on the cost of hardware and electricity in your location, not even considering reliability issues with having all data in one location.

To store on-site or on a cloud provider
The final main factor to consider is whether it’s better to store your data onsite or off-premise via a cloud provider such as Amazon’s AWS, Microsoft’s Azure, Google’s GCP, or one of a variety of other cloud providers.

The traditional approach to storing data is to store it on-premises. This requires the need for constant server access and ensuring adequate storage capacity. This involves a certain amount of planning and investing ahead of time, as the last thing you want to do is have a server outage or lack of storage space available, leading to lost data. Conversely, it’s also not wise to be paying for and maintaining storage capacity that is not being used.
Storing on the cloud provides an avenue for easy data storage and data access. Instead of planning ahead and estimating how much space will need to be bought, nowadays it is also possible to rent cloud storage space, meaning an easily scalable solution is right at hand and can be automated to only charge you for what you actually use. As well as storage, cloud providers can also provide data analysis and ML solutions with the computing done on the server, at the site of the data.
In many cases, there are tens or even hundreds of types of storage to be chosen from to suit your needs at hand. In the cloud, even more so than with on-site data storage, choosing the right option for size vs frequency of update is even more important to avoid paying excessive fees to maintain systems not suited for your needs
Which solution is right for you?
Finding which solution is correct for you largely depends on your situation. With a smaller company with little data to manage, simply using a few relational databases managed on a MySQL or Postgres server may suit your needs at a very low cost. As your company grows, you will likely find yourself needing to maintain a system of databases, data warehouses, and a data lake to meet your needs and finding an optimal database management system solution to optimize performance and minimize costs. Likewise, the decision to go fully or partly on cloud (and on how many clouds to use, as your jurisdiction may require) is a question that needs to be carefully answered based on your unique situation. Choosing the wrong one may end up costing you tens of thousands of dollars a month, while choosing the optimal solution will not only lead to an efficient, maintainable system, but potentially one that can return its value many times over in valuable BI and ML knowledge.
Need solutions to optimize your data store problems? Our team of data experts at Keel Solution is here to help.
Enjoyed this article? Check out these others from our Keel Data series:
Link – The cost of poor data in a data driven world – Keel Solution
We are ready to help!
Request consultation, ask a question or share your feedback. Just get in touch!



