Data Warehousing

Sushant Kumar Jha
2 min readMay 7, 2021
Data Warehousing

Happy reading to all, In this section, I will give you is the basic idea of the monster word Data Warehousing.

So before we start discussing data warehousing let’s understand transaction is Processing or Analytics?

Transaction Processing or Analytics?

As Data Science comes under the rise the database transaction no longer limits to perform only traditional transactions with the ACID property.

Now the need for database access is changing, you should have to make low-latency reads and writes as opposed to batch processing jobs, which only run periodically.

E.g: Data analytics require very different access patterns and perform algorithms over huge records in low response time.

Here what solution comes with Data Warehousing

Data Warehousing

By contrast, is a separate database that analysts can query to their hearts’ content, without affecting OLTP(Online Transactional Processing) operations.

Ok lets, make it simple

The data warehouse contains a read-only copy of the data in all the various OLTP systems in the company.

In general, Data is extracted from the OLTP database (periodically dump data records), transformed into an analysis-friendly schema, cleaned up, and then loaded to the data warehouse. The process of getting data into the warehouse is known as Extract-Transform-Load(ETL).

Simplified outline of ETL into a data warehouse.

Where we Can use Data Warehousing?

Data warehouses now exist in almost all large enterprises.

Most small companies have a small amount of data that can be queried in a conventional SQL database, or even analyzed in a spreadsheet. In a large company, a lot of heavy lifting is required to do something simple in a small company.

Data warehouse and a relational OLTP database look similar because they both have a SQL query interface.

Microsoft SQL Server and SAP HANA, have support for transaction processing and data warehousing in the same product. However, quite different as both two are separate storage and query engine.

Vendors

Data warehouse vendors such as Teradata, Vertica, SAP HANA, and ParAccel typically sell their systems under expensive commercial licenses.

Amazon RedShift is hosted version of ParAccel.

Conclusion

Rather than querying OLTP systems directly for analytics, is that the data warehouse can be optimized for analytic access patterns.

It works well for answering analytic queries.

I hope, I have given you a very basic idea of what is Data Warehousing is and where and why we should you this.

Give a clap if you like it.

Thanks

--

--

Sushant Kumar Jha

Machine Learning enthusiast |Python |C++|JavaScript|Tensorflow|keras|pytorch