-
Azure Business Intelligence
Topics Which We Can Cover
In this course, the students will learn various components that can be used to design and develop BI solutions to businesses in Cloud. We will work on various cloud data platform technologies to transform raw data processing efforts into Analytical data that meet business and technical requirements.
Azure Business Intelligence
Azure BI offers to help you to model and visualize data for interactive reporting and business intelligence. It also offers you a comprehensive line-up of services or resources to help you ingest, transform and store the aggregate data so that it can be modeled and explored with commonly used visualization tools like Microsoft Power BI.
Components which are part of Azure BI
Azure Storage
It is a storage service provided by Microsoft Azure which represents a highly available store in the cloud. Microsoft Azure allows cloud users to store objects, blobs, and queues. The storage can be accessed via HTTP protocol. You can create various cloud storage resources like
- Storage Account
- Azure Stack Edge
- Azure Data Lake Storage Gen1
- Azure Data Lake Storage Gen2
- Azure Data Box
- Cloudian HyperCloud for Azure
- Azure Storage Account
The storage account is a top-level container for various storage services in Microsoft Azure like Blob storage, file storage table storage, and q storage. Before you configure any of the services you need to create a Microsoft storage account
Azure data Lake storage Gen1
Azure Data Lake Gen 1 is a specialized cloud storage service optimized for big data analytics. It is basically an implementation HDFS in the Cloud
Azure data Lake storage Gen2
Azure Data Lake Gen 2 is an enterprise-wide hyper-scale repository that is created and designed for big data analytics. It enables you to ingest data of any size, type, and transfer speed in one single place for big data analytics.
Azure SQL Database
Azure SQL Database is a fully managed platform as a service (PaaS) instance of SQL in the cloud. Azure SQL database is a cloud-based database service provided by Microsoft. It is based on SQL Server database technology and built on Microsoft’s Windows Azure cloud computing platform, It enables businesses to store structured data in the cloud and provides scalability feature on database size as per business requirement.
Azure Synapse Analytics
Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless or provisioned resources—at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage and serve data for immediate BI and machine learning needs
PolyBase
PolyBase is a technology that can be leveraged to access and combine both non-relational and relational data from SQL Server. PolyBase enables your SQL Server instance to process Transact-SQL queries that read data from external data sources. SQL Server 2016 and higher can access external data in Hadoop and Azure Blob Storage. Starting in SQL Server 2019, you can now use PolyBase to access external data in SQL Server, Oracle, Teradata, and MongoDB.
PolyBase pushes some computations to the Hadoop node to optimize the overall query. However, PolyBase’s external access is not limited to Hadoop. Other unstructured non-relational tables are also supported, such as delimited text files.
Azure Data Lake Analytics
Azure Data Lake Analytics is a distributed, cloud-based analytics service offered by Microsoft in the Azure cloud. It is an on-demand analytics job service to simplify big data analytics built on top of YARN architecture which is used in the open-source Hadoop platform. It pairs with Azure Data Lake Store, specialized cloud storage service optimized for big data analytics
Unified SQL
U-SQL is a new data processing language which is a mixture of T-SQL and C# that unifies the benefits of SQL with C#.NET. U-SQL’s scalable distributed query capability enables you to efficiently analyze data in Data Lake Store, Azure Storage Blobs, and relational stores such as Azure SQL DB/DW.
Data Factory
Azure data factory is a cloud data integration service that is used to compose data storage, movement, and processing services into automated data pipelines. It is a data orchestration tool which is hybrid data integration ETL service in the cloud.
You can easily build code-free ELT pipelines within the intuitive data factory environment. You can create pipelines containing activities to integrate ad transform data.
Coding skills not required to build hybrid ETL pipelines within the Data Factory visual environment
It is Cost-efficient and fully managed serverless cloud data orchestration service that scales on demand
Linked services or connections are needed to define the information used by Data Factory to connect to external data sources.
Azure Data Factory Components
Pipeline – A pipeline is a logical grouping of activities set to perform a particular task. For example, If you want to copy folder from Azure file share to Azure Data Lake Gen 1, then you can perform this Copy task by Copy Data activity in Azure Data Factory. The Activities will be contained inside the pipeline and they are connected together to create a sequence of events, depending upon your requirement
Linked Service – It is used to establish a connection with the data source and it can be compared to connection string to establish the connection in databases.
Trigger – Trigger is a component which is used to determine time of pipeline execution.
Parameter – It is a placeholder that needs some data at run time. Parameterized data sets can be created in Azure Data Factory.
Azure Databricks
It is managed implementation of Apache Spark in the cloud and best framework for performing big data analytics and Machine Learning with Apache SparkProcess your data and build Machine Learning Scoring Pipelines with Azure Databricks, Set up your Apache Spark environment in minutes, auto scale, and collaborate on shared projects in an interactive workspace. Azure Databricks supports Python, Scala, R, SQL languages as well as data science frameworks and libraries including TensorFlow, PyTorch, and Scikit-learn.
Fast, optimized Apache Spark implementation
Interactive workspace with built-in support for popular tools, languages, and frameworks
Supercharged machine learning on big data with native Azure Machine Learning integration
High-performance modern data warehousing in conjunction with Azure Synapse Analytics