All About Power BI Architecture Design

In the previous article we had a detailed introduction to what actually Power BI is and how it is used. In this section we are going to discuss a new topic and an important one to kick-start the Power BI journey. Here, we will discuss Power BI Architecture, its components and the Power BI Service architecture. So let’s start.

Power BI Architecture

Power BI architecture consist of 4 major sections that starts right from Data Sourcing to the creation of reports and dashboards. If we observe various technologies and processes are working together to get the desired outcome with correct accuracy. This is the reason Power BI is among the market leader when it is about Reporting and Dash boarding tools.

Power BI Architecture

Sourcing of Data: Power BI can extract data from various data connectors. It can be servers, Excel Sheets, CSV files, other databases and many more. You can even extract live data or a streaming data in Power BI. The extracted data is directly imported in Power BI within few seconds and is compressed up to 1 GB. After sourcing of data you can perform Data Transformation operations.

Transforming the data: As we know the Golden Rule of Data Analytics that before analyzing or visualizing the data we have to clean the data to get the accurate insights. So in this step Data Cleaning and Pre-processing takes place. After transforming data, the data is loaded into data warehouse and further analysis takes place.

Creating Reports or Visualizations: After data transformation process, different data reports and data visualizations are made based on the business requirements. A particular report has various visualizations of the data with different filters, graphs, charts, diagrams, etc.

Creating Dashboards: Planning and arranging all elements of Power BI report makes a Power BI Dashboard. Dashboards are created after publishing the reports in Power BI service.

Components of Power BI Architecture

Various components included in Power BI Architecture are as follows:

1) Power Query: This component provided by Power BI is used to access, search and transform data from various data sources.
2) Power Pivot: It provides tools to model data from internal memory data source for analytics.
3) Power View: These components have various tools to represent data through various visuals which are used for visual analysis.
4) Power Map: It has abilities to represent spatial data in form of maps. The important advantage of Power BI is that we can use maps in different customized ways.
5) Power BI Desktop: Power BI Desktop is the heart of entire Power BI platform. Its development tool for Power View, Power Query, and Power Pivot. You can import various data sources and perform visualization tasks.
6) Power Q&A: Using the Power Q&A option, you can search for your data and find insights by entering queries in natural language format. It can understand your questions asked and answers it with relevant insights in form of various visualizations.
7) Power BI Service: The Power BI Service helps in sharing the workbooks and data views with other users. Even data re-freshing can take place after regular intervals.
8) Power BI Mobile Apps: Business stakeholders can view and interact with the reports and dashboards published on a cloud service through mobile using Power BI Mobile Apps.

Working of Power BI Architecture

The Power BI architecture is mainly divided into two parts:

  1. On-cloud
  2. On-premises

The below diagram is also called as Power BI Data Flow diagram that may help you to clearly understand the flow of data from On-premises to On-cloud server applications.

Power BI Gateway Diagram

On-premises

All the reports published in Power BI Report Server are distributed to the end users only. Power Publisher enables to publish Power BI reports to Power BI Report Server. Report Server and Publisher tools by Power BI helps to create datasets, paginated reports, etc.

On-cloud

In this Data flow diagram, Power BI gateway acts as a bridge in transferring data from on-premises data sources to on-cloud servers. The clouds consist of various stuffs such as datasets, reports, dashboards, embedded, etc.

Power BI Service Architecture

It is mainly based on two clusters they are mainly:

  1. The Front-end Cluster
  2. The Back-end Cluster

The Front-end Cluster

The front-end cluster behaves as a medium between the clients and the on-cloud servers. After the initial connection and authentication the client can interact with various datasets available.

The Back-end Cluster

The back-end cluster manages datasets, visualizations, data connections, reports, and other services in Power BI. These Components are mainly responsible for authorizing, routing, authentication and load balancing.

Here, we have completed the architecture part behind the Power BI and in the next article we will study about some of the “7 Important Rules” that we need to remember to become pro in Power BI.

I hope you liked and understood the write-up. Meet you all soon in the next blog. Stay tuned and Happy Learning !! 🚀🚀👋

Let’s get the basics of Python done !!

In our previous blog we saw basics ad some theoretical knowledge of Python. In this blog let’s get some more basics clear.

Python Keywords

Keywords are the reserved words in python. We can’t use a keyword as a variable name, function name or any other identifier. Keywords are case sensitive.

# Get all keywords of python 3.6

import keyword

print(keyword.kwlist)

print("\nTotal number of keywords: ", len(keyword.kwlist))

Identifiers

Identifier is the name given to the entities like class, function, variables, etc. in python. It helps differentiating one entity from another.

Rules for writing identifier:

  1. Identifier can be combination of letters in lowercase (a to z) or uppercase (A to Z) or digits (0 to 9) or an underscore (_).
  2. An identifier cannot start with a digit. 1variable is invalid, but variable1 is perfectly fine.
  3. Keywords cannot be used as identifiers.
abc12 = 12;
global = 1

Python Comments

Comments are lines that exists in computer programs that are ignored by compilers and interpreters.

Including comments in programs makes code more readable for humans as it provides some information or explanation about what each part of a program is doing.

In general, it is a good idea to write comments while you are writing or updating a program as it is easy to forget you though process later on, and comments written later may be less useful in the long term.

In python, we use hash(#) symbol to start writing a comment.

#Print Hello, word to console
print("Hello World")

Multi Line Comments

If we have comments that extend multiple lines, one way of doing it is to use hash (#) in the beginning of each line.

#This is long comment
#and it extends
#Multiple lines

Another way of doing this is to use triple quotes, either ”’ or “””

"""This is also a 
perfect example of
multi-line comment"""

Python Indentation

  1. Most of the programming language like C,C++,Java use braces { } to define a block of code. Python uses indentation.
  2. A code block (body of a function, loop, etc.) starts with indentation and ends with the first unintended line. The amount of indentation is up to you, but it must be consistent throughout the block.
  3. Generally four whitespaces are used for indentation and is preferred over tabs.
for i in range(10): 
    print(i)

Indentation can be ignored in line continuation. But it’s a good idea to always indent. It makes the code more readable.

if True:
    print("Machine Learing")
    c = "AAIC"
if True: print("Machine Learing"); c = "AAIX"

Statements

Instructions that a Python interpreter can execute are called statements.

a = 1 #single statement

Multi-Line Statement

In python, end of a statement is marked by a newline character. But we can make a statement extend ove multiple lines with the line continuation character ().

a = 1 + 2 + 3 + \
    4 + 5 + 6 + \
    7 + 8
print (a)

 

a = 10; b = 20; c =30 #put multiple statements in a single line using ;

 

#another way is to use paranthesis
a = (1 + 2 + 3 + 
    4 + 5 + 6 +
    7 + 8)
print (a)

So with this we have covered some little basics part of the Python. In the next article we are going to study about Data types and variables.

So stay tuned!!!

Understanding Measures of Dispersion in an easy manner !

Introduction

In the field of statistics for both sample and population data, when you have a whole population you are 100% sure of the measures you are calculating. When you use sample data and compute statistic then a sample statistic is the approximation of population parameter. When you have 10 different samples which give you 10 different measures.

Measures of dispersion

The mean, median and mode are usually not by sufficient measure to reveal the shape of distribution of a data set. We also need a measure that can provide some information about the variation among data set values.

The measures that helps us to know the spread of data set is called are called as “Measures of dispersion”.  The Measures of Central Tendency and Measures of dispersion taken together gives a better picture about the dataset.

Measures of dispersion are called Measures of variability. Variability also called as dispersion or spread refers how spread data is. It helps to compare data set with other data sets. It helps to determine the consistency. Once we get to know the variation of data, we can control the causes behind that particular variation.

Some measures of dispersion are :

  1. Range
  2. Variance
  3. Standard deviation
  4. Interquartile Range (IQR)

Note: In this blog we won’t be discussing IQR, as it has some other application which we will cover in detail

Range

The difference between the smallest and largest observation in sample is called as “Range”. In easy words, range is the difference between the two extreme values in the dataset.

Let say, if X(max) and X(min) are two extreme values then range will be,

Range = X(max) – X(min)

Example: The minimum and maximum BP are 113 and 170. Find range.

Range = X(max) – X(min)

= 170 – 113

= 57

So, range is 57.

Variance

Now let’s consider two different distributions A and B which has data sets as following

A = {2, 2, 4, 4} and B = {1, 1, 5, 5}

If we compute mean for both the distributions,

                   

We can see that we have got the mean as 3 for both the distribution, but if we observe both the distributions there is difference in the data points. When observing distribution A we can say data points are close to each other there is not a large difference. On the other side when we observer distribution B we can observe that data points are far then each other there is a large difference. We can say that the distance is more that means there is more spread and this spread is called “Variance”.

Variance measures the dispersion of set of data points around their mean. Variance in statistics is a measure of how far each value in the data set from the mean.

The formula for variance is different for both Population and Sample
Why squaring?

Dispersion cannot be negative. Dispersion is nothing but the distance hence it cannot be negative. If we don’t square we will get both negative and positive value which won’t cancel out. Instead, squaring amplifies the effect of large distances.

Let us consider first variance for population, it is given by formula

When we computed the mean we saw it was same but when we compute the variance we observed that both the variance are different. The variance of distribution A is 4 and that of distribution B is 1.

The reason behind the large and small value in variance is because of the distance between the data points.

When the distance between the data points is more which means dispersion or spread is more hence we get higher variance. When the distance between the data points is less which means dispersion or spread is less hence we get lower variance.

For sample variance, there is little change in the formula.

Why n-1 ?

As we now we take sample from population data. So sample data should surely make some inference about the population data. There are different inferences using sample data for population data.

Now let us consider that we have a population data of ages and we are plotting it on the graph and it increasing across the x-axis. Also we have the mean at the middle.

So if we randomly select sample in the population data, the sample mean and population mean is almost equal.

If we take a random sample then the distance between the mean of random sample and actual sample is huge. So sample mean <<<<< population mean and sample variance <<<< population variance. Here we are underestimating the true population variance.

Hence we take the n-1 during the calculation of variance using sample data. n-1 makes the distance shorter then that of using n. Therefore to reduce the distance we use ‘n – 1’ instead of ‘n’ while computing sample variance. This ‘n-1’ is called as Bessel’s correction.

Also while discussing further topics we will come across a term Degree of freedom = n – 1.

Importance of Variance

  1. Variance can determine what a typical member of a data set looks like and how similar the points are.
  2. If the variance is high it implies that there are very large dissimilarities among data points in data set.
  3. If the variance is zero it implies that every member of data set is the same.

Standard deviation

As variance is measure of dispersion but sometime the figure obtained while computing variance is pretty large and hard to compare as unit of measurement is square.

Standard deviation (SD) is a very common measure of dispersion. SD also measures how spread out the values in data ste are around the mean.

More accurately it is a measure of average distance between the values of data and mean.

  1. If data values are similar, then the SD will be low (close to zero).
  2. If data values are of high variable, then the SD will be high (far from zero).

  • If SD is small, data has little spread (i.e. majority of points fall near the mean).
  • If SD = 0, there is no spread. This only happens when all data items are of same value.
  • The SD is significantly affected by outliers and skewed distributions.

Coefficient of variation

Standard deviation is the most common measure of variablity for a single data set Whereas the coefficient of variation is used to compare the SD of two or more dataset.

Example

     

  • If we observe, variance gives answer in square units and so in original units and hence SD is preferred and interpretable.
  • Correlation coefficient does not have unit of measurement. It is universal across data sets and perfect for comparisons.
  • If Correlation coefficient is same we can say that two data sets has same variability.

Python Implementation 

Python code for finding range

import numpy as np
import statistics as st

data = np.array([4,6,9,3,7])
print(f"The range of the dataset is {max(data)-min(data)}")

The Output will give us the value of range i.e. 6

Python code for finding variance

import numpy as np
import statistics as st

data = np.array([3,8,6,10,12,9,11,10,12,7])
var = st.variance(data)

print(f"The variance of the data is {var}")

The Output will give us the value of variance i.e. 8.

Python code for finding Standard deviation

import numpy as np
import statistics as st

data = np.array([3,8,6,10,12,9,11,10,12,7])
sd= st.stdev(data)

print(f"The standard deviation of data points is {sd}")

The Output will give us the value of SD i.e. 2.8284271247461903

Conclusion

So here we have understood about Measures of variability. Measures of Central Tendency and Measures of Variability together are called Univariate Measures of analysis.

Measures which deals with only one variable is called as univariate measures.

In the next section, we are going to discuss about more interesting topic such as 5 number summary statistics and skewness.

Happy Learning !! 

 

 

All about Descriptive and Inferential Statistics

So in the previous article we had a brief introduction about Statistics and importance of it in the field of analytics. In this article we will move one foot forward towards understanding the stats.

In this blog we are going to have an overview of types of statistics, Types of data and measurement scale.

Types of Statistics

So basically statistics is divided into 2 major categories i.e. Descriptive and Inferential statistics.

Descriptive statistics:

This is one of the very important part of stats. In this type we deal with numbers that can be numbers, figures or information to describe any certain phenomena. These numbers are known as descriptive statistics.

It helps us to organize and summarize data using numbers and graphs to look for a pattern in the data set.

Some examples of this type of statistics are Measures of central tendency which include mean, median, mode, etc. Also includes Measures of variability that are standard deviation, range, variance, etc.

Example: Reports of production, cricket batting averages, ages, ratings, marks, etc.

Inferential statistics:

To make an inference or draw a conclusion from the population sample data is used. Inferential statistics is a decision, estimate, prediction or generalization about a population based on the sample.

Inferential statistics is used to make interferences from the data whereas descriptive statistics simply describes what’s going on in our data.

Scenario based study:

Suppose a particular college has 1000 students. We are interested to find out how many of the total students prefer eating in canteen and how much prefer eating in mess. A random group of 100 students were selected and hence it becomes our sample data.

So, population size = 1000 college students

sample size = 100 random students selected

So now we can do survey with this 100 student sample and after doing the survey we get the following insights.

So after analyzing the data we get the following visualizations.

Insights rederived:

  1. 72 % of students prefer eating in canteen.
  2. Of the total students who prefer canteen 44.4 % are from 4th year.
  3. Of the total number of students who prefer canteen 72% are from 3rd and 4th year.
  4. 1st year students are more inclined towards eating in mess.

The above statistics give the trends of data among the sample data. In this insights we are using numbers hence this all is included in Descriptive Statistics.

Now, suppose we wanted to open a canteen or mess in the college from the above insights we can assume that –

  1. 3rd year and 4th year students are main target to start the business.
  2. To get more sales you can provide discounts to 1st year and 2nd year students.
  3. Since from the above insights we can conclude that canteen is better option than that of mess to run a business and most of the students in the data are inclined towards canteen than that of mess.

So here we made interferences/assumptions/estimations from the above insights for the whole college on the basis of the sample data. Hence this is a crucial part of Inferential statistics.

So here we have discussed the main difference between descriptive and inferential statistics based on the above scenario.

Everything about print() in python

print() function:

Python print() function is used to print something on the screen. For printing we need to use print() function. Strings are the collection of character inside “double quotes” or ‘single quotes’.

If we observe then print is not a statement it is a function. It is an in-built python function.

sep: It is a key word that is used to seperate string and insert some values or some default space. Let’s see some examples

Rather than using \n or \t, we can also use symbols like comma (,) or plus (+) sign.

To display a variable’s value along with a predefined string, all you need to do is add a comma in between the two. Here the position of the predefined string and the variable does not matter.

Similar to a format argument where your print function acts as a template, you have a percentage (%) sign that you can use to print the values of the variables.

Like format argument, this also has a concept of placeholders. However, unlike the format function where you pass in just the index numbers, in this, you also need to specify the datatype the placeholder should expect.

%d is used as a placeholder for numeric or decimal values. %s is used as a placeholder for strings.

Formatting:

A good way to format objects into your string for print statement is with the string. Here two method are used.

1)Format Method

Syntax:

‘String here { } then also here { }’. format(‘something1’,’something2)

2)f-string (formatted string literals)

 

Also read:

CODE FOR PRACTICE:

print("Hello World")

print('Hello World')

#type() of print
type(print)

print('Python','tutorial','of','data crux')

print('Python','tutorial','of','data crux',sep='\n') #\n will put each word in a new line

print('Python','tutorial','of','data crux',sep=',')

print('Python','tutorial','of','data crux',sep='\n\n')

print('Python','tutorial','of','data crux',sep='+')

a = 3
b = "Datacux"
print(a,"is an integer while",b,"is a string.")

print("{0} is an integer while {1} is a string.".format(a,b))

print("%d is an integer while %s is a string."%(a,b))

print(f'{a} is an integer while {b} is a string')

TEST YOUR KNOWLEDGE !

0%

What is %s used for?

Correct! Wrong!

%s is always used to represent string. If you use integer with %s then it will perform typecasting.

How many are there in formatting?

Correct! Wrong!

There are two methods of formatting using format() method and formatting string.

Can we use other symbol in seperator sep() as well?

Correct! Wrong!

Yes!!!

print() function quiz