Heart Disease Prediction in Tableau | Logistic Regression in Tableau
Heart Disease Prediction in Tableau | Logistic Regression in Tableau
This content is solely created for an experimental purpose. It demonstrates writing python script in Tableau and apply machine learning to predict heart disease. It also demonstrates the limitation with the existing python script for ML. This was already published in the previous channel and we are migrating the content in this channel. If you have already viewed it, ignore and wait for upcoming videos.
Dataset and twbx link-
https://drive.google.com/file/d/1D5yr…
Content
0.899 -> Hello, friends, in today's video, we will
be doing our Heart Disease prediction, and this will
8.25 -> be done in Tableau by integrating
it with Python. I've already created one
15.539 -> on the left and the right side are the input
details on the basis of this one can
22.02 -> improve the value inside this
parameter's. By inserting the specific value inside
28.52 -> the parameter, one can
check what would be the probability of one
33.83 -> having heart disease or not. Now, let us
understand its features, so the very first
42.24 -> feature is as SBP stands for systolic
blood pressure and it's unit is.
47.54 -> Mm. Of mercury. Let's have a look at its
range of value. So if the value is higher
54.659 -> than 5, then that would
be considered very critical. Makes us
59.639 -> a little a little lower and also the less
bad cholesterol and its unit is mili mol per litre.
68.23 -> But let's have a look at its value.
If the value is greater than, 5, that,
73.44 -> that would be considered very
high. Next is adiposity, On the basis
79.699 -> of the value of adiposity person,
will be categorised whether he is overweight,
84.29 -> underweight or the case of obesity.
Next is tobacco so tobacco in KG is like
94.41 -> it is a cumulative value of tobacco
engaging that a person has consumed
99.839 -> throughout this life. Next is alcohol
sorry to say this, but from the data that I have
107.4 -> collected, it doesn't have any info. for it.
But we have a range of value and the range
112.769 -> of value is between 0 to 150.
Next is type A, the value of type is between one
121.62 -> hundred. That is the score. So what
type a is type a of behavior pattern that is being
127.379 -> expressed by a person who are very
ambitious and they are more likely to be
132.869 -> get irritated. OK, next is age, so it is very
much self-explanatory. Other is family
140.82 -> history, whether a person has any
family already related to heart disease. One
147.509 -> is for present, Zero is for Absent
CHD is our target variable and it is coronary
153.899 -> heart disease. Present & absent. Now, if you are trying
to build a predictive model
160.38 -> in tableau, you need to prepare the data.
So rather than passing the value in different
167.039 -> row, you need to pass the value
in a list inside a single cell. I will tell
174.179 -> you why we are doing this later on. So in order
to prepare the data that way we will be
180.36 -> making use of python. So guys, I have
written the code in Python. Let me explain
186.839 -> what it does in the very first
step. I have imported library pandas.
192.36 -> In the next step, you need to store
the value of the data inside Dataframe .OK,
197.429 -> so I have stored the value of dataframe df
In the next step, you need to store
202.649 -> the name of the column inside the list.
So the df.column fetches the name of the column
208.619 -> and .tolist() convert that into list and I have
saved it inside column named Col, you know,
216.149 -> in the next step we will be preparing
the data. Let me explain the first line.
222.029 -> So what it does is it initially fetch
the data that is in the first column, OK
229.74 -> And it is converted to list using the function
to list. Now this list is stored
236.369 -> inside another list. That's why there
is square bracket outside using the same way
242.399 -> I have stored all the data
inside the column in all the variable that
247.529 -> is on the left side. Let me show you one
or both of SBP. So just take a look at all
255.389 -> the value stored inside lists and that
list is inside the list. In the next step,
262.529 -> we will be storing the value inside
dictionary in the form of key and value pair.
267.07 -> But so the very first value is key.
That is called zero call is our list and zero
273.75 -> is the index. So the value of the column
index zero is SBP MM HG. So that will be
281.64 -> the first key SBP HG and the value
of that will be SBP. That is this one. OK,
290.489 -> using the same way all the values will
be stored inside this dictionary, the form
294.72 -> of key and value. But OK, now let me say that
in in the form of dataframe ,OK I will be exporting
305.94 -> this file in the form of Excel in this
folder. So I have successfully exported
314.13 -> the file in this folder like this. Here
it is. This is a file. OK, so I have passed
324.059 -> the value inside the list and all
this while using a single cell.
329.339 -> So before importing this data inside
tableau make sure that you did it the first
335.399 -> column. Now we need to establish
connection between python and tableau so for
342.059 -> that you need to install tabpy server prior. I have
already installed it. So I'm copying
347.369 -> the location and I'm changing the directory
in color. OK, now I will run
354.329 -> that batch file that startup.bat got back
there. OK. No, it will initialize services
361.2 -> exports. Nine zero zero four. OK, now
I have loaded the file in tableau. Now we need
369.839 -> to establish connection ,go to
help setting & preformance, manage external service
374.25 -> of it and in server it should be
localhost and port should be 9004, test your connection.
381.179 -> So we have successfully connected with
external services at nine zero zero four.
387.619 -> Now let us understand the reason
behind preparing the data. So we have loaded
393.359 -> the original prepared datasource. Let
us know the original datasource. So I have
403.64 -> loaded the original datasource, OK,
now, whenever we are about to build any
409.66 -> predictive models, it requires
multiple value for training. But what tableau does
416.2 -> is aggregates data for single value.
So if let's say if a drag and drop adiposity
425.649 -> here. So some of that is aggregation
is applied, which is so. And if I pass this
431.529 -> value for training model as my input
one value will be passed, that is eleven
436.209 -> thousand seven hundred thirty eight.
So using one value, we can't train a model.
443.17 -> I suppose if I pass string and it needs
to be aggregated, so it would be
449.2 -> aggregated at the attr function.
So * displayed what * indicates is
453.959 -> null. So we can't neither create a model
using single value or both. So we require
462.22 -> multiple values. OK, now in our prepared
data, let's say if I drag and drop
469.48 -> adiposity here and that data is visible,
even if I aggregate it lets sat ATTR and that
477.07 -> data will be visible as it is.
So this multiple data is required
482.17 -> for our predictive model. So that is what
the reason why I have prepared the data,
489.239 -> let us create user input and that would
have been the form of parameter. So the very
494.38 -> first parameter is adiposity and in the same
way I will create for all excluding
499.64 -> CHD. Now, I have created all the input
parameter. Now let us create predictive
504.97 -> model and model. I will be
using its logistic regression. I know
511.299 -> whenever you're writing python code it is to be
encapsulated in script function
516.969 -> and python code is written
inside " code " So before that we need
521.65 -> to write, we need to pass some inputs.
So that is adiposity age. And in the same
531.38 -> we will be parsing all the columns
along with the user input parameters. So guys,
539.919 -> all the input, the first three rows
will be required for training the model
546.76 -> and creating dataframe, and these
3 rows will be required for manual user
553.15 -> input through which they can check
what the probability of having Heart Disease or not.
559.869 -> Now, we need to write python code. So it has
to be started by inputting libraries.
565.119 -> So these are the libraries, which is required
1st & 2nd is Panda & Numpy by the end of by what
570.07 -> is used to handle the data. The third
one is logistic regression. Using this,
575.89 -> we will build our predictive model
next is standard scalar. This is just to make
583.359 -> the independent variable bring every
value in this same scale. Literal eval is used
591.039 -> to convert the string into list. So we have
imported all the libraries. No need
598.39 -> to store their data inside data. So before that
I'm storing it inside a dictionary.
604 -> So dictionary store the value in the form key
and value. So I need to store
609.7 -> adiposity so that I'm using AD :
and I need to pass this value. OK, this value
616.059 -> is argument one, so _arg1
and index needs to be specified just
621.789 -> zero. OK, so once the value is here,
it is in the form of string. So you need
629.5 -> to convert that string into list to convert.
We need to make use of literal eval
635.77 -> literal eval will convert that string
to list. Now the next is age, So for that
642.299 -> ag : copy and paste, replace
it by using the same way I will fill
652.599 -> for the rest. We have created
the dictionary. Now let us save it into dataframe.
666.159 -> So we have successfully saved the value.
Now we need to separate our independent
672.309 -> variable and dependent variable. These
are independent variable and the one which is
678.01 -> to be predicted. This is our dependent
variable. So I will separate it using
683.32 -> a slicing technique so all the independent
variable used to be stored inside X, so X
689.679 -> equals to the df.integer location.
I will use the slicing technique. So I need
694.419 -> entire arrow. So for that right colon
comma and the index start from zero.
700.08 -> So I need help here. So zero, one,
two, three, four, five, six, seven, eight.
708.09 -> I need all the value, which is index
eight. So let's try it right. Colon,
713.789 -> 8.So I have fetched all the value
up, so now I need the value at index eight
722.19 -> so for that write df.interger location
colon Comma 8.Now, we need to transform
730.53 -> this value and we have to put every
single value that is all value is to be
735.57 -> brought on the same scale so for
that we need to make use of standard scalar.
740.039 -> So SC standard scalar. OK, now X_t
is my new variable. Where we will have
750.27 -> our scaled value, so sc. fit and transform,
OK, transform my X variable.
762.059 -> No, we will build a logistic
regression model. So lr = logistic
769.71 -> regression. OK, now train the model. So in order
to train you fit function X_t and
778.53 -> we need to predict y. OK,
now let's check what is the accuracy of it.
785.58 -> So return the accuracy that is lr.
score score of X_t,y .
793.08 -> Right. And it is in the form of string.
OK, drag and drop you. So our model is 75
805.65 -> percent accurate. Now let's
identify the result on the basis of user input
813.09 -> and check what is what would be
the probability of having heart disease.
818.34 -> So for that just edit it. OK, so let
us do some changes now. We need to make use
829.979 -> of this parameters,so store this
parameters in a list that is input_list
841.14 -> and store this value. OK, so we have
stop by argument nine. This will be 10, 11, 12,
848.729 -> 13 and so on. So this argument,
10 and index zero using the same way
856.38 -> I will store the rest of the data.
But it has to be remember that it should be
861.03 -> in the same sequence. OK, in the same
sequence as our data was stored
867.75 -> in the Dataframe. OK, so sequence must
not change. So I have stored all this value
875.309 -> of parameter inside list. Now
we have created this input list. Now
881.039 -> whenever we are passing in list
consisting of only single row of value, it is to be
885.9 -> reshaped. So we need to reshape it so and using
a variable inp np.array numpy not adding
893.679 -> store input list and reshape it to
one , minus one. No, we need to transform
907.469 -> this input data, so transform
it using standard scalar .transform
919.44 -> and pass and inp. Now store the result
of prediction, lr.predict, predict
930.28 -> using inp,so the result of the prediction
will be stored inside it.Now let us store
936.77 -> the probability so for that lr.
predict_proba this will fetch probability
944.619 -> and the probability of inp is to be determined
Now returning all the value so return
952.25 -> across. I need to return prediction and I even
need to return probability. So just
966.619 -> have a look on the basis of all
this input we are having, so all the values
974.929 -> are 1 basically so on the basis of this
value, one person can have heart disease
981.169 -> and the probability of not having
heart disease is 99 and having our diseases
985.849 -> 0.01. Now, I want to show this
3 values separately so that I will be making 3
993.049 -> calculated fields.So the first is to fetch
absent,so edit and just replace this
1002.619 -> and write zero. OK, so absent this,
fetch, duplicate and edit here, you have to fetch
1017.38 -> probability, which is inside prob
just place in text. So we have list in another
1027.069 -> list, so we need to fetch the first
value added. OK, I will specify zero and zero,
1040.219 -> so in this way, I will fetch the first
value, now duplicate it. Here,
1052.14 -> specify 1,just drag and drop. So using
this way, you can separate these three
1062.219 -> categories. The first is absent,
which is a second is absent probability.
1068.01 -> And the third is, yes probability..
that is having disease. You can rename
1073.43 -> the field, as per convenience,now we need
to convert its datatype to numeric and create an achievement.
1080.43 -> donut out of it.
1139.533 -> Now, let us work on user
input data for parameter's so the three data
1143.88 -> adiposity,LDL and SBP, and using that,
I will be creating a file through which user
1149.939 -> can refer and insert value. So,
guys, I have created user input details data
1156.089 -> for Adiposity, LDL and SBP now
we will be creating other input details
1162.949 -> at the dashboard itself. Now I have
changed the size of the dashboard by
1168.54 -> width-1320 and height-720 .
Let us create title and border.
1175.92 -> So title and border is ready. Now let us create
input details. So guys, I have created
1181.05 -> the user input data for adiposity
systolic blood pressure and LDL and these
1187.709 -> are the text for the other
columns. Let us insert the sheet, one where we have
1193.619 -> created the donut chart. So those are inserted
the file of the donut chart. Let us try
1200.579 -> to find the probability of having
disease by inserting in value inside parameter
1206.819 -> the value of adiposity is twenty four ,LDL-2.5
ages =40 alcohol consumption =50
1217.68 -> family history=1, SBP=140
tobacco=10
1224.4 -> typea=50
1230.43 -> So the person with this value
as input might have chances of having
1237.13 -> disease 39% not having disease - 61%.
So that's how you could build a predictive
1242.609 -> model and but there are some limitations
in it. OK, but it was worth giving a try.
1249.119 -> That's, I have tried it and now I want
to share it with him. So I hope that you guys
1254.28 -> try to. OK, thank you guys.
Source: https://www.youtube.com/watch?v=R__EeIePba8