Heart Disease Prediction in Tableau | Logistic Regression in Tableau

Heart Disease Prediction in Tableau | Logistic Regression in Tableau


Heart Disease Prediction in Tableau | Logistic Regression in Tableau

This content is solely created for an experimental purpose. It demonstrates writing python script in Tableau and apply machine learning to predict heart disease. It also demonstrates the limitation with the existing python script for ML. This was already published in the previous channel and we are migrating the content in this channel. If you have already viewed it, ignore and wait for upcoming videos.

Dataset and twbx link-
https://drive.google.com/file/d/1D5yr


Content

0.899 -> Hello, friends, in today's video, we will be doing our Heart Disease prediction, and this will
8.25 -> be done in Tableau by integrating it with Python. I've already created one
15.539 -> on the left and the right side are the input details on the basis of this one can
22.02 -> improve the value inside this parameter's. By inserting the specific value inside
28.52 -> the parameter, one can check what would be the probability of one
33.83 -> having heart disease or not. Now, let us understand its features, so the very first
42.24 -> feature is as SBP stands for systolic blood pressure and it's unit is.
47.54 -> Mm. Of mercury. Let's have a look at its range of value. So if the value is higher
54.659 -> than 5, then that would be considered very critical. Makes us
59.639 -> a little a little lower and also the less bad cholesterol and its unit is mili mol per litre.
68.23 -> But let's have a look at its value. If the value is greater than, 5, that,
73.44 -> that would be considered very high. Next is adiposity, On the basis
79.699 -> of the value of adiposity person, will be categorised whether he is overweight,
84.29 -> underweight or the case of obesity. Next is tobacco so tobacco in KG is like
94.41 -> it is a cumulative value of tobacco engaging that a person has consumed
99.839 -> throughout this life. Next is alcohol sorry to say this, but from the data that I have
107.4 -> collected, it doesn't have any info. for it. But we have a range of value and the range
112.769 -> of value is between 0 to 150. Next is type A, the value of type is between one
121.62 -> hundred. That is the score. So what type a is type a of behavior pattern that is being
127.379 -> expressed by a person who are very ambitious and they are more likely to be
132.869 -> get irritated. OK, next is age, so it is very much self-explanatory. Other is family
140.82 -> history, whether a person has any family already related to heart disease. One
147.509 -> is for present, Zero is for Absent CHD is our target variable and it is coronary
153.899 -> heart disease. Present & absent. Now, if you are trying to build a predictive model
160.38 -> in tableau, you need to prepare the data. So rather than passing the value in different
167.039 -> row, you need to pass the value in a list inside a single cell. I will tell
174.179 -> you why we are doing this later on. So in order to prepare the data that way we will be
180.36 -> making use of python. So guys, I have written the code in Python. Let me explain
186.839 -> what it does in the very first step. I have imported library pandas.
192.36 -> In the next step, you need to store the value of the data inside Dataframe .OK,
197.429 -> so I have stored the value of dataframe df In the next step, you need to store
202.649 -> the name of the column inside the list. So the df.column fetches the name of the column
208.619 -> and .tolist() convert that into list and I have saved it inside column named Col, you know,
216.149 -> in the next step we will be preparing the data. Let me explain the first line.
222.029 -> So what it does is it initially fetch the data that is in the first column, OK
229.74 -> And it is converted to list using the function to list. Now this list is stored
236.369 -> inside another list. That's why there is square bracket outside using the same way
242.399 -> I have stored all the data inside the column in all the variable that
247.529 -> is on the left side. Let me show you one or both of SBP. So just take a look at all
255.389 -> the value stored inside lists and that list is inside the list. In the next step,
262.529 -> we will be storing the value inside dictionary in the form of key and value pair.
267.07 -> But so the very first value is key. That is called zero call is our list and zero
273.75 -> is the index. So the value of the column index zero is SBP MM HG. So that will be
281.64 -> the first key SBP HG and the value of that will be SBP. That is this one. OK,
290.489 -> using the same way all the values will be stored inside this dictionary, the form
294.72 -> of key and value. But OK, now let me say that in in the form of dataframe ,OK I will be exporting
305.94 -> this file in the form of Excel in this folder. So I have successfully exported
314.13 -> the file in this folder like this. Here it is. This is a file. OK, so I have passed
324.059 -> the value inside the list and all this while using a single cell.
329.339 -> So before importing this data inside tableau make sure that you did it the first
335.399 -> column. Now we need to establish connection between python and tableau so for
342.059 -> that you need to install tabpy server prior. I have already installed it. So I'm copying
347.369 -> the location and I'm changing the directory in color. OK, now I will run
354.329 -> that batch file that startup.bat got back there. OK. No, it will initialize services
361.2 -> exports. Nine zero zero four. OK, now I have loaded the file in tableau. Now we need
369.839 -> to establish connection ,go to help setting & preformance, manage external service
374.25 -> of it and in server it should be localhost and port should be 9004, test your connection.
381.179 -> So we have successfully connected with external services at nine zero zero four.
387.619 -> Now let us understand the reason behind preparing the data. So we have loaded
393.359 -> the original prepared datasource. Let us know the original datasource. So I have
403.64 -> loaded the original datasource, OK, now, whenever we are about to build any
409.66 -> predictive models, it requires multiple value for training. But what tableau does
416.2 -> is aggregates data for single value. So if let's say if a drag and drop adiposity
425.649 -> here. So some of that is aggregation is applied, which is so. And if I pass this
431.529 -> value for training model as my input one value will be passed, that is eleven
436.209 -> thousand seven hundred thirty eight. So using one value, we can't train a model.
443.17 -> I suppose if I pass string and it needs to be aggregated, so it would be
449.2 -> aggregated at the attr function. So * displayed what * indicates is
453.959 -> null. So we can't neither create a model using single value or both. So we require
462.22 -> multiple values. OK, now in our prepared data, let's say if I drag and drop
469.48 -> adiposity here and that data is visible, even if I aggregate it lets sat ATTR and that
477.07 -> data will be visible as it is. So this multiple data is required
482.17 -> for our predictive model. So that is what the reason why I have prepared the data,
489.239 -> let us create user input and that would have been the form of parameter. So the very
494.38 -> first parameter is adiposity and in the same way I will create for all excluding
499.64 -> CHD. Now, I have created all the input parameter. Now let us create predictive
504.97 -> model and model. I will be using its logistic regression. I know
511.299 -> whenever you're writing python code it is to be encapsulated in script function
516.969 -> and python code is written inside " code " So before that we need
521.65 -> to write, we need to pass some inputs. So that is adiposity age. And in the same
531.38 -> we will be parsing all the columns along with the user input parameters. So guys,
539.919 -> all the input, the first three rows will be required for training the model
546.76 -> and creating dataframe, and these 3 rows will be required for manual user
553.15 -> input through which they can check what the probability of having Heart Disease or not.
559.869 -> Now, we need to write python code. So it has to be started by inputting libraries.
565.119 -> So these are the libraries, which is required 1st & 2nd is Panda & Numpy by the end of by what
570.07 -> is used to handle the data. The third one is logistic regression. Using this,
575.89 -> we will build our predictive model next is standard scalar. This is just to make
583.359 -> the independent variable bring every value in this same scale. Literal eval is used
591.039 -> to convert the string into list. So we have imported all the libraries. No need
598.39 -> to store their data inside data. So before that I'm storing it inside a dictionary.
604 -> So dictionary store the value in the form key and value. So I need to store
609.7 -> adiposity so that I'm using AD : and I need to pass this value. OK, this value
616.059 -> is argument one, so _arg1 and index needs to be specified just
621.789 -> zero. OK, so once the value is here, it is in the form of string. So you need
629.5 -> to convert that string into list to convert. We need to make use of literal eval
635.77 -> literal eval will convert that string to list. Now the next is age, So for that
642.299 -> ag : copy and paste, replace it by using the same way I will fill
652.599 -> for the rest. We have created the dictionary. Now let us save it into dataframe.
666.159 -> So we have successfully saved the value. Now we need to separate our independent
672.309 -> variable and dependent variable. These are independent variable and the one which is
678.01 -> to be predicted. This is our dependent variable. So I will separate it using
683.32 -> a slicing technique so all the independent variable used to be stored inside X, so X
689.679 -> equals to the df.integer location. I will use the slicing technique. So I need
694.419 -> entire arrow. So for that right colon comma and the index start from zero.
700.08 -> So I need help here. So zero, one, two, three, four, five, six, seven, eight.
708.09 -> I need all the value, which is index eight. So let's try it right. Colon,
713.789 -> 8.So I have fetched all the value up, so now I need the value at index eight
722.19 -> so for that write df.interger location colon Comma 8.Now, we need to transform
730.53 -> this value and we have to put every single value that is all value is to be
735.57 -> brought on the same scale so for that we need to make use of standard scalar.
740.039 -> So SC standard scalar. OK, now X_t is my new variable. Where we will have
750.27 -> our scaled value, so sc. fit and transform, OK, transform my X variable.
762.059 -> No, we will build a logistic regression model. So lr = logistic
769.71 -> regression. OK, now train the model. So in order to train you fit function X_t and
778.53 -> we need to predict y. OK, now let's check what is the accuracy of it.
785.58 -> So return the accuracy that is lr. score score of X_t,y .
793.08 -> Right. And it is in the form of string. OK, drag and drop you. So our model is 75
805.65 -> percent accurate. Now let's identify the result on the basis of user input
813.09 -> and check what is what would be the probability of having heart disease.
818.34 -> So for that just edit it. OK, so let us do some changes now. We need to make use
829.979 -> of this parameters,so store this parameters in a list that is input_list
841.14 -> and store this value. OK, so we have stop by argument nine. This will be 10, 11, 12,
848.729 -> 13 and so on. So this argument, 10 and index zero using the same way
856.38 -> I will store the rest of the data. But it has to be remember that it should be
861.03 -> in the same sequence. OK, in the same sequence as our data was stored
867.75 -> in the Dataframe. OK, so sequence must not change. So I have stored all this value
875.309 -> of parameter inside list. Now we have created this input list. Now
881.039 -> whenever we are passing in list consisting of only single row of value, it is to be
885.9 -> reshaped. So we need to reshape it so and using a variable inp np.array numpy not adding
893.679 -> store input list and reshape it to one , minus one. No, we need to transform
907.469 -> this input data, so transform it using standard scalar .transform
919.44 -> and pass and inp. Now store the result of prediction, lr.predict, predict
930.28 -> using inp,so the result of the prediction will be stored inside it.Now let us store
936.77 -> the probability so for that lr. predict_proba this will fetch probability
944.619 -> and the probability of inp is to be determined Now returning all the value so return
952.25 -> across. I need to return prediction and I even need to return probability. So just
966.619 -> have a look on the basis of all this input we are having, so all the values
974.929 -> are 1 basically so on the basis of this value, one person can have heart disease
981.169 -> and the probability of not having heart disease is 99 and having our diseases
985.849 -> 0.01. Now, I want to show this 3 values separately so that I will be making 3
993.049 -> calculated fields.So the first is to fetch absent,so edit and just replace this
1002.619 -> and write zero. OK, so absent this, fetch, duplicate and edit here, you have to fetch
1017.38 -> probability, which is inside prob just place in text. So we have list in another
1027.069 -> list, so we need to fetch the first value added. OK, I will specify zero and zero,
1040.219 -> so in this way, I will fetch the first value, now duplicate it. Here,
1052.14 -> specify 1,just drag and drop. So using this way, you can separate these three
1062.219 -> categories. The first is absent, which is a second is absent probability.
1068.01 -> And the third is, yes probability.. that is having disease. You can rename
1073.43 -> the field, as per convenience,now we need to convert its datatype to numeric and create an achievement.
1080.43 -> donut out of it.
1139.533 -> Now, let us work on user input data for parameter's so the three data
1143.88 -> adiposity,LDL and SBP, and using that, I will be creating a file through which user
1149.939 -> can refer and insert value. So, guys, I have created user input details data
1156.089 -> for Adiposity, LDL and SBP now we will be creating other input details
1162.949 -> at the dashboard itself. Now I have changed the size of the dashboard by
1168.54 -> width-1320 and height-720 . Let us create title and border.
1175.92 -> So title and border is ready. Now let us create input details. So guys, I have created
1181.05 -> the user input data for adiposity systolic blood pressure and LDL and these
1187.709 -> are the text for the other columns. Let us insert the sheet, one where we have
1193.619 -> created the donut chart. So those are inserted the file of the donut chart. Let us try
1200.579 -> to find the probability of having disease by inserting in value inside parameter
1206.819 -> the value of adiposity is twenty four ,LDL-2.5 ages =40 alcohol consumption =50
1217.68 -> family history=1, SBP=140 tobacco=10
1224.4 -> typea=50
1230.43 -> So the person with this value as input might have chances of having
1237.13 -> disease 39% not having disease - 61%. So that's how you could build a predictive
1242.609 -> model and but there are some limitations in it. OK, but it was worth giving a try.
1249.119 -> That's, I have tried it and now I want to share it with him. So I hope that you guys
1254.28 -> try to. OK, thank you guys.

Source: https://www.youtube.com/watch?v=R__EeIePba8