Heart Disease Prediction in Tableau | Logistic Regression in Tableau

Aug 26, 2023

Heart Disease Prediction in Tableau | Logistic Regression in Tableau

This content is solely created for an experimental purpose. It demonstrates writing python script in Tableau and apply machine learning to predict heart disease. It also demonstrates the limitation with the existing python script for ML. This was already published in the previous channel and we are migrating the content in this channel. If you have already viewed it, ignore and wait for upcoming videos.

Dataset and twbx link-
https://drive.google.com/file/d/1D5yr…

Content

0.899 -> Hello, friends, in today's video, we will be doing our Heart Disease prediction, and this will

8.25 -> be done in Tableau by integrating it with Python. I've already created one

15.539 -> on the left and the right side are the input details on the basis of this one can

22.02 -> improve the value inside this parameter's. By inserting the specific value inside

28.52 -> the parameter, one can check what would be the probability of one

33.83 -> having heart disease or not. Now, let us understand its features, so the very first

42.24 -> feature is as SBP stands for systolic blood pressure and it's unit is.

47.54 -> Mm. Of mercury. Let's have a look at its range of value. So if the value is higher

54.659 -> than 5, then that would be considered very critical. Makes us

59.639 -> a little a little lower and also the less bad cholesterol and its unit is mili mol per litre.

68.23 -> But let's have a look at its value. If the value is greater than, 5, that,

73.44 -> that would be considered very high. Next is adiposity, On the basis

79.699 -> of the value of adiposity person, will be categorised whether he is overweight,

84.29 -> underweight or the case of obesity. Next is tobacco so tobacco in KG is like

94.41 -> it is a cumulative value of tobacco engaging that a person has consumed

99.839 -> throughout this life. Next is alcohol sorry to say this, but from the data that I have

107.4 -> collected, it doesn't have any info. for it. But we have a range of value and the range

112.769 -> of value is between 0 to 150. Next is type A, the value of type is between one

121.62 -> hundred. That is the score. So what type a is type a of behavior pattern that is being

127.379 -> expressed by a person who are very ambitious and they are more likely to be

132.869 -> get irritated. OK, next is age, so it is very much self-explanatory. Other is family

140.82 -> history, whether a person has any family already related to heart disease. One

147.509 -> is for present, Zero is for Absent CHD is our target variable and it is coronary

153.899 -> heart disease. Present & absent. Now, if you are trying to build a predictive model

160.38 -> in tableau, you need to prepare the data. So rather than passing the value in different

167.039 -> row, you need to pass the value in a list inside a single cell. I will tell

174.179 -> you why we are doing this later on. So in order to prepare the data that way we will be

180.36 -> making use of python. So guys, I have written the code in Python. Let me explain

186.839 -> what it does in the very first step. I have imported library pandas.

192.36 -> In the next step, you need to store the value of the data inside Dataframe .OK,

197.429 -> so I have stored the value of dataframe df In the next step, you need to store

202.649 -> the name of the column inside the list. So the df.column fetches the name of the column

208.619 -> and .tolist() convert that into list and I have saved it inside column named Col, you know,

216.149 -> in the next step we will be preparing the data. Let me explain the first line.

222.029 -> So what it does is it initially fetch the data that is in the first column, OK

229.74 -> And it is converted to list using the function to list. Now this list is stored

236.369 -> inside another list. That's why there is square bracket outside using the same way

242.399 -> I have stored all the data inside the column in all the variable that

247.529 -> is on the left side. Let me show you one or both of SBP. So just take a look at all

255.389 -> the value stored inside lists and that list is inside the list. In the next step,

262.529 -> we will be storing the value inside dictionary in the form of key and value pair.

267.07 -> But so the very first value is key. That is called zero call is our list and zero

273.75 -> is the index. So the value of the column index zero is SBP MM HG. So that will be

281.64 -> the first key SBP HG and the value of that will be SBP. That is this one. OK,

290.489 -> using the same way all the values will be stored inside this dictionary, the form

294.72 -> of key and value. But OK, now let me say that in in the form of dataframe ,OK I will be exporting

305.94 -> this file in the form of Excel in this folder. So I have successfully exported

314.13 -> the file in this folder like this. Here it is. This is a file. OK, so I have passed

324.059 -> the value inside the list and all this while using a single cell.

329.339 -> So before importing this data inside tableau make sure that you did it the first

335.399 -> column. Now we need to establish connection between python and tableau so for

342.059 -> that you need to install tabpy server prior. I have already installed it. So I'm copying

347.369 -> the location and I'm changing the directory in color. OK, now I will run

354.329 -> that batch file that startup.bat got back there. OK. No, it will initialize services

361.2 -> exports. Nine zero zero four. OK, now I have loaded the file in tableau. Now we need

369.839 -> to establish connection ,go to help setting & preformance, manage external service

374.25 -> of it and in server it should be localhost and port should be 9004, test your connection.

381.179 -> So we have successfully connected with external services at nine zero zero four.

387.619 -> Now let us understand the reason behind preparing the data. So we have loaded

393.359 -> the original prepared datasource. Let us know the original datasource. So I have

403.64 -> loaded the original datasource, OK, now, whenever we are about to build any

409.66 -> predictive models, it requires multiple value for training. But what tableau does

416.2 -> is aggregates data for single value. So if let's say if a drag and drop adiposity

425.649 -> here. So some of that is aggregation is applied, which is so. And if I pass this

431.529 -> value for training model as my input one value will be passed, that is eleven

436.209 -> thousand seven hundred thirty eight. So using one value, we can't train a model.

443.17 -> I suppose if I pass string and it needs to be aggregated, so it would be

449.2 -> aggregated at the attr function. So * displayed what * indicates is

453.959 -> null. So we can't neither create a model using single value or both. So we require

462.22 -> multiple values. OK, now in our prepared data, let's say if I drag and drop

469.48 -> adiposity here and that data is visible, even if I aggregate it lets sat ATTR and that

477.07 -> data will be visible as it is. So this multiple data is required

482.17 -> for our predictive model. So that is what the reason why I have prepared the data,

489.239 -> let us create user input and that would have been the form of parameter. So the very

494.38 -> first parameter is adiposity and in the same way I will create for all excluding

499.64 -> CHD. Now, I have created all the input parameter. Now let us create predictive

504.97 -> model and model. I will be using its logistic regression. I know

511.299 -> whenever you're writing python code it is to be encapsulated in script function

516.969 -> and python code is written inside " code " So before that we need

521.65 -> to write, we need to pass some inputs. So that is adiposity age. And in the same

531.38 -> we will be parsing all the columns along with the user input parameters. So guys,

539.919 -> all the input, the first three rows will be required for training the model

546.76 -> and creating dataframe, and these 3 rows will be required for manual user

553.15 -> input through which they can check what the probability of having Heart Disease or not.

559.869 -> Now, we need to write python code. So it has to be started by inputting libraries.

565.119 -> So these are the libraries, which is required 1st & 2nd is Panda & Numpy by the end of by what

570.07 -> is used to handle the data. The third one is logistic regression. Using this,

575.89 -> we will build our predictive model next is standard scalar. This is just to make

583.359 -> the independent variable bring every value in this same scale. Literal eval is used

591.039 -> to convert the string into list. So we have imported all the libraries. No need

598.39 -> to store their data inside data. So before that I'm storing it inside a dictionary.

604 -> So dictionary store the value in the form key and value. So I need to store

609.7 -> adiposity so that I'm using AD : and I need to pass this value. OK, this value

616.059 -> is argument one, so _arg1 and index needs to be specified just

621.789 -> zero. OK, so once the value is here, it is in the form of string. So you need

629.5 -> to convert that string into list to convert. We need to make use of literal eval

635.77 -> literal eval will convert that string to list. Now the next is age, So for that

642.299 -> ag : copy and paste, replace it by using the same way I will fill

652.599 -> for the rest. We have created the dictionary. Now let us save it into dataframe.

666.159 -> So we have successfully saved the value. Now we need to separate our independent

672.309 -> variable and dependent variable. These are independent variable and the one which is

678.01 -> to be predicted. This is our dependent variable. So I will separate it using

683.32 -> a slicing technique so all the independent variable used to be stored inside X, so X

689.679 -> equals to the df.integer location. I will use the slicing technique. So I need

694.419 -> entire arrow. So for that right colon comma and the index start from zero.

700.08 -> So I need help here. So zero, one, two, three, four, five, six, seven, eight.

708.09 -> I need all the value, which is index eight. So let's try it right. Colon,

713.789 -> 8.So I have fetched all the value up, so now I need the value at index eight

722.19 -> so for that write df.interger location colon Comma 8.Now, we need to transform

730.53 -> this value and we have to put every single value that is all value is to be

735.57 -> brought on the same scale so for that we need to make use of standard scalar.

740.039 -> So SC standard scalar. OK, now X_t is my new variable. Where we will have

750.27 -> our scaled value, so sc. fit and transform, OK, transform my X variable.

762.059 -> No, we will build a logistic regression model. So lr = logistic

769.71 -> regression. OK, now train the model. So in order to train you fit function X_t and

778.53 -> we need to predict y. OK, now let's check what is the accuracy of it.

785.58 -> So return the accuracy that is lr. score score of X_t,y .

793.08 -> Right. And it is in the form of string. OK, drag and drop you. So our model is 75

805.65 -> percent accurate. Now let's identify the result on the basis of user input

813.09 -> and check what is what would be the probability of having heart disease.

818.34 -> So for that just edit it. OK, so let us do some changes now. We need to make use

829.979 -> of this parameters,so store this parameters in a list that is input_list

841.14 -> and store this value. OK, so we have stop by argument nine. This will be 10, 11, 12,

848.729 -> 13 and so on. So this argument, 10 and index zero using the same way

856.38 -> I will store the rest of the data. But it has to be remember that it should be

861.03 -> in the same sequence. OK, in the same sequence as our data was stored

867.75 -> in the Dataframe. OK, so sequence must not change. So I have stored all this value

875.309 -> of parameter inside list. Now we have created this input list. Now

881.039 -> whenever we are passing in list consisting of only single row of value, it is to be

885.9 -> reshaped. So we need to reshape it so and using a variable inp np.array numpy not adding

893.679 -> store input list and reshape it to one , minus one. No, we need to transform

907.469 -> this input data, so transform it using standard scalar .transform

919.44 -> and pass and inp. Now store the result of prediction, lr.predict, predict

930.28 -> using inp,so the result of the prediction will be stored inside it.Now let us store

936.77 -> the probability so for that lr. predict_proba this will fetch probability

944.619 -> and the probability of inp is to be determined Now returning all the value so return

952.25 -> across. I need to return prediction and I even need to return probability. So just

966.619 -> have a look on the basis of all this input we are having, so all the values

974.929 -> are 1 basically so on the basis of this value, one person can have heart disease

981.169 -> and the probability of not having heart disease is 99 and having our diseases

985.849 -> 0.01. Now, I want to show this 3 values separately so that I will be making 3

993.049 -> calculated fields.So the first is to fetch absent,so edit and just replace this

1002.619 -> and write zero. OK, so absent this, fetch, duplicate and edit here, you have to fetch

1017.38 -> probability, which is inside prob just place in text. So we have list in another

1027.069 -> list, so we need to fetch the first value added. OK, I will specify zero and zero,

1040.219 -> so in this way, I will fetch the first value, now duplicate it. Here,

1052.14 -> specify 1,just drag and drop. So using this way, you can separate these three

1062.219 -> categories. The first is absent, which is a second is absent probability.

1068.01 -> And the third is, yes probability.. that is having disease. You can rename

1073.43 -> the field, as per convenience,now we need to convert its datatype to numeric and create an achievement.

1080.43 -> donut out of it.

1139.533 -> Now, let us work on user input data for parameter's so the three data

1143.88 -> adiposity,LDL and SBP, and using that, I will be creating a file through which user

1149.939 -> can refer and insert value. So, guys, I have created user input details data

1156.089 -> for Adiposity, LDL and SBP now we will be creating other input details

1162.949 -> at the dashboard itself. Now I have changed the size of the dashboard by

1168.54 -> width-1320 and height-720 . Let us create title and border.

1175.92 -> So title and border is ready. Now let us create input details. So guys, I have created

1181.05 -> the user input data for adiposity systolic blood pressure and LDL and these

1187.709 -> are the text for the other columns. Let us insert the sheet, one where we have

1193.619 -> created the donut chart. So those are inserted the file of the donut chart. Let us try

1200.579 -> to find the probability of having disease by inserting in value inside parameter

1206.819 -> the value of adiposity is twenty four ,LDL-2.5 ages =40 alcohol consumption =50

1217.68 -> family history=1, SBP=140 tobacco=10

1224.4 -> typea=50

1230.43 -> So the person with this value as input might have chances of having

1237.13 -> disease 39% not having disease - 61%. So that's how you could build a predictive

1242.609 -> model and but there are some limitations in it. OK, but it was worth giving a try.

1249.119 -> That's, I have tried it and now I want to share it with him. So I hope that you guys

1254.28 -> try to. OK, thank you guys.

Source: https://www.youtube.com/watch?v=R__EeIePba8