Heart failure clinical Data Analysis

Download 428.37 Kb.

Page	3/9
Date	16.12.2020
Size	428.37 Kb.
	#54654

1 2 3 4 5 6 7 8 9

466 project
466 project

Platelets

Creatinine Phosphokinase: displays the level of enzymes that is present in your body.

Diabetic: displays that, whether the person is infected with diabetics or not.

1= Yes

0= No

Ejection fraction: displays the level of volumetric fraction of fluid ejected

High blood pressure: displays whether the person have a problem of high blood pressure or not.

1= Yes

0= No

Platelets: displays the quantity of platelets in your body.
Serum creatinine: displays the level of muscle metabolism in your body.

Serum sodium: displays the level of sodium in body.

Time: displays the number of times patient got heart attack

Fig 1. Shows the summary of each attribute including minimum, maximum, mean and median of each attribute.

Figure 1: Summary of Each Attribute

The Fig 2. show the head of our dataset, which includes five sample records from the dataset showing the above mention attributes.

Figure 2: Sample Records

The following images shows the data distribution of each attribute in our dataset. The below code has been utilized for plotting of data distribution for each attribute in our dataset, simply by changing the name and color of required attribute.

par("mar")

par(mar=c(4,4,4,4))

par("mar")
# high density vertical lines.

plot(df$Age , type= "h", col="Blue", ylab="Anaemia",xlab="smoking", main="Data Distribution of Age")

Figure 3: Age Distribution

Figure 4: Anaemia Distribution

Figure 5: Creatinine Phosphokinase Distribution

Figure 6: diabetes Distribution

Figure 7: Ejection Fraction Distribution

Figure 8: Blood Pressure Distribution

Figure 9: Platelets Distribution

Figure 10: Sex Distribution

Figure 11: Serum Creatinine Distribution

Figure 12: Serum Sodium Distribution

Figure 13: Smoking Distribution

Figure 14: Time Distribution

par("mar")

par(mar=c(4,4,4,4))

par("mar")

library(ggplot2)

Numbers<-table(df$DEATH_EVENT)

barplot(Numbers,main='Class Distribution',

col=c('red','orange'),legend=rownames(Numbers),

ylab='count')

Figure 15: Unbalanced Class Distribution

We can see that from the distribution of the class instances the dataset is not balanced. We need to balance data be generating the sample of minority class. In classification problems, majority of machine learning algorithms are vulnerable to unbalanced dataset and may leads to worst outcomes. Let’s consider an example to understand, how unbalanced data affect the efficiency of statistical models. Suppose we had 10 malignant and 90 benign tests. A trained and validated machine learning model on such a dataset could then forecast "benign" for all samples and yet correctly reach very high precision. An unbalanced data set can discriminate between the prediction model and the popular class. We have used SMOTE [6] to balance the data among the class distribution. Fig 16. Shows the class distribution of balanced dataset.

Figure 16: Balanced Class Distribution

Download 428.37 Kb.

Share with your friends:

1 2 3 4 5 6 7 8 9