Heart failure clinical Data Analysis



Download 428.37 Kb.
Page3/9
Date16.12.2020
Size428.37 Kb.
#54654
1   2   3   4   5   6   7   8   9
466 project
466 project
Creatinine Phosphokinase: displays the level of enzymes that is present in your body.

Diabetic: displays that, whether the person is infected with diabetics or not.

1= Yes


0= No

Ejection fraction: displays the level of volumetric fraction of fluid ejected

High blood pressure: displays whether the person have a problem of high blood pressure or not.

1= Yes


0= No

Platelets: displays the quantity of platelets in your body.
Serum creatinine: displays the level of muscle metabolism in your body.

Serum sodium: displays the level of sodium in body.

Time: displays the number of times patient got heart attack

Fig 1. Shows the summary of each attribute including minimum, maximum, mean and median of each attribute.



Figure 1: Summary of Each Attribute

The Fig 2. show the head of our dataset, which includes five sample records from the dataset showing the above mention attributes.



Figure 2: Sample Records

The following images shows the data distribution of each attribute in our dataset. The below code has been utilized for plotting of data distribution for each attribute in our dataset, simply by changing the name and color of required attribute.

par("mar")

par(mar=c(4,4,4,4))

par("mar")
# high density vertical lines.

plot(df$Age , type= "h", col="Blue", ylab="Anaemia",xlab="smoking", main="Data Distribution of Age")



Figure 3: Age Distribution



Figure 4: Anaemia Distribution



Figure 5: Creatinine Phosphokinase Distribution



Figure 6: diabetes Distribution



Figure 7: Ejection Fraction Distribution



Figure 8: Blood Pressure Distribution



Figure 9: Platelets Distribution



Figure 10: Sex Distribution



Figure 11: Serum Creatinine Distribution



Figure 12: Serum Sodium Distribution



Figure 13: Smoking Distribution



Figure 14: Time Distribution

par("mar")

par(mar=c(4,4,4,4))

par("mar")

library(ggplot2)

Numbers<-table(df$DEATH_EVENT)

barplot(Numbers,main='Class Distribution',

col=c('red','orange'),legend=rownames(Numbers),

ylab='count')



Figure 15: Unbalanced Class Distribution

We can see that from the distribution of the class instances the dataset is not balanced. We need to balance data be generating the sample of minority class. In classification problems, majority of machine learning algorithms are vulnerable to unbalanced dataset and may leads to worst outcomes. Let’s consider an example to understand, how unbalanced data affect the efficiency of statistical models. Suppose we had 10 malignant and 90 benign tests. A trained and validated machine learning model on such a dataset could then forecast "benign" for all samples and yet correctly reach very high precision. An unbalanced data set can discriminate between the prediction model and the popular class. We have used SMOTE [6] to balance the data among the class distribution. Fig 16. Shows the class distribution of balanced dataset.





Figure 16: Balanced Class Distribution

  1. Download 428.37 Kb.

    Share with your friends:
1   2   3   4   5   6   7   8   9




The database is protected by copyright ©ininet.org 2024
send message

    Main page