March 13, 2021

Fetal Health Classification
Classify fetal health to prevent child and maternal mortality

Twitter image. Credit: Brett Jordan

1 Abstract

The goal of this study is to classify fetal health in 3 categories in order to prevent child and maternal mortality (Normal, Suspicious and Pathological). By testing medicale cardiotocographic data ( from Kaggle) on 3 models (KNN, LinearSVC,C-SVC), the more accurate model to predict the health criticity of the patients is the KNN model. The PCA modelisation and correlation prove that all CTG variables are need to achieve a good prediction.

2 Motivation

In order to achieve the UNICEF Sustainable Development Goals, all countries aiming to reduce under‑5 mortality to at least as low as 25 per 1,000 live births. Cardiotocograms (CTGs) are a simple and cost accessible medical devices to assess fetal health. With ultrasound the CTG measure fetal heart rate, fetal movements, uterine contractions…(11 measurements). A good indicator of the fetal health is valuable for healthcare professionals.

3 Dataset

The data set is a csv file downloaded from Kaggle website. It is compound of 2126 samples and 22 variables. Data came from Ayres de Campos et al. (2000) SisPorto 2.0 A Program for Automated Analysis of Cardiotocograms. J Matern Fetal Med 5:311-318.

4 Data preparation and Cleaning

The most important preparation is to check for the absolute absence of patient identity. With the help of a midwife we select the variable available in most CTG. The study will contain 2016 samples and 11 variables.

Variables

baseline value, accelerations, fetal_movement, uterine_contractions, light_decelerations, severe_decelerations , prolongued_decelerations , Abnormal_short_term_variability, mean_short_term_variability, abnormal_long_term_variability, long_term_variability.

5 Research Questions

Main question: Is it possible to predict fetal health with CTG variables? With what precision?

Secondary question: When predict fetal health, does some CTG variables useless ? Could we simplify the CTG interface?

6 Methods

Does some CTG variables useless? a/ Correlation between variables b/ Principal Components Analysis (PCA)

Is it possible to predict fetal health? With what precision? a/ Find the best fitting model, data split b/ Support vector machines: LinearSVC c/ Nearest Neighbors Classification d/ Support vector machines: C-Support Vector Classification

Findings: Does some CTG variables more usefull?

heatmap

We can't see any strong correlation between the variables (less than 0.5). A variables reduction will be difficult.

Principal component analysis of the samples

Correlation between variables

Correlation

The PCA show us that the 3 groups are mixed. Generally, the more the groups are mixed, the more variables are need to achieve an accurate classification.

Principal component analysis of the variables

PCA projection

The variables PCA shows spread variable around the circle. The vector of most of the variables is long (>0.5). Wich mean that the variable reduction won’t be possible without losing too much information.

Explained inertia by principal components

Explained inertia

The main component (0) is small: 0.25. The second too: 0.15 Wich mean that these 2 components explain only 40% of the information. The total of the other component explain 60% of the information. To explain a fetal health accurately the 11 variables are needed.

Findings: Is it possible to predict fetal health?

a/Find the best fitting model, data split

The samples had been split in 2 group: 40% as train samples 60% as test samples

10 runs had been done with each model in order to ensure results stability.

Support vector machines (SVMs) LinearSVC

LinearSVC with 10000 iteration => accuracy score: 0.786

Nearest Neighbors Classification

KNeighborsClassifier(n_neighbors=3) => accuracy score: 0.874

Support vector machines: C-Support Vector Classification

SVC() => accuracy score: 0.813

The KNN is the more accurate over the 3 models.

Findings: Test of the selected model with midwives usual values

his is a real life model test using the variables traced by midwives in the patient report: baseline value, accelerations, uterine_contractions, light_decelerations, severe_decelerations, prolongued_decelerations, long_term_variability.

KNeighborsClassifier(n_neighbors=3) => accuracy score: 0.756

Because of the lack of some values, the model accuracy is lower: 0.756 < 0.874

Limitations

Because of the limitation of the number of samples, the model is fragile. With a larger sample, the model would be more accurate and thrustable. Other important variables are missing, like gestational age and pregnancy pathologies.

Conclusions

ACT and correlation between variables prove that all 11 variables are needed for accurate diagnostics. In clinic, the midwives usent all of those variable to estimate the patient risk.

The model use to predict fetal health score an accuracy of more than 85%. The prediction seems good, but only as a proposal value, not for a medical diagnostic.

Acknowledgements

Data set from Ayres de Campos et al. (2000) SisPorto 2.0 A Program for Automated Analysis of Cardiotocograms. J Matern Fetal Med 5:311-318.

Contribution of a midwife from a private clinic center.

References

UNESCO: Child survival and the SDGs (2020).

SisPorto 2.0: A Program for Automated Analysis of Cardiotocograms. J Matern Fetal Med 5:311-318

Code

feel free to use it

By Benoit Pont