The task of segmentation of brain regions affected by ischemic stroke is help to tackle important challenges of modern stroke imaging analysis. Unfortunately, at the moment, the models for solving this problem using machine learning methods are far from ideal. In this paper, we consider a modified 3D UNet architecture to improve the quality of stroke segmentation based on 3D computed tomography images. We use the ISLES 2018 (Ischemic Stroke Lesion Segmentation Challenge 2018) open dataset to train and test the proposed model. Interpretation of the obtained results, as well as the ideas for further experiments are included in the paper. Our evaluation is performed using the Dice or f1 score coefficient and the Jaccard index. Our architecture may simply be extended to ischemia segmentation and computed tomography image identification by selecting relevant hyperparameters. The Dice/f1 score similarity coefficient of our model shown 58% and results close to ground truth which is higher than the standard 3D UNet model, demonstrating that our model can accurately segment ischemic stroke. The modified 3D UNet model proposed by us uses an efficient averaging method inside a neural network. Since this set of ISLES is limited in number, using the data augmentation method and neural network regularization methods to prevent overfitting gave the best result. In addition, one of the advantages is the use of the Intersection over Union loss function, which is based on the assessment of the coincidence of the shapes of the recognized zones.

Ischemic stroke is a violation of the cerebral circulation with damage to the brain tissues, a violation of its functions due to difficulty or cessation of blood flow to a particular section [

In acute ischemic stroke, computed tomography (CT) and magnetic resonance imaging (MRI) are the most effective methods of early imaging [

As mentioned earlier, to minimize the consequences of a stroke, it is necessary to make a diagnosis as soon as possible. The most effective tool for visualizing the brain in the first hours after a stroke is computed tomography [

In this study, we propose a modified UNet neural network architecture for brain stroke segmentation based on CT images.

U-Net has established itself as one of the leaders in the field of medical image segmentation. However, using the classical model to solve segmentation problems does not produce sufficiently good results. Therefore, the strategy of our model was to add methods to the classical U-Net model. The insufficient amount of training data is one of the obstacles, so in this case the problem was solved by using data augmentation. Adam learning rate = 0.0002 was chosen as the optimizer because it provides great computational efficiency, low memory requirement was an important issue for us, and it is invariant to diagonal scaling of gradients. As for the regularization, we used a dropout layer with probability 50% to prevent overfitting. In particular, we wanted to prevent co-adaptation of pixels with their neighbors across feature maps, so we used Spatial Dropout, which excludes entire feature maps from the convolutional layer. Furthermore, l2 regularization with a regularization factor of 0.001 was added to improve the learning ability of the model, resulting in better segmentation. Thus, the strategy we chose provided the first excellent results according to the criteria of IoU, dice/f1 score, recall or sensitivity and precision, in contrast to the classical U-net model. However, this strategy and the evaluation criteria will be completed and improved in further studies as needed.

The analysis of medical images in 1970–1990 was carried out on the basis of sequential low-level pixel processing and mathematical modeling [

Image analysis has also found its place in medicine. Segmentation of organs on medical data provides information about the shape of the object under study, its size, and its area. Each segmentation method is faced with the task of determining the contours/boundaries or sections of the image under study.

Despite the fact that 3D image formats have appeared, many studies and calculations are faster and much easier to perform with 2D formats when segmenting CT and MRI convolutional neural networks. Ignoring the 2D CNN model for 3D information, while 3D CNN models require powerful computing resources, it was advisable to propose an architecture called dimension-fusion-UNet (D-UNet), which provides a combination of 2D and 3D convolution at the encoding stage [

Correctly applied machine learning methods and algorithms in medical image analysis do not always play a key role, but the lack of sufficient labeled data limits the progress of research in this area. The application of a data augmentation structure using a conditional generative adversarial network (cGAN) and a convolutional neural network with segmentation control, generate brain images from specially modified lesion masks, as well as a Similarity Module (FSM) function to facilitate the learning process, which leads to better segmentation of the lesion [

When segmenting medical images, there are also differences between classification and segmentation methods when determining accuracy. The application of a deep convolutional neural network and a cascading structure, establishes a combined learning structure using a conditional random field for a more efficient model with direct dependencies between spatial closure tags used in post-segmentation processing. Thus, it ensures the accuracy of segmentation, the correct network depth and the number of connections [

Automating the process of manually labeled masks, due to which deep learning methods demonstrate impressive performance in segmentation tasks, using a new sequential perception generative adversarial network (CPGAN) under control, with a similarity connection module for capturing information about multiscale functions, and with an auxiliary network simplifies the expensive and time-consuming process [

Another one of the unorthodox research methods with label annotation was proposed by the authors in [

As a dataset, we use the ISLES 2018 (Ischemic Stroke Lesion Segmentation Challenge), which consists of 3D medical CT images of the brain [

The UNet architecture, developed in 2015 by Olaf Ronneberger, Philipp Fischer and Thomas Brock for cell segmentation on microscopic images, has performed well and is widely used to solve image segmentation problems [

In our task, we modify the UNet architecture to achieve higher accuracy in the segmentation of strokes on CT images. Thanks to the use of methods such as data augmentation, dropout, Adam optimization algorithm, l2 regularization and instance normalization, we were able to modify the classical 3D UNet model. The advantages of each method are described below:

Data augmentation. This method allows developers to artificially increase the size of the training set by creating data by modifying an existing dataset [

Dropout was designed to solve the problem of readjusting the neural network during testing due to a large number of parameters [

The Adam optimization algorithm optimizes the speed of adaptive learning. It was specifically designed for deep neural network training [

L2 regularization solves multicollinearity problems (independent variables are highly correlated) by limiting the coefficient and preserving all variables [

The Instance Normalization or contrast normalization prevents changes in the mean and covariance for a given instance, thus simplifying the learning process. Intuitively, the normalization process allows the removal of information about the contrast of a given instance from the content image in a task such as image styling, which simplifies generation [

The convolution operator is calculated using the formula:

The activation operation is a nonlinear function and determines the output signal of a neuron, while the most commonly used in modern neural networks and used in this architecture is a function called “rectifier” (by analogy with a single-half-period rectifier in electrical engineering) [

Unlike other architectures, in proposed modified U-Net Some convolutions were replaced with dilated filter convolutions. This modification expands the filter's field of the view, allowing the model to include more background context information into the calculation. We utilize leaky ReLu activations for all feature maps calculating convolutions across the network. To compensate for the stochasticity caused by small batch sizes utilized owing to memory constraints, we substitute standard batch normalisation with instance normalisation. Dropout layers with l2 regularization were also added to minimize overfitting and as a loss function, we use intersection over union. In this architecture, each blue square corresponds to a multi-channel property map. The number of channels is shown at the top of the square. The x-y size is shown in the lower-left corner of the square. The white squares represent copies of the property map. The arrows indicate various operations. It consists of a narrowing path (left) and an expanding path (right). The narrowing path is a typical convolutional neural network architecture. It consists of re-applying two 3 × 3 × 3 convolutions, followed by ReLU initialization and a maximum join operation (2 × 2 powers of 2) to reduce the resolution. At each step of downsampling, the property channels are doubled. Each step in the expanding path consists of an upsampling operation of the property map, followed by: a 2 × 2 × 2 convolution, which reduces the number of property channels; a merge with an appropriately trimmed property map from the shrinking path; two 3 × 3 × 3 convolutions, followed by a ReLU. The last layer uses a 1 × 1 × 1 convolution to map each 64-component property vector to the desired number of classes. In total, the network contains 23 convolutional layers. The data sizes used in the model are as follows: input form = [5, 128, 128, 32], weight decay = 0. Two comparative architectures were considered, the classic 3D UNet in 200 epochs and the proposed 3D UNet model in 650 epochs. As a result of evaluation according to the main criteria during training, the classic 3D UNet model received results on dice/f1 score-48%, precision-39%, recall/sensitivity-99%, Jaccard index-35%, and the proposed model during training received results on dice/f1 score-90%, precision-83%, recall/sensitivity-93%, Jaccard index-89%, and the test result the classic 3D UNet model received results on dice/f1 score-36%, precision-38%, recall/sensitivity-37%, Jaccard index-32%, and of the proposed model dice/f1 score-58%, precision-68%, recall/sensitivity-60%, Jaccard index-66%.

As an assessment of the quality of the prediction, we use the Dice/f1 score similarity coefficient, precision, recall/sensitivity, and Jaccard index. Dice/f1 score similarity coefficient is responsible for the “similarity” of the two data sets. Let A and B be some sets of voxels (three-dimensional pixels). Formula (3) explains the calculation of the Dice/f1 score similarity coefficient [

However, it is important to note that due to the fact that Dice/f1 score is an undifferentiated metric, its maximization by gradient methods is not directly possible. Therefore, we will minimize the standard cross-entropy metric (logloss), where the class label is equal to one if the voxel belongs to the affected area and zero otherwise. The neural network gives the probability of a bill belonging to a particular class.

To decide whether a prediction is correct with respect to an object or not, the Jaccard index (also called intersection over union) is used. Formula (4) demonstrates the calculation of the Jaccard index [

Recall or sensitivity effectively describes the completeness of our positive predictions relative to the ground truth. Of all of the objections, annotated in our ground truth, how many did we capture as positive predictions? [

For experimental work, we took the classic U-Net architecture with the ISLES 2018 dataset. The practical part of the experiment was conducted in the google colab environment with the Tensorflow library. The classical UNet model was completed after 200 epochs, as it was poorly trained and the loss of validation did not change and was uninformative. Our modified model was also implemented in the google colab environment, but since there was not enough memory and platform power, our model stopped at the level of 650 epochs.

Methods | Classical 3D UNet (Training results) | Classical 3D UNet (Test results) | Proposed approach (Training results) | Proposed approach (Test results) |
---|---|---|---|---|

Dice/f1 score | 48% | 36% | 90% | 58% |

Precision | 39% | 38% | 83% | 68% |

Recall/sensitivity | 99% | 37% | 93% | 60% |

Jaccard index | 35% | 32% | 89% | 66% |

Methods | Dice/f1 score | Precision | Recall/sensitivity |
---|---|---|---|

D-UNET [ |
0.53 | 0.63 | 0.52 |

X-NET [ |
0.48 | 0.60 | 0.47 |

U-NET CNN [ |
0.46 | 0.34 | 0.44 |

SegNet [ |
0.27 | 0.19 | 0.25 |

PSPNet [ |
0.35 | 0.25 | 0.33 |

V-NET [ |
0.43 | 0.50 | 0.49 |

DeepLab v3+ [ |
0.46 | 0.34 | 0.44 |

ResUNET [ |
0.47 | 0.35 | 0.45 |

2D Dense-UNET [ |
0.47 | 0.35 | 0.48 |

The introduction of a decision support system into clinical practice that will significantly accelerate and increase the effectiveness of medical care for brain stroke is an important task. Automatic analysis of neuroimaging data will allow for early differential diagnosis in the shortest possible time, predict the possible outcome of the disease and provide recommendations for the most effective treatment method individually for each patient. The presence of a representative sample consisting of a large number of structured and reliable data is the basis for the implementation of various methods of analyzing medical images, including using machine learning. The effectiveness and accuracy of the models also directly depend on the quality of the initial data (training sample) and require their careful pre-processing [

When conducting research, there is a need to search for cases that meet the selected criteria in the local repositories of a single medical institution, followed by a time-consuming process of marking the images by experts manually. It is for this reason that the samples used in training systems most often contain less than 100 diagnostic series [

Despite the urgency of automating the process of diagnosing ischemic stroke, the availability of appropriate collections of images in the open access in the world is not numerous, and in many countries, such projects are completely absent, which inevitably leads to a loss of accuracy of models obtained on open population data. The most common purpose of organizing publicly available datasets is mainly to assist teams in creating and improving algorithms for automatic segmentation of the lesion volume, and often collections are presented with diagnostic data for ischemic and hemorrhagic stroke [

For this study, we use ISLES 2018 open dataset provided by a medical image segmentation challenge [

Segmentation of brain images is still not a completely solved problem in the field of deep learning. The amount of available materials in the study, including training and testing data, also plays a key role in image segmentation. In this article, we proposed a modified UNet architecture for segmentation of acute ischemic stroke foci. In turn, the UNet model is one of the best methods of segmentation of medical images applied to small amounts of data. In addition, the improved model proposed by us helps to increase the accuracy of segmentation of CT images, which leads to better results. Specifically, the proposed model achieved a 58% similarity coefficient of cubes, 60% recall/sensitivity, which proves the correctness of the technologies used such as: data augmentation to increase training data, dropout to prevent coadaptation of pixels with their neighbors on object maps, removing all object maps from the convolutional layer, an effective Adam optimizer, l2 regulation for multicollinearity problems. As a future work, we consider the improvement of the proposed model by applying fine-tuning, extracting functions. It is also designed to create our own weights in the ISLES 2017 dataset, which will improve our model by adjusting the layers that have the most abstract representations, and for more efficient models in the future, object extraction will be applied, which is one of the main key points. Thus, we believe that the use of these methods will increase the accuracy and accuracy of the evaluation criteria and bring the segmentation process closer to high accuracy.