U-Net is a convolutional neural network that was developed for biomedical image segmentation at the Computer Science Department of the University of Freiburg.[1] The network is based on the fully convolutional network[2] and its architecture was modified and extended to work with fewer training images and to yield more precise segmentations. Segmentation of a 512 × 512 image takes less than a second on a modern GPU.
The U-Net architecture stems from the so-called “fully convolutional network” first proposed by Long, Shelhamer, and Darrell.[2]
The main idea is to supplement a usual contracting network by successive layers, where pooling operations are replaced by upsampling operators. Hence these layers increase the resolution of the output. A successive convolutional layer can then learn to assemble a precise output based on this information.[1]
One important modification in U-Net is that there are a large number of feature channels in the upsampling part, which allow the network to propagate context information to higher resolution layers. As a consequence, the expansive path is more or less symmetric to the contracting part, and yields a u-shaped architecture. The network only uses the valid part of each convolution without any fully connected layers.[2] To predict the pixels in the border region of the image, the missing context is extrapolated by mirroring the input image. This tiling strategy is important to apply the network to large images, since otherwise the resolution would be limited by the GPU memory.
Review: 3D U-Net — Volumetric Segmentation (Medical Image Segmentation)
3D U-Net for Dense Volumetric Segmentation from Sparse Segmentation
Inthis story, 3D U-Net is briefly reviewed. This is a work by University of Freiburg, BIOSS Centre for Biological Signalling Studies, University Hospital Freiburg, University Medical Center Freiburg, and Google DeepMind. It is published as 2016 MICCAI with over 600 citations. (
@ Medium)
Outline
- 3D U-Net Architecture
- Results
1. 3D U-Net Architecture
- The 3D U-Net architecture is quite similar to the U-Net.
- It comprises of an analysis path (left) and a synthesis path (right).
- In the analysis path, each layer contains two 3×3×3 convolutions each followed by a ReLU, and then a 2×2×2 max pooling with strides of two in each dimension.
- In the synthesis path, each layer consists of an up-convolution of 2×2×2 by strides of two in each dimension, followed by two 3×3×3 convolutions each followed by a ReLU.
- Shortcut connections from layers of equal resolution in the analysis path provide the essential high-resolution features to the synthesis path.
- In the last layer, a 1×1×1 convolution reduces the number of output channels to the number of labels which is 3.
- batch normalization (\BN”) before each ReLU.
- 19069955 parameters in total.
2. Results
2.1. Some Details
- Different structures were given the labels 0: “inside the tubule”, 1:”tubule”, 2: “background”, and 3: “unlabeled”.
- Weighted cross entropy loss is used, where weights are reduced for the frequently seen background and weights are increased for the inner tubule to reach a balanced influence of tubule and background voxels on the loss.
- Voxels with label 3 (“unlabeled”) do not contribute to the loss computation, i.e. have a weight of 0.
- Down-sampled versions of the original resolution by factor of two are used.
- Only 3 samples of Xenopus Kidney are used.
- The data sizes used in the experiments are 248×244×64, 245×244×56 and 246×244×59 in x×y×z dimensions for the sample 1, 2, and 3, respectively.
2.2. Two Cases
- a) 1st case: Semi-Automatic segmentation: With a sparsely annotated data set, i.e. some slices of the 3D structure are annotated, the network can help to segment the rest.
- The number of manually annotated slices in orthogonal (yz, xz, xy) slices are (7, 5, 21), (6, 7, 12), and (4, 5, 10) for sample 1, 2, and 3, respectively.
- b) 2nd case: Fully-Automatic Segmentation: After training on training data, the network can be generalized to new data set.
2.3. Semi-Automatic Segmentation
- Start by using 1 annotated slice in each orthogonal direction and increase the number of annotated slices gradually.
- More slices annotated, higher IoU.
- 77 manually annotated slices from all 3 samples into three subsets, too have a 3-fold cross validation both with and without batch normalization (BN).
- 3D U-Net with BN outperforms others.
2.4. Fully-automated Segmentation
- Train on two kidney volumes, segment the third one.
- BN improves the result except for the third setting.
- Authors argue that large differences in the data sets are responsible for this effect. The solution is to have much larger sample sizes.
https://github.com/wolny/pytorch-3dunet
https://paperswithcode.com/paper/3d-u-net-learning-dense-volumetric