Mika Anttonen and Dongwann KangA Survey on VR-Based Annotation of Medical ImagesAbstract: The usage of virtual reality (VR) in healthcare field has been gaining attention lately. The main use cases revolve around medical imaging and clinical skill training. Healthcare professionals have found great benefits in these cases when done in VR. While medical imaging on the desktop has lots of available software with various tools, VR versions are mostly stripped-down with only basic tools. One of the many tool groups significantly missing is annotation. In this paper, we survey the current situation of medical imaging software both on the desktop and in the VR environment. We will discuss general information on medical imaging and provide examples of both desktop and VR applications. We will also discuss the current status of annotation in VR, the problems that need to be overcome and possible solutions for them. The findings of this paper should help developers of future medical image annotation tools in choosing which problem they want to tackle and possible methods. The findings will be used to help in our future work of developing annotation tools. Keywords: Image Annotation , Medical Imaging , Virtual Reality 1. IntroductionSelling and developing virtual reality (VR) equipment has become profitable during recent years after a rocky start. While lower-priced equipment becoming available has helped in selling more units, there are other reasons that have also boosted sales including general improvement in household computers. When Valve released the original Oculus Rift CV1 in the year 2016, the requirements for running the equipment smoothly were quite high for computers of that time. The minimum requirements set by the current manufacturer for running the hardware [1] have components that were mainly released in 2015 or 2016. While VR still is not recommended for low-end computers, the current mid-level computers have become powerful enough to run most equipment and software without major problems. While gaming is still the biggest customer field by a big margin, multiple sources estimate that by the year 2025 healthcare will be the second or third biggest field with investments of at least 5 billion dollars [2,3]. The main competitors of healthcare will be education and engineering depending on the source. While the idea of using VR for healthcare was discussed over 20 years ago, actual usage is still quite rare. Some main use cases of VR in healthcare are related to medical imaging. Medical imaging can be used for training, surgery planning and diagnosis. There are a few pieces of software available that allow viewing and analyzing medical images in VR. Software like 3D Slicer and SurgeryVision have already shown that there are great benefits to be gained in the speed of analyzing and learning about medical images in VR. Tools that are available in the software are quite limited especially when annotation is discussed. Annotation in VR is taking its first steps with DIVA Cloud and Medical Imaging XR, but both have their limitations in performance and usability. Neither of the mentioned software can export annotations as of writing this survey. In this paper we survey the current situation of medical imaging on the desktop and in VR. We will discuss the main file formats and standards for medical imaging with some examples of available software. Later we will discuss the requirements for annotation software development on VR and problems that current pieces of software are facing. We hope that this survey will help developers of medical image annotation tools in locating future problems and fixing them. This paper is organized as follows: Section 2 discusses medical imaging in general with an introduction to main file formats and examples of medical imaging software. Section 3 discusses medical imaging in VR and a few pieces of software that are available for the public. Section 4 shows generalized examples of service scenarios involving medical images. Section 5 discusses some problems current VR annotation software faces and how to possibly solve them or at least lessen their impact. It also discusses some other requirements that already have solutions. Section 6 concludes this paper with our future plans regarding the subject discussed in this paper. 2. Background Knowledge on Medical ImagingThis section discusses some basic knowledge of medical imaging. This knowledge includes main file formats used with medical imaging and some examples of open-source software that can be used to view medical images. Table 1 provides a comparison of the capabilities of software mentioned. 2.1 File FormatsMost pieces of software that are used to look at medical images use images that follow either the DICOM [4] or NIfTI [5] standard. Filename extensions for files are .dcm and .nii, respectively. NIfTI is further divided into two formats, NIfTI-1 and NIfTI-2. NIfTI-2 is an extension to the NIfTI-1 format that allows larger images and matrices. Other formats worthy of mentioning are Analyze [6] and Minc [7,8]. NIfTI was originally based on Analyze with a purpose of overcoming Analyze's weaknesses such as its format not supporting some basic data types including unsigned 16-bits. Minc was created by the Montreal Neurological Institute as a flexible data format, but it has failed to gain traction. It is mainly used by software developed by this same institute. Some software also supports TIFF images. Both DICOM and NIfTI standards require files to have certain information in them that include but are not limited to an image’s location and rotation in the body, image modulation (computed tomography [CT], magnetic resonance imaging [MRI], X-ray, etc.) and real-life distance between pixels. Information fields are commonly called tags [9]. The DICOM standard has its own image transfer protocols that use certain tags as parameters. These tags will be discussed later. Main difference between the mentioned standards is their number of dimensions [10]. DICOM images are two-dimensional (2D) while NIfTI images are three-dimensional (3D). NIfTI images are generally made by combining a series of DICOM images. Most pieces of medical imaging software that support NIfTI can perform this combination. To make third dimension accurate tags included in DICOM files are used to arrange images and to give thickness to original images. Thickness is directly proportional to distance between locations images represent. If distance between two subsequent images is high, thickness also becomes high. Using 2D images with high thickness will result in inaccurate 3D images. Therefore, to achieve best quality for 3D images, original image set should include images with small distances between them. A lot of software that supports visualizing 3D images use either NIfTI files or method previously mentioned to create image from DICOM files. There are some websites that have free medical images available for download. Images on such sites are commonly anonymized due to patient privacy laws. MedDream has images from 13 different modalities available for downloading [11]. The user can view the images on MedDream's online DICOM Viewer before downloading. According to DICOM and NIfTI standards, images are divided into databases on four different levels [12]. Commonly these databases are called PACS which stands for picture archiving and communication system. Levels from top to bottom are patient, study, series and image or instance. Each item on a higher level can hold multiple items from lower levels. Each file has tags to show what item they belong to on each level. These tags are those used as parameters with image transfer protocol. The patient tells you who they are. The study tells you when images were taken. The series holds images that are in a straight relationship to each other. One series holds images that are of same image modulation and body part. NIfTI images are made of images that belong to same series. Instance, the last level, points to a single image or DICOM file. 2.2 Existing SoftwareThere are many pieces of medical imaging software, some of which are open source. While all of them support 2D images, a significant number of them also support 3D images. Minimalistic medical imaging software allows a user to view images in 2D, tags included in file and pixel values of each pixel. Pixel values are important for medical professionals as they can be used to count Hounsfield unit (HU) values which represent radiodensity of a material [13]. Similar to grayscale values, really low pixel values show themselves as black and really high values as white. Medical imaging professionals generally remember important HU values. One of the simplest software available is MicroDicom [14]. MicroDicom supports only 2D images and so does not support NIfTI files. A user can open multiple images at once and scroll through them. Images are sorted in order they would be in the physical world with same method that is used when combining DICOMs to make NIfTI. Software has simple measuring and marking tools. MicroDicom also shows tags for each file and pixel values for each pixel. On the other hand, 3D Slicer [15] includes all functions that MicroDicom does. In addition to 2D images, it also supports 3D and 4D images. Images can be viewed as cross-sectional or 3D models. Cross-sectional imaging allows viewing of slices of 3D image from any of available axes [16]. Fig. 1 shows an example of a cross-sectional view of a CT image from three different directions. The 3D Slicer can communicate with image archives that support DICOM image transfer protocols. Regarding annotation, most important addition are tools for automatic and manual segmentation. If segments can be exported correctly, they can be used as training material for artificial intelligence (AI). Fig. 2 shows a cross-sectional view of an image where lungs have been segmented. Currently, the 3D Slicer can export segments in STL and OBJ formats. Between the 3D Slicer and MicroDicom are software like MITK [17]. MITK supports more file formats than MicroDicom including NIfTI. It supports 2D and 3D images. MITK shows images as cross-sectional but not as 3D models. MITK has tools that MicroDicom has and also multiple manual 2D and 3D segmentation tools. Segments can be viewed within cross-sectional views or as 3D models. MITK is capable of recognizing STL files as segmentations and showing them alongside other medical images. MITK is capable of exporting both segmentations and medical images in many different file formats like NIfTI and VTK [18]. Table 1 compares the main capabilities of the software introduced earlier in this section. In Table 1, we only consider 3D Slicer's desktop version. Its VR capabilities are included in Table 2. Things compared are supported input file formats, maximum number of dimensions of an image, existence of measuring and annotation/segmentation tools and capability of communicating with PACS using DICOM transfer protocols. As there are many kinds of measuring and segmentation tools, the table only shows if the software has them or not. Every software introduced has measuring tools. As mentioned, MicroDicom is very simplistic. It only supports one file format and 2D images. It also does not have annotation or segmentation tools. However, it has the capability to communicate with PACS which MITK does not have. The 3D Slicer and MITK both support many file formats. The 3D Slicer supports a maximum of 4 dimensions while MITK supports 3. Both have tools for annotation and segmentation. Similar to MicroDicom, 3D Slicer is capable of communicating with PACS. Table 1.
Table 2.
3. VR Technology in Medical ImagingVR-based medical imaging software is a still relatively new market. In this section we will discuss some basic knowledge on medical imaging in VR and its main uses. A few examples of software that allow medical imaging in VR will be introduced. Similarly to Section 2, a table comparing the examples is shown at the end of the section. 3.1 Background on Medical Imaging in VRThe use of VR for medical purposes was already theorized in 1990s. Hoffman and Vu [19] predicted the main usages to be teaching anatomy and clinical skills training. Both cases were said to save a lot of resources. Students would be able to reach a better understanding of the topic they are studying by seeing it from multiple points of view. Students would see parts of the body and their behavior that might not be seen with other methods. Clinical skills could be trained safely away from patients and with less supervision from professionals. Training in VR would provide a chance to train for rarer situations and material could be reused. These sentiments have been later supported by Uppot et al. [20]. As an additional advantage they brought up better communication between students. Main use cases of medical imaging in VR are visualization and analysis of 3D images. Surgery planning is also done to certain degree. Unlike when using desktop displays, medical images in VR are usually shown exclusively in 3D with some exceptions. If images are given in 2D form, the images are combined to make a 3D image as mentioned in Section 2.1. Research suggests that analyzing images in VR is faster and more accurate than on a monitor when the user has little to no experience with medical images. With users that have more experience the situation is not as clear. Timonen et al. [21] found that when analysis tasks were done by experts, VR was somewhat slower than 2D. There was no change in accuracy between the two methods. In the questionnaire regarding the user experience that was filled after the trials, both novice and expert users evaluated the VR version as the better option in most categories. A 2D method won in appearance of tools and ergonomics. The biggest victories for VR were in depth perception and learning levels, latter of which was supported by observations made during the trials. 3.2 Existing VR Medical Imaging SoftwareAs medical imaging in VR is relatively new technique, the number of available software pieces are lower especially on the open-source side. Most pieces of software support CT and MR images but some of them support more. As with desktop imaging software, most pieces of VR software have measuring tools. Other tools that are very common are cutting, clipping, slicing or excluder tools. These tools allow the user to see inside the 3D medical images. The user drags a tool of some shape into the image and creates a hole that can be looked into. While the tool comes in many different shapes, a planar shaped clipping tool is the most common one. Another common function found in most software is the ability to exclude certain pixel or HU values from the image. If the user knows what kind of values they are looking for in the image, they can remove the parts that are not needed. If the user has no interest in the bones, they can easily remove them so that it's easier to see the important parts. Some pieces of software allow the user to select multiple ranges for the program to exclude while others only allow exclusion on lower and higher ends of the spectrum. Two pieces of simple VR imaging software are student version of Medical Imaging XR [22] and 3D Slicer which was also mentioned in Section 2.2. Medical Imaging XR allows medical students to download a free limited version for their studies. The student version allows the user to view 3D CT and MR images, while the professional versions also support CBCT, STL and OBJ files. VR version of 3D Slicer is a stripped-down version of the desktop version. The user can still do the same things on the desktop as usual, but in VR mode the user interactions are limited to moving objects, including the image and tools and marking areas. The 3D Slicer has a store where the user can download extensions for their software. At the time of writing, Slicer Extensions Manager has 86 different extensions. Fig. 3 shows a VR view of 3D Slicer with two simultaneous users. In the image the user can see another user marking an abnormality in a on a model created from a medical image. Another free software is called DIVA. DIVA has similar measuring and clipping tools as the software mentioned earlier. DIVA's clipping tools come in multiple shapes and can be moved and rotated freely using VR controllers. DIVA only supports Tagged Image File Format which is also known as TIFF. Some other image formats can be converted to TIFF, but the process has to be performed by the user [23]. According to DIVAvs website, besides CT and MR images the software supports also "all scientific imaging modalities." Instead of an option of just removing unwanted HU values, DIVA offers an option to change the color of values. This makes it even easier for the user to look for things if they know what they are looking for. DIVA has an extension called DIVA Cloud. DIVA Cloud is so far the only medical imaging software that allows annotation in VR. Unlike most annotation software, DIVA Cloud trains the AI during runtime instead of exporting the annotations in a file after the process. A research paper on the extension was published in January 2022 [24]. According to the research, annotation in VR was significantly faster than on the desktop. There were no comparisons made between making the same annotation on the desktop and in VR. Fig. 4 shows a part of an annotated image as seen in DIVA Cloud. Most of the original medical images has been hidden by using clipping tools. The annotated area is shown in green. On the side of paid medical imaging VR software is SurgeryVision [25]. The base version supports CT and MR images that are in DICOM format. As is the case with others, SurgeryVision has measuring and clipping tools, and the possibility to exclude certain HU values from the image. Clipping tools are available in multiple shapes and can be moved and rotated freely using controllers in VR. Use of one of the clipping tools is shown in Fig. 5. There is also a marking tool and possibility to record video image. Most interactions involving the image are only available in VR. While there is a VR simulator available for users who do not have the needed hardware, it is hard to use accurately. User can buy additional modules allowing use of AI, 3D printing and communication with PACS image archives. Table 2 shows how the VR medical imaging software introduced compare to each other. Table 2 includes all the categories shown in Table 1 with some additions. While their desktop counterparts usually support all medical image modalities, VR medical imaging software might only be able to support a few of them. Another addition to the categories is clipping tools as they are necessary in cases where the user must see inside the model. As was with measuring tools, they come in many forms so Table 2 shows if they exist or not. Compared to desktop medical imaging software, many fewer file formats are supported in VR. The 3D Slicer supports the same formats as it does on desktop. Medical Imaging XR supports seven different formats while DIVA and SurgeryVision support only one each. The same happens with modalities. The 3D Slicer and DIVA support all modalities. Medical Imaging XR supports six different modalities while SurgeryVision only supports CT and MR images. DIVA is the only software that cannot communicate with PACS. Medical Imaging XR and 3D Slicer support 4D images while DIVA and SurgeryVision support 3. Every software mentioned has measuring and clipping tools. As mentioned earlier in this section, DIVA has an extension that allows annotation in VR but does not allow exportation of annotations. Medical Imaging XR is capable of segmenting images but does not have a possibility of exporting them either. It also does not train an AI during runtime; therefore DIVA is better in this category. The 3D Slicer does not allow annotation in VR. 4. Common Service Scenarios of Medical ImagingIn this section we will discuss some common service scenarios that involve medical imaging. The examples should not be taken as absolute truth as each healthcare facility has its own protocols. Examples provided are generalized and might differ greatly depending on the location and type of a healthcare center. Patients' needs might also affect the scenario especially in urgent cases. Each scenario is accompanied by a flowchart. 4.1 Surgery Planning for Voluntary OperationIn this scenario, the patient has come for consultation for a voluntary surgery. The scenario starts when the patient walks into the doctor's office and ends when a surgery plan has been finished. It is assumed that there is no need for the patient to come for a second appointment to take images and that no problems arise during the consultation, imaging or surgery planning. The scenario proceeds as follows: 1. The patient arrives at the doctor's office 2. Pre-consultation happens where the operation is discussed. Risks and benefits of the surgery are discussed. Possible existing conditions which could prevent the surgery from happening are discussed to minimize risks. 3. The patient goes to the imaging room. Images related to the surgery planning are taken and transferred to the doctor responsible for planning the surgery. 4. The doctor analyzes the images. The images are segmented either manually or automatically using AI. 5. Based on the images and information discussed during step 2, a surgery plan is made. The plan is verified by other professionals familiar with the operation type. 4.2 Medical Image Segmentation Training for StudentsIn this scenario a student is tested on their understanding of medical images. The scenario is similar to a normal exercise except in the end the results are evaluated. In this example the images are CT images showing the upper torso of a person. The student must segment the images into different segments based on organs. The segments should include heart, lungs and bones. The scenario ends when the resulting segments have been evaluated. The scenario proceeds as follows: 1. The student opens a new image. 2. The student segments the image either manually by mouse or by using automated tools depending on the complexity of organs seen. 3. The student proceeds to the next image if there are any left. Otherwise, they submit their work to be evaluated. 4. A judge evaluates the segments based on speed and accuracy. 5. Results are sent to the student. 5. Requirements of VR Annotation SoftwareAnnotating medical images in VR requires a powerful computer. Depending on the software, a 2D annotation program can run smoothly on a lower-end computer. However, when the annotation is done on a 3D image, it already requires significantly more resources. An average 3D medical image has tens of millions of voxels. Most existing pieces of software are capable of reading pixel or HU values fast. For middle tier computers problems appear when the desired area is made into a 3D model during annotation. Generally, medical image annotation and segmentation programs use either marching squares or marching cubes algorithms depending on the number of dimensions in the image. In marching cubes, the program creates a cube for every coordinate used in the model [26]. Depending on the complexity and size of the model, the number of cubes might reach millions. The program needs to keep information on each of these cubes which requires a lot of memory. The cubes also have to be rendered on the screen which might cause problems for graphics cards. Memory problems can be partly solved by combining the cubes into bigger objects [27]. This way the software can release some memory from being used to remember scale and rotation of each cube. Graphics card problems are tackled with a method that is also common in video games where only objects relevant in any given moment are rendered. This method is called surface rendering [28]. If the model is a solid 9×9×9 cube, the inner 7×7×7 cube is never rendered. While the inner cube still exists in the memory, the computer does not use resources to render it. This is very useful when the models are bigger and bulkier. Annotating blood veins will still remain a problem due to their shape. Both memory and graphics problems can cause problems with framerate. Stable and high framerate in VR is more important than on desktop displays. While the desktop software MITK suffers from low framerate when a 3D model of a segment has been created, it is not as serious of a problem as it would be in VR. On desktop displays, low framerate will result in a slower pace of work and discomfort. In VR, low framerate can cause symptoms similar to those of motion sickness. The symptoms were brought up as a limiting factor for VR and AR usage in the study of Uppot et al. [20] mentioned in Section 3.1. Manufacturers of VR equipment consider 90 frames per second as a minimum [29], but some of them recommend at least 120 frames per second. Higher framerate is always considered better. There are other things that factor in producing symptoms, but high framerate should lower the risk. DIVA Cloud suffered from really poor framerate in VR. Most images had a framerate below 60 frames per second. Two of the 12 images tested had a suitable framerate of 110 and 115 frames per second, while three images dropped below 20 frames per second. Most of the same images on desktop display had a framerate of at least 100 frames per second with the highest being 1100. To use the annotated images as training material for AI, they have to be in the right format. As was mentioned in Section 3.2, DIVA Cloud trains the AI during runtime and so there is no need to export annotations in a file. This is a problem if the developer wants to change the AI used. There are usually no other limitations to the format except that it needs to follow DICOM or NIfTI standard. NVIDIA Clara expects the user to provide two NIfTI files for each annotation [30]. NVIDIA provides a data converter that can be used to make sure the files are in the right format. One of the files has the original NIfTI image without any modifications made during annotation. The second file should have all the same information as the first file except the image should only show the part included in the annotation. One file can include multiple areas of interest. Usually, if the file has only one area of interest, the coordinates of that area have a pixel value of 1 and every other coordinate has a pixel value of 0. If there are more areas of interest, they are separated into different values. For example, if the file is used to show lung cancer, the value of the healthy part of a lung could be 1 while the cancerous tumor would have a value of 2. Everything else would have the value of 0. Fig. 6 shows an example of a cross-sectional view of a NIfTI file representing a liver with two different areas of interest as seen in MITK. The black area has a pixel value of 0, gray value of 1 and white a value of 2. Fig. 7 shows a 3D model made from the same file. 6. Conclusion and Future WorkIn this paper, we discussed medical imaging on desktop and in VR with examples. We also discussed the requirements of making a working annotation program. While annotation of medical images in VR was made possible with DIVA Cloud, there are still obstacles that need to be overcome. DIVA Cloud showed that annotating in VR can save time but running the software requires such a powerful computer that it is not a valid option for wide use at the time of writing. In our future work, we will attempt to tackle the problems DIVA Cloud faced. We plan to make an extension to SurgeryVision that will allow us to annotate medical images in VR and export the annotations in a format that is suitable for NVIDIA Clara. We will attempt to reach a framerate that satisfies the standard of 90 frames per second that VR hardware manufacturers consider as a minimum. The framerate should be reached with the current recommended hardware specifications of SurgeryVision or with only small improvements. BiographyMika Anttonenhttps://orcid.org/0000-0002-3396-0572He received Bachelor of Engineering degree in Information and Communications Technology (ICT) from Turku University of Applied Sciences, with focus on Game Development in 2019. Since March 2020 he is with the Department of Computer Science and Engineering from Seoul National University of Science and Technology as Master's student. His current research interests include virtual reality and medical imaging. BiographyDongwann Kanghttps://orcid.org/0000-0001-7210-4595He is currently an assistant professor in the Department of Computer Science and Engineering, Seoul National University of Science and Technology, Seoul, Korea. He received the Ph.D. degree from Chung-Ang University, South Korea, in 2013, where he has been a research fellow, until 2015. He was a lecturer of Undergraduate Inter-disciplinary Program in Computational Sciences, Seoul National University, South Korea, from 2014 to 2015; a lecturer with the Department of Multimedia, Sookmyung Women's University, South Korea, in 2014; a visiting researcher, from 2015 to 2018, and a Marie Sklodowska-Curie Fellow of the Faculty of Science and Technology, Bournemouth University, UK, in 2018. His research interests include non-photorealis-tic rendering and animation, computer vision, affective computing, and computational aesthetics. References
|