CLASSIFICATION OF BIOMEDICAL IMAGES USING CONTENT BASED IMAGE RETRIEVAL SYSTEMS

: Because of the numerous application of Content-based image retrieval (CBIR) system in various areas


Content-Based Image Retrieval
It is a process in which an image is given to a system and searching is performed on the complete dataset of images to find the similar images based oncolor, size, texture. Histogram Equalization, Standard Deviation,Discrete wavelet transform, HSL (Hue, Saturation, Lightness) and HSV (Hue, Saturation, Value) are some of the parameters that can be used for comparison of various feature matching techniques [1].

Histogram
Histograms are a type of bar plot for numeric data that group the data into bins. In Matlab histogram of an image is calculated using the following syntax [2]:

 imhist(I)
Calculates the histogram for the intensity image I and displays a plot of the histogram. The number of bins in the histogram is determined by the image type [https://in.mathworks.com/help/images/ref/imhist.html#buo3qek-2_1]

Gray-Level Co-Occurrence Matrix (GLCM)
Also known as the gray-level spatial dependence matrix is a statistical method for examining the texture using the spatial relationship of the pixels. The GLCM functions characterize the texture of an image by calculating how often pairs of pixels with specific values and in a specified spatial relationship occur in an image, creating a GLCM, and then extracting statistical measures from this matrix [3].
In Matlab gray-level co-occurrence matrix from the image is created using the following syntax:

 glcms = graycomatrix(I)
This will create a gray-level co-occurrence matrix (GLCM) from the image I. This statement will return one or more gray-level co-occurrence matrices, depending upon the values of NAME/Values. graycomatrix creates the GLCM by calculating how often a pixel with gray-level (grayscale intensity) value i occurs horizontally adjacent to a pixel with the value j.

Correlation Coefficient
For matching the percentage comparison between the two images,correlation coefficient is calculated between them. The function for calculating the correlation coefficient in Matlabis [4]:

Applications of CBIR
1) An information system can be developed related to the Biodiversity, that will contain information of various species in the form of images. In such a system an input in the form of animage can be given and similar images can be searched based upon various characteristics like size, texture, color, etc. 2) CBIR can help in theinvestigation of the criminal cases. Images related to a case can be uploaded to the database and then identification of aperson orplace can be searched accordingly.
3) E-commerce can use CBIR system that will recommend therelevant product to the customer based on feature extraction corresponding to color, shape texture as per the user's liking. 4) CBIR can be developed for medical applications such that treatment of the diseases, diagnosis of the disease at early stage etc. can be done in a more efficient manner [5]. 5) Remote sensing, weather forecasting, surveillance, historical researchare other research areas where content-based applications can be used for automation of the searching process and finding crucial information from vast data.

Research Paper Work Done
Rani et al. [6] The main objective of their work is to use Support Vector Machine in a more efficient manner so that results can be produced in lesser execution time. In their proposed work, images are pre-processed before they are stored in the database. This process helps to enhance the quality of the images by removal of the noise. Then RGB model is used for clustering the images. Again, re clustering using support vector machine algorithm (SVM). Chaudhary et al. [7] In their proposed work, integrated approach is used to extract color and texture feature from images. The help of the multi featured extraction is taken so that efficient image retrieval process can be performed. A higher order of color moment is used for the extraction of color features. Texture extraction and face recognition are done using local binary pattern (LBP).
Trojacanec et al. [8] In their proposed work, an improvement of the two-level CBIR architecture is performed. In the first level, the clinician's voting is included, and in the second level, inclusion to the previously computed agents' voting is performed. The main motive of these two levels is to increase the efficiency of the proposed work and accuracy of the retrieval process.
Antaniet al. [9] Their research work is focused on developing techniques for hybrid text/image retrieval from the survey of text and image data. Their research also shows various challenges that come into existence, when a particular CBIR system is developed especially for the biomedical images. Khan et al. [10] This research gives a comparison of three different approaches of CBIR based on image features and similarity measures taken for finding the similarity between two images. Main conclusions are as follows: 1. A Novel Fusion Approach (NFA) incorporated both local and global level features and has fused them for retrieving images. So, it contained properties of both local image descriptors as well as global image descriptors.

2.
A Universal Model (UM) method used only local features to construct a feature vector.

A Genetic Programming Framework (GPF) considered only global image features
Aloniet al. [11] In their proposed work a novel approach using classification is driven and learning based frame work is developed for the retrieval of the medical images having different modalities. A direct link between the classification and retrieval process is established for efficient working. Relevance feedback (RF) based technique is used for the updating of the feature weights based on feedback given by the user. The authorhave also used multiclass support vector machine (SVM) classifier for category prediction. Ramos et al. [12] In their proposed work in radiology, reports are done which supervise the CBIR. Inferring the relationship between the patients and subsequently applying them to supervise the metric learning algorithms are the main phases of the proposed work. Their research work calculates the text distances between exam reports in the exam image space to supervise a metric learning algorithm. Using this method, there is a consistent increase in the CBIR performance.

Dataset Characterization
In the present work, the data set from National Biomedical Imaging Archive (NBIA) is used. NBIA is a searchable repository of in vivo images that provides the biomedical research community, industry, and academia with access to image archives. Dataset corresponding to the cancer images for Breast having protocol name Bilat Breast Lesion Vib is used in the present work for observations [13].

Work flow of Proposed Work
1) Collection of the images from the repository of National Biomedical Imaging Archive (NBIA). 2) Select and image say X corresponding to which similar images are to be searched.
3) Draw Histogram of this image. 4) Input the percentage corresponding to which matching is being done. 5) Select the first image for the repository, calculate its GLCM and HISTOGRAM. 6) Calculate correlation coefficient of the these to images. 7) If thevalue of the correlation coefficient matrix is equal to or less than the image X, then both the images are same. 8) Load next image and repeat from Step No.5.
Diagrammatical representation of the workflow is shown in Fig.1.

Conclusion and Future Scope
In the present work, CBIR is implemented using LCM and is used to search the similar images for various images of breast cancer. The proposed system can be very much helpful for the physicians. By matching the images of the present patient with the previous patient, diagnosis can be done in the early stage, and most efficient medicine can be prescribed accordingly. Present work implements sequential matching of each image one by one, and in future work, other programming languages having multithreading support [14]can be used so that parallel matching can be started at the time same time and execution time can be decreased. Use of Cache efficient algorithms can be done for faster execution [15] [16].Hadoop based image processing techniques are also much effective to parallelize the tasks [17].