Hadoop Optimization for Massive Image Processing: Case Study Face Detection
Keywords:
Hadoop, MapReduce, Cloud Computing, Face DetectionAbstract
Face detection applications are widely used for searching, tagging and classifying people inside very large image databases. This type of applications requires processing of relatively small sized and large number of images. On the other hand, Hadoop Distributed File System (HDFS) is originally designed for storing and processing largesize files. Huge number of small-size images causes slowdown in HDFS by increasing total initialization time of jobs, scheduling overhead of tasks and memory usage of the file system manager (Namenode). The study in this paper presents two approaches to improve small image file processing performance of HDFS. These are (1) converting the images into single large-size file by merging and (2) combining many images for a single task without merging. We also introduce novel Hadoop file formats and record generation methods (for reading image content) in order to develop these techniques
References
Berlinska, J.; M. Drozdowski. (2011); Scheduling Divisible MapReduce Computations, Journal of Parallel and Distributed Computing, 71(3): 450-459. http://dx.doi.org/10.1016/j.jpdc.2010.12.004
Dean, J.; S. Ghemawat. (2010); MapReduce: A Flexible Data Processing Tool, Communications of the ACM, 53(1): 72-77. http://dx.doi.org/10.1145/1629175.1629198
Dean, J.; S. Ghemawat. (2008); MapReduce: Simplified Data Processing on Large Clusters, Communications of the ACM, 51(1): 1-13.
Ghemawat, S.; H. Gobioff.; S. T. Leung.(2003); The Google File System, Proceedings of the 19th ACM Symposium on Operating System Principles, NY, USA: ACM
White, T. (2009); The Definitive Guide. 2009: O'Reilly Media.
Dong, B.; et al. (2012); An Optimized Approach for Storing and Accessing Small Files on Cloud Storage, Journal of Network and Computer Applications, 35(6): 1847-1862. http://dx.doi.org/10.1016/j.jnca.2012.07.009
Dong, B.; et al. (2010); A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint Files, IEEE International Conference on Services Computing (SCC), Florida, USA: IEEE.
Golpayegani, N.; M. Halem. (2009); Cloud Computing for Satellite Data Processing on High End Compute Clusters, IEEE International Conference on Cloud Computing, Bangalore, India: IEEE, 88-92. http://dx.doi.org/10.1109/CLOUD.2009.71
Krishna, M.; et al. (2010); Implementation and Performance Evaluation of a Hybrid Distributed System for Storing and Processing Images from the Web, 2nd IEEE International Conference on Cloud Computing Technology and Science, Indianapolis, USA: IEEE, 762-767.
Kocakulak, H.; T. T. Temizel. (2011); MapReduce: A Hadoop Solution for Ballistic Image Analysis and Recognition, International Conference on High Performance Computing and Simulation (HPCS), ˙lstanbul, Turkey, 836-842.
http://wiki.apache.org/hadoop/SequenceFile
Liu, X.; et al. (2009), Implementing WebGIS on Hadoop: A Case Study of Improving Small File I/O Performance on HDFS, IEEE International Conference on Cluster Computing and Workshops, Louisiana USA: IEEE, 1-8. http://dx.doi.org/10.1109/CLUSTR.2009.5289196
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/Combine-FileInputFormat.html
Published
Issue
Section
License
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.