Performing MapReduce on Data Centers with Hierarchical Structures
Keywords:
MapReduce, Data Center, distributed hash table (DHT)Abstract
Data centers are created as distributed information systems for massive data storage and processing. The structure of a data center determines the way that its inner servers, links and switches are interconnected. Several hierarchical structures have been proposed to improve the topological performance of data centers. By using recursively defined topologies, these novel structures can well support general applications and services with high scalability and reliability. However, these structures ignore the details of some specific applications running on data centers, such as MapReduce, a well-known distributed data processing application. The communication and control mechanisms for performing MapReduce on the traditional structure cannot be employed on the hierarchical structures. In this paper, we propose a methodology for performing MapReduce on data centers with hierarchical structures. Our methodology is based on the distributed hash table (DHT), an efficient data retrieval approach on distributed systems. We utilize the advantages of DHT, including decentralization, fault tolerance and scalability, to address the main problems that face hierarchical data centers in supporting MapReduce. Comprehensive evaluation demonstrates the feasibility and excellent performance of our methodology.
References
M. Al-Fares, A. Loukissas, and A. Vahdat. A Scalable, Commodity Data Center Network Architecture. Proc. ACM SIGCOMM, pp.63-74, Aug. 2008.
D. Borthakur. The Hadoop Distributed File System: Architecture and Design. http://hadoop.apache.org/core/docs/current/hdfsdesign.pdf
C. Bastoul and P. Feautrier. Improving Data Locality by Chunking. Springer Lecture Notes in Computer Science, vol.2622, pp.320-334, 2003.
F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R.E.Gruber. Bigtable: A Distributed Storage System for Structured Data. Proc. 7th Symposium on Operating Systems Design and Implementation (OSDI), pp.205-218, Nov. 2006.
J. Cohen. Graph Twiddling in a MapReduce world. Computing in Science and Engineering, IEEE Educational Activities Department, vol.2, no.4, pp.29-41, 2009.
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Proc. 6th Symposium on Operating System Design and Implementation (OSDI), pp.137-150, Dec. 2004.
J. Dean, and S. Ghemawat. MapReduce: A Flexible Data Processing Tool. Communications of the ACM, vol.53, no.1, pp.72-77, 2010. http://dx.doi.org/10.1145/1629175.1629198
A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel. The Cost of a Cloud: Research Problems in Data Center Networks. ACM SIGCOMM computer communication review, vol.39, no.1, pp.68-73, Jan. 2009. http://dx.doi.org/10.1145/1496091.1496103
C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu. DCell: A Scalable and Fault-Tolerant Network Structure for Data Centers. Proc. ACM SIGCOMM, pp.75-86, Aug. 2008.
A. Greenberg, J.R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D.A. Maltz, P. Patel, and S. Sengupta. VL2: A Scalable and Flexible Data Center Network. ACM SIGCOMM Computer Communication Review, vol.39, no.4, pp.51-62, Aug. 2009. http://dx.doi.org/10.1145/1594977.1592576
C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu. BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers. Proc. ACM SIGCOMM, pp.63-74, Aug. 2009.
S. Ghemawat, H. Gobioff, and S.T. Leung. The Google File System. Proc. 19th ACM Symposium on Operating Systems Principles, pp.29-43, Dec. 2003.
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed Data-parallel programs from Sequential Building Blocks. Proc. 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, pp.59-72, Jun. 2007.
W. Jun. A Methodology for the Deployment of Consistent Hashing Proc. 2nd IEEE International Conference on Future Networks, Jan. 2010.
D. Li, C. Guo, H. Wu, K. Tan, Y. Zhang, and S. Lu. FiConn: Using Backup Port for Server Interconnection in Data Centers. Proc. IEEE INFOCOM, pp.2276-2285, Apr. 2009.
J. Lin. The Curse of Zipf and Limits to Parallelization: A Look at the Stragglers Problem in MapReduce Workshop on Large-Scale Distributed Systems for Information Retrieval, Jul. 2009.
J. Pang, P.B. Gibbons, M. Kaminsky, S. Seshan, and H. Yu. Defragmenting DHT-based Distributed File Systems Proc. 27th IEEE International Conference on Distributed Computing Systems, Jun. 2007.
T. Redkar. Introducing Cloud Services. Windows Azure Platform, Apress, pp.1-51, 2009. http://dx.doi.org/10.1007/978-1-4302-2480-8_1
L. Rao, X. Liu, L. Xie, and W. Liu. Minimizing Electricity Cost: Optimization of Distributed Internet Data Centers in a Multi-Electricity-Market Environment Proc. IEEE INFOCOM, Mar. 2010.
I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrishnan. Chord: A Scalable Peertopeer Lookup Service for Internet Applications Proc. ACM SIGCOMM, pp.1-12, Aug. 2001.
D. Talia and P. Trunfio. Enabling Dynamic Querying over Distributed Hash Tables. Elsevier Journal of Parallel and Distributed Computing, vol.70, no.12, pp.1254-1265, 2010. http://dx.doi.org/10.1016/j.jpdc.2010.08.012
G. Urdaneta, G. Pierre and M.V. Steen. A Survey of DHT Security Techniques. Journal of ACM Computing Surveys, vol.43, no.2, pp.1-49, 2011. http://dx.doi.org/10.1145/1883612.1883615
X.Wang and D. Loguinov. Load-balancing performance of consistent hashing: asymptotic analysis of random node join IEEE/ACM Transactions on Networking, vol.15, no.4, pp.892-905, 2007. http://dx.doi.org/10.1109/TNET.2007.893881
Published
Issue
Section
License
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.