Optimal Data File Allocation for All-to-All Comparison in Distributed System: A Case Study on Genetic Sequence Comparison
Keywords:
distributed system, all-to-all comparison, mix integer linear programming (MILP), file allocation, load balancingAbstract
In order to solve the problem of unbalanced load of data les in large-scale data all-to-all comparison under distributed system environment, the differences of les themselves arefully considered. This paper aims to fully utilize the advantages of distributed system to enhance the le allocation of all-to-all comparison between the data les in a large dataset. For this purpose, the author formally described the all-to-all comparison problem, and con-structed a data allocation model via mixed integer linear programming (MILP). Meanwhile, a data allocation algorithm was developed on the Matlab using the intlinprog function of branch-and-bound method. Finally, our model and algorithm were veried through several experiments. The results show that the proposed le allocation strategy can achieve the basic load balance of each node in the distributed system without exceeding the storage capacity of any node, and completely localize the data le. The research ndings can be applied to such elds as bioinformatics, biometrics and data mining.References
Borodin, V.; Bourtembourg, J.; Hnaien, F., Labadie, N. (2018). COTS software integration for simulation optimization coupling: case of ARENA and CPLEX products, International Journal of Modelling and Simulation, (5), 1-12, 2018.
Dai, Y.; Wu, W.; Zhou, H.B.; Zhang, J.; Ma, F.Y. (2018). Numerical Simulation and Oprimization of Oil Jet Lubrication for Rotorcraft Meshing Gears, International Journal of Simulation Modelling, 17(2), 318-326, 2018. https://doi.org/10.2507/IJSIMM17(2)CO6
Dai, Y.; Zhu, X.; Zhou, H.; Mao, Z.; Wu, W. (2018). Trajectory Tracking Control for Seafloor Tracked Vehicle By Adaptive Neural-Fuzzy Inference System Algorithm, International Journal of Computers Communications & Control, 13(4), 465-476, 2018. https://doi.org/10.15837/ijccc.2018.4.3267
Deng, J. (2014). Research and Improvement of Mixed Integer Linear Programming Model for Unit Combination, Nanning: Guangxi University, 12-16, 2014.
Gao, Y.J. (2017). Research on Data Allocation Strategy for All-to-all Comparison of Large Data Sets, Taiyuan: Taiyuan University of Technology, 5-10, 2017.
Guo, J.W.; Li, Y.; Du, L.P.; Zhao, G.F.; Jiang, J.Y. (2014). Research on distributed data mining system based on hadoop platform, Advances in Intelligent Systems and Computing, 255, 629-636, 2014. https://doi.org/10.1007/978-81-322-1759-6_72
He, H.; Du, Z.H.; Zhang, W.Z.; Chen, A. (2016). Optimization strategy of Hadoop small file storage for big data in healthcare, Journal of Supercomputing, 72(10), 3696-3707, 2016. https://doi.org/10.1007/s11227-015-1462-4
Hess, M.; Sczyrba, A.; Egan, R.; Kim, T.W.; Chokhawala, H.; Schroth, G.; Luo, S.; Clark, D.S.; Chen, F.; Zhang, T.; Mackie, R.I.; Pennacchio, L.A.; Tringe, S.G.; Visel, A.; Woyke, T.; Wang, Z.; Rubin, E.M. (2011). Metagenomic discovery of biomass-degrading genes and genomes from cow rumen, Science, 331(6016), 463-467, 2011. https://doi.org/10.1126/science.1200387
Hu, S.R. (1991). Modern supercomputer system, Journal of computer science, (1), 47-56, 1991.
Jiao, X.P.; Mu, J.J. (2013). Improved check node decomposition for linear programming decoding, IEEE Communications Letters, 17(2), 377-380, 2013. https://doi.org/10.1109/LCOMM.2012.122012.122396
Liao, J.; Trahay, F.; Xiao, G.; Li, L.; Ishikawa, Y. (2017). Performing initiative data prefetching in distributed file systems for cloud computing, IEEE Transactions on Cloud Computing, 5(3), 550-562, 2017. https://doi.org/10.1109/TCC.2015.2417560
Mu, R.; Wu, J.J.; Li, N. (2018). MATLAB and mathematical modeling, Beijing: Science Press, 63-78, 2018.
MAzller, E.R.; Carlson, R.C.; Junior, W.K. (2016). Intersection control for automated vehicles with MILP, IFAC-PapersOnLine, 49(3), 37-42, 2016. https://doi.org/10.1016/j.ifacol.2016.07.007
Nayahi, J.J.V.; Kavitha, V. (2017). Privacy and utility preserving data clustering for data anonymization and distribution on Hadoop, Future Generation Computer Systems, 74, 393- 408, 2017. https://doi.org/10.1016/j.future.2016.10.022
Pitty, S.S.; Karimi, I.A. (2008). Novel MILP models for scheduling permutation flowshops, Chemical Product and Process Modeling, 3(1), 35-42, 2008. https://doi.org/10.2202/1934-2659.1176
Sun, J.Y. (2016). Simulation experiment of operation research model based on MATLAB, Journal of Shenyang University (Natural Science Edition), 28(4), 337-339, 2016.
Schulman, J.; Duan, Y.; Ho, J.; Lee, A.; Awwal, I.; Bradlow, H. (2014). Motion planning with sequential convex optimization and convex collision checking, International Journal of Robotics Research, 33(9), 1251-1270, 2014. https://doi.org/10.1177/0278364914528132
Schmidt, B.; Hartmann, C. (2018). Wavepacket: a matlab package for numerical quantum dynamics. ii: open quantum systems, optimal control, and model reduction, Computer Physics Communications, 228, 229-244, 2018. https://doi.org/10.1016/j.cpc.2018.02.022
Ubarhande, V.; Popescu, A.; González-Vélez, H. (2015). Novel Data-Distribution Technique for Hadoop in Heterogeneous Cloud Environments, 2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems, 217-224, 2015. https://doi.org/10.1109/CISIS.2015.37
Wang, L.Z.; Tao, J.; Ranjan, R.; Marten, H.; Streit, A.; Chen, J.Y.; Chen, D. (2013). GHadoop: MapReduce across distributed data centers for data-intensive computing, Future Generation Computer Systems, 29(3), 739-750, 2013. https://doi.org/10.1016/j.future.2012.09.001
Yang, X.P.; Zhou, X.G.; Cao, B.Y. (2015). Multi-level linear programming subject to addition-min fuzzy relation inequalities with application in Peer-to-Peer file sharing system, Journal of Intelligent and Fuzzy Systems, 28(6), 2679-2689, 2015 https://doi.org/10.3233/IFS-151546
Zhang, Y.F.; Tian, Y.C.; Fidge, C.; Kelly, W. (2016); Data-aware task scheduling for allto- all comparison problems in heterogeneous distributed systems, Journal of Parallel & Distributed Computing, 93(C), 87-101, 2016.
Zhang, Y.F.; Tian, Y.C.; Kelly, W.; Fidge, C. (2017). Scalable and efficient data distribution for distributed computing of all-to-all comparison problems, Future Generation Computer Systems, 67, 152-162, 2017. https://doi.org/10.1016/j.future.2016.08.020
Zhang, Y.F.; Tian, Y.C.; Kelly, W.; Fidge, C. (2014). A distributed computing framework for All-to-All comparison problems, IECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society, 2499-2505, 2014.
Zhou, J.X.; Shao, X.M.; Qiao, J.Y.; Zhang, Y.W. (2012). MATLAB from the introduction to proficiency (2nd edition), Beijing: People's Post and Telecommunications Publishing House, 35-92, 2012.
Published
Issue
Section
License
ONLINE OPEN ACCES: Acces to full text of each article and each issue are allowed for free in respect of Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.
You are free to:
-Share: copy and redistribute the material in any medium or format;
-Adapt: remix, transform, and build upon the material.
The licensor cannot revoke these freedoms as long as you follow the license terms.
DISCLAIMER: The author(s) of each article appearing in International Journal of Computers Communications & Control is/are solely responsible for the content thereof; the publication of an article shall not constitute or be deemed to constitute any representation by the Editors or Agora University Press that the data presented therein are original, correct or sufficient to support the conclusions reached or that the experiment design or methodology is adequate.