TY - JOUR
T1 - Hadoop-MCC
T2 - Efficient multiple compound comparison algorithm using hadoop
AU - Hua, Guan Jie
AU - Hung, Che Lun
AU - Tang, Chuan Yi
N1 - Publisher Copyright:
© 2018 Bentham Science Publishers.
PY - 2018
Y1 - 2018
N2 - Aim and Objective: In the past decade, the drug design technologies have been improved enormously. The computer-aided drug design (CADD) has played an important role in analysis and prediction in drug development, which makes the procedure more economical and efficient. However, computation with big data, such as ZINC containing more than 60 million compounds data and GDB-13 with more than 930 million small molecules, is a noticeable issue of time-consuming problem. Therefore, we propose a novel heterogeneous high performance computing method, named as Hadoop-MCC, integrating Hadoop and GPU, to copy with big chemical structure data efficiently. Materials and Methods: Hadoop-MCC gains the high availability and fault tolerance from Hadoop, as Hadoop is used to scatter input data to GPU devices and gather the results from GPU devices. Hadoop framework adopts mapper/reducer computation model. In the proposed method, mappers response for fetching SMILES data segments and perform LINGO method on GPU, then reducers collect all comparison results produced by mappers. Due to the high availability of Hadoop, all of LINGO computational jobs on mappers can be completed, even if some of the mappers encounter problems. Results: A comparison of LINGO is performed on each the GPU device in parallel. According to the experimental results, the proposed method on multiple GPU devices can achieve better computational performance than the CUDA-MCC on a single GPU device. Conclusion: Hadoop-MCC is able to achieve scalability, high availability, and fault tolerance granted by Hadoop, and high performance as well by integrating computational power of both of Hadoop and GPU. It has been shown that using the heterogeneous architecture as Hadoop-MCC effectively can enhance better computational performance than on a single GPU device.
AB - Aim and Objective: In the past decade, the drug design technologies have been improved enormously. The computer-aided drug design (CADD) has played an important role in analysis and prediction in drug development, which makes the procedure more economical and efficient. However, computation with big data, such as ZINC containing more than 60 million compounds data and GDB-13 with more than 930 million small molecules, is a noticeable issue of time-consuming problem. Therefore, we propose a novel heterogeneous high performance computing method, named as Hadoop-MCC, integrating Hadoop and GPU, to copy with big chemical structure data efficiently. Materials and Methods: Hadoop-MCC gains the high availability and fault tolerance from Hadoop, as Hadoop is used to scatter input data to GPU devices and gather the results from GPU devices. Hadoop framework adopts mapper/reducer computation model. In the proposed method, mappers response for fetching SMILES data segments and perform LINGO method on GPU, then reducers collect all comparison results produced by mappers. Due to the high availability of Hadoop, all of LINGO computational jobs on mappers can be completed, even if some of the mappers encounter problems. Results: A comparison of LINGO is performed on each the GPU device in parallel. According to the experimental results, the proposed method on multiple GPU devices can achieve better computational performance than the CUDA-MCC on a single GPU device. Conclusion: Hadoop-MCC is able to achieve scalability, high availability, and fault tolerance granted by Hadoop, and high performance as well by integrating computational power of both of Hadoop and GPU. It has been shown that using the heterogeneous architecture as Hadoop-MCC effectively can enhance better computational performance than on a single GPU device.
KW - Big data
KW - Compound comparision
KW - GPU
KW - Hadoop
KW - High performance computing
KW - LINGO
UR - http://www.scopus.com/inward/record.url?scp=85047808645&partnerID=8YFLogxK
U2 - 10.2174/1386207321666180102120641
DO - 10.2174/1386207321666180102120641
M3 - Article
C2 - 29295690
AN - SCOPUS:85047808645
SN - 1386-2073
VL - 21
SP - 84
EP - 92
JO - Combinatorial Chemistry and High Throughput Screening
JF - Combinatorial Chemistry and High Throughput Screening
IS - 2
ER -