TY - JOUR
T1 - Deep learning for printed document source identification
AU - Tsai, Min-Jen
AU - Tao, Yu Han
AU - Yuadi, Imam
N1 - Publisher Copyright:
© 2018 Elsevier B.V.
PY - 2019/2
Y1 - 2019/2
N2 - Due to the rapid development of the information technology and wide use of the Internet, Information is easily to be obtained in the form of digital format. Digital content can be freely printed into documents since the convenience and accessibility of the printers. On the other hand, printed documents can be illegally manipulated by some criminal issues such as: forged documents, counterfeit currency, copyright infringement, and so on. Therefore, how to develop an efficient and appropriate safety testing tool to identify the source of printed documents is an important task in the meantime. Currently, the forensic system using the statistical methods and support vector machine technology has been able to identify the source printer for the text and the image documents. Such an approach belongs to the category of shallow machine learning with human interaction during the stages of feature extraction, feature selection and data pre-processing. In this paper, a deep learning system to solve the complex image classification problem is developed by Convolutional Neural Networks (CNNs) of deep learning which can learn the features automatically. Systematic experiments have been performed for both systems. For microscopic documents, feature based SVM system outperforms the deep learning system with limited gap. For scanned documents, both system can achieve equally well with high accuracy. Both systems should be constantly evaluated and compared for the best interest in universal utilization.
AB - Due to the rapid development of the information technology and wide use of the Internet, Information is easily to be obtained in the form of digital format. Digital content can be freely printed into documents since the convenience and accessibility of the printers. On the other hand, printed documents can be illegally manipulated by some criminal issues such as: forged documents, counterfeit currency, copyright infringement, and so on. Therefore, how to develop an efficient and appropriate safety testing tool to identify the source of printed documents is an important task in the meantime. Currently, the forensic system using the statistical methods and support vector machine technology has been able to identify the source printer for the text and the image documents. Such an approach belongs to the category of shallow machine learning with human interaction during the stages of feature extraction, feature selection and data pre-processing. In this paper, a deep learning system to solve the complex image classification problem is developed by Convolutional Neural Networks (CNNs) of deep learning which can learn the features automatically. Systematic experiments have been performed for both systems. For microscopic documents, feature based SVM system outperforms the deep learning system with limited gap. For scanned documents, both system can achieve equally well with high accuracy. Both systems should be constantly evaluated and compared for the best interest in universal utilization.
KW - Convolutional neural networks
KW - Deep learning
KW - Machine learning
KW - Printer source identification
UR - http://www.scopus.com/inward/record.url?scp=85055720316&partnerID=8YFLogxK
U2 - 10.1016/j.image.2018.09.006
DO - 10.1016/j.image.2018.09.006
M3 - Article
AN - SCOPUS:85055720316
SN - 0923-5965
VL - 70
SP - 184
EP - 198
JO - Signal Processing: Image Communication
JF - Signal Processing: Image Communication
ER -