Purpose To evaluate ways to improve the generalizability of a deep learning algorithm for identifying glaucomatous optic neuropathy (GON) using a limited number of fundus photographs, as well as the key features being used for classification. Methods A total of 944 fundus images from Taipei Veterans General Hospital (TVGH) were retrospectively collected. Clinical and demographic characteristics, including structural and functional measurements of the images with GON, were recorded. Transfer learning based on VGGNet was used to construct a convolutional neural network (CNN) to identify GON. To avoid missing cases with advanced GON, an ensemble model was adopted in which a support vector machine classifier would make final classification based on cup-to-disc ratio if the CNN classifier had low-confidence score. The CNN classifier was first established using TVGH dataset, and then fine-tuned by combining the training images of TVGH and Drishti-GS datasets. Class activation map (CAM) was used to identify key features used for CNN classification. Performance of each classifier was determined through area under receiver operating characteristic curve (AUC) and compared with the ensemble model by diagnostic accuracy. Results In 187 TVGH test images, the accuracy, sensitivity, and specificity of the CNN classifier were 95.0%, 95.7%, and 94.2%, respectively, and the AUC was 0.992 compared to the 92.8% accuracy rate of the ensemble model. For the Drishti-GS test images, the accuracy of the CNN, the fine-tuned CNN and ensemble model was 33.3%, 80.3%, and 80.3%, respectively. The CNN classifier did not misclassify images with moderate to severe diseases. Class-discriminative regions revealed by CAM co-localized with known characteristics of GON. Conclusions The ensemble model or a fine-tuned CNN classifier may be potential designs to build a generalizable deep learning model for glaucoma detection when large image databases are not available.