The performance of conventional speaker identification systems is severely compromised by interference, such as additive or convolutional noises. High-level information of the speaker provides more robust cues for identifying speakers. This paper proposes an auditory-model based spectro-temporal modulation filtering (STMF) process to capture high-level information for robust speaker identification. Text-independent closed-set speaker identification simulations are conducted on TIMIT and GRID corpora to evaluate the robustness of Auditory Cepstral Coefficients (ACCs) after the STMF process. Simulation results show ACCs' substantial improvement over conventional MFCCs in all SNR conditions. The superior noise-suppression performance of STMF to newly developed Auditory-based Nonnegative Tensor Cepstral Coefficients (ANTCCs) is also demonstrated in low SNR conditions.