We propose an embedded multiprocessor architecture and its associated thread-based programming model. Using a cycle-true simulation model of this architecture, we are able to estimate energy savings for a threaded C program. The savings are obtained by voltage- and frequency-scaling of the individual processors. We port a fingerprint minutiae detection application onto this architecture, and show the resulting performance on single-, dual-, and quad-processor configurations. The energy-scaled quad-processor version results in a 77% energy reduction over the single-processor non-scaled implementation, at only a 2.2% degradation in cycle count.