In this paper, an efficient application-specific architecture will be presented for a real-time edge detection system. This architecture is based on the cooperating data-path model which has allowed to optimize both the throughput and the area for this recursive algorithm. Careful scheduling of the operations on the partly parallel, partly shared hardware has allowed to balance the load on each of the 4 data-paths. In this way, the inherently high degree of concurrency in the algorithm has been effectively exploited in the parallel pipelined hardware. The layout of these data-path has been generated by means of powerful CAD tools and the use of a parameterizable functional building block library. The corresponding global controller has been partitioned in order to optimize the critical path. This has increased the achievable clock-rate even further, up to 10 MHz. Also the stringent I/O requirements have been taken into account. The resulting ASIC has been verified by register-transfer simulation. It is more than twice as fast as existing designs. The effectiveness of the cooperating data-path model is thus clearly substantiated with this large practical test-vehicle.