Network quality of service (QoS) is essential for network applications. For many applications, getting a fair share of available bandwidth for their flows can prevent them from being blocked by other flows that do not respond to congestion. Providing per-flow scheduling in each output port of a commodity switch can isolate the flows that compete for the bandwidth of a bottleneck link. Although per-flow scheduling can maintain fair shares among competing flows, due to the high implementation costs of providing per-flow queues in commodity switches, this capability is rarely provided in commodity switches on the market. To address this need, we design and implement a near-per-flow scheduling scheme named Near Per-flow Scheduling (NPFS) in P4 programmable hardware switches and evaluate its performance. NPFS provides near-per-flow scheduling effectiveness in commodity switches that do not have per-flow queues in their output ports. NPFS utilizes the priority queues provided in most commodity switches and dynamically assigns competing flows to these queues based on their protocol types and current sending rates. Experimental results show that, when the number of competing flows is less than three times the number of queues, NPFS guarantees that the achieved bandwidths of these flows only deviate from their ideal fair shares by 5%.