It is widely recognized that execution of the higher-level protocol software can be an important performance bottleneck in distributed systems which use data communications. This paper examines the use of parallelism in enhancing protocol execution performance, in particular the use of a separate concurrent task for each protocol layer. Two layers of the OSI protocol system were implemented and run on a multiprocessor, with from one to five processors at each end of the connection. Potentially concurrent entities included user tasks as data source and sink, the OSI session layer (kernel functional unit only), the OSI transport layer (classes 0 and 2), vestigial network tasks, and tasks to buffer data between layers. Three substantially different design architectures with from nine to twelve tasks at each end of the connection were compared. The design differences centered on different ways to provide interlayer coordination and buffering; the protocol code was kept identical. The implementation used a real-time kernel which provides synchronous (request-reply) interprocess communication. The variation in throughput between designs covered a range of approximately two to one, and the best design was a symmetrical decentralized two-way pipeline with courier communications.