mcoro: an improved coroutine implementation


The mcoro library is a modified implementation of coroutines that can have multiple dispatchers and thereby utilize the multiprocessor. This overcomes the main limit of existing coroutine implementations.


For a definition of the term coroutine, see The Art of Computer Programming by Donald E. Knuth. Coroutines are a very simple cooperative multitasking environment where the switch from one task to another is done explicitly by a function call. Coroutines require much less OS resources than processes/threads, and are a lot faster than processes or threads switch since there is no OS kernel involvement for the operation. Thus, coroutines are usually employed as the basis of user-level threading management.

Existing coroutine libraries, e.g. the coro library by E. Toernig, the libpcl by Davide Libenzi and Ralf S. Engelschall's signal stack trick are very interesting and powerful. However, the main limit of the existing implementations is that only one control thread is allowed so multi-processors can not be exploited.


mcoro's coroutines have no internal static state. A container is created by a native process (or thread) to hold the internal states of a group of coroutines dipatched by this process.


The functions initially provided by mcoro are:

typedef struct s_customalloc{
	void *cookie;/* maybe some handler */
	void *(*malloc)(void *,unsigned long);
	void (*free)(void *,void *, unsigned long);
} co_allocator_t;

typedef void * cor_t;
typedef unsigned int corg_t;

extern int cor_group_create(cor_alloc_t const *);
	/* a group of coroutines are controled by a separate thread 
	 * this function should be called exclusively!

extern cor_t cor_create(unsigned int corg, void *func, void *data, unsigned long stk_size);
extern void cor_yield(unsigned int corg);
extern void cor_exit(unsigned int);
extern cor_t cor_self(unsigned int);
extern void cor_call(cor_t);
extern void cor_delete(cor_t);/* cann't delete itself */


I use Davide Libenzi's cobench to test his libpcl and my mcoro.

The performance result of libpcl is:
measuring co_create+co_delete performance ... 0.121905 usec
measuring switch performance ... 0.0826117 usec

mcoro's result is:
measuring cor_create+cor_delete performance ... 0.135481 usec
measuring switch performance ... 0.0844271 usec

And later I revised its implementation to cache and recycle the deleted coroutine structs. This optimization improves it's performance:
measuring cor_create+cor_delete performance ... 0.0893656 usec
measuring switch performance ... 0.0822688 usec

The comparison is reasonable and shows the efficience of mcoro. Note that each mcoro function call has to do a little more work than libpcl does.

I am writting a custom memory allocator, called XFMalloc, which will further speed up those coroutine implementations.

Source Code

Latest version : .tar.gz


Pothouse's pothread is based on mcoro.
More information of 'coroutine'

Back Home