Tuesday, November 9, 2010

Using Multiprocessing in Python

For a newbie to multi CPU processing, I have to say that the python 2.7 documentation is difficult to understand at best and incomplete at worst.  Searching the web I came across two tutorials that I found to be very helpful.

The first is by Doug Hellmann in his Python Module of the Week series (PyMOTW), he gave much clearer examples of how the multiprocessing module worked.

However, I still needed more information.  I found enough to get me over the hump in solving my problem from Norman Matloff's (pdf link) tutorial from UC Davis, called "Programming on Parallel Machines".  Chapter 3 is called "The Python Threads and Multiprocessing Modules".  Not exhaustive, but very helpful.  I will probably be referring to it again as I move the code from a single multiprocessor machine to a cluster.

Other information on other types of Parallel Processing (Cluster, Cloud, Grid) and the related python libraries can be found here at the PythonWiki

How do you split a list into evenly sized chunks?

Answer was originally posted by Ned Batchelder on StackOverflow

Q: How do I split a list of arbitrary length into an equal number of smaller pieces that I can perform operations on?

A: Create a generator

Note the pprint library passes input directly through to stdin, as opposed to print.  Good thing to remember for future.


def chunks(l, n):
    """ Yield successive n-sized chunks from l.
    """
    for i in xrange(0, len(l), n):
        yield l[i:i+n]
import pprint
pprint.pprint(list(chunks(range(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]