Multiprocessing in Python: with Windows machine

- October 24, 2018

Multiprocessing generally involves two approaches: via threads or via multiple processes. Although multiple processes come with an overhead of combining the results from various processes; it makes better utilization of multiple CPU cores which is commonplace these days. There are 3 main elements to the execution of this concept: dividing the workload in logical chunks, passing the chunks to individual processes, and compiling the output from each of those processes.

There are various libraries available in python for achieving real parallel programming, one popular choice being multiprocessing. There are various blog posts available on the internet describing the nitty-gritty of working with this library (I have listed a couple in the references section). However, this post explores two other choices we can easily use to achieve the same.

Concurrent.futures

This module provides an interface for asynchronously executing callables. A good thing is that it comes packaged with the standard Python Library, no need to install anything. To achieve true process parallelization, this module provides an implementation through the class ProcessPoolExecutor.

It accepts the no. of parallel processes to be forked as an argument. If not given, this will default to the no. of logical processes available for all the cores present (For my quad-core machine, this translates to 8 processes). In the newer version of this module, few other arguments have been added e.g. mp_context (multiprocessing context), initializer and initargs to provide greater control to the execution and initialization of various processes. These, however, are not covered in the current post.

I have used the above-mentioned class in this Multiprocessing Example Code. This is a simple program which extracts unigrams and two-word tuples from a given text and creates a frequency distribution out of it. The magic of parallel processing though happens in this piece:

 with ProcessPoolExecutor(5)  as executor:  
     for token_list, token_fds in zip(list_text, executor.map(calc_freq_dist, list_text,[sliding_win_size]*len(list_text))):  
       print('Len of tokens processed = %d\nLen of unigrams =%d\nLen of tuples = %d' % (len(token_list), len(token_fds[0]),len(token_fds[1])))  
       postWWI_unigrams_fd.extend(token_fds[0])  
       postWWI_bigrams_fd.extend(token_fds[1])

Here, we first initialize the ProcessPoolExecutor instance with 5 workers/processes and with that pool of workers we run a loop where we pass on the workload to individual workers through this statement

zip(list_text, executor.map(calc_freq_dist, list_text,[sliding_win_size]*len(list_text)))

list_text is nothing but the list of text segments which needs to be processed (workload division) and calc_freq_dist is the function that does all the work. The function that passes the workload to individual processes is executor.map. First argument here is the function name (calc_freq_dist) and rest of the arguments are nothing but the input needed for callable. I used the zipping function to pass the individual elements of list_text to the individual process. Lastly, the compilation of output happens inside the for loop where I'm just adding the output to two separate lists in the order that it is received.

Parallel Python(pp)

This is another module built to support parallel processing on SMP (systems with multiple cores) or in a clustered environment. It gives many advantages over the module concurrent.futures, which are listed in the home page of the module.

This, however, requires installation; instructions and downloadables for which can be found through the home page. Due to portability issues with python3, the authors have provided a separate zip file to deal with that and the instructions can be found here.

I have created a sample program to append two columns of a dataset here. The snippet that uses pp for parallelization is given below:

 #parallel processing code with pp  
 ppservers = () #this is mainly used in clustered env and names of the nodes are given here.  
         #used here for demonstration purposes only  
 job_server = pp.Server(ncpus =5,ppservers=ppservers) #start the job with 5 workers  
 print ("Starting pp with %d workers" %( job_server.get_ncpus()))  
 start_time = time.time()  
 x_all = []  
 #input load division  
 inputs = (data[0:60000],data[60000:120000],data[120000:180000],data[180000:240000],data[240000:300000],data[300000:360000],data[360000:len(data)])  
 #job submission  
 jobs = [(input_x,job_server.submit(test_func,(input_x,),(None,),("pandas",))) for input_x in inputs]  
 #result accumulation  
 for input_x,job in jobs:  
   x_all.append(job())  
 len(x_all)  
 print ("Time elapsed: %f s" %(time.time() - start_time))  
 job_server.print_stats()

To register the processes, we call the function pp.Server with no. of processes (ncpus). Input load is divided with a simple list creation. When submitting the job we need to call the function submit with the following parameters:

callable: function name (test_func in this case)
input: input_x (element of the list of list of inputs)
depfuncs: Functions internally called by the "callable" (in this case None)
modules: Module to be imported for the function to work (in this case "pandas")

Finally, we accumulate the results by calling the function job(). The result of this code snippet looks like this:

 Time elapsed: 24.610267 s  
 Job execution statistics:  
  job count | % of all jobs | job time sum | time per job | job server  
      7 |    100.00 |   27.4447 |   3.920671 | local  
 Time elapsed since server creation 45.127434968948364  
 0 active tasks, 5 cores

Thanks for reading!

References

Multiprocessing

https://sebastianraschka.com/Articles/2014_multiprocessing.html
https://www.ploggingdev.com/2017/01/multiprocessing-and-multithreading-in-python-3

Concurrent.futures

https://docs.python.org/3/library/concurrent.futures.html

Search This Blog

Learn and Share