Python is… slow?

One of the work items that kept me occupied over the past week was writing a computational physics simulation modelling “quantum potential wells using a Monte Carlo simulation”:http://www.phys.soton.ac.uk/teach/year3/notes/ph314/projects/pjtb2.pdf.

I chose to write the simulation in “Python”:http://www.python.org instead of C - my feeling was this would be quicker to get a working simulation with, and I could always rewrite in C if it was very slow. i didn’t expect it to be - I’d read a lot of reports about Scientists using Python, so figured it would fine. I also picked Python as this would force me to learn the language, something that if I pursue “a particular route”:http://www.ses.soton.ac.uk/projects/Comp_Eng_Des/comp_eng_des.html after graduating would be very useful.

Due to some problems, the lecturer overseeing this simulation ended up writing the simulation also, in the departmental version of Basic.

At that point I realised how slow the Python code was. I didn’t expect it to be as fast as C, but was surprised by how much faster the Basic version was. It wasn’t impossibly slow, but running a million random walks of up to 300 steps was taking nearly 2 hours to complete. I tried plugging in Psyco to speed up the Python code, without much luck.

I don’t know if there were ways to get the Python code faster (short of using Pyrex), but I’m posting the code here to see what people think. Given it was the first proper Python program I’ve done I expect it to be a bit slow, but wasn’t expecting it to be this much slower.

One other point that came out was that the Python code produced a greater scatter on the results than the Basic version. I think this has to come down to the random number generator, but thoughts on this would be appreciated as well.

To the code:

* “Python code for a 1 dimensional potential well”:/physics/simulations/rand_walks_qm/1D_0.py
* “New Forest Basic code for a 1 dimensional potential well”:/physics/simulations/rand_walks_qm/1D_0.nfb (slightly more basic than mine - walks aren’t killed after X steps, but continue to arrest). You’ll also need the editor/compiler/interpreter (”Windows”:http://www.phys.soton.ac.uk/teach/year2/notes/lab/software/nfbwin.exe or “Linux”:http://www.phys.soton.ac.uk/teach/year2/notes/lab/software/nfblnx) - which in turn requires “TCL”:http://www.tcl.tk/ to be installed.
* Python code for a 2 dimensional potential well (”ground state”:/physics/simulations/rand_walks_qm/2D_0.py , “first potential”:/physics/simulations/rand_walks_qm/2D_1.py , “second potential”:/physics/simulations/rand_walks_qm/2D_2.py ). Note that this code does not relate well to the real values expected; I didn’t work out what the problem was before I had to hand in the report.

I just ran across dirtSimple’s post “The perception of speed”:http://dirtsimple.org/2004/12/perception-of-speed.html today, so it seems I’ve written this post at a time when the subject’s topical :-)

24 comments ↓

#1 Jimmy Retzlaff on 12.23.04 at 4:44 pm

I tried a few things and 1D_0.py went from 15s to 5s on my machine. The easiest is to put all of the code after the psyco construct into a function before the psyco construct and call it after the psyco contruct. psyco only optimizes code inside functions and methods. That block of code is a big portion of the runtime of your program so you’ll want psyco to see it.

Everything else offers smaller gains, but optimizations like these often lead to more readability as well.

You can replace the two lines that initialize “bins” with “bins = [0] * MAXSTEPS”.

Your implementation of RandomWalk.walk can be replaced with “self.directions[ 0 ] += random.choice((-1, 1))”.

Your implementation of RandomWalk.arrested can be replaced with “return self.distance() == J”.

The loop in the distance method does a bunch of unneeded lookups. Instead try “for coordinate in self.directions: sum += coordinate**2″.

Any calls to _newlist (which doesn’t appear to be used in 1D) can be replaced with “[defaultvalue] * length”.

But these are all incremental improvements. The place where Python performance really shines is in algorithmic experimentation. Once you have a certain level of proficiency in Python, you will save enough development time over lower level languages, that you’ll have time to experiment with alternate algorithms. Instead of getting 10% and 20% speed improvements, this can lead to 5x or 10x improvements or more.

#2 David Brown on 12.23.04 at 7:03 pm

Couple of comments — the previous commenter pointed out some good ideas, and let me add:

Something that is good for pretty much any programming language is storing intermediate results and then referencing the variable rather than calling the function again. You are using len() in a lot of tight loops, and while the call is fast, it adds up.

Look into simplifying the math — sqrt is an expensive call, and unnecessary if all you are doing is comparing. Square the number that you are going to compare against the calculated distance, and then eliminate the call to sqrt.

Learn list comprehensions. Python loves them and optimizes the heck out of them. If you are coding the construct “for i in range(len(foo)): foo[i] = something(foo[i])” a lot, try using “[something(x) for x in foo]”

I’ll repeat myself — Python loves list comprehensions. Learn them, use them, love them.

Learn the standard library — the random.choice() method, mentioned by the previous caller, can replace almost your entire walk() method.

Before attempting to optimize, use the Python profiler to find out where the slowness actually is. It may surprise you. It usually surprises me.

#3 David Brown on 12.23.04 at 7:06 pm

Hmmm. I did a direct basic to python translation of the professors code, and wanted to try sending it to you, but I couldn’t find your address anywhere.

Send me an email if you’d like to see it.

#4 Peter Bowyer on 12.23.04 at 7:22 pm

Thanks David. For your use and others, my email address is

peter
#not this
@
#not this
mapledesign.co.uk

#5 Tom Pollard on 12.23.04 at 8:55 pm

Try using

dirn = random.random() < 0.5

instead of

dirn = random.randint(0,1).

The random() call is much faster than randint(), for some reason.
This, combined with the sorts of changes Jimmy Retzlaff advised, actually makes the python script speedy on my system.

#6 David M. Cooke on 12.23.04 at 10:03 pm

First, some style comments.
* It’s a good practice to avoid using the names of builtins as variable or function names. In particular, your filter function shadows the builtin of that name, and the sum variable in RandomWalk.distance.

* Declaring directions in the class definition (above the __init__) is not necessary.

* You call random.seed() every time you create a RandomWalk object — this is usually a bad idea. You should do this once at the beginning of the program.

* RandomWalk.distance() can be written simpler:
s = 0
for d in self.directions:
    s += d**2
return s
where I return distance**2 instead, and change the compare in RandomWalk.walk() to compare against the (precomputed) J**2.

* Inlining RandomWalk.distance() into RandomWalk.arrested gave me a 10% speed-up.

* Your other programs for the 2D well are all similiar. I find it’s much easier, instead of maintaining several programs that differ by little bits, to make a module that encompasses most of the functionality, and subclass portions as necessary. This is of course a general software practice, and hopefully obvious, but I’ve see *a lot* of copy-and-pasted scientific code!!

* I’d pass J as a parameter to RandomWalk.

* Try moving the loop that walks the walker into the RandomWalk object. I’d probably make it a generator, returning True or False for each step. That won’t necessarily make it faster (now), but it does allow you do optimizations in one place in the loop, like inlining the arrested() test.

To make this even faster, I would seriously consider using Pyrex. I’d probably use the GSL library for the random-number generator, and make sure I’m calling C routines instead of Python ones (C’s sqrt instead of Python’s sqrt, for instance). A quick implementation of 1D_0.py gave a 4x speedup. More work speeds it up to 25x the original.

That’s all I got. Hope it helps.

#7 Phillip J. Eby on 12.23.04 at 10:27 pm

If you’re using Python 2.4, try rewriting the ‘arrested’ method as:

return sqrt(sum(x*x for x in self.directions))==J

Or better yet:

return sum(x*x for x in self.directions)==J2

(where J2 = J*J)

This should give you an enormous speedup. You should also get another small speedup by computing ‘range(MAXSTEPS)’ *once*, and keeping it in a constant, rather than computing it 20,000 times as you are currently doing. There are other inefficient things being done in your code, but these two are probably the most immediate in terms of bang for the buck.

Also, the Basic program isn’t doing the same thing as your program is doing — it’s doing several things differently, including the fact that it’s not doing any floating point math and doesn’t make any function calls, so of course it’s going to be faster! This is an apples-to-oranges comparison. If you wrote a straightforward translation of the Basic program, it would be a hell of a lot faster than your Python program, and would probably produce similar output as well.

#8 Keir Mierle on 12.24.04 at 8:58 am

Scientists who use Python use “SciPy”:http://www.scipy.org/

I rewrote the ‘interesting’ part of the fastwalk using SciPy (mostly Numeric). The 2D case runs in 14 seconds, compared to 40 seconds with your original 1D code. This is n-dimensional. Further questions, try mierle with an at gmail.com.

“fastwalk.py”:http://www.ecf.utoronto.ca/~mierle/fastwalk.py

You just have to learn to think in terms of vectorizing. Think, “How can I randomly walk the drunkards, all at the same time?” Check out my code for more details.

#9 Perry Greenfield on 12.24.04 at 3:09 pm

(Reposting from an email copy. Linefeeds for the code examples may cause the code to be wrapped in strange ways)

Saw your comments referred to on the Python Daily URL

So as best I can tell from your first code sample the problem is related to doing statistics on a histogram of how many steps it takes a random walk to go a certain distance.

First of all, I’m not sure how you could have read about scientific use of Python without also reading about using it with one of the two array packages available for it. For most cases,Python is not nearly as useful for scientific programming without using one of these if you want to write all your code in Python (it still is very useful for scripting codes written in C, C++, and Fortran). As you can see, large numbers of loops over simple operations will kill you.

So run, don’t walk, to either numarray or Numeric (aka numpy). You goal should be to do as much in array manipulations as possible rather than explicit loops. Some problems lend themselves well to that, some don’t. This looks like it lends itself moderately well (it will still be slower than corresponding C code by a factor of at least a few) but should be way faster than the way you did it. The sample I show below is wasteful in that it will compute the random walk for all steps, not just the “arrest”. I’m guessing that as long as the average number for the arrest is large enough, it isn’t too wasteful. There are other ways of coding this that are a little more complex that will approach C speed. Coding things using array manipulations require thinking about the problem in a different way. At some point, the contortions to do that aren’t worth it, but this doesn’t appear to be one of those cases.

So here is an alternate way of coding your example (note that it is a lot shorter too) using numarray (why numarray? Because we wrote it :-). The results don’t seem to agree quite with yours. I probably have an off by 2 error in what you call y at least but that doesn’t account for all of it. This version runs about 10 times faster on my computer as compared to the non psyco version, 3 seconds vs 32 seconds (though I’m not folding the the startup time of starting python and importing numarray). It can be made faster still by performing the calculation on the ensemble of drunkards (as a 1-d array), and only propagating the calculation for those that haven’t been arrested (each iteration results in a smaller ensemble).

You can also look at tools like weave in scipy for speeding up numerical computations (it can convert expressions to C). It depends on how close to C you want to get.

from numarray import *
from numarray.random_array import randint
from numarray.nd_image import histogram

MAXSTEPS = 300
NUM_DRUNKARDS = 20000
J = 12
DIRECTIONS = 1

def rw():
# This will use a lot of memory if maxsteps * num_drunkards is
large.
# In that case just loop over a large ensemble. The overhead of looping
# over large computations is negligible.
rd = randint(0,2, (NUM_DRUNKARDS, MAXSTEPS))
# turn into -1, +1 values
rd = 2*rd - 1
# accumulate for net distance along axis (not hard to generalize
# to n dimensions)
dist = abs(add.accumulate(rd, axis=1))
# select cases where arrest occurs
arrests = where(logical_or.reduce(dist >= J, axis=1))
print “number of arrests = “, len(arrests[0]), ” out of “,
NUM_DRUNKARDS, ” trials”
# Now construct histogram
# First determine location of first arrest
arrestdist = dist[arrests] # selecting only cases with arrests
arreststep = (arrestdist >= J).argmax() # index location of first
dist=j
arresthist = histogram(arreststep,0,MAXSTEPS, MAXSTEPS+1)

# Not quite following what was meant by survivor and the filter, but
# this seems to be the equivalent calculation (but my results don’t
agree)

survivors = arresthist[1::2].astype(Float32) # only even values can
be nonzero
x = survivors.copy()
x[where(x == 0.)] = e
x = log(x) # you did want natural logs, right?
y = arange(len(x))*2
yerr = 0.*survivors
nzs = where(survivors > 0)
yerr[nzs] = 1./sqrt(survivors[nzs])
for i in arange(len(y)): # instead of to a file as the example does
print y[i], x[i], yerr[i]

#10 florian on 12.27.04 at 12:02 pm

I’ve somewhat shortiefied/speedified the 1d example.
see source here http://codeflow.org/files/1d_0

#11 Birdman's land on 12.27.04 at 3:04 pm

Python kinda sorta slow

#12 Wolfgang on 12.27.04 at 4:38 pm

The ‘Python code for a 1 dimensional potential well’ measures nothing more as that python function calls are slow.
Use the python profiler to see this fact. Remove unused code and clean up all for higher readability.

Read:

http://trific.ath.cx/resources/python/optimization/
http://www.python.org/doc/essays/list2str.html
http://manatee.mojam.com/~skip/python/fastpython.html
http://www.python.org/doc/faq/programming.html
Part 1.1.5 My program is too slow. How do I speed it up?

#13 Paul Winkler on 12.28.04 at 2:12 am

Re. a previous comment:

“”"Learn list comprehensions. Python loves them and optimizes the heck out of them. If you are coding the construct “for i in range(len(foo)): foo[i] = something(foo[i])” a lot, try using “[something(x) for x in foo]”
“”"

I don’t believe this is true at all. I have always heard that list comprehensions are simply syntactic sugar for the equivalent for loop, with maybe some allocation-time savings if the size of the list can be guessed. I tried your specific example and measured NO benefit to using list comprehensions for that case. I’d love to be proven wrong, though.

#14 Al Christians on 12.28.04 at 6:15 am

Python is much slower than C, and compiled BASIC with the most popular BASIC compiler is relatively close to C in speed. You only notice the difference rarely, maybe 5% of all programs where it makes sufficient difference that you can even tell how slow it is. This program is one of those. No big deal — even if you notice a speed difference, it doesn’t mean that Python is not the preferred tool. Suppose a program like this runs in 1 minute if written in some bad language and takes 20 minutes if done in Python. Python costs you 19 minutes of run time, but CPU time on a desktop is the closest thing to free that we have. The most expensive scenario is if you spend the 19 minutes watching the program run, and it costs you 19 minutes of people time. Didn’t python save you 19 minutes writing it?

Do the math. Figure out, based on what the total lifetime run time of this program will be, how much the total run time goes up from using python. (I figure that research calculations like this are not re-run many times.) Try to guess how much people time you will save writing and testing this program in python. Figure what the advantage is in time saved of staying with 1 high-productivity language as much as possible.

#15 Florian on 12.28.04 at 10:51 am

Al:
Python of course is slower on a per instruction level then C, period. If you’re not happy with that, use pyrex. If you’re not happy with that use swig/boost-python. If you’re still unhappy read along.

The benefit of python is not onyl implementation time alone, that’s wrong arguing.

The benefit of clear, concise, uncluttered and short code is time-savings on implementation and maintenance, less bugs and greater speed.

Any sufficently large problem is hard to express in C or Assembler alone. Thus projects implement half of a lisp-parser to express the domain problems. By that time they’ve bought in slowness of a parser and uglyness of a low-level language coupled with a half-bred language. Congrats to that.

What you have to realize is that a big factor for speed is algorithmic intelligence. And when your algorithms are expressed as clearly and shortly as possible, it is much easier to archive this goal.

#16 George Paci on 12.28.04 at 9:54 pm

You’re calling random.seed() every time you instantiate a drunkard. Unless you’re running python 2.4 and up, and know that your OS’s source of random numbers is good, you should only call seed() once. Mac OS X’s source is bad (probably the system time), leading to very un-random results. This may have something to do with your scatter mismatch.

#17 Toby Donaldson on 01.01.05 at 6:14 am

Another simple idea that results in a big speed-up is to simply get rid of the RandomWalk class. In the code as given, the RandomWalk class does nothing more than inefficiently maintain a single variable as a list. Replacing the RandomWalk object with an integer variable and in-lining the walk and arrested method result in a run-time of about 1.5s, compared to 15s for the original program. Including psyco gets it to under a second.

#18 Anon Y Mouse on 01.16.05 at 9:58 pm

As mentioned before, refactoring the code a bit to make it work with psyco will speed it up quite a bit. Move the psyco import into the body of the module, rather than in the if __name__ conditional, and move the main loop into its own function. This should give a 300% improvement in speed.

Also, move some of the array initilizations into list comprehensions, replace conditionals with inline funtions where possible, avoid randint() (use random() instead), and dereference array variables once when used multiple times in a loop. Those changes will give you another 30% or so improvement.

How fast was the BASIC version?

#19 Kirby Urner on 04.11.06 at 3:37 pm

None of the original code (Python) seems available when I click on the links. Internal server error.

#20 Peter Bowyer on 04.11.06 at 8:58 pm

Thanks Kirby, this server executed Python unlike my old hosting provider. The files are now accessible again.

#21 sergio on 08.01.06 at 4:30 am

If math speed is your primary concern, a dynamic high-level script language like Python is not a good choice. You can optimize until your blue in the face. Why? Make a Python C module out of the critical code and let Python do the rest.

#22 Eduardo Costa on 02.02.08 at 4:54 am

People keep claiming that Python is slow, but more productive than other languages. This is not true; I mean, it is true that Python is slow, but not true that it is more productive than other languages. For instance, Clean is very productive (if you use things like list comprehension, and array comprehension). The same can be said about OCAML. Notwithstanding, these languages are very fast, almost as fast as C, and faster in a few cases. They are faster than Fortran, for instance. Clean is very good to detect errors too. Errors are detected during compilation, which is much better. Besides this, the claim that Python programs are concise is greatly exagerated; for instance, the program mandelbrot, in the Computer Language Benchmarks Game, takes 32 lines in Python, and 31 lines in Clean; therefore, the programs have roughly the same size. But the program in Clean is 65 times faster. The recursive program has also the same size in both languages, but it is 177 times faster in Clean. Python beats Clean only in the amount of memory used. In fact , Python uses much more memory than Clean; however, there are experts that claim that languages should use little memory; if they are right, then Python will be defeated in all items.

#23 ron_paulite on 05.05.08 at 9:09 pm

i am not really a programmer, but i think python is really slow. zope/pone is written in python. i think Google Apps is written in python and Google Apps is not fast.
facebook and mediawiki are written in PHP and they are fast!
ASP is sloooow
Phython is slooow
Ruby is sloow
Java is slow
PHP is okay
C/C++ is superfast!

#24 Mike on 06.15.08 at 1:26 pm

Of course Python is slow, and it will remain substantially slower than compiled languages, no matter how you tweak the code. Also notice how all the above suggestions detract from Python’s purported strength - productivity. Instead of focusing on your algorithm, they urge you to try all kinds of tweaks and focus on implementation details.

Numeric Python is great for its scope but not so much for its speed.

Leave a Comment