Default behaviour of comparisons in Python

Browsing DrProject’s code base a moment ago, I came across this piece of code in drproject.project.Project: (I abbreviated the long expression a bit)

    def get_members(self):
        members = [m.user for m in Membership.query.filter_by(...)]
        members.sort()
        return members

I was curious about what the sort actually did. So, members is a list of drproject.project.User objects. The class User extends Elixir’s Entity class. Neither of these classes appeared to have overridden the comparison operators. EntityMeta didn’t seem to do anything either. And the sort method was being called with no comparator function passed in. What on earth is going on?

The Python language reference page on comparisons was not very helpful. So I had to make some guesses. Since User objects are database rows, maybe they are sorted by their primary keys? But the aforementioned lack of code suggested against this possibility. Based on my experience with object-oriented languages, the only other thing I could think of was the IDs of objects — their memory addresses. It was a good candidate, since IDs are comparable for all objects.

A quick experiment in a Python session gives evidence for this behaviour:

In [1]: import random
In [2]: randcmp = lambda x, y: random.choice([-1, 0, +1])

In [3]: things = [object() for i in range(10)]
In [4]: things  # The IDs are in ascending order merely by coincidence.
Out[4]:
[<object object at 0xb7d6d598>,
 <object object at 0xb7d6d5a0>,
 <object object at 0xb7d6d5a8>,
 <object object at 0xb7d6d5b0>,
 <object object at 0xb7d6d5b8>,
 <object object at 0xb7d6d5c0>,
 <object object at 0xb7d6d5c8>,
 <object object at 0xb7d6d5d0>,
 <object object at 0xb7d6d5d8>,
 <object object at 0xb7d6d5e0>]

In [5]: things.sort(randcmp)
In [6]: things  # Now the IDs are in random order.
Out[6]:
[<object object at 0xb7d6d5c8>,
 <object object at 0xb7d6d5a8>,
 <object object at 0xb7d6d5d0>,
 <object object at 0xb7d6d598>,
 <object object at 0xb7d6d5a0>,
 <object object at 0xb7d6d5b0>,
 <object object at 0xb7d6d5d8>,
 <object object at 0xb7d6d5b8>,
 <object object at 0xb7d6d5c0>,
 <object object at 0xb7d6d5e0>]

In [7]: things.sort()
In [8]: things  # My gosh, we have ascending IDs again!
Out[8]:
[<object object at 0xb7d6d598>,
 <object object at 0xb7d6d5a0>,
 <object object at 0xb7d6d5a8>,
 <object object at 0xb7d6d5b0>,
 <object object at 0xb7d6d5b8>,
 <object object at 0xb7d6d5c0>,
 <object object at 0xb7d6d5c8>,
 <object object at 0xb7d6d5d0>,
 <object object at 0xb7d6d5d8>,
 <object object at 0xb7d6d5e0>]

Is this behaviour unintuitive, and does it let you shoot yourself in the foot easily? You be the judge.

And going back to the original DrProject code, the sort indeed sorts by IDs, which is probably not very useful.

Advertisements
Explore posts in the same categories: DrProject, Python, Uncategorized

4 Comments on “Default behaviour of comparisons in Python”

  1. Greg Wilson Says:

    So is the sort useful in this context?

  2. Liam Clarke Says:

    So why are they not passing a custom sort function to the sort method? Seems like a bit of superstitious code to me.


  3. The only reason I can think of is this: you want to show pages with a consistent ordering of items. The sort criteria doesn’t matter, but, for example, if you exit the item list page to go to an edit page and then return to the items list page, you want the items to have the same position. I even think I’ve done this in several of my apps (but usually I write a TODO in that code, to add real sorting criterias).


  4. Duh, if you don’t provide a comparison method for your class, Python can’t choose a relevant comparison method. Anyway, this changes in Python 3.0:
    >>> class A(object): pass
    ...
    >>> sorted([A(),A()])
    Traceback (most recent call last):
    File "", line 1, in TypeError: unorderable types: A() < A()

    (In Python 2.5, it returns [<__main__.A object at 0x82b032c>, <__main__.A object at 0x82b0e4c>], worthlessly.)
    So, the original code would have thrown an error instead of wasting cycles if it had been done in Python 3.0. Obviously, the 3 response is more appropriate in general, but unfortunately, this can’t be done in 2, since it would break code that depends on the current behavior.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: