Friday 7 March 2014

Python Equivalents of LINQ Methods

In my last post, I looked at how Python’s list comprehensions and generators allow you to achieve many of the same tasks that you would use LINQ for in C#. In this post, we’ll look at Python equivalents for some of the most popular LINQ extension methods. We’ll mostly be looking at Python’s built-in functions and itertools module.

For these examples, our test data will be a list of fruit. But all of these techniques work with any interable, including the output of generator functions. Here’s our Python test data

fruit = ['apple', 'orange', 'banana', 'pear', 
         'raspberry', 'peach', 'plum']

Which of course in C# is

var fruit = new List<string>() { "apple", "orange",
 "banana", "pear", "raspberry", "peach", "plum" };

Any & All

LINQ’s Any method allows you to test whether any of the items in a sequence fulfil a certain requirement, while All checks if all of them do. Python’s built-in functions are named the same, so it’s really straightforward. Let’s see if any of our fruit contain the letter “e”, then see if all of them do:

>>> any("e" in f for f in fruit)
True
>>> all("e" in f for f in fruit)
False

in LINQ:

fruit.Any(f => f.Contains("e"));
fruit.All(f => f.Contains("e"));

Min & Max

Again, Python has built-in functions similarly named to LINQ. Let’s find the minimum and maximum fruit lengths:

>>> max(len(f) for f in fruit)
9
>>> min(len(f) for f in fruit)
4

which are the equivalents of:

fruit.Max(f => f.Length);
fruit.Min(f => f.Length);

Take, Skip, TakeWhile & SkipWhile

LINQ’s Take and Skip methods are very useful for paging data, or limiting the amount you process, and TakeWhile and SkipWhile come in handy from time to time as well (TakeWhile can be a good way of checking for user cancellation).

Take and Skip can be implemented using the itertools islice function. We can specify an end index, or a start and end index. If the end index is None, that means keep going to the end of the iterable. I’d prefer methods actually called “skip” and “take” as I think that makes for more readable code, but they could be easily created if needed.

Here’s Take(2) and Skip(2) implemented with Python. Since islice returns a generator function, I turn it into a list for debugging purposes:

>>> from itertools import islice
>>> list(islice(fruit, 2))
['apple', 'orange']
>>> list(islice(fruit, 2, None))
['banana', 'pear', 'raspberry', 'peach', 'plum']

islice does have the benefit though of letting you combine a skip and a take into one step rather than chaining them like you would in C#:

fruit.Skip(2).Take(2);

with islice:

>>> list(islice(fruit, 2, 4))
['banana', 'pear']

The itertools module does include a “takewhile” method and for LINQ’s SkipWhile, it’s “dropwhile”. With these functions, you might want to use Python’s lambda syntax, which is a rare example of where the Python is less succinct than C#.

>>> from itertools import takewhile
>>> list(takewhile(lambda c: len(c) < 7, fruit))
['apple', 'orange', 'banana', 'pear']
>>> from itertools import dropwhile
>>> list(dropwhile(lambda c: len(c) < 7, fruit))
['raspberry', 'peach', 'plum']

Here’s the same TakeWhile and SkipWhile in C#:

fruit.TakeWhile (f => f.Length < 7);
fruit.SkipWhile (f => f.Length < 7);

First, FirstOrDefault, & Last

With LINQ you can easily get the first item from an IEnumerable. This throws an exception if the sequence is empty, so FirstOrDefault can be used alternatively. With Python, the “next” method can be used on an iterable (but not on a list). Let’s use Python to get the first fruit starting with “p” and to return a default value when our generator looking for the first fruit starting with “q” doesn’t find any elements.

>>> next(f for f in fruit if f.startswith("p"))
'pear'
>>> next((f for f in fruit if f.startswith("q")), "none")
'none'

There does not seem to be any built-in Python function to implement LINQ’s “Last” or “LastOrDefault” methods, but you could quite easily create one. Here’s a fairly rudimentary one:

>>> def lastOrDefault(sequence, default=None):
...     lastItem = default
...     for s in sequence:
...         lastItem = s
...     return lastItem
...
>>> lastOrDefault((f for f in fruit if f.endswith("e")))
'orange'
>>> lastOrDefault((f for f in fruit if f.startswith("x")), "no fruit found")
'no fruit found'
You could do the same if you really needed the LINQ “Single” or “SingleOrDefault” methods, which also have no direct equivalent.

Count

The LINQ Count extension method lets you count how many items are in a sequence. For example, how many fruit begin with ”p”?

fruit.Count(f => f.StartsWith("p"))
Probably the most logical expectation would be that Python’s “len” function would do the same, but you can’t call len on an iterable. There is a neat trick though you can use with the “sum” built-in function.
>>> sum(1 for f in fruit if f.startswith("p"))
3

Select & Where

We saw in the last blog post that a list comprehension already includes the capabilities of LINQ’s Select and Where, but there may be times you want to them to be available as functions. Python’s “map” and “filter” function take an iterable and a lamba and return an iterator (this is Python 3 only – in Python 2 they returned lists). Here’s a couple of simple examples of them in action, with the output turned into a list for debug purposes:

>>> list(map(lambda x: x.upper(), fruit))
['APPLE', 'ORANGE', 'BANANA', 'PEAR', 'RASPBERRY', 'PEACH', 'PLUM']
>>> list(filter(lambda x: "n" in x, fruit))
['orange', 'banana']

 

GroupBy

At first glance it might appear that itertools groupby method behaves the same as LINQ’s GroupBy, but there is a gotcha. Python’s groupby expects the incoming data to be sorted by the key, so you have to call sorted first. This example shows us first trying to group without sorting (resulting in two “p” groups), and then doing it the right way. We’re grouping by first letter of the fruit, and I’m using a helper method to print out the contents of the grouped data:

>>> def printGroupedData(groupedData):
...     for k, v in groupedData:
...         print("Group {} {}".format(k, list(v)))
...
>>> from itertools import groupby
>>> keyFunc = lambda f: f[0]
>>> printGroupedData(groupby(fruit, keyFunc))
Group a ['apple']
Group o ['orange']
Group b ['banana']
Group p ['pear']
Group r ['raspberry']
Group p ['peach', 'plum']
>>> sortedFruit = sorted(fruit, key=keyFunc)
>>> printGroupedData(groupby(sortedFruit, keyFunc))
Group a ['apple']
Group b ['banana']
Group o ['orange']
Group p ['pear', 'peach', 'plum']
Group r ['raspberry']

OrderBy

As we saw above, the “sorted” built-in function in Python can be used to order a sequence. It returns a list, but this is understandable since to implement OrderBy it must iterate through the entire sequence first. Here we sort the fruit by their string length:

>>> sorted(fruit, key=lambda x:len(x))
['pear', 'plum', 'apple', 'peach', 'orange', 'banana', 'raspberry']

Distinct

As far as I can tell there isn’t a built-in function in Python to emit a distinct iterable sequence, but the easiest way is probably to just construct a set. If you wanted to create a generator function, allowing you to abort early before reaching the end of a sequence, you could create your own helper method:

def distinct(sequence):
    seen = set()
    for s in sequence:
        if not s in seen:
            seen.add(s)
            yield s

Zip

The last example I’ll look at is the Zip method. In Python there is an equivalent zip function, and it is actually a little simpler as it assumes you want a tuple, rather than LINQ’s where you need to explicitly create a result selector function. It actually supports zipping more than two sequences together which is nice. As with LINQ’s Zip, the resulting sequence is the length of the shortest. Here’s a quick example of the Python zip function in action:

>>> recipes = ['pie','juice','milkshake']
>>> list(zip(fruit,recipes))
[('apple', 'pie'), ('orange', 'juice'), ('banana', 'milkshake')]
>>> list(f + " " + r for f,r in zip(fruit,recipes))
['apple pie', 'orange juice', 'banana milkshake']

Conclusion

As can be seen, most of the main LINQ extension methods have fairly close Python equivalents, and those that don’t could be quite easily recreated. I don’t pretend to be an expert on Python, so if I’ve missed any cool tricks, let me know in the comments.

No comments: