Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. I’ve covered many valuable methods from System.Linq class library before, so you already know it’s packed with extension-method goodness. Today I’d like to cover two small families I’ve neglected to mention before: Skip() and Take(). While these methods seem so simple, they are an easy way to create sub-sequences for IEnumerable<T>, much the way GetRange() creates sub-lists for List<T>. Skip() and SkipWhile() The Skip() family of methods is used to ignore items in a sequence until either a certain number are passed, or until a certain condition becomes false. This makes the methods great for starting a sequence at a point possibly other than the first item of the original sequence. The Skip() family of methods contains the following methods (shown below in extension method syntax): Skip(int count) Ignores the specified number of items and returns a sequence starting at the item after the last skipped item (if any). SkipWhile(Func<T, bool> predicate) Ignores items as long as the predicate returns true and returns a sequence starting with the first item to invalidate the predicate (if any). SkipWhile(Func<T, int, bool> predicate) Same as above, but passes not only the item itself to the predicate, but also the index of the item. For example: 1: var list = new[] { 3.14, 2.72, 42.0, 9.9, 13.0, 101.0 };
2:
3: // sequence contains { 2.72, 42.0, 9.9, 13.0, 101.0 }
4: var afterSecond = list.Skip(1);
5: Console.WriteLine(string.Join(", ", afterSecond));
6:
7: // sequence contains { 42.0, 9.9, 13.0, 101.0 }
8: var afterFirstDoubleDigit = list.SkipWhile(v => v < 10.0);
9: Console.WriteLine(string.Join(", ", afterFirstDoubleDigit));
Note that the SkipWhile() stops skipping at the first item that returns false and returns from there to the rest of the sequence, even if further items in that sequence also would satisfy the predicate (otherwise, you’d probably be using Where() instead, of course).
If you do use the form of SkipWhile() which also passes an index into the predicate, then you should keep in mind that this is the index of the item in the sequence you are calling SkipWhile() from, not the index in the original collection.
That is, consider the following:
1: var list = new[] { 1.0, 1.1, 1.2, 2.2, 2.3, 2.4 };
2:
3: // Get all items < 10, then
4: var whatAmI = list
5: .Skip(2)
6: .SkipWhile((i, x) => i > x);
For this example the result above is 2.4, and not 1.2, 2.2, 2.3, 2.4 as some might expect. The key is knowing what the index is that’s passed to the predicate in SkipWhile(). In the code above, because Skip(2) skips 1.0 and 1.1, the sequence passed to SkipWhile() begins at 1.2 and thus it considers the “index” of 1.2 to be 0 and not 2. This same logic applies when using any of the extension methods that have an overload that allows you to pass an index into the delegate, such as SkipWhile(), TakeWhile(), Select(), Where(), etc.
It should also be noted, that it’s fine to Skip() more items than exist in the sequence (an empty sequence is the result), or even to Skip(0) which results in the full sequence. So why would it ever be useful to return Skip(0) deliberately? One reason might be to return a List<T> as an immutable sequence.
Consider this class:
1: public class MyClass
2: {
3: private List<int> _myList = new List<int>();
4:
5: // works on surface, but one can cast back to List<int> and mutate the original...
6: public IEnumerable<int> OneWay
7: {
8: get { return _myList; }
9: }
10:
11: // works, but still has Add() etc which throw at runtime if accidentally called
12: public ReadOnlyCollection<int> AnotherWay
13: {
14: get { return new ReadOnlyCollection<int>(_myList); }
15: }
16:
17: // immutable, can't be cast back to List<int>, doesn't have methods that throw at runtime
18: public IEnumerable<int> YetAnotherWay
19: {
20: get { return _myList.Skip(0); }
21: }
22: }
This code snippet shows three (among many) ways to return an internal sequence in varying levels of immutability. Obviously if you just try to return as IEnumerable<T> without doing anything more, there’s always the danger the caller could cast back to List<T> and mutate your internal structure. You could also return a ReadOnlyCollection<T>, but this still has the mutating methods, they just throw at runtime when called instead of giving compiler errors. Finally, you can return the internal list as a sequence using Skip(0) which skips no items and just runs an iterator through the list. The result is an iterator, which cannot be cast back to List<T>.
Of course, there’s many ways to do this (including just cloning the list, etc.) but the point is it illustrates a potential use of using an explicit Skip(0).
Take() and TakeWhile()
The Take() and TakeWhile() methods can be though of as somewhat of the inverse of Skip() and SkipWhile(). That is, while Skip() ignores the first X items and returns the rest, Take() returns a sequence of the first X items and ignores the rest.
Since they are somewhat of an inverse of each other, it makes sense that their calling signatures are identical (beyond the method name obviously):
Take(int count)
Returns a sequence containing up to the specified number of items. Anything after the count is ignored.
TakeWhile(Func<T, bool> predicate)
Returns a sequence containing items as long as the predicate returns true. Anything from the point the predicate returns false and beyond is ignored.
TakeWhile(Func<T, int, bool> predicate)
Same as above, but passes not only the item itself to the predicate, but also the index of the item.
So, for example, we could do the following:
1: var list = new[] { 1.0, 1.1, 1.2, 2.2, 2.3, 2.4 };
2:
3: // sequence contains 1.0 and 1.1
4: var firstTwo = list.Take(2);
5:
6: // sequence contains 1.0, 1.1, 1.2
7: var underTwo = list.TakeWhile(i => i < 2.0);
The same considerations for SkipWhile() with index apply to TakeWhile() with index, of course.
Using Skip() and Take() for sub-sequences
A few weeks back, I talked about The List<T> Range Methods and showed how they could be used to get a sub-list of a List<T>. This works well if you’re dealing with List<T>, or don’t mind converting to List<T>. But if you have a simple IEnumerable<T> sequence and want to get a sub-sequence, you can also use Skip() and Take() to much the same effect:
1: var list = new List<double> { 1.0, 1.1, 1.2, 2.2, 2.3, 2.4 };
2:
3: // results in List<T> containing { 1.2, 2.2, 2.3 }
4: var subList = list.GetRange(2, 3);
5:
6: // results in sequence containing { 1.2, 2.2, 2.3 }
7: var subSequence = list.Skip(2).Take(3);
I say “much the same effect” because there are some differences. First of all GetRange() will throw if the starting index or the count are greater than the number of items in the list, but Skip() and Take() do not. Also GetRange() is a method off of List<T>, thus it can use direct indexing to get to the items much more efficiently, whereas Skip() and Take() operate on sequences and may actually have to walk through the items they skip to create the resulting sequence.
So each has their pros and cons. My general rule of thumb is if I’m already working with a List<T> I’ll use GetRange(), but for any plain IEnumerable<T> sequence I’ll tend to prefer Skip() and Take() instead.
Summary
The Skip() and Take() families of LINQ extension methods are handy for producing sub-sequences from any IEnumerable<T> sequence. Skip() will ignore the specified number of items and return the rest of the sequence, whereas Take() will return the specified number of items and ignore the rest of the sequence.
Similarly, the SkipWhile() and TakeWhile() methods can be used to skip or take items, respectively, until a given predicate returns false.
Technorati Tags: C#, CSharp, .NET, LINQ, IEnumerable<T>, Skip, Take, SkipWhile, TakeWhile