C#/.NET – Finding an Item’s Index in IEnumerable<T>

Posted by James Michael Hare on Geeks with Blogs See other posts from Geeks with Blogs or by James Michael Hare
Published on Fri, 07 Jan 2011 01:29:13 GMT Indexed on 2011/01/07 1:54 UTC
Read the original article Hit count: 469

Filed under:

Sorry for the long blogging hiatus.  First it was, of course, the holidays hustle and bustle, then my brother and his wife gave birth to their son, so I’ve been away from my blogging for two weeks.

Background: Finding an item’s index in List<T> is easy…

Many times in our day to day programming activities, we want to find the index of an item in a collection.  Now, if we have a List<T> and we’re looking for the item itself this is trivial:

   1: // assume have a list of ints:
   2: var list = new List<int> { 1, 13, 42, 64, 121, 77, 5, 99, 132 };
   3:  
   4: // can find the exact item using IndexOf()
   5: var pos = list.IndexOf(64);

This will return the position of the item if it’s found, or –1 if not.  It’s easy to see how this works for primitive types where equality is well defined.  For complex types, however, it will attempt to compare them using EqualityComparer<T>.Default which, in a nutshell, relies on the object’s Equals() method.

So what if we want to search for a condition instead of equality?  That’s also easy in a List<T> with the FindIndex() method:

   1: // assume have a list of ints:
   2: var list = new List<int> { 1, 13, 42, 64, 121, 77, 5, 99, 132 };
   3:  
   4: // finds index of first even number or -1 if not found.
   5: var pos = list.FindIndex(i => i % 2 == 0);

 

Problem: Finding an item’s index in IEnumerable<T> is not so easy...

This is all well and good for lists, but what if we want to do the same thing for IEnumerable<T>?  A collection of IEnumerable<T> has no indexing, so there’s no direct method to find an item’s index. 

LINQ, as powerful as it is, gives us many tools to get us this information, but not in one step.  As with almost any problem involving collections, there are several ways to accomplish the same goal.  And once again as with almost any problem involving collections, the choice of the solution somewhat depends on the situation.

So let’s look at a few possible alternatives.  I’m going to express each of these as extension methods for simplicity and consistency.

Solution: The TakeWhile() and Count() combo

One of the things you can do is to perform a TakeWhile() on the list as long as your find condition is not true, and then do a Count() of the items it took.  The only downside to this method is that if the item is not in the list, the index will be the full Count() of items, and not –1.  So if you don’t know the size of the list beforehand, this can be confusing.

   1: // a collection of extra extension methods off IEnumerable<T>
   2: public static class EnumerableExtensions
   3: {
   4:     // Finds an item in the collection, similar to List<T>.FindIndex()
   5:     public static int FindIndex<T>(this IEnumerable<T> list, Predicate<T> finder)
   6:     {
   7:         // note if item not found, result is length and not -1!
   8:         return list.TakeWhile(i => !finder(i)).Count();
   9:     }
  10: }

Personally, I don’t like switching the paradigm of not found away from –1, so this is one of my least favorites. 

Solution: Select with index

Many people don’t realize that there is an alternative form of the LINQ Select() method that will provide you an index of the item being selected:

   1: list.Select( (item,index) => do something here with the item and/or index... )

This can come in handy, but must be treated with care.  This is because the index provided is only as pertains to the result of previous operations (if any).  For example:

   1: // assume have a list of ints:
   2: var list = new List<int> { 1, 13, 42, 64, 121, 77, 5, 99, 132 };
   3:  
   4: // you'd hope this would give you the indexes of the even numbers
   5: // which would be 2, 3, 8, but in reality it gives you 0, 1, 2
   6: list.Where(item => item % 2 == 0).Select((item,index) => index);

The reason the example gives you the collection { 0, 1, 2 } is because the where clause passes over any items that are odd, and therefore only the even items are given to the select and only they are given indexes.

Conversely, we can’t select the index and then test the item in a Where() clause, because then the Where() clause would be operating on the index and not the item!

So, what we have to do is to select the item and index and put them together in an anonymous type.  It looks ugly, but it works:

   1: // extensions defined on IEnumerable<T>
   2: public static class EnumerableExtensions
   3: {
   4:     // finds an item in a collection, similar to List<T>.FindIndex()
   5:     public static int FindIndex<T>(this IEnumerable<T> list, Predicate<T> finder)
   6:     {
   7:         // if you don't name the anonymous properties they are the variable names
   8:         return list.Select((item, index) => new { item, index })
   9:             .Where(p => finder(p.item))
  10:             .Select(p => p.index + 1)
  11:             .FirstOrDefault() - 1;
  12:     }
  13: }

 

 

So let’s look at this, because i know it’s convoluted:

  1. First Select() joins the items and their indexes into an anonymous type.
  2. Where() filters that list to only the ones matching the predicate.
  3. Second Select() picks the index of the matches and adds 1 – this is to distinguish between not found and first item.
  4. FirstOrDefault() returns the first item found from the previous clauses or default (zero) if not found.
  5. Subtract one so that not found (zero) will be –1, and first item (one) will be zero.

The bad thing is, this is ugly as hell and creates anonymous objects for each item tested until it finds the match.  This concerns me a bit but we’ll defer judgment until compare the relative performances below.

Solution: Convert ToList() and use FindIndex()

This solution is easy enough.  We know any IEnumerable<T> can be converted to List<T> using the LINQ extension method ToList(), so we can easily convert the collection to a list and then just use the FindIndex() method baked into List<T>.

   1: // a collection of extension methods for IEnumerable<T>
   2: public static class EnumerableExtensions
   3: {
   4:     // find the index of an item in the collection similar to List<T>.FindIndex()
   5:     public static int FindIndex<T>(this IEnumerable<T> list, Predicate<T> finder)
   6:     {
   7:         return list.ToList().FindIndex(finder);
   8:     }
   9: }

This solution is simplicity itself!  It is very concise and elegant and you need not worry about anyone misinterpreting what it’s trying to do (as opposed to the more convoluted LINQ methods above).

But the main thing I’m concerned about here is the performance hit to allocate the List<T> in the ToList() call, but once again we’ll explore that in a second.

Solution: Roll your own FindIndex() for IEnumerable<T>

Of course, you can always roll your own FindIndex() method for IEnumerable<T>.  It would be a very simple for loop which scans for the item and counts as it goes.  There’s many ways to do this, but one such way might look like:

   1: // extension methods for IEnumerable<T>
   2: public static class EnumerableExtensions
   3: {
   4:     // Finds an item matching a predicate in the enumeration, much like List<T>.FindIndex()
   5:     public static int FindIndex<T>(this IEnumerable<T> list, Predicate<T> finder)
   6:     {
   7:         int index = 0;
   8:         foreach (var item in list)
   9:         {
  10:             if (finder(item))
  11:             {
  12:                 return index;
  13:             }
  14:  
  15:             index++;
  16:         }
  17:  
  18:         return -1;
  19:     }
  20: }

Well, it’s not quite simplicity, and those less familiar with LINQ may prefer it since it doesn’t include all of the lambdas and behind the scenes iterators that come with deferred execution.  But does having this long, blown out method really gain us much in performance?

Comparison of Proposed Solutions

So we’ve now seen four solutions, let’s analyze their collective performance.  I took each of the four methods described above and run them over 100,000 iterations of lists of size 10, 100, 1000, and 10000 and here’s the performance results.  Then I looked for targets at the begining of the list (best case), middle of the list (the average case) and not in the list (worst case as must scan all of the list).

Each of the times below is the average time in milliseconds for one execution as computer over the 100,000 iterations:

Searches Matching First Item (Best Case)

  10 100 1000 10000
TakeWhile 0.0003 0.0003 0.0003 0.0003
Select 0.0005 0.0005 0.0005 0.0005
ToList 0.0002 0.0003 0.0013 0.0121
Manual 0.0001 0.0001 0.0001 0.0001

 

Searches Matching Middle Item (Average Case)

  10 100 1000 10000
TakeWhile 0.0004 0.0020 0.0191 0.1889
Select 0.0008 0.0042 0.0387 0.3802
ToList 0.0002 0.0007 0.0057 0.0562
Manual 0.0002 0.0013 0.0129 0.1255

 

Searches Where Not Found (Worst Case)

  10 100 1000 10000
TakeWhile 0.0006 0.0039 0.0381 0.3770
Select 0.0012 0.0081 0.0758 0.7583
ToList 0.0002 0.0012 0.0100 0.0996
Manual 0.0003 0.0026 0.0253 0.2514

 

Notice something interesting here, you’d think the “roll your own” loop would be the most efficient, but it only wins when the item is first (or very close to it) regardless of list size.  In almost all other cases though and in particular the average case and worst case, the ToList()/FindIndex() combo wins for performance, even though it is creating some temporary memory to hold the List<T>

If you examine the algorithm, the reason why is most likely because once it’s in a ToList() form, internally FindIndex() scans the internal array which is much more efficient to iterate over.  Thus, it takes a one time performance hit (not including any GC impact) to create the List<T> but after that the performance is much better.

Summary

If you’re concerned about too many throw-away objects, you can always roll your own FindIndex() method, but for sheer simplicity and overall performance, using the ToList()/FindIndex() combo performs best on nearly all list sizes in the average and worst cases. 

 

© Geeks with Blogs or respective owner