LINQ and ArcObjects

Posted by Marko Apfel on Geeks with Blogs See other posts from Geeks with Blogs or by Marko Apfel
Published on Wed, 12 Sep 2012 13:20:00 GMT Indexed on 2012/09/12 21:39 UTC
Read the original article Hit count: 289

Filed under:

Motivation

LINQ (language integrated query) is a component of the Microsoft. NET Framework since version 3.5. It allows a SQL-like query to various data sources such as SQL, XML etc.

Like SQL also LINQ to SQL provides a declarative notation of problem solving – i.e. you don’t need describe in detail how a task could be solved, you describe what to be solved at all. This frees the developer from error-prone iterator constructs.

Ideally, of course, would be to access features with this way. Then this construct is conceivable:

var largeFeatures =
	from feature in features
	where (feature.GetValue("SHAPE_Area").ToDouble() > 3000)
	select feature;

or its equivalent as a lambda expression:

var largeFeatures =
	features.Where(feature => 
		(feature.GetValue("SHAPE_Area").ToDouble() > 3000));

This requires an appropriate provider, which manages the corresponding iterator logic. This is easier than you might think at first sight - you have to deliver only the desired entities as IEnumerable<IFeature>.

LINQ automatically establishes a state machine in the background, whose execution is delayed (deferred execution) - when you are really request entities (foreach, Count (), ToList (), ..) an instantiation processing takes place, although it was already created at a completely different place. Especially in multiple iteration through entities in the first debuggings you are rubbing your eyes when the execution pointer jumps magically back in the iterator logic.

Realization

A very concise logic for constructing IEnumerable<IFeature> can be achieved by running through a IFeatureCursor. You return each feature via yield. For an easier usage I have put the logic in an extension method Getfeatures() for IFeatureClass:

public static IEnumerable<IFeature> GetFeatures(this IFeatureClass featureClass,
	IQueryFilter queryFilter, RecyclingPolicy policy)
{
	IFeatureCursor featureCursor =
	featureClass.Search(queryFilter, RecyclingPolicy.Recycle == policy);
 
	IFeature feature;
	while (null != (feature = featureCursor.NextFeature()))
	{
		yield return feature;
	}
 
	//this is skipped in unit tests with cursor-mock
	if (Marshal.IsComObject(featureCursor))
	{
		Marshal.ReleaseComObject(featureCursor);
	}
}

So you can now easily generate the IEnumerable<IFeature>:

IEnumerable<IFeature> features = 
	_featureClass.GetFeatures(RecyclingPolicy.DoNotRecycle);

You have to be careful with the recycling cursor. After a delayed execution in the same context it is not a good idea to re-iterated on the features. In this case only the content of the last (recycled) features is provided and all the features are the same in the second set.

Therefore, this expression would be critical:

largeFeatures.ToList().
	ForEach(feature => Debug.WriteLine(feature.OID));

because ToList() iterates once through the list and so the the cursor was once moved through the features. So the extension method ForEach() always delivers the same feature.

In such situations, you must not use a recycling cursor.

Repeated executions of ForEach() is not a problem, because for every time the state machine is re-instantiated and thus the cursor runs again - that's the magic already mentioned above.

Perspective

Now you can also go one step further and realize your own implementation for the interface IEnumerable<IFeature>. This requires that only the method and property to access the enumerator have to be programmed. In the enumerator himself in the Reset() method you organize the re-executing of the search. This could be archived with an appropriate delegate in the constructor:

new FeatureEnumerator<IFeatureclass>(_featureClass,
	featureClass => featureClass.Search(_filter, isRecyclingCursor));

which is called in Reset():

public void Reset() 
{
	_featureCursor = _resetCursor(_t); 
}

In this manner, enumerators for completely different scenarios could be implemented, which are used on the client side completely identical like described above. Thus cursors, selection sets, etc. merge into a single matter and the reusability of code is increasing immensely.

On top of that in automated unit tests an IEnumerable could be mocked very easily - a major step towards better software quality.

Conclusion

Nevertheless, caution should be exercised with these constructs in performance-relevant queries. Because of managing a state machine in the background, a lot of overhead is created. The processing costs additional time - about 20 to 100 percent. In addition, working without a recycling cursor is fast a performance gap.

However declarative LINQ code is much more elegant, flawless and easy to maintain than manually iterating, compare and establish a list of results. The code size is reduced according to experience an average of 75 to 90 percent! So I like to wait a few milliseconds longer.

As so often it has to be balanced between maintainability and performance - which for me is gaining in priority maintainability. In times of multi-core processors, the processing time of most business processes is anyway not dominated by code execution but by waiting for user input.

Demo source code

The source code for this prototype with several unit tests, you can download here: https://github.com/esride-apf/Linq2ArcObjects.

.

Developer IT