What's the best practice to do SOA exception handling?
- by sun1991
Here's some interesting debate going on between me and my colleague when coming to handle SOA exceptions:
On one side, I support what Juval Lowy said in Programming WCF Services 3rd Edition:
As stated at the beginning of this chapter, it is a common illusion that clients care about errors or have anything meaningful to do when they occur. Any attempt to bake such capabilities into the client creates an inordinate degree of coupling between the client and the object, raising serious design questions. How could the client possibly know more about the error than the service, unless it is tightly coupled to it? What if the error originated several layers below the service—should the client be coupled to those lowlevel layers? Should the client try the call again? How often and how frequently? Should the client inform the user of the error? Is there a user? 
By having all service exceptions be indistinguishable from one another, WCF decouples the client from the service. The less the client knows about what happened on the service side, the more decoupled the interaction will be.
On the other side, here's what my colleague suggest:
I believe it’s simply incorrect, as it does not align with best practices in building a service oriented architecture and it ignores the general idea that there are problems that users are able to recover from, such as not keying a value correctly.  If we considered only systems exceptions, perhaps this idea holds, but systems exceptions are only part of the exception domain.  User recoverable exceptions are the other part of the domain and are likely to happen on a regular basis.  I believe the correct way to build a service oriented architecture is to map user recoverable situations to checked exceptions, then to marshall each checked exception back to the client as a unique exception that client application programmers are able to handle appropriately.  Marshall all runtime exceptions back to the client as a system exception, along with the stack trace so that it is easy to troubleshoot the root cause.
I'd like to know what you think about this? Thank you.