DataSets in SOA

Last November, I engaged Udi Dahan (the Software Simplist) on using DataSets in OLTP situations. It turns out that Udi uses a question I posed that kind of preceded this dialogue for his latest podcast at Dr. Dobbs Journal titled DataSets and Web Services. Yes, I am the Jacob he mentions.

You can tell that he's not a great fan of DataSets in general, though he takes pains to treat them fairly. Which puts him ahead of most of the developers I track, really.

Row State

One of the things that Udi doesn't like about DataSets is that they track row state. Unfortunately, he seems to be wrapping together two different consequences of doing so and it's a little bit confusing. His point actually makes more sense if you break it apart a bit.

The first thing he brings up is that because a DataSet tracks row state, many developers build that tracking into the web service as an unstated expectation. I've seen this, too, and it is hideous. In the most extreme cases, you'll get a web service for each of the CRUD operations for your objects—essentially making your web service a thin proxy for your data layer.

The real problem with using row state, however, is that it muddies the contract for your services. This is where Udi is talking about "passing along the intent." Services should have an explicit intent for the data that they receive (or return). Without an explicit intent, it can be hard to discern what any specific service is supposed to do and therefore what business logic that service will enforce. The narrower you can define your intent the better. If your service only knows that a customer record has changed, that service doesn't know if it's supposed to change the address, the name, the marital status, or what all. Which means that the service will not only have to handle all business logic associated with a customer record every time, it also has to try to reconstruct what has happened if it wants to have any kind of concurrency at all.

Incidentally (and more importantly, I think), this also means that those using that service don't know what all it is supposed to do or what criteria it uses to do it. Developers don't deal well with uncertainty. Before you know it, you'll see business logic creep into the calling code "just to be sure." Or worse, you'll see new services pop up duplicating pieces of the larger service.

The problem with row state isn't deterministically bad—just because it's easier to screw up by relying on that additional information doesn't mean you have to screw it up. In other words, just because DataSets track row state doesn't mean you have to give row state significance in your services.

Too Much Information

Row State is really just a specific example of a wider problem, though—namely that DataSets encode a lot of details beyond the data that your service needs to do its stuff. These details make DataSets unsuited for including as a part of an SOA contract because alterations in those implementation details will break a service contract even when the data structures remain the same.

Udi's conclusion is that DataSets should never be used in SOA. I'm forced to agree with him for this last reason. Versioning objects across an enterprise is a tricky business. Minimizing the number of things that break downstream stuff when changed is important to good architecture. The more that you can isolate, the happier you will be.

So the key question for me becomes: are you actually implementing SOA or just using web services? If you're simply using web services, DataSets have limitations to watch for but can be useful. If you're going SOA, keep DataSets to yourself.

I can live with that.

28. March 2007 20:36 by Jacob | Comments (0) | Permalink

Arguing Data

People have a lot of different reasons for posting blog entries. These reasons vary from financial, to personal, to professional, to I'm afraid to know more. For me, one reason I take the time when I could be doing something else is that I like to put my ideas out there to be tested. I don't really care if a majority of people agree with me so much as I want to see what other people have to say for or against certain things. The downside to this is that I'll sometimes find that an idea isn't as good as I had originally thought it was. The upside is the opportunity to refine something to be better or to discard an idea that turns out simply to be bad.

Which is why I'm glad to see Karl Seguin's response to a post I had made about DataSets. Karl's a bright guy and he has a good background in the problem domain associated with DataSet objects. He displays class, too, even when he feels I've been a bit rough in a point or two.

The School of Hard Knocks

I empathize with his experience where DataSet misuse caused much pain and suffering. I've been in similar situations and it's no fun. In a full-blown business transaction environment, DataSets have some liabilities that make them ill-suited for business-layer usage. The thing is, the opposite problem exists as well, and it's one that is more serious than people want to give it credit for: a layer of specialized, hand-crafted business objects that don't actually do anything.

I'm currently working at a place that has an extreme case of this problem. We have four entirely separate ASP.Net applications for our internal invoice processing. All four of these applications have their own set of substantially similar custom objects that are completely unique for that application. Each object doesn't do anything more than contain a group of properties that are populated from a database and write changes back to it.

I shudder to think how many hours were wasted on this travesty. It's over-complex, can't leverage any type of automated binding, doesn't track row state, and testing and debugging changes is an unmitigated pain. It's like someone attended an n-tier lecture somewhere and never bothered understanding what the point of having one actually was. Frankly, I'd prefer if the previous developers had simply put all the data access right in each individual page--at least that'd be easier to fix when something blew up.

Learning Your Craft

The thing is, my experience no more proves custom business objects wrong than Karl's experience proves DataSets wrong. That's the trouble with anecdotal experience: it feels more important than it is (it doesn't help that pain is such an efficient teacher).

The trick of learning a craft is in gaining experience that is both specific and broad. This can be tricky in a field that is as immense as software development. You really have no choice but to specialize at some point. Even narrowing it down to ".Net Framework" isn't nearly enough to constitute adequate focus for competence.

Unfortunately, Karl's point that there are a lot of lazy programmers out there is true. Anyone who has had to hire or manage programmers will confirm this. Too many developers don't bother learning enough of their craft to be considered actually competent. Faced with the need to specialize carefully, many simply give up and learn only enough to get by (and sometimes not even that much). They're content to learn the bare minimum needed to get hired. They'll learn enough of the "how" to create a program without ever bothering to learn any of the "why".

Teaching Others

I have a minor problem with Karl's explanation, though. He says, "I advocate against the use of DataSets as a counterbalance to people who blindly use them." While I understand this position, I'm not sure I can be said to appreciate it. It smacks a little of the "for your own good" school of learning; which works well enough in a parent-child or even teacher-student relationship. I'm not sure it works so well in public or general discourse.

It is hard to correct bad habits, particularly habits as widespread as DataSet misuse seems to be. As one who often has the bad habits to be corrected, though, I think that I'd prefer having the problem explained and given the context so I can understand the trade-offs being made. That would give me the opportunity to know why something is wrong, not just that something is wrong.

That'd require discussing DataSets in specific instead of general terms. I'm not sure if Karl would really want to do that, though. I mean, his specialty at CodeBetter is really ASP.Net. Expecting him to tackle ADO.Net is not just unrealistic, it could have the effect of diluting his blog posts and alienating his regular readers or getting him embroiled in things he's less interested in.

I would like to see someone respectable and wider-read than I am take on Strongly-typed DataSets in a more complete fashion, though.

Professor Microsoft

Which is why I have to agree with Karl that the blame for DataSet misuse lies squarely in Microsoft's court. I stopped counting how many official articles and examples from Microsoft included egregious misuse or abuse of DataSets. And I have yet to see any that describe how to do it right or what kinds of things to look for in determining the trade-offs between a Strongly-typed DataSet and a more formal OR/M solution, let alone ameliorating factors for each. The only articles about DataSets that I can remember that don't actually teach bad habits are articles about how bad they are. Which isn't helpful. It'd be nice to have something, somewhere that talks about using them wisely and what their strengths actually are. Maybe that should be a future blog post here...

26. February 2007 18:33 by Jacob | Comments (0) | Permalink

DataSets Suck

First off, a correction. In my recent post on OLTP using DataSets, I gave four methods that would allow you to handle non-conflicting updates of a row using the same initial data state. In reviewing a tangent later I realized that method 2 wouldn't work. Here's why:

The auto-generated Update for a datatable does a "SET" operation on all the fields of the row and depends on the WHERE clause to make sure that it isn't going to change something that wasn't meant to be changed. Which means that option 2 would not only not be a good OLTP solution, it'd overwrite prior updates without any notice. Much better to simply throw a DbConcurrencyException and let the application handle the discrepancy (or not).

Which also answers Udi's question of why it doesn't do that out of the box. It'd be nice if the defaults were implemented with a more robust OLTP scenario in mind, though. It'd be pretty complex, but that's because OLTP has inherent complexities. You would either have to generate the Update statement on the fly (thus breaking the new ADO.NET 2.0 batch option on the adapters) or put the logic at the field level (using an SQL "CASE" statement). I'm not sure how efficient CASE is on the server, but that could potentially fix my 2nd option.

But this brings me to my second and broader point again: the disdain that "real" programmers have for datasets. This was refreshed for me recently on a blog post by Karl Sequin at Code Better. I liked that post a lot (about using a coding test when evaluating potential hires) until I got to the bit about tell-tell signs he would look for. Right at the top?

Datasets and SqlDataSource are very bad

He has since amended that so:

Datasets and SqlDataSource are very bad (update: the dataset thing didn't go over too well in the comments ;) )

and added in the comments:

Sorry everyone...I've always had a thing against datasets...

He's not alone here. It's a common feature of highly technical programmers to hold datasets in contempt. Which would be fair enough if they were willing to give reasons or support for the position. If I felt that such statements came from an informed foundation, there wouldn't be much to quibble about. Unfortunately, too often this is simply not the case.

On those rare occasions when I can get one of these gurus to expound a bit, this attitude generally devolves back to a couple of bad experiences where datasets were used poorly or shoved into a situation where they didn't belong. Indeed, Karl goes on to give the kind of thing he doesn't want to see and I have to agree that he has a point. But while his example uses a dataset, it isn't the source of the problem. The problem is actually in his second point after datasets:

Data access shouldn't be in the aspx or codebehind

Since he's looking for strong enterprise-level coding habits, he's right that it'd be better encapsulated in its own class, and better still in its own library.

Again, it isn't the dataset he actually has a quibble with. He's just perpetuating a prejudice when he reflexively includes them as a first strike. To his credit, he's willing to own up to the prejudice. Unfortunately, he does so in a way that indicates that it is a prejudice he has no plans to explore or evaluate. That's what I hate about the whole anti-dataset vibe in the guru set. Particularly since these tend to be people who are proud of their rationality and expect others to listen to them when they expound on technical topics.

 

Technorati tags: , , ,
23. November 2006 16:47 by Jacob | Comments (0) | Permalink

DataSets and Business Logic

Whoa, that was fast. Udi Dahan responded to my post on DataSets and DbConcurrencyException. Cool. Also cool: he has a good point. Two good points, really.

Doing OLTP Better Out of the Box

I'll take his last point first because it's pure conjecture. Why don't DataSets handle OLTP-type functions better? My first two suggestions would, indeed, be better if they were included in the original code generated by the ADO.NET dataset designer. I wish that they were. Frankly, the statements already generated by the "optimistic" updates option are quite complex as-is and adding an additional "OR" condition per field wouldn't really be adding that much in either complexity or readability (which are both beyond repair anyway) and would add to reliability and reduce error conditions.

My guess is that it has to do with my favorite gripe about datasets in general: nobody knows quite what they are for. I suspect that this applies as much to the folks in Redmond as anywhere else. Datasets are obviously a stab at an abstraction layer from the server data and make it easier to do asynchronous database transactions as a regular (i.e. non-database, non-enterprise guru) developer. But that doesn't really answer the question of what they are useful for and when you should use them.

DataSets are, essentially, the red-headed step child of the .NET framework. They get enough care and feeding to survive, but hardly the loving care they'd need to thrive. And really, I think that LINQ pretty much guarantees their eventual demise. Particularly with some of the coolness that is DLINQ.

Datasets Alone Make Lousy Business Objects

As much as I am a fan of DataSets in general, you have to admit that they aren't a great answer in the whole business layer architecture domain.

I mean, you can (if you are sufficiently clever) implement some rudimentary data validation by setting facets on your table fields (not that most people do this--or even know you can). You can encode things like min/max, field length, and other relatively straight-forward data purity limitations. Anything beyond this, however, (like, say, when orders in Japan have to have an accompanying telephone number to be valid) would involve either some nasty derived class structures (if you even can--are strongly-typed DataTables inheritable? I've never tried. It'd be a mess to do so, I think), or wrapping the poor things in real classes.

One solution to this is to use web services as your business layer and toss DataSets back and forth as the "state" of a broader, mostly-conceptual object. This is something of a natural fit because DataSet objects serialize easily as XML (and do so much better--i.e. less buggy--in .NET 2.0). This de-couples methods from data, so isn't terribly OO. It can work in an environment where complex rules must work in widely disparate environments (like a call center application and a self-serve web sales application) when development speed is a concern (as in, say, a high-growth environment).

I think this leads to the kind of complexity Udi says he has seen with datasets. The main faultline is that what methods to call (and where to find them) are in design documents or a developer's head. This can easily lead to a nasty duplication of methods and chaos--problems that functionally don't exist in a stronger object paradigm.

That Said...

Here is where I stick my neck out and reveal my personal preferences (and let all the "real" developers write me off as obviously deluded): although DataSets make admittedly lousy business objects, most non-enterprise level projects just don't need the overhead that a true object data layer represents. For me, it's a case of serious YAGNI.

Take any number of .NET open source software projects I've played with: not one uses DataSets, yet not one needs all the complexity of their custom created classes, either. They aren't doing complex data validation and their CRUD operations are less robust than those produced automatically from the dataset designer. All at a higher expense of resources to produce.

Or take my current place of gainful employ. We have five ASP.NET applications that all have an extremely complex n-tier architecture--all implemented separately in each web application (and nowhere else--they're not even in a separate library). Each of the business objects has a bunch of properties implemented that are straight get/set from an internal field. And that is all they are. Oh, there's a couple of "get" routines that populate the object for different contexts using a separate Data Access Layer object. And an update routine that does the same. And a create... you get the point. It's three layers of abstraction that don't do anything. I shudder to think how much longer all that complexity took to create when a strongly-typed DataSet would have done a much better job and taken a fraction of the time. It makes me want to call the development police to report ORM abuse.

Which is to Say

Don't let all that detract from Udi's point, though. He's right that for seriously complex enterprise-level operations, you can't really get around the fact that you need good architecture for which datasets will likely be inadequate. Relying wholly on DataSets in that case will get you into trouble.

I personally think that you could get away with datasets being the communication objects between web services in most cases even so, but I also realize that there are serious weaknesses in this approach. It works best if the application is confined to a single enterprise domain (like order processing or warehouse inventory management). Once you cross domains with your objects, you incur some serious side-effects, not least of which is that the meaning of your objects (and the operations you want to perform on them) can change with context (sometimes without you knowing it--want an exercise in what I mean? Ask your head of marketing and your head of finance what the definition of a "sale" is--then go ask your board of directors).

So yeah, DataSets aren't always the answer. I'd just prefer if more developers would make that judgement from a standpoint of knowing what DataSets are and what they can do. Too often, their detractors are operating more from faith than from knowledge.*

*Not that this is the case for Udi. For all he has admitted that he isn't personally terribly familiar with datasets, his examples are pretty good at delineating their pressure points and that tends to indicate that he's speaking from some experience with their use in the wild.

 

21. November 2006 19:23 by Jacob | Comments (0) | Permalink

4 Solutions to DbConcurrencyException in DataSets

Following links the other day, I ran across this analysis of DataSets vs. OLTP from Udi Dahan. His clincher in favor of coding OLTP over using datasets is this:

The example that clinched OLTP was this. Two users perform a change to the same entity at the same time – one updates the customer’s marital status, the other changes their address. At the business level, there is no concurrency problem here. Both changes should go through.When using datasets, and those changes are bundled up with a bunch of other changes, and the whole snapshot is sent together from each user, you get a DbConcurrencyException. Like I said, I’m sure there’s a solution to it, I just haven’t heard it yet.

I thought about this for a minute and came up with four solutions for DbConcurrencyException in this scenario using DataSets (though the first two are essentially the same and differ only by who actually implements it). I'm sure there are others, but this should do for starters.

  1. Use stored procedures created by a competent DBA that utilizes parameters for the original and new column state. This means that you check each field with a "OR (<ds.originalValue> = <ds.updateValue>)". This solution passes the same two parameters per field as an "optimistic" pre-generated update statement but it makes the update statement larger by adding this new "OR" condition for each field.
  2. You can do the same by altering a raw update generated from the DataSet designer. This means sending a longer select to the database each update though this can be offset by setting your batch size higher if you have lots of updates you're sending (uh, you'd need ADO.NET 2.0 for that). I'd hesitate to use this method but that's mainly a personal taste issue than anything else (because I'd prefer using stored procedures and recognize that internal network traffic generally isn't the bottleneck in these kinds of transactions, though on-the-fly statement execution plan creation could be).
  3. Override the OnUpdating for the adapter to alter the command sent based on which fields have actually changed. This is probably the closest in effect to the OLTP solution envisioned by Udi. This solution is problematic for me simply because I've never actually tried to do it and I'm not sure you can hook into the base adapter updates each execution. If you can't, an alternative (in ADO.NET 2.0) would be to create a base class for the table adapters and create an alternative Update function in derived partial classes. In this case, you'd have "AcceptFineGrainedChanges" or some such function that you'd call. Once the alternative base class was created, custom programming per table adapter would be a matter of a couple moments. I've done something similar for using the designer for SyBase table adapters and it worked out pretty well. I'd have to actually try this to make sure it'd work though. Call this two half-solutions if you're feeling stern about it.
  4. This last would be useful if I have a relatively well-defined use case that isn't going to morph much or require stringent concurrency resolution. In this one, you deliberately break the one-for-one relationship from your dataset and database (i.e. one database table can be represented by multiple dataset tables). In Udi's concurrency example, the dataset would have a CustomerAddress table and a CustomerStatus table. Creating the dataset with custom selects would generate the tables pretty painlessly with appropriate paranoia. Now, this only really pushes his concern down a little, making it less likely to be an issue. It doesn't eliminate it. It'd probably handle most of the concurrency problems people are likely to run into. Or at least, push them out beyond where most people will ever experience it (not quite the same thing). It could be taken to a rediculous extreme where each field was it's own datatable (which is just silly, but I've seen sillier things happen) so a little balance and logical separation would be needed.

OLTP may seem more natural as a solution for many, but that's likely an issue of preference and sunken costs (because they've done it before and are comfortable with that solution space). It certainly isn't the only solution, though, nor is it a stumper for datasets.

Finally, I’ll add a caveat that I'm not saying that datasets are necessarily to be preferred over stronger object models. I just know that they get pretty short shrift from "real" developers in these kinds of discussions and want to make sure that the waters remain appropriately muddied. There may be a universal stumper for datasets I don't know about. There are certainly environments where a formal OLTP or ORM tool would be a legitimately preferred solution.

 

Technorati tags: , , , , ,
21. November 2006 05:35 by Jacob | Comments (0) | Permalink

Calendar

<<  April 2017  >>
MoTuWeThFrSaSu
272829303112
3456789
10111213141516
17181920212223
24252627282930
1234567

View posts in large calendar