Arguing Data

People have a lot of different reasons for posting blog entries. These reasons vary from financial, to personal, to professional, to I'm afraid to know more. For me, one reason I take the time when I could be doing something else is that I like to put my ideas out there to be tested. I don't really care if a majority of people agree with me so much as I want to see what other people have to say for or against certain things. The downside to this is that I'll sometimes find that an idea isn't as good as I had originally thought it was. The upside is the opportunity to refine something to be better or to discard an idea that turns out simply to be bad.

Which is why I'm glad to see Karl Seguin's response to a post I had made about DataSets. Karl's a bright guy and he has a good background in the problem domain associated with DataSet objects. He displays class, too, even when he feels I've been a bit rough in a point or two.

The School of Hard Knocks

I empathize with his experience where DataSet misuse caused much pain and suffering. I've been in similar situations and it's no fun. In a full-blown business transaction environment, DataSets have some liabilities that make them ill-suited for business-layer usage. The thing is, the opposite problem exists as well, and it's one that is more serious than people want to give it credit for: a layer of specialized, hand-crafted business objects that don't actually do anything.

I'm currently working at a place that has an extreme case of this problem. We have four entirely separate ASP.Net applications for our internal invoice processing. All four of these applications have their own set of substantially similar custom objects that are completely unique for that application. Each object doesn't do anything more than contain a group of properties that are populated from a database and write changes back to it.

I shudder to think how many hours were wasted on this travesty. It's over-complex, can't leverage any type of automated binding, doesn't track row state, and testing and debugging changes is an unmitigated pain. It's like someone attended an n-tier lecture somewhere and never bothered understanding what the point of having one actually was. Frankly, I'd prefer if the previous developers had simply put all the data access right in each individual page--at least that'd be easier to fix when something blew up.

Learning Your Craft

The thing is, my experience no more proves custom business objects wrong than Karl's experience proves DataSets wrong. That's the trouble with anecdotal experience: it feels more important than it is (it doesn't help that pain is such an efficient teacher).

The trick of learning a craft is in gaining experience that is both specific and broad. This can be tricky in a field that is as immense as software development. You really have no choice but to specialize at some point. Even narrowing it down to ".Net Framework" isn't nearly enough to constitute adequate focus for competence.

Unfortunately, Karl's point that there are a lot of lazy programmers out there is true. Anyone who has had to hire or manage programmers will confirm this. Too many developers don't bother learning enough of their craft to be considered actually competent. Faced with the need to specialize carefully, many simply give up and learn only enough to get by (and sometimes not even that much). They're content to learn the bare minimum needed to get hired. They'll learn enough of the "how" to create a program without ever bothering to learn any of the "why".

Teaching Others

I have a minor problem with Karl's explanation, though. He says, "I advocate against the use of DataSets as a counterbalance to people who blindly use them." While I understand this position, I'm not sure I can be said to appreciate it. It smacks a little of the "for your own good" school of learning; which works well enough in a parent-child or even teacher-student relationship. I'm not sure it works so well in public or general discourse.

It is hard to correct bad habits, particularly habits as widespread as DataSet misuse seems to be. As one who often has the bad habits to be corrected, though, I think that I'd prefer having the problem explained and given the context so I can understand the trade-offs being made. That would give me the opportunity to know why something is wrong, not just that something is wrong.

That'd require discussing DataSets in specific instead of general terms. I'm not sure if Karl would really want to do that, though. I mean, his specialty at CodeBetter is really ASP.Net. Expecting him to tackle ADO.Net is not just unrealistic, it could have the effect of diluting his blog posts and alienating his regular readers or getting him embroiled in things he's less interested in.

I would like to see someone respectable and wider-read than I am take on Strongly-typed DataSets in a more complete fashion, though.

Professor Microsoft

Which is why I have to agree with Karl that the blame for DataSet misuse lies squarely in Microsoft's court. I stopped counting how many official articles and examples from Microsoft included egregious misuse or abuse of DataSets. And I have yet to see any that describe how to do it right or what kinds of things to look for in determining the trade-offs between a Strongly-typed DataSet and a more formal OR/M solution, let alone ameliorating factors for each. The only articles about DataSets that I can remember that don't actually teach bad habits are articles about how bad they are. Which isn't helpful. It'd be nice to have something, somewhere that talks about using them wisely and what their strengths actually are. Maybe that should be a future blog post here...

26. February 2007 18:33 by Jacob | Comments (0) | Permalink

DataSets and Business Logic

Whoa, that was fast. Udi Dahan responded to my post on DataSets and DbConcurrencyException. Cool. Also cool: he has a good point. Two good points, really.

Doing OLTP Better Out of the Box

I'll take his last point first because it's pure conjecture. Why don't DataSets handle OLTP-type functions better? My first two suggestions would, indeed, be better if they were included in the original code generated by the ADO.NET dataset designer. I wish that they were. Frankly, the statements already generated by the "optimistic" updates option are quite complex as-is and adding an additional "OR" condition per field wouldn't really be adding that much in either complexity or readability (which are both beyond repair anyway) and would add to reliability and reduce error conditions.

My guess is that it has to do with my favorite gripe about datasets in general: nobody knows quite what they are for. I suspect that this applies as much to the folks in Redmond as anywhere else. Datasets are obviously a stab at an abstraction layer from the server data and make it easier to do asynchronous database transactions as a regular (i.e. non-database, non-enterprise guru) developer. But that doesn't really answer the question of what they are useful for and when you should use them.

DataSets are, essentially, the red-headed step child of the .NET framework. They get enough care and feeding to survive, but hardly the loving care they'd need to thrive. And really, I think that LINQ pretty much guarantees their eventual demise. Particularly with some of the coolness that is DLINQ.

Datasets Alone Make Lousy Business Objects

As much as I am a fan of DataSets in general, you have to admit that they aren't a great answer in the whole business layer architecture domain.

I mean, you can (if you are sufficiently clever) implement some rudimentary data validation by setting facets on your table fields (not that most people do this--or even know you can). You can encode things like min/max, field length, and other relatively straight-forward data purity limitations. Anything beyond this, however, (like, say, when orders in Japan have to have an accompanying telephone number to be valid) would involve either some nasty derived class structures (if you even can--are strongly-typed DataTables inheritable? I've never tried. It'd be a mess to do so, I think), or wrapping the poor things in real classes.

One solution to this is to use web services as your business layer and toss DataSets back and forth as the "state" of a broader, mostly-conceptual object. This is something of a natural fit because DataSet objects serialize easily as XML (and do so much better--i.e. less buggy--in .NET 2.0). This de-couples methods from data, so isn't terribly OO. It can work in an environment where complex rules must work in widely disparate environments (like a call center application and a self-serve web sales application) when development speed is a concern (as in, say, a high-growth environment).

I think this leads to the kind of complexity Udi says he has seen with datasets. The main faultline is that what methods to call (and where to find them) are in design documents or a developer's head. This can easily lead to a nasty duplication of methods and chaos--problems that functionally don't exist in a stronger object paradigm.

That Said...

Here is where I stick my neck out and reveal my personal preferences (and let all the "real" developers write me off as obviously deluded): although DataSets make admittedly lousy business objects, most non-enterprise level projects just don't need the overhead that a true object data layer represents. For me, it's a case of serious YAGNI.

Take any number of .NET open source software projects I've played with: not one uses DataSets, yet not one needs all the complexity of their custom created classes, either. They aren't doing complex data validation and their CRUD operations are less robust than those produced automatically from the dataset designer. All at a higher expense of resources to produce.

Or take my current place of gainful employ. We have five ASP.NET applications that all have an extremely complex n-tier architecture--all implemented separately in each web application (and nowhere else--they're not even in a separate library). Each of the business objects has a bunch of properties implemented that are straight get/set from an internal field. And that is all they are. Oh, there's a couple of "get" routines that populate the object for different contexts using a separate Data Access Layer object. And an update routine that does the same. And a create... you get the point. It's three layers of abstraction that don't do anything. I shudder to think how much longer all that complexity took to create when a strongly-typed DataSet would have done a much better job and taken a fraction of the time. It makes me want to call the development police to report ORM abuse.

Which is to Say

Don't let all that detract from Udi's point, though. He's right that for seriously complex enterprise-level operations, you can't really get around the fact that you need good architecture for which datasets will likely be inadequate. Relying wholly on DataSets in that case will get you into trouble.

I personally think that you could get away with datasets being the communication objects between web services in most cases even so, but I also realize that there are serious weaknesses in this approach. It works best if the application is confined to a single enterprise domain (like order processing or warehouse inventory management). Once you cross domains with your objects, you incur some serious side-effects, not least of which is that the meaning of your objects (and the operations you want to perform on them) can change with context (sometimes without you knowing it--want an exercise in what I mean? Ask your head of marketing and your head of finance what the definition of a "sale" is--then go ask your board of directors).

So yeah, DataSets aren't always the answer. I'd just prefer if more developers would make that judgement from a standpoint of knowing what DataSets are and what they can do. Too often, their detractors are operating more from faith than from knowledge.*

*Not that this is the case for Udi. For all he has admitted that he isn't personally terribly familiar with datasets, his examples are pretty good at delineating their pressure points and that tends to indicate that he's speaking from some experience with their use in the wild.


21. November 2006 19:23 by Jacob | Comments (0) | Permalink

4 Solutions to DbConcurrencyException in DataSets

Following links the other day, I ran across this analysis of DataSets vs. OLTP from Udi Dahan. His clincher in favor of coding OLTP over using datasets is this:

The example that clinched OLTP was this. Two users perform a change to the same entity at the same time – one updates the customer’s marital status, the other changes their address. At the business level, there is no concurrency problem here. Both changes should go through.When using datasets, and those changes are bundled up with a bunch of other changes, and the whole snapshot is sent together from each user, you get a DbConcurrencyException. Like I said, I’m sure there’s a solution to it, I just haven’t heard it yet.

I thought about this for a minute and came up with four solutions for DbConcurrencyException in this scenario using DataSets (though the first two are essentially the same and differ only by who actually implements it). I'm sure there are others, but this should do for starters.

  1. Use stored procedures created by a competent DBA that utilizes parameters for the original and new column state. This means that you check each field with a "OR (<ds.originalValue> = <ds.updateValue>)". This solution passes the same two parameters per field as an "optimistic" pre-generated update statement but it makes the update statement larger by adding this new "OR" condition for each field.
  2. You can do the same by altering a raw update generated from the DataSet designer. This means sending a longer select to the database each update though this can be offset by setting your batch size higher if you have lots of updates you're sending (uh, you'd need ADO.NET 2.0 for that). I'd hesitate to use this method but that's mainly a personal taste issue than anything else (because I'd prefer using stored procedures and recognize that internal network traffic generally isn't the bottleneck in these kinds of transactions, though on-the-fly statement execution plan creation could be).
  3. Override the OnUpdating for the adapter to alter the command sent based on which fields have actually changed. This is probably the closest in effect to the OLTP solution envisioned by Udi. This solution is problematic for me simply because I've never actually tried to do it and I'm not sure you can hook into the base adapter updates each execution. If you can't, an alternative (in ADO.NET 2.0) would be to create a base class for the table adapters and create an alternative Update function in derived partial classes. In this case, you'd have "AcceptFineGrainedChanges" or some such function that you'd call. Once the alternative base class was created, custom programming per table adapter would be a matter of a couple moments. I've done something similar for using the designer for SyBase table adapters and it worked out pretty well. I'd have to actually try this to make sure it'd work though. Call this two half-solutions if you're feeling stern about it.
  4. This last would be useful if I have a relatively well-defined use case that isn't going to morph much or require stringent concurrency resolution. In this one, you deliberately break the one-for-one relationship from your dataset and database (i.e. one database table can be represented by multiple dataset tables). In Udi's concurrency example, the dataset would have a CustomerAddress table and a CustomerStatus table. Creating the dataset with custom selects would generate the tables pretty painlessly with appropriate paranoia. Now, this only really pushes his concern down a little, making it less likely to be an issue. It doesn't eliminate it. It'd probably handle most of the concurrency problems people are likely to run into. Or at least, push them out beyond where most people will ever experience it (not quite the same thing). It could be taken to a rediculous extreme where each field was it's own datatable (which is just silly, but I've seen sillier things happen) so a little balance and logical separation would be needed.

OLTP may seem more natural as a solution for many, but that's likely an issue of preference and sunken costs (because they've done it before and are comfortable with that solution space). It certainly isn't the only solution, though, nor is it a stumper for datasets.

Finally, I’ll add a caveat that I'm not saying that datasets are necessarily to be preferred over stronger object models. I just know that they get pretty short shrift from "real" developers in these kinds of discussions and want to make sure that the waters remain appropriately muddied. There may be a universal stumper for datasets I don't know about. There are certainly environments where a formal OLTP or ORM tool would be a legitimately preferred solution.


Technorati tags: , , , , ,
21. November 2006 05:35 by Jacob | Comments (0) | Permalink


<<  April 2017  >>

View posts in large calendar