Home Thoughts Thought Articles Automatic Value Objects
Automatic Value Objects
Written by Ben Teese   
Tuesday, 11 October 2005 00:00

The Value Object pattern (also known as Transfer Objects) is one of the most well known J2EE patterns. Its intention is to allow efficient client-server data transfer. However, it has its drawbacks: even in the best-case it can result in an extra layer of simplified data structures that closely resemble the object models in your system; in the worst-case these data structures can end up becoming the object model for your entire system architecture. Either way, the value object pattern can result in systems that are difficult to maintain. This is especially the case for systems comprising complex clients and non-clustered servers.

In this article I will discuss such a system in more detail, and then demonstrate a prototype for a tool that uses Dynamic Proxies and RMI to automatically generate value objects. I'll talk a little bit about how the tool works, and then discuss some broader issues relating to optimizing client-server communications. Finally, I'll talk about ways in which the prototype could be extended, including an inspiration from Hibernate.

In a Past Life

In a past life I worked on a project involving a large and complex Swing client. It talked to an equally large-and-complex home-built application server. Because it was using Swing, the sorts of things that the client was able to do to business objects were fine grained and sophisticated - more so than for a web application that had a course-grained request-response cycle.

Furthermore, the interactions between these objects (on the server-side) would also be quite complex, and in some cases need to be made immediately visible on the client. Sometimes a change to a field in one window would require that the value of another field in another window be updated immediately. Even worse, sometimes the processing in-between these updates would require some kind of server interaction - for example, a call to a central clock.

Value objects were used in this system for client-server communication. The reason for this was performance-related - the system operated primarily over a network and latencies were of concern.

The problem was that the value objects pervaded the entire architecture. They defined the basic building blocks for the entire system.

It was all well and good to be subject to such constraints when adding functionality that needed to operate over the remote interface. However, much of the time, either not much or none of the functionality actually pertained to remote layer. Consequently, you'd be stuck with using these simple data structures when it would be much more desirable to have a richer object model.

In some ways this was an extreme case: the value objects could have been contained to a layer of their own instead of infecting the whole architecture. However, even if they had been contained to a layer of their own, there still would have been maintenance overhead. If I wanted to add a new field, I'd need to add it to both my domain model and the value objects. This is something I've had to do numerous times on other systems.

The problem is that value objects are essentially just dumb data holders. Furthermore, the data that they store often has to be quite simple - it can't be a complex tree of objects because it's considered undesirable to pass this tree over a network. But sometimes you want them to contain state information that only exists on the server-side. Sometimes you want something that is half-serializable and half-remote.

Automatic Value Objects

So what I needed was a value object that also retained a reference to a remote object on the server. Furthermore, the internals of this object had to be transparent to the client, so that in the first instance the client didn't have to worry about whether properties were being accessed locally or remotely.

Another way of putting this was that I wanted to isolate the issue of efficient remote access to a layer of its own as much as possible. I didn't want client programmers to have to worry in the first instance about changing value object code. Nor did I want this concern to leach out and affect the architecture of the entire application. In short, what I needed was some sort of Automatic Value Object.

Inspired by dynamic proxies and the subsequent arrival of frameworks that allow you to transparently make objects remote (for example TRMI, Spring Remoting), I wrote a simple prototype that attempted to transparently make objects remotely accessible in an efficient manner. I also created a test program that demonstrates the prototype in action. automatic value objects.zip contains an Eclipse project that contains the source. It's been tested with Eclipse 3.1 and J2SE 5.

Here is an excerpt of the core of the test program:

LocateRegistry.createRegistry(Registry.REGISTRY_PORT);
String name = "CustomerFetcher";
CustomerFetcher customerFetcher = new CustomerFetcher();
Naming.bind(name, customerFetcher);

ICustomer clientCustomer = ((ICustomerFetcher) Naming.lookup(name)).getCustomer();
clientCustomer.getName();

IAddress address = clientCustomer.getAddress();
address.getPostcode();
address.getStreet();
address.getSuburb();

clientCustomer.setName("testName");

This test implements both the client- and server-side of a series of calls. Firstly, it creates a remotely-accessible ICustomerFetcher. It then looks up this ICustomerFetcher remotely and fetches other remote objects from it. Finally, it sets a property on one of the remote objects. Because these calls are being made on an object that was looked up via the RMI registry, they will all go through the RMI transport layer. To demonstrate this, we can enable the java.rmi.server.logCalls system property and run the test code. It produces the following output (to which I have added newlines to increase readability):

Sep 30, 2005 3:18:52 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(1)-10.0.0.175: [10.0.0.175: sun.rmi.registry.RegistryImpl[0:0:0, 0]:
void bind(java.lang.String, java.rmi.Remote)]
Sep 30, 2005 3:18:52 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(2)-10.0.0.175: [10.0.0.175: sun.rmi.transport.DGCImpl[0:0:0, 2]:
java.rmi.dgc.Lease dirty(java.rmi.server.ObjID[], long, java.rmi.dgc.Lease)]
Sep 30, 2005 3:18:52 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(1)-10.0.0.175: [10.0.0.175: sun.rmi.registry.RegistryImpl[0:0:0, 0]:
java.rmi.Remote lookup(java.lang.String)]
Sep 30, 2005 3:18:52 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(2)-10.0.0.175: [10.0.0.175: CustomerFetcher[0]:
public abstract ICustomer ICustomerFetcher.getCustomer() throws java.rmi.RemoteException]
Sep 30, 2005 3:18:52 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(2)-10.0.0.175: [10.0.0.175: sun.rmi.transport.DGCImpl[0:0:0, 2]:
java.rmi.dgc.Lease dirty(java.rmi.server.ObjID[], long, java.rmi.dgc.Lease)]
Sep 30, 2005 3:18:52 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(2)-10.0.0.175: [10.0.0.175: Customer[1]:
public abstract java.lang.String ICustomer.getName() throws java.rmi.RemoteException]
Sep 30, 2005 3:18:52 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(2)-10.0.0.175: [10.0.0.175: Customer[1]:
public abstract IAddress ICustomer.getAddress() throws java.rmi.RemoteException]
Sep 30, 2005 3:18:52 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(2)-10.0.0.175: [10.0.0.175: sun.rmi.transport.DGCImpl[0:0:0, 2]:
java.rmi.dgc.Lease dirty(java.rmi.server.ObjID[], long, java.rmi.dgc.Lease)]
Sep 30, 2005 3:18:52 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(2)-10.0.0.175: [10.0.0.175: Address[2]:
public abstract int IAddress.getPostcode() throws java.rmi.RemoteException]
Sep 30, 2005 3:18:52 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(2)-10.0.0.175: [10.0.0.175: Address[2]:
public abstract java.lang.String IAddress.getStreet() throws java.rmi.RemoteException]
Sep 30, 2005 3:18:52 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(2)-10.0.0.175: [10.0.0.175: Address[2]:
public abstract java.lang.String IAddress.getSuburb() throws java.rmi.RemoteException]
Sep 30, 2005 3:18:52 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(2)-10.0.0.175: [10.0.0.175: Customer[1]:
public abstract void ICustomer.setName(java.lang.String) throws java.rmi.RemoteException]

The key part are the calls to Customer, CustomerFetcher and Address - I've marked them in bold (there are also calls to the bind, lookup and lease methods -these are unavoidable and are thus discounted). We see that for each call to these objects there is a call over the RMI transport layer - totalling seven remote calls. In a real system each of these calls would amounts to a call over a network and thus would experience network latency.

Now let's tweak the example slightly by modifying CustomerFetcher.getCustomer so that it now uses AVOProxy.newProxyInstance():

public ICustomer getCustomer() throws RemoteException {
// return new Customer();
return (ICustomer) AVOProxy.newProxyInstance(new Customer());
}

Rerunning the example, we get the following:

Sep 30, 2005 3:21:41 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(1)-10.0.0.175: [10.0.0.175: sun.rmi.registry.RegistryImpl[0:0:0, 0]:
void bind(java.lang.String, java.rmi.Remote)]
Sep 30, 2005 3:21:41 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(2)-10.0.0.175: [10.0.0.175: sun.rmi.transport.DGCImpl[0:0:0, 2]:
java.rmi.dgc.Lease dirty(java.rmi.server.ObjID[], long, java.rmi.dgc.Lease)]
Sep 30, 2005 3:21:41 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(1)-10.0.0.175: [10.0.0.175: sun.rmi.registry.RegistryImpl[0:0:0, 0]:
java.rmi.Remote lookup(java.lang.String)]
Sep 30, 2005 3:21:41 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(2)-10.0.0.175: [10.0.0.175: CustomerFetcher[0]:
public abstract ICustomer ICustomerFetcher.getCustomer() throws java.rmi.RemoteException]
Sep 30, 2005 3:21:41 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(1)-10.0.0.175: [10.0.0.175: sun.rmi.transport.DGCImpl[0:0:0, 2]:
java.rmi.dgc.Lease dirty(java.rmi.server.ObjID[], long, java.rmi.dgc.Lease)]
Sep 30, 2005 3:21:41 PM sun.rmi.server.UnicastServerRef logCall
FINER: RMI TCP Connection(1)-10.0.0.175: [10.0.0.175: $Proxy4[4]:
public abstract void ICustomer.setName(java.lang.String) throws java.rmi.RemoteException]

Disregarding the calls to bind, lookup and dirty, we now see that only two calls have been made. So what makes AVOProxy.newProxyInstance() so special?

The Gory Details

When AVOProxy.newProxyInstance() is passed the Customer, it gets the values of any properties that it has - ie, it invokes those methods on ICustomer that take no arguments but return a value. It then stores these property values in a HashMap. In the example above, this occurred for the methods ICustomer.getName(), ICustomer.getAddress(), IAddress.getStreet(), IAddress.getPostcode()}} and IAddress.getSuburb(). The important thing to note here is that all of these calls occur on the server-side.

AVOProxy.newProxyInstance() then creates a Proxy that implements ICustomer and contains both the HashMap as well as a Remote reference to the original Customer, and returns it. This means that when CustomerFetcher.getCustomer() is called remotely, it ends up returning this proxy. Furthermore, because the proxy is Serializable (as opposed to the original Customer, which was Remote), it goes across the wire to the client, taking with it the HashMap of stored values, as well as a Remote reference to the original Customer.

When the client invokes a method on this proxy, the first thing that the proxy does is check whether its HashMap contains a result for this method. If it does, it returns the result immediately and thus no remote call is made. If it doesn't, then it just delegates the call to its Remote object reference. Consequently, when the client calls ICustomer.getName(), ICustomer.getAddress(), IAddress.getStreet(), IAddress.getPostcode()}} and IAddress.getSuburb(), no remote call is made. However, when ICustomer.setName() is called, a remote call does take place.

One important property of AVOProxy.newProxyInstance() is that it's recursive. This recursion occurs in three ways:

  1. Greedy Invocation: Whilst getting the value of a property on the server-side in order to cache it's return value, AVOProxy.newProxyInstance() will first look at the type of this property. If the property value is a Remote, then AVOProxy.newProxyInstance() will call itself recursively to create a proxy for that value, and cache that instead. In this way, AVOProxy.newProxyInstance() can be thought of as greedily invoking and caching everything it can.
  2. Method Arguments: Say that you invoke some method on the client-side that takes arguments and thus won't have had its return value cached. Before invoking this method on the Remote reference, the proxy will look at each of the method arguments. If an argument is itself a Remote object, then the proxy will use AVOProxy.newProxyInstance() to create a Proxy for the argument and pass this proxy across instead.
  3. Return Values: If the remote method on the server-side returns a Remote object, then the server-side will use AVOProxy.newProxyInstance() to create a Proxy for that object and pass it back instead.

An important implication of these last two points is that the concepts of client and server become interchangeable.

Correctness Vs. Performance

Of course, even in this simple example, there is a trade-off between absolute correctness and performance. Why? Well, because the various property values have been cached and sent across, there's no guarantee that they'll be up-to-date when they're fetched by the client side. If the nature of the environment is such that those values could change on the server-side at any time, our current solution would result in the client being unaware of those changes. That's a basic trade-off that we have to be aware of: if the value of a property needs to always be up-to-date, we can't cache it and thus can't optimize access to it.

Generalizing on this, we could say that the business requirements of an object will affect how efficiently it can be access remotely. In many ways this can be seen as an extension of basic data compression theory: the more that we know about the model of the data that we are compressing, the better it is able to compress that data. In this case, we can say that the more the optimization layer knows about the sort of conversation that the objects are having, the better it is able to optimize that conversation.

This observation becomes especially true when considering sets of objects working together. For example, consider a set of Hibernate domain objects loaded from a Hibernate session. A naive approach to distributing such objects might be to apply a simple wrapper as has been demonstrated above. However, whilst that's fine when it comes to reading the properties of the domain objects, consider what happens when the client starts to set properties on those objects. For each property modification, a remote call will be made. Yet we know that it's not until the transaction is committed that we really need to set the property values on the remote objects. Consequently, we could probably get away with only bundling all the changes together at the commit point and making a single remote call. Unfortunately, at this stage the framework I've proposed isn't smart enough to figure that sort of thing out - we'll talk a little more about this later.

My key point is that whilst I accept that the remote optimization layer does need to be aware of what the business-logic layer is doing, I don't see why the converse should be true; why should the business-logic layer be aware of what the remoting layer is doing? Should it have to worry about such things - at least in the first instance? If it becomes apparent that there are performance issues, then perhaps the remoting layer will have to become aware of some more business rules, or even business rules will have to be shifted into the remoting layer.

Where to from here?

The intention of this demonstration has not been to provide a comprehensive solution to the problem of transparent optimized distributed object communication. The purpose has instead been to demonstrate that these things are possible, and to serve as a starting point for further discussion as to where this can be taken.

Neither is it believed that the implementation is as simple as it could be. Spring's Remoting framework could probably be used to reduce the amount of RMI-related code, and some AOP framework (for example Spring AOP) could possibly be used to reduce the amount of proxy-related code. Such frameworks have been avoided in the first instance to try and make the example more easily understandable.

Here are some of the more obvious improvements that I can see, listed in rough order of importance:

  1. Collections: This solution doesn't really examine the issue of how to deal with collections. Possible it could use a similar approach to Hibernate, which uses its own collection classes that transparently deal with issues of lazy-loading, etc.
  2. Limiting Greedy Invocation: As discussed earlier, this example recursively invokes everything that it can on the server-side. This 'greedy' approach could rapidly get out of control and result in most of your object tree being invoked, serialized and passed across a remote interface in one hit. Whilst this is great from the perspective of reducing remote calls, it raises the question as to whether you've got the bandwidth to efficiently pass it across a network, or more importantly, whether you would even want your whole tree to be invoked immediately. To get around this, you could possibly make the framework configurable so that you can specify when and where greedy invocation is used, and how many levels it should go down to in your object tree. Again, this configuration could be put in an XML file.
  3. XML Descriptors: The decision to evaluate all methods that return values but don't take arguments was very much an arbitrary one. More advanced schemes could be introduced. Furthermore, possibly the wrapper could use an XML configuration file that specifies which methods are to be invoked up-front and which aren't.
  4. Exceptions Transport: At the moment, the framework doesn't really deal very well with exceptions that occur when getting property values on the server-side - it just wraps them in a RuntimeException and throws them. This could be a bit confusing to a developer who can't understand why a method that he hasn't called is being invoked anyway. One way around this might be to store and transport any exceptions that occur, then thrown them when the client-side calls the method.
  5. Use an Existing Caching Framework: There are already many Java caching frameworks. I wonder if, instead of reinventing that particular wheel, existing frameworks could be leveraged in this context.
  6. Different remoting frameworks: This solution is RMI-specific but doesn't necessarily need to be so. Work could be done abstracting out the RMI part so that any transport framework can be used - for example CORBA, XML-RPC, etc.

Hibernate

In previous sections, I've referred to Hibernate or concepts that Hibernate supports. This isn't coincidental. Hibernate has already dealt with many of the issues related to transparent caching that I've come across (although, to be fair, Hibernate actually delegates most second-level caching issues to a configurable caching framework). For example, in presenting my ideas to a colleague, he said it reminded him very much of the concept of look-ahead caching in databases - a concept that Hibernate supports.

Furthermore, the very notion of transactions could be a very useful one when it comes to optimizing distributed object interactions. As I mentioned earlier, if we know that a group of operations is occurring within a particular transaction, doesn't that give us some scope to bundle that unit-of-work together in a single remote call? I think there would be some benefit in trying to take this concept and apply it to the problem of efficient remote object access.

Conclusion

The problem of efficient distributed object access is extremely old. Thus I was a little surprised that in a brief survey I couldn't find anything like what I've just presented. It is widely accepted that blindly treating remote objects in the same manner as local objects is a recipe for disaster - and not just in terms of performance (refer to http://research.sun.com/techrep/1994/smli_tr-94-29.pdf for further information). However, as a result many people seem to use value objects and throw the benefits of object-oriented programming out the window. This in itself is not a bad thing, but when those value objects introduce a layer that is labour-intensive to modify and/or pervade an entire architecture, I start to wonder whether we've gone too far.

For client-server applications with complex clients and servers that don't need to be clustered, is there a better way? In this article I've presented a prototype of an alternative approach that could take a lot of the effort out of efficient remote access to objects. I've also outlined a number of directions in which I think it could go. I'm interested in what you think of it and whether you've seen anything like it before. Feel free to email me at This e-mail address is being protected from spambots. You need JavaScript enabled to view it .