An Opinion? Well, if you ask...: 2007

Monday, December 17, 2007

Premature Conclusions with Services and REST

Damn, I wish we were at a white board. Oh well.

JJ, you've drawn a lot of conclusions from talking with a few people in the REST community. I think you've gotten ahead of yourself.

To defend the "REST community": What we are talking about isn't where most of the value is. Not yet anyway. Maybe someday this issue will really be an issue, but right now just GETing info with declarative representation data is so unusual that we're on the fringe.

Really, I believe this on the internet at large and in many organizations as well (enterprisey doesn't even get funny when sharing a database is so common.) . How much value does a definition language really have? Well, some. Just like I appreciate static languages for documentation and _some_ checking, having a WSDL or something is useful. None of the things being thrown around for "interface definition language" are much more useful than being able to organize documentation, generate code (a bad idea), and create pictures (sometimes useful for humans). That last point, the human part, is where you started this conversation... Let's not pretend that WSDL allows machines to process things for us.

To your recent post. I don't have much time tonight, but hear is my reaction.

Shared Understanding.
This isn't actually denied by the REST. It's just relegated to the media type.

The "contract" that is specific to a particular domain in a RESTful system is buried in the Representation. That means a) we all collectively don't know much about it, b) every system can use the standard VERBS to explore and partially integrate. The first point means we need to have this very conversation, but the value of that struggle is more common ground between all services.

Result Sets aren't Resources
Huh? Why not? I'll need to read up on the posts, but everything interesting is a resource. I don't understand why you conclude this.

Some of your recent posts have indicated what I think may be a confusion regarding the "uniform interface" and resources. Not every resource must support and expose each method. Part of the discovery of REST is learning (at runtime) what methods are available for each resources. Results sets don't need to expose PUT.

How many people are in this discussion?
Not many, I'm afraid. You're assuming that people won't discuss these issues with you, but it's probably that it doesn't matter to most of the RESTful systems out there.

As an example: the Amazon service exposed this weekend. It is people (not machine processes) that are writing the wrapper layers around the new API. It almost doesn't matter what they had exposed it in, people need to spend thought understanding it and wrapping it. Then, other people, need to consume the wrappers to write applications.

Had Amazon used some magically fantastic definition language it wouldn't change but a part of the total human/computer processing cost and value proposition. Again, I think there is value in understanding and exposing that fantastic definition language. I'm not assuming it will magically compose my systems. (I'm not necessarily claiming you make that declaration either, but the value you place on the formality seems too strong. I will read your book on composition to really understand what you are saying.)

REST and actions
There is something critically important about the distinction between "action interfaces" and "document exchange". You took Subbu to task for trying to remove actions, but I think there is a truly subtle reason that REST motivates that thinking.

I'm not sure I can express it well, but this is my intuitive view:

contract negotiation is an (endless?) cycle of document exchange
reserving a _good_ hotel room is an cycle of question/answer...
bartering is a cycle of bid exchange
resolving a speeding ticket is ...

what these real ways of getting things done tell me is that an interface contract doesn't model the real world.

REST models "action" as one or more transfer of document representation. That is more like the world than interface actions.

REST and versioning.
Really bad at versioning? Seriously? If you'd said "efficiency", or "tool support", or "machine clients" I'd be right there saying "Yeah, maybe this would improve that..."

Versioning is hard because something wants to change - and the rest of the system isn't yet ready to change as well. REST has from the beginning been about supporting that change (across organizations, and between the client and server). The very reason that interface definition languages are so hard to reconcile with REST is because of the radically different view on how to support evolution.

(I'm posting this without all the links...)

Wednesday, December 12, 2007

Shared Understanding and/or Evolvability

(This is belated and too short. All you fast-typing bloggers out there can just be patient :) Sorry for the delay!)

How do distributed systems both cooperate and evolve?

That is a subtly different question than JJ asks:

when no human is in the loop, you need a shared understanding between the resource consumer and the resource (provider)

The only viable choices JJ then describes (WSDL or WADL) do indeed provide a partial solution to coding shared understanding, but not evolution.

The short (flip?) answer to my leading question is to create a new media type that describes the semantics necessary for both consumers and providers to communicate.

In the case of exposing a Job Application Service as a RESTful provider that could mean the following:

for a human provide an HTML response representation with forms for review, cancel, submit interview.
for a machine provide an XML or RDF (or something) response representation.

That machine representation must have a lot of things going for it. It must provide just enough shared information to do useful things, but not prevent service evolution. In particular, the response must:

provide links/conformance to some shared schema type(s) (to share semantics)
be extensible (in the "does not understand" sense)
the shared schema can't pre-define URIs
the shared schema can't pre-constrain all of the transition paths

That last one is really fuzzy, but I'm trying to express the idea with my "signposts" metaphor. Getting something done (i.e. changing the state of something) isn't alway a single step, and how many steps have to be followed isn't a very stable property. In the case of job applications, getting from "offered" to "accepted" might take a few loops in the 1.1 release of the service, or maybe the "review" state gets split into 2 steps, what happens then?

The current conception of "shared understanding" is "shared interface". In the Job Application example that means the client service is encoded to expect after "Submit Review" that the service moved the job app to "reviewed". If the client service instead was coded to have a pair of (current job app data, desired state) when the service changed to a 2-stage review process the clients would likely continue to work:

Client: Get job app 123
Service: job app 123, submitted, "Submit Review", "Cancel"
Client: "Submit Review" for job app 123
Service: job app 123, review1, "Submit Review", "Cancel", "Reject"
Client: "Submit Review" for job app 123
Service: job app 123, reviewed, "Submit Interview", "Cancel", "Reject"

Understanding how to encode an imperative SOA static model to a conversational document interchange is not really that well understood yet (certainly to me). I think someday it will be, but right now there is just the quotes from the REST thesis I commented on and a reference to "data reactive programming" from Roy Fielding, and I just found Mimosa.

Just need to make sure of something about the Job Application Service example: is this service the provider or consumer of the lifecycle of a job app? It clearly has an internal state machine for job apps, but what other machine process does that get shared with?

[1] Some discussion (still searching for link...) hinted that WADL could be simply returned like an HTML Form, and not used to statically generate code. I think avoids the coupling of code-gen, but I'm not sure how it solves the problem of shared understanding either.

Thursday, November 29, 2007

Passing and Returning Nulls

Several blog posts about handling null values caught my attention.

The pair of posts from Marty Alchin on "Returning None is Evil" and "Except the Unexpected" offer these opinions:

most methods can return null
Java has lots of APIs that silently return null (java.util.Map.get(Object key))
returned nulls are annoying and hard to debug
exceptions should be preferred when a non-null value isn't available
always consider when/how nulls should(n't) be used

Cedric Otaku in "It's okay to return null" chimes in with these valid points:

don't return exceptions unless something really is exceptional
use the Null Object pattern if you need it

Neither of those suggestions (exceptions or Null Object) really work out well for my taste. I really don't like these bad choices:

Write clear and direct code -- that neither shows where nulls could be hiding nor handles them well when they occur
Write lots of verbose if/then/else and try/catch blocks to deal with nulls everywhere

So what to do? Borrow a construct from somewhere else of course!

In the Scala programming language I was introduced to the Option class. I'm pretty sure that is comes from Haskell and other places, but I'm not really researching it.

An Option is like a list of zero or one elements; either a Some with a value, or a None.

Functions that return or receive "optional" things use an Option wrapper to both document that fact and promote cleaner code constructs. Here is an example.

Normal Java code with an override variable:


String overrideMessage;
final String defaultMessage="Howdy";

public String sayHello(String name) {
  String s=null;
  if (overrideMessage!=null)
      s = overrideMessage;
  else
      s = defaultMessage;

  return s+" "+name;
}

Yuck! Now let's uses Option to clean up that method:


Option<String> overrideMessage=none();
final String defaultMessage="Howdy";

public String sayHello(String name) {
  return overrideMessage.or(defaultMessage)+" "+name;
}

Much nicer. The Java version can accept alternate values (with the or() method) but not code blocks to execute like the Scala version can (unless Java gets closures...). The Scala version can participate in list comprehensions and all sorts of other things, but I'm not not talking about those things now.

Here is the full version of my Option port to Java:


public class Options {

  /**
   * An Option can be either full (Some<T>) or empty (None).
   * Instead of passing/returning nulls use an Option
   */
  public abstract static class Option<T> {
      public final boolean some;
      public final boolean none;
   
      private Option(boolean some) {
          this.some = some;
          this.none = !some;
      }
   
      abstract public T val();
   
      public T or(T defaultResult) {
          if (some)
              return val();
          else
              return defaultResult;
      }
  }

  /**
   * Placeholder for empty
   */
  public static class None<T> extends Option {
      private None() {
          super(false);
      }
      public T val() {
          throw new NullPointerException("Can't dereference a None value");
      }
  }

  /**
   * A Holder for some value (never null)
   */
  public static class Some<T> extends Option {
      private final T val;
      private Some(T val) {
          super(true);
          this.val = val;
      }
      public T val() {
          return val;
      }
  }

  /**
   * Use this when not sure if value is present or null
   */
  public static <T> Option<T> option(T value) {
      if (value==null)
          return none();
      else
          return some(value);
  }

  /**
   * Use this to wrap a non-null value
   */
  public static <T> Some<T> some(T value) {
      return new Some<T>(value);
  }

  @SuppressWarnings("unchecked")
  private final static None NONE=new None();

  /**
   * Use this when you don't have a value
   */
  @SuppressWarnings("unchecked")
  public static <T> Option<T> none() {
      return NONE;
  } 
}

Tuesday, November 27, 2007

Using a lazy proxy to avoid Spring dependency cycles

Ever gotten a dependency cycle in your Spring configuration?

How about using the Hibernate Annotations and @Configurable together?

Hibernate will instantiate a single instance of each Entity class and ask for the default key value.
But when the first instance is instantiated @Configurable will try to Spring configure that instance.
It that Entity depends on anything that leads back to Hibernate... Bang!

I like this quote from Rod Johnson: "Ouch: this is nastier." That about sums it up. Oh, and that quote is from 2004.

I created a LazyProxyFactoryBean to allow me the chance to not break the cycle, but at least lazy resolve it. Combining a Spring FactoryBean with a dynamic proxy does the trick. Hope this can help someone else out there.

Here is a failing example with cycles:


<bean id="serviceA" class="com.ServiceA">
<property name="serviceB" ref="serviceB"/>
</bean>

<bean id="serviceB" class="com.ServiceB">
<property name="serviceA" ref="serviceA"/>
</bean>

Here is a modified version with a lazy proxy inserted (notice that proxy property is a value, not a ref):


<bean id="serviceA" class="com.ServiceA">
<property name="serviceB" ref="serviceB"/>
</bean>

<bean id="serviceB" class="com.ServiceB">
<property name="serviceA" ref="serviceAProxy"/>
</bean>

<bean id="serviceAProxy" class="com.LazyProxyFactoryBean">
<property name="serviceA" value="serviceA"/>
</bean>

Finally, here is the source code:


import java.lang.reflect.InvocationHandler;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.lang.reflect.Proxy;

import org.springframework.beans.BeansException;
import org.springframework.beans.factory.BeanFactory;
import org.springframework.beans.factory.BeanFactoryAware;
import org.springframework.beans.factory.FactoryBean;
import org.springframework.beans.factory.FactoryBeanNotInitializedException;

public class LazyProxyFactoryBean implements FactoryBean, BeanFactoryAware {

private String beanName;
private BeanFactory beanFactory;
private Object proxyObject;
private Object realObject;

public LazyProxyFactoryBean() {
}

public Object getRealObject() throws Exception {
 if (this.realObject == null)
  this.realObject = this.beanFactory.getBean(this.beanName);

 return this.realObject;
}

public Object getObject() throws Exception {
 Class[] ifcs = getProxyInterfaces();
 if (ifcs == null) {
  throw new FactoryBeanNotInitializedException(
    getClass().getName() + " does not support circular references");
 }
 if (this.proxyObject == null) {
  this.proxyObject = Proxy.newProxyInstance(getClass().getClassLoader(), ifcs,
   new InvocationHandler() {
    public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
     try {
      return method.invoke(getRealObject(), args);
     }
     catch (InvocationTargetException ex) {
      throw ex.getTargetException();
     }
    }
   });
 }
 return this.proxyObject;
}

public Class getObjectType() {
 return beanFactory.getType(beanName);
}

public boolean isSingleton() {
 return false;
}

public void setBeanFactory(BeanFactory beanFactory) throws BeansException {
 this.beanFactory = beanFactory;
}

protected Class[] getProxyInterfaces() {
 Class type = getObjectType();

 if (type != null && type.isInterface()) {
  return new Class[] {type};
 } else if (type != null) {
  return type.getInterfaces();
 } else {
  return null;
 }
}

public void setBeanName(String beanName) {
 this.beanName = beanName;
}
}

Notes from NetObjectives Lean Webinar

These are notes from a NetObjectives Webinar on "Using Lean Thinking to Align People, Process and Practices".

This was a really succinct presentation that still covered a lot of ground. If you register it is still available for free viewing!

Lean Software Development:

Focuses on the ability to add value quickly now, while improving the ability to add value quickly in the future.

Edwards Deming:

Plan-Do-Check-Act (PDSA)
System of Profound Knowledge

Gary Hamel. "Management Innovation" in Harvard Business Review. Feb 2006

Only after American carmakers had exhausted every other explanation for Toyota's success - an undervalued yen, a docile workforce, Japanese culture, superior automation - were they finally able to admit that Toyota's real advantage was its ability to harness the intellect of 'ordinary' employees.

Lean - A New Paradigm

Engaged, thinking people

Referenced "Software by Numbers" (better graphics...)

Tuesday, November 20, 2007

Just In: REST can't handle state!!

Or, maybe Jean-Jacques Dubray, Ph.D. doesn't understand hypermedia. :(

In his post on the states and transitions of a resource he claims that REST has no way to let a client know what state transitions are available for a given resource....

Um, what is hypermedia?

Distributed hypermedia provides a uniform means of accessing services through the embedding of action controls within the presentation of information retrieved from remote sites.

This quote is just the introduction to hypermedia and REST, but we already have an answer.

Action controls are embedded. Action controls are embedded.

Embedding the action controls means

The client doesn't need to have generated (and hard-coded) client stubs.
The server can change it's rules and expand available states as needed.
The client does need to process a generically extensible content type.

That last point isn't free or trivial, but it enable the first two.

Perhaps more directly on point is this quote from 5.3.3 Data View:

The model application is therefore an engine that moves from one state to the next by examining and choosing from among the alternative state transitions in the current set of representations.

Jean-Jacques' answer to how to share types, constraints, correct states, and transitions is not suprisingly: a schema and a business interface. Let's take those four goals one at a time:

Types: definitely covered by IDL (until the systems changes then you need to tell everyone to re-compile...)
Constraints: Not even covered a little by todays SOA technologies. Until there is an OCL or Catalyis (or Z or ...) model that get's compiled with WSDL the best a schema and business interface will have for constraints is a few lines of prose... (in )
States: WSDL is not much help, maybe you can infer the states from the operation names? SSDL I think actually does cover this...
Transitions: Same as states; how do we deduce which operations are transitions?

I really can understand the challenge and barrier to hypermedia state control with generic clients. Machine processing and interoperability in this context is unfamiliar: an "engine ... choosing from among the alternate state transitions" is very declarative and make a historically imperative programmer (me!) feels out of control.

Static class diagrams (WSDL) and code-generation are so much more in my control and experience. If only they worked for distributed systems without so many negatives...

The complete ignorance of hypermedia and FUD in the blog posting compelled me to respond and put my $.02 in.

Monday, November 19, 2007

Glassbox 2.0 RC1 is available!

The 2.0 release candidate for Glassbox Troubleshooter has been released!

Glassbox is an open source Java monitor and troubleshooting application:

Easy to deploy to many app servers (deploy the war, click "Install", restart server)
Lightweight monitoring: only instruments key layers for efficiency
Extensible monitors (in XML or code) to domain-specific operations for better reporting

For more information I would recommend browsing the slides I presented at the Dallas JUG, or watching Ron Bodkin Google Tech Talk on Glassbox.

How am I related to this? I'm a committer for the project and have been most responsible for the automated installation support.

Thursday, September 13, 2007

Presenting at New England Software Symposium

I'll be in Boston this weekend presenting at the NFJS New England Software Symposium

I'm presenting on

Glassbox
Adding Behavior to Java Annotations
REST: The basics, and not so basic

Glassbox Presentation at Dallas JUG (yesterday)

I gave a presentation on Glassbox yesterday at the Dallas Java users group (JavaMUG)

Here's a link directly to the presentation entry: Dallas JUG Glassbox presentation

At some point a PDF of the slides will be posted there. These slides are a reconsolidated and personalized version of what Ron Bodkin gave in the Google Video Glassbox presentation.

Glassbox is an Open-Source Automated Java Troubleshooter and Monitor. I'll post a link to the slides when it's available.

Sunday, August 12, 2007

Train-Wreck Management

Wow, this is a well organized essay by Mary Poppendieck.

Train-Wreck Management

The essay explains a bit about how a train accident in 1841 leads to the Prussian army becoming a model for today's organization chart!

Here's a quote that might make you look around your environment and feel an eary sense of historical dread...

Problems are caused by people who don't do their job well, so finding someone to blame is the first step to correcting problems.

I recommend anything on the website, and the books they've authored as well.

Friday, August 10, 2007

What I'm Reading

I try not to post my own blog entries that just point to other blog entries - unless I have something to say that adds to the original.

On my blog page I have a readling list widget, but here is a link in case anyone is interested and hadn't noticed it before:

What I'm Reading

Friday, August 3, 2007

SOA Integration: RPC and Constraints

DevHawk responded to my hard work post. Most important responses first.

Beer.

Let's have a beer together someday and you can make up your own mind! I try to have formed my own opinion on some topic, but don't take my view so imortantly that other people are wrong. The back-and-forth of ideas during a technical discussion is a fun way to spend a few hours.

DevHawk said:

I might be going out on a limb here, I'll bet the core of John's problem with SOA is how toolkits like WCF all but force you to build RPC style services that can easily be modeled as method calls. That's certainly one of my problems with SOA.

You're right, that is something that causes me grief. When I see a WSDL with endless request/response operations I first think:

That's going to cause some scalability/extensibility/versioning problems.
It would be easier to build this design in CORBA[1]

Also, your quoting from Tim Ewald was spot on. 10 years of SOAP and most systems just are just strongly coupled RPC.

Also, I agree that all the tools (and examples and sample code and ...) nearly universally pushes a view of Web Services that is really just RPC.

[Note: I just switched from SOA to SOAP to WS. I'm trying to formulate another post that will articulate what I think distinguishes those from each other. Right now I'm just rambling...]

There is one other thing about SOA that drives me bonkers. I'm hooked on architecture by constraint - limit the system in certain key areas to promote certain use and benefits. The four tenets of SOA don't do it to me at all.

Tenet 1: Boundaries are Explicit
(Sure, but isn't everything? Ok, so SQL based integration strategies don't fall into this category. How do I build a good boundary? What will version better? What has a lower barrier to mashup/integration?)

Tenet 2: Services are Autonomous
(Right. This is a great goal, but provides no guidance or boundaries to achieve it.)

Tenet 3: Services share schema and contract, not class
(So do all of my OO programs with interface and classes. What is different from OO design that makes SOA something else?)

Tenet 4: Service compatibility is based upon policy
(This is a good start: the types and scope of policy can shape an architecture. The policies are the constraints in a system. There not really defined though, just a statement that they should be there.)

Ah, I feel better getting that out.

I asked "Where are the SOA Mashups?" and DevHawk responded:

That's easy! They're inside the firewall where you can't see them! ;)

I'm so glad I procrastinated writing this because now I can just refer to this InfoQ article. Gregor Hohpe's comment is pretty much my definition of mashup.
The project you describe is an integration, but multi-schema, star-integration with a huge effort doesn't fit my definition of mashup.

[1] Yes, I'm serious! I used OmniORB with Python to build systems for about 2 years. I think CORBA with C++ might be more work than most WSDL based systems, but this was a very efficient development environment for strongly typed interfaces.

Wednesday, August 1, 2007

REST, Serendipity, and Hard Work

DevHawk found my comment on the "SOA in the Real World" book. DevHawk writes:

Yeah, I'd rather not have to think about integration before hand either. On the other hand, I want integration that actually works. It sounds like John H. is suggesting here that REST somehow eliminates the need to consider integration up front. It doesn't.

I didn't mean to imply that building RESTful system would lead to magical integration without any hard work. I can see how that came across in my post, and I guess I got the reaction I asked for ;)

Let me try again...

I want to build systems using tools, techniques, designs, and architectures that maximize my investment in effort and work to produce value. If I put one unit of effort into building something, and I get more (or many more) than one unit of value for my effort, then I'm working in a high-leverage environment.

REST uses constraints to encourage integration. If I pay the cost of building a RESTful system (uniform interface, single naming system, resource-based design, representation transfer, ...) then any other system can leverage my system - for at least some minimum degree of use. That is serendipity. This is the fundamental reason that mashups on the web can exist.

Where are the SOA mashups?

I think related to that question, DevHawk further says:

...enterprises aren't interested in unexpected or serendipitous reuse. They want their reuse to be systematic and predictable.

I think we disagree here. Enterprises think they want systematic and predictable systems, but I suspect this is the equivalent of developers wishing distributed object systems worked despite latency and partial failures. I'm not suggesting enterprise systems should set "not predictable" as a goal, but I do think that trying to make things too predictable often leads to fragile systems.

Disclaimer: I've only read the first chapter of the book.

Here are some more quotes.

This one leads into the first chapter:

“SOAs are like snowflakes – no two are alike.”
- David Linthicum
Consultant

I'm assuming it's included not as a supporting opinion, but the first chapter doesn't do much to disprove it! Obviously if this is true then every bit of integration will have to be fought for, bit by little tiny bit. When everything is different there are no high-leverage environments for integration.

This next quote is why I don't feel (at least the first chapter) does much to discount the previous quote:

For the purposes of this book, we will define SOA as:
A loosely-coupled architecture designed to meet the business needs of the organization.

That loosely-coupled looks good. Loosely-coupled systems should be easier to integrate (as opposed to highly-coupled systems anyway).

How does an SOA become loosely-coupled? The four tenets of services or the summarized SOA process (expose, compose, consume) listed in the first chapter don't tell me anything yet.

Building systems is hard, building RESTful systems is also hard. The question of serendipity is this:

If I build an application with constraint A instead of constrain B, will I get more integration and value elsewhere the total system?

The constraints that define REST, and inspire the Web, have a significant track record for enabling integration. Are they the only constraints that can lead to value? No, absolutely not.

Can some of the constraints of REST be applied to SOA? Absolutely. I think an asynchronous, message-passing architecture with a uniform interface would be astoundingly interesting! I'm not the only one: see MEST, AMPQ, and Erlang.

Thursday, July 26, 2007

SlideShare slides on Web App Scalability

These look very interesting, enjoy: Web App Scalability Slides.

MySQL, Digg, Twitter, Vox, LiveJournal, ...

Wednesday, July 25, 2007

Integration Forethought over Afterthought?

DevHawk linked to a post by John Evdemon announcing an e-book from Microsoft on SOA.

From the SOA in the Real World book page:

SOA is an architectural approach to creating systems built from autonomous services. With SOA, integration becomes forethought rather than afterthought.This book introduces a set of architectural capabilities, and explores them in subsequent chapters.

Now, I haven't read the book, so take this with appropriate disclaimers of ignorance.

I, for one, would rather build on an architecture that promotes integration as an afterthought, so I don't have to think about it before hand!!!

This post to rest-discuss by Nick Gall quotes some important thinkers. Here is the relevant section:

Unexpected reuse is the value of the web
Tim Berners-Lee
Two of the goals of REST: independent evolvability and design-for-serendipity
Roy T. Fielding
Engineer for serendipity
Roy T. Fielding

Tuesday, July 10, 2007

Writings on Scalability Topics

I suggest reading these blogs.

Jeff Darcy (Canned Platypus) has an great blog, and also this article on High Performance Server Architecture. The article talks about the performance and scalability implications of

Data copies
Context switches
Memory allocation
Lock contention

Dan Creswell (author of Blitz JavaSpaces) has collected a Distributed Systems Reading List. The list include references to key papers from Google, MySpace, eBay, Amazon, and also some distributed theory papers.

Monday, July 9, 2007

Google Scalability Conference on Google Video

It looks like the Seattle Conference on Scalability has made it onto Google Video.

Search for the sessions.

Friday, July 6, 2007

REST vs (sort of) SOAP: How to choose?

Summary: REST supports request/response (client/server) interactions best, message passing systems support asynchronous and peer-to-peer message communication systems best.

The first architectural constraint applied to derive REST is client/server. I don't think this can be understated, everything that follows tunes the architecture to provide an elegant request/response system with fantastic results. Many, many, systems are client/server - and those should take advantage of REST (and often HTTP).

Message passing systems involve sending a message (headers + body) to a destination. That may not involve a response message, or may involve a sophisticated set of response messages from various other parties to the originator. HTTP always pairs a request message with a response message; solidifying the client/server interaction.

SOAP (without WSDL and WS-*) is just a message container, and can be used to implement asynchronous and P2P message passing systems. See MEST and SSDL for good examples of using SOAP infrastructure without the WSDL and WS-* cooked right in.

Another useful message passing system I've recently began studying is AMQP. This standard is designed to provide the building blocks for message passing systems, an efficient binary protocol for low latency operations, and a protocol based interaction (instead of programming API). Implementations include: Qpid (Java, C++ and more clientes) and RabbitMQ (Erlang).

Of course, request/response can be built on message passing:

like most WSDL services that expose RPC
like many JMS systems with replyTo
everything gets shipped over IP (Internet Protocol) anyway, so everything is message based at the bottom

and message passing can be built on client/server:

an HTTP server can be bundled into a client (two inverted client/server relationships form a P2P connection)
an HTTP server can be polled to simulate a peer message send

but in both of these situations it would be better to pick an architecture that actually matches the problem. (If you can)

Finally, when working on the public internet, it is often not the case that remote machines can listen for P2P messages (because of firewalls and and NAT). STUN is a notable exception, but requires programming on top of UDP (shouldn't there be some library support for this?).

My conclusion: inside the enterprise pick the right architecture, on the internet expose only a client/server RESTful architecture.

Thursday, July 5, 2007

Intuitive Aspects: null checks

I've been a fan and user of AspectJ for many years. In fact my employer, New Aspects of Software, offers expert consulting on AspectJ and AOP systems.

Many excellent books and articles have been published on the topic. The one thing that seems to be missing is examples of Aspects that solve small problems so simply and clearly they can be "intuitively" understood. I'm setting out to start creating a small set of examples to fill that gap.

Problem: null values are annoying to track down.

Solution: The following policy should be applied to my code:

When a null argument is passed into a method, throw a NullPointerException.
When a null is returned from a method, throw a NullPointerException.

Java Implementation:
Simple and clear, but also verbose, error prone, and annoying.

public Item findItem(Service service, ItemPattern itemPattern) {
if (service == null || itemPattern == null)
    throw new NullPointerException("null argument!");

Item result=null;

if (service.hasItem(itemPattern))
result = service.getSingleItem(itemPattern);
else
result = service.getDefaultItem(); // may be null?

 if (result == null)
    throw new NullPointerException("null return!");

return result;
}

Now, repeat that in every method. Yuck.

AspectJ Implementation:

public aspect NullChecks {

 /**
  * Before the execution of any method in the com.mycompany packages
  */
 before(): execution(* com.mycompany..*.*(..)) {
  
  // Check for null args
  for (Object arg : thisJoinPoint.getArgs()) {
   if (arg == null) // throw a NullPointerException
    throw new NullPointerException("null argument!");
  }
 }


 /**
  * After returning from the execution of any method in the com.mycompany packages
  */
 after() returning(Object result): execution(Object com.mycompany..*.*(..)) {
  
  // Check for null result
  
  if (result == null) // throw a NullPointerException
   throw new NullPointerException("null argument!");
 }
}

Does this seem like a better solution? Have I reached "intuitive"?

Tuesday, July 3, 2007

Scalability at Amazon

Some fantastic references from the Google Scalability conference. See Dan Creswell's post.

I can't wait for the video.

Thursday, June 28, 2007

Web3S pushes Secondary Resources

Summary: Web3S doesn't naturally support RESTful primary resource identifiers for EIIs. It can, but only with optional elements and server-specific guarantees about identifiers. A generic Web3S client could never "know" what canonical and primary resources were identified.

Yaron responded to my Web3S posts on identity in hierarchies and graph serialization support. (Yaron's comment is on the second link).

My original concern was Web3S duplicates the values of a single resource when serializing a tree, and this obscures the identity of resources.

Now, I realize the issue isn't just about obscuring the identity, but Web3S has no standard way to expose the primary resources in a system.

The properties of a Web3S:ID are

unique only within the containing element,
used to generate URIs, but also path-relative to the containing elements

Then, from the Web3S spec, section 7: The author elements have a single URI, but that is both path (within article) dependent and based on a locally unique object identifier.

This URI (a non-prefixed, short version from the example in section 7):

http://example.net/stuff/morestuff/articles/article(8383)/authors/author(23455)

doesn't provide a primary resource identifier for the author. Why? This path dependent URI is hardly different than the same URI with a fragment identifier (for a secondary resource):

http://example.net/stuff/morestuff/articles/article(8383)#author23455

From Architecture of the World Wide Web, section 2.6 Fragment Identifiers:

The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information. The secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource defined or described by those representations. The terms "primary resource" and "secondary resource" are defined in section 3.5 of [URI].

The author in all of these examples should be (able to be) identified as a primary resource, not just a secondary one. Whether or not authors are primary or secondary resources is really a question of server implementation. The spec for Web3S makes that that decision by default for all services though.

Here is the rest of Yaron's comment, with my additional thinking added in.

Yaron writes: Web3S’s infoset is just the core set of primitive data containers, they don’t define the data models that sit on top. So, for example, one could easily define a graph based data model on top of the tree based infoset that used links to say things like “These two things are the same”. In fact section 10.1 and 10.2 of the Web3S spec define a standard HREF style element for exactly this reason.

True enough, it could be done. However, the Web3S infoset does explicitly define a notion of identify, Web3S:ID, that doesn't provide a way to express those thing.

The definition you mentioned of HREF elements makes no statement about what resource is identified, so no client can assume anything about it.

Yaron writes: By Value – If the canonical author entry and the references to author all exist within the same ‘system’ (I’m being vague intentionally but think of examples like a single DB) then likely would one just use by value. The author values would show up where needed and changing one author value would change the other. Astoria does this today but they add the additional guarantee that if two instances of a particular element (e.g. author) are in fact the same underlying object then the ID will be the same. That is completely legal in Web3S. Web3S just says that the caller can only assume that IDs are locally unique. But the server is free to offer a higher guarantee if it wants and then advertise that fact. Heck a server could choose to give every element instance a GUID/UUID and so guarantee global uniqueness.

All based on optional elements and out of band (published schema) communication.

I can't write a general purpose Web3S tool that takes advantage of that, and
Tools that do take advantage of that are more tightly coupled to that one service.

Yaron writes: Also, for whatever it’s worth, Astoria supports both hard and soft linking. Our current thinking about this in Web3S is that we would allow servers to advertise schemas that define object relationships, explain ID guarantees, specify hard versus soft linking, etc. We will also probably provide mechanisms to allow servers to annotate data with this information directly rather than requiring schema look up but given the bandwidth expense I’m not sure how often we would use this.

I hope you would use it all the time, otherwise you will be promoting a code generation solution with published Schemas. See my own blog posts, as well as the recent storm of discourse on WADL and REST.

I'm not exactly sure what hard and soft linking refer to here, can you expand? I don't have experience with Astoria yet, and I'm thinking inode filesystem hard links...

Yaron writes: By Reference – Alternatively there would be a single canonical author entry and anyone who wanted to refer to that entry would just use a URL ala section 10.1.

This should be the default for how Web3S works: the identifier can be a primary or secondary resource identifier in all cases. It should be absolutely server-dependent whether authors can be independently identifiable, but providing canonical and absolute identity for any EII should not require client-server coupling.

I'll think more about how to achieve this, but a first random idea is to allow the HREF element inside (and in place of) the Web3S:ID element.

Yaron writes: In either case you can get there from here.

Yes, you can, but the default, standard, and primary means of identification should directly support a canonical identifier system without reporting to optional elements and shared schemas.

Tuesday, June 26, 2007

Distributed Systems and Consensus

Mark Mc Keown has posted a fantastic summary of "Consensus, 2PC, and Transaction Commit" over the last decades.

I've read some of those materials before, but had certainly never ordered everything so clearly.

I think this is a particularly important reference:

Fischer, Lynch and Paterson showed that distributed consensus was impossible in an asynchronous system with just one faulty process in "Impossibility of distributed consensus with one faulty process" (1985), this famous result is known as the "FLP" result.

In particular, this result means that any system that is distributed will have to deal with failures somehow.

Erlang has a built in model for handling distributed failures.... more reading to do.

Going faster by duplicating work

Dare has posted some excellent summaries of Google's Scalability conference. (I can't wait till the videos are online!)

I am always entertained by counterintuitive results, and the solution to stragglers in MapReduce is no exception.

Google had an issue with "stragglers" performing tasks very slowly in the MapReduce infrastructure. The solution: duplicate the same tasks on multiple machines and throw away the redundant results. Go faster by doing more work! ;)

Related idea: set-based design. Optimize the throughput of the entire system, instead of each part.

Thoughts, techniques, and references to achieving scalability

I'm going to use this label, scalability, to collect my thoughts and random ideas for how to make systems scale.

I'm going to start with a list of links that I recognize have been formative to my understanding of scalability.

SEDA: An Architecture for Highly Concurrent Server Applications and then C10K

Reading the SEDA thesis and then the C10K site helped me understand that simple imperative programming might not always scale...

REST: Architectural Styles and the Design of Network-based Software Architectures

REST identifies scalability as a key property for internet information architectures. The Stateless Client-Server, Cache, and Layered System constraints all contribute to the scalability of systems.

Life Beyond Distributed Transactions

Most recently this paper by Pat Helland influenced my thinking: Transactions can only bound a single entity, but that entity can contain various pieces of data including historical communication entries to help support idempotent messaging.

Monday, June 18, 2007

Graph based serialization examples

My previous post on Web3S hierarchical data vs. graph data didn't provide any solutions.

The explanation in the FAQ about graphs isn't sitting well with me. There is a simple way to represent graph data in serialized formats: name the data the first time it's seen and refer to it subsequent times. Examples include:

XML ID/IDREF has done this for a long time.
YAML has Anchors and Aliases.

This is not a complicated pattern to serialize or write code against.

From Dare's posted example, we could add another article with an existing author like this (notice the Web3S:IDREF element):

<articles>
<article>
<Web3S:ID>8383</Web3S:ID>
<title>Manual of Surgery Volume First: General Surgery. Sixth Edition.</title>
<authors>
 <author>
  <Web3S:ID>23455</Web3S:ID>
  <firstname>Alexander</firstname>
  <lastname>Miles</lastname> 
 </author>
 <author>
  <Web3S:ID>88828</Web3S:ID>
  <firstname>Alexis</firstname>
  <lastname>Thomson</lastname> 
 </author>
</authors>
</article>
<article>
<Web3S:ID>8384</Web3S:ID>
<title>...</title>
<authors>
 <author>
  <Web3S:IDREF>88828</Web3S:ID>
 </author>
</authors>
</article>
</articles>

But, that won't work in Web3S without many more changes. It's not the data format that's a problem, but

the rule for ID uniqueness is only scoped to the containing XML element. (So that IDREF might not find a single target)
the rules for URI generation and identity of EII are relative to containment.

The problem here is identity, not data formats.

Web3S supports hierarchies? Useful ones?

Update: I've added more detailed links and summaries how this could fail in my next blog post.

Web3S seems to be clearly be driven by a need for hierarchies; from the discussions (Dave, Tim, Sam), examples about books and authors, and the FAQ on why not APP:

No Hierarchy – ATOM only supports a two level hierarchy, feed and entry. It is possible, of course, to create an entry that is really a pointer to another feed but that is both painful to handle at a protocol level and inefficient when one actually wants to retrieve an entire tree as one has to make many round trips to pull in all the values as one walks the feeds.

But, later in the FAQ on hierarchies and graphs:

... we will just have to make do with hierarchical rather than graph based data formats.

So, Web3S supports hierarchies, but not graphs. Isn't the Web a graph? What does Web3S do when two articles have the same author?

My assumption was that the author element would be repeated with the same id, so I looked further and found the FAQ on ID uniqueness where we learn that the IDs are only unique within the containing element.

Am I understanding this right? Does Web3S really not have support for maintaining a single author of many articles? Do really need to track down every occurrence of one author (by data matching instead of URI) in order to correct a spelling error?

I've not done more homework on this than the FAQ, I'll keep looking. Someone please tell me I'm wrong because this would be silly.

Wednesday, June 13, 2007

Web Sites, Web Applications, and Content Types

I couldn't find this reference when I was looking for it the other day.

Good Web APIs are just Web Sites

Excellent!

Both that presentation and an RDF Shopping example use content negotiation to serve different content types to clients. My previous signposts entry used a pretend HTML microformat. Different strategies for transferring understandable content from server to client.

The question I'm still asking myself is which of these two choices is better and why?:

Separate and negotiated media types
Single extensible media type

I have to admit, I find conneg to be non-Visible. I just don't feel very comfortable with it.

Candidates for extensible media types are HTML microformats and RDF (and Topic Maps and DITA and SGML Architectures).

The HTML Web has flourished (in part) because of Postel's Law:

"Be liberal in what you accept, and conservative in what you send."

Is that enough to encourage using fewer, but extensible, media types over individual crafted media types?

Monday, June 11, 2007

Java ClassLoader trick

I work on the Glassbox Java Troubleshooting Agent, and it uses Ant scripts to automate installation into various Java containers. The version of Ant we use was conflicting with existing libraries (in CruiseControl and JBoss), so I needed to create a ClassLoader sandbox.

The OverridingURLClassLoader.java does the trick. This is just a URLClassLoader with one additional argument: the name of the class to redefine. This ClassLoader will enable one of your application classes to be re-defined in a ClassLoader sandbox and avoid Jar hell when deploying into any random environment.

That one application class could be defined by both the application ClassLoader and this new ClassLoader, and those two classes won't be compatible or assignable. An interface or base class could be shared by both though (and only defined by the application ClassLoader). The AntInstaller.java is using an interface to type the returned newInstance() object.

Here is the constructor that takes the name of a single class that should be re-defined by this ClassLoader (instead of the parent ClassLoader):


class OverridingURLClassLoader extends URLClassLoader {

  String redefineClassName;

  public OverridingURLClassLoader(String redefineClassName, URL[] urls, ClassLoader parentLoader) {
    super(urls, parentLoader);
    this.redefineClassName = redefineClassName;
  }

The loadClass method checks cached classes, then classes defined in the list of URLs. If that fails, then before the parent is called a check for
if (name.equals(this.redefineClassName)) is done. If that class is being asked for, then it is re-defined using this instance of a ClassLoader:


public Class loadClass(String name) throws ClassNotFoundException {
  Class c = findLoadedClass(name);
  if (c == null) {
    try {
      c = findClass(name);

    } catch (ClassNotFoundException e) {
      if (name.equals(this.redefineClassName)) {
        String path = name.replace('.', '/').concat(".class");
        URL resource = getParent().getResource(path);
 try {
   byte[] bytes = toBytes(resource.openStream());
   c = defineClass(name, bytes, 0, bytes.length);
 } catch (IOException e1) {
   throw new IllegalStateException("Can't get class definition", e1);
 }
      } else {
        c = getParent().loadClass(name);
      }

      return c;
      }
    }

  return c;
}

Wednesday, June 6, 2007

Why, What, How and programming.

I wanted to record this thought. I was working on some code that set a string value into a java.util.Map that was passed to Ant to do something else that resulted in a startup parameter for another system that.......

Hmm. Let's just say it was a little obtuse.

I've long been a believer and student of "What, not How" when it comes to requirements, design, and so on.

On this particular occasion, I realized the "Why" was really the most important, and missing!

So I now think we should:

Document "Why, not What"
Design for "What, now How"
and hope no one ever looks at "How" (along with sausage and law)

D'oh! REST already had contracts.

The contracts and protocol exist solely in the data. A client that understand a media type can then navigate the web of links based on knowledge gleaned from documents conforming to that media type. (This is exactly hypermedia as engine of application state).

First, Joe Gregorio's excellent post mentioning OpenSearch and offering advice on balancing 1 vs. n media types.

Second, Alan Dean posting this RDF shopping example.

Notice this paragraph in the shopping example:

To validate that the server supports the "Shop" protocol, the User Agent can test the document for the existence of the basket element, e.g. with the following XPath expression count(//shop:Basket)!=0

The shopping protocol is checked with a data expression!
The shopping protocol is checked with a data expression!

My trivial example is just an instance of using data for protocol. I still think there is value in using signposts as a metaphor for self-descriptive data; if only to help people lead from programming API to hypermedia.

Friday, June 1, 2007

Microformats, formats, and schemas

Jacek has some true words for me in Tim Bray's Blog:

It seems to me that John, in his eagerness to use microformats, reinvents the idea of a standardized structured format. We had that with SGML, we have that with XML. You don't need to use XML Schema to work with XML. In fact, (and Tim may confirm or deny this), XML seems like it's intended to be usable for these signposts John speaks of - recognized names would be my interpretation. Granted, microformats have their advantages and disadvantages over XML.

Jacek, I do know these things. I've worked with SGML and XML for years at Isogen, and Eliot Kimber taught me many important things about markup, including SGML Architectures (which interesting could provide an avenue to formalize microformats into fully defined markup languages). (Anyone have a link to SGML Architectural Forms?)

Thing is, I'm not really that interested in those details. Yes, they are important, but the distinctions between a DTD, RelaxNG, XML Schema, micro-format, and XLink don't make this issue any different.

On the topic of REST: only some commonly understood format is important. It's my personal opinion that any of the schemas I described in my previous post should be described in RelaxNG, XSD, and an HTML micro-format (with an SGML Architecture for good measure!).

My point is that multiple clients should be able to easily describe what must be followed and inspected, or what must be acted upon and triggered.

Jacek, I think we agree that IDL should not be dismissed. I'm trying to define a declarative definition that support better machine processing that fits into the constraints of the Web.

The Web has one Interface. Trying to write an IDL against that one interface (URI, HTTP Methods) is either very boring or is trying overlay domain methods onto HTTP methods.

I'm trying to find a way to describe web interfaces to machines. Hard coding URIs isn't good because of all the stuff people say. So, we add some special data (I'm not ready to say metadata) to some of the HTML forms to let a machine know which form should be used to buy something instead of logging out.

Thursday, May 31, 2007

Does REST need a DL? No, just Signposts.

REST (err, the Web) does not need a Description Language.

But wouldn't it be so nice to have one! Sure, until you want to change anything. A DL couples the client to the server, and forces stability of everything defined by the DL.

The REST style encourages loose coupling, independent evolution, and uniform interfaces. The "hypermedia as engine of application state" is the opposite of a DL.

How is a DL so bad? A DL is used to generate code (stubs/skeletons) at design time. After that, any changes to the URI or data values on the server side will brake any clients.

So what's my great idea? Use a micro-format as machine signposts.

Links and forms could be marked with a set of class attributes that convey the common machine processing activities. I'm thinking about common stuff like:

login/logout
parts (list, name, id number, related parts links, specification link)
checkout/ordering (mailing address, billing address, credit card info)

As an example, if I have a shopping site with a list of parts and a checkout form, something like this (in pig-xml for easy web typing):

[div class="part"]
  [div class="part.name"]Super Espresso Machine[/div]
  [div class="part.number"]42[/div]
  [a class="part.specification" href="42/specification"]Part specification[/div]
  [form class="add.to.cart" action="order/42" method="post"]
    [input type="submit" value="Submit"]Add to Cart[/input]
  [/form]
[/div]
[div class="part"]
  ...
[/div]

[form class="checkout" action="buynow" method="post"]
    [input class="fullname" type="text" name="fullname"/]
    [input class="mailing.address" type="text" name="mailing-address"/]
     [input class="billing.address" type="text" name="billing-address"/]
    [input class="creditcard.number" type="text" name="creditcard-number"/]
    [input type="submit" value="Send"/]
 [/form]

I could write a personal shopping agent that picked parts by id and followed the "checkout" action (form) filling in anything that was already known (like addresses and cc info).

Why can't I do the same thing with a DL?!? Sure, you could. Until the server added a confirmation page after the first form. Something like:

[form class="checkout" action="buynow.confirm" method="post"]
    [input type="submit" value="Send"/]
  [/form]

A human would take that in stride, but a DL wouldn't already be coded to handle that. The agent that's just following signposts would easily be able to cope with this change.

A signpost following agent would, once it decided to follow the "checkout" action keep on following it, filling in recognized data values as it progressed. That is until a success or error response is found. Oh, must add a "success" signpost to my list.

Couldn't a web site just provide a DL exposed service point and the HTML exposed service? Sure, but, that would be more work, right?

Introduction

Hello world!

I'm going to try writing my tech thinking down in a blog. I don't easily write things, so I'm giving myself permission to post raw drafts, incomplete thoughts, incoherent ramblings, and maybe a few good ideas.

I'm going to stick mainly to technology topics like programming languages, distributed concurrent systems, REST and other service oriented architectures, and other limited domains.