Wednesday, November 18, 2009

Webmachine 1.5: virtual host dispatching

We recently tagged and pushed the webmachine-1.5 release, which has a number of minor bugfixes and one major new feature: resource dispatching on Host as well as on URL. There was a healthy discussion on the webmachine mailing list about this, and I think that the compromise solution that was created is a good one. The description from the changeset is quite good documentation of the new feature in my opinion.

dispatch rules can now take two different forms:

The old form: {PathMatchSpec, Module, Paramters}
The new form: {HostMatchSpec, [{PathMatchSpec, Module, Parameters}]}

The former is equivalent to the latter with HostMatchSpec={['*'],'*'}

HostMatchSpec is matched against one of (in order of preference):
X-Forwarded-For, X-Forwarded-Host, X-Forwarded-Server, Host

HostMatchSpec can have two forms:
{[HostPart], PortSpec}
The latter is equivalent to the former with PortSpec='*'

The list of host parts is matched against the hostname extracted from
a header in much the same way that PathMatchSpec is matched against
the path.


{[], root_resource, [x]}.
{['*'], [{[], root_resource, [x]}]}.
{{['*'],'*'}, [{[], root_resource, [x]}]}.
Will each match the root path of any host.

{["example","com"], [{[], root_resource, [x]},
{["static"], static_resource, [y]}]}.
Will dispatch the root of to root_resource and to static_resource.

{['*',"example","com"], [{[], root_resource, [x]},
{["static"], static_resource, [y]}]}.
Will do the same as above, but also for any subdomains of

{{[host,"local"], 8000}, [{[], res_A, [x]}]}.
{{[host,"local"], 8001}, [{[], res_B, [x]}]}.
Will dispatch requests to ?.local:8000/ to res_A and requests
to ?.local:8001/ to resB, binding the host part immediately
preceding ".local" to 'host', such that
wrq:get_path_info(host, ReqData) would return the matched string.

Some notable features of this approach include complete backward compatibility (allowing new host-specific rules to be added to old URL-only dispatch lists without rewriting the entire list) and bringing the same simple pattern-matching style of dispatch to the host portion of the problem.

You may also be interested to see a new site for Webmachine at

That site is currently just a different structure on the documents from the bitbucket wiki, but it is likely to grow over time.

As of this release, we are doing away with private development branches by default, and expect to work against the bitbucket tip by default. Making that same change for Riak has certainly paid off, and we hope that with Webmachine it will also allow people to more easily get involved and work against the active codebase.


Wednesday, August 26, 2009

Webmachine as an application front-end

Webmachine 1.4 is pushed to bitbucket, and is providing the HTTP face for a few interesting new software systems.

The very cool guys at Collecta have built a different sort of search engine, great for watching the flow of social-network sorts of conversation occurring about whatever topics you are interested in. The REST API powering that engine is written in Webmachine, and takes advantage of some of Webmachine's more interesting features while interacting with other components. I'll leave more detailed explanation of their technology to their excellent team and just say that they've built something very deserving of attention.

We have also used Webmachine to provide the HTTP interface to our decentralized document store: Riak. Webmachine's ability to provide the full richness of HTTP's capabilities while not dictating anything else about the shape of your application makes it a natural fit for the front end of a system like Riak. The first incarnations of Riak and Webmachine were each built at about the same time: late 2007. As they grew up together there were some internal interface design decisions that were more obvious to us as a result. The structure of Riak's core operations map nicely to the universal interface that includes GET, PUT, POST, and DELETE not just by name but by their essential properties such as idempotency, safety, and defined semantics.

One aspect of the Web that has been essential to its success as the biggest distributed system in the world is the notion of links. This is also one of the things that differentiates Riak from some other data storage systems. It's not exactly a graph database, but it adds some elements of a graph database on top of the great benefits that come with having a decentralized key/value store at the core. Documents in a Riak cluster can have links to other documents in that cluster. Riak itself can take great advantage of this internally, as the MapReduce programming model that we use is ideally suited to walking links in order to build up inputs for the next phase of computation.

These links ought to also be useful to clients, and in the context of HTTP this should be possible in a way that does not assume application knowledge on the part of a client. To that end, the newest release of Riak includes support for the Link header in HTTP responses. This allows clients to explore the link structure of a set of related documents without having to read or understand the body of those documents. Based on our past experience building applications atop Webmachine and Riak, we expect this to be an added bonus for rapid development.

Monday, June 15, 2009

Webmachine 1.3: streamed bodies, multipart forms, and efficiency

Easily the most requested feature for Webmachine since its release has been the ability to "stream" the request and/or response bodies, instead of having to receive or send them in one potentially-large hunk. As of the most recent version, this feature is now available. See the wiki page for details on the API.

A number of other changes are also in, such as multipart form parsing, improved efficiency by changing a gen_server (per request) into a parameterized module, and so on... but I suspect that the streamed bodies are what people are really looking for most. Enjoy!

Monday, June 1, 2009

REST and HTTP services as a business advantage

The advantages of HTTP as an application protocol (not just a transport) as opposed to many other networked service models are not abstract, idealized technical advantages. They directly affect your -- and your partners' -- cost of doing business.

At Basho, our services integrate out of necessity with those of many kinds of partner companies, including CRM, Business Intelligence, Search, and more. We consider ourselves lucky in general when a company we'd like to partner with exposes any consistent and documented interface for this purpose.

However, when those interfaces are SOAP or another RPC-shaped system it means that each integration is a fairly major new project even when the resulting connections between applications are conceptually small. This is because you have to learn the programming model of that other service and work as though you were a developer of that service -- learning their calling conventions, naming schemes, error conditions, and so on.

We recently had the pleasure of integrating with Jigsaw's data service. While they don't quite match up to the ideals of REST just yet, their service is young and the interface is already far better than that of many other business-to-business integration APIs. Not only did they deliver a cleaner and easier service than expected, I suspect that they did so at lower cost than many others. How?

By using HTTP.

Even the coarsest approximation of the Web's uniform interface gives you a much better running start than is possible with, say WSDL and SOAP. Jigsaw's Web interface isn't perfect (GET requests are idempotent but not safe, and a couple of status codes are incorrectly used) but it is simple and it isn't surprising. The fact that there is already a completely interoperable HTTP client in every major programming language means that, instead of using some WSDL to generate 10,000 lines of code to then put a client on, we were able to just jump in and immediately write working client code. The resulting client code was also about 20% as long as the manually written portion of our client code in comparable services that use SOAP.

I'm not talking about ideal systems, and I'm not talking about idealistic academic goals. I'm just talking about the simple realities of how your technical choices affect the level of effort that your partners must apply in order to work with you. That simple reality has a direct and powerful effect on the bottom line.

Wednesday, May 27, 2009

Video Slideshow, Introducing Webmachine

The Webmachine talk at Bay Area Erlang Factory 2009 went quite well. I received useful feedback, and some very interesting and productive conversations spun off after the talk.

For anyone interested who wasn't there, I have recorded a voiceover with the slides and made that video available here. The slides are the same ones used at the conference, but I trimmed the speaking portion a bit. This version is a bit under half an hour; it leaves out a few minor topics but still covers all of the material needed to introduce Webmachine.


Tuesday, May 26, 2009

Webmachine 1.2

There are a few changes in webmachine-1.2 that deserve mention.

We simplified the API to the dispatcher module so that it can be used easily in a standalone fashion. In cases where another application (such as CouchDB) wants to use Webmachine-style dispatching, it is now easy to just call webmachine_dispatcher:dispatch/2 and get a useful result without any of the rest of Webmachine running. A trivial example:

1> webmachine_dispatcher:dispatch("/a",[{["a"],some_resource,[]}])

The other change that is most interesting from a feature point of view is that the request body is not read off the socket until the first time wrq:req_body/1 is called. This means that a resource can (for example) return an error response code without having to wait for the body to be pulled off the wire first.

There is also a change in the new_webmachine project creation script. Your list of dispatch terms will now by default be in a separate file ("priv/dispatch.conf") instead of directly in your application's _sup file.

This version is identified with the "webmachine-1.2" mercurial tag,

In upcoming versions, we hope to add a few much-clamored-for features such as host-based dispatching and incremental request/response body reading and writing.

Tuesday, April 28, 2009

A Simple Webmachine Example

Bryan Fink (of BeerRiot fame and a colleague at Basho) recently posted a great example of how easy it is to make a useful and working Webmachine resource.

He then followed up with more examples, showing how easy it is to add support for PUT, for authorization, and for
entity tags.

Yesterday's post wrapped up his short series by not only adding DELETE support but also reflecting on the nature of Webmachine and how it lets you improve the way you think about and use the power of HTTP in your applications.

Bryan knows both Erlang and Web programming well; his examples are worth the read.

Thursday, March 19, 2009

Webmachine One Point Oh!

I am happy to announce the release of Webmachine 1.0.

In its short public life so far, Webmachine has already been used to build a range of Web applications, including a sales productivity tool, an SMS gateway, an HTTP caching intermediary, a content management system, the front end to a decentralized key/value data store, and more. It has been used as a central element in Erlang training courses, enabling students to write working, Web-friendly applications after only a day or two of exposure to the language. Some of its users have been delivering customer-facing Webmachine applications for nearly a year and a half now.

However, the 1.0 release isn't just an acknowledgment of that stability. It also introduces two major changes that are very beneficial to developers. One of these is a new debugging tool that is unlike anything we've seen elsewhere, allowing you to visualize in great detail how your Web resources process requests. Bryan Fink did most of the work on the debugging and tracing tool, and is the best person to explain how it works.

The other major change is to the developer API for resources, helping to make your resource functions to be referentially transparent. In case that phrase is new to you, it really means something quite simple. Resource functions no longer interact with a Req object that manipulates the response via side effects and stored state. Instead, the behavior is defined purely in terms of its input parameters and return value. For any given input a given function ought to return the same output and the side effects will be insignificant from the point of view of Webmachine's execution. This might not sound like a huge deal, but the difference it makes in terms of testability and re-use is huge.

Unit testability is vastly improved because the functions are fully isolated. You can write test inputs for your (e.g.) is_authorized function (and verify that it only returns true in the cases you want it to) without having those tests have to interact with anything else in the application. There's no need for anything resembling a "mock object" or other such silliness; when the inputs and outputs are simple records as opposed to gen_servers with side effects, you can just construct whole records for your test inputs, and inspect the output records for checking test results.

Referential transparency also enables much more in the way of static analysis. One item on the roadmap for a future version of Webmachine is a type analysis tool that can verify at compile time that the type signatures of your resource functions can only lead to valid HTTP behavior.

It's also worth noting that the backward incompatibility is well-contained and easy to get past. Our production application was switched entirely to the new API by one person in a day, and Bryan also converted all of BeerRiot in under two hours. A handy guide to upgrading shows just how simple it is.

As of the 1.0 release we're also moving Webmachine from Google Code to Bitbucket. The main reason for this is that we vastly prefer Mercurial to Subversion, but a number of other features on Bitbucket have also turned out to be nice. We're leaving the old version up on Google Code for a while, so that people with running applications can access both the old and new versions.


Wednesday, February 25, 2009

Webmachine at Erlang Factory

I will be speaking at Erlang Factory Bay Area 09 on Webmachine.

I'm looking forward to the conference; it seems like there will be a very interesting crowd.

Friday, February 6, 2009

content-negotiation for humans

Simon Willison asked for opinions on how to deliver JSON content properly while also assisting browser-driven exploration and debugging.

Here is a short, simple example of how to use content-negotiation to achieve this.

This is a complete, working webmachine resource with trivial content:

-export([init/1, to_json/2, content_types_provided/2]).


init([]) -> {ok, x}.

to_json(_,X) -> {"{\"key\": \"value\"}\n", X}.

content_types_provided(_,X) ->
{[{"application/json", to_json},{"text/plain", to_json}], X}.

A typical request/response, with a tiny bit of header noise trimmed:

$ curl -v http://localhost:8000/js/simonw
> GET /js/simonw HTTP/1.1
> Accept: */*
< HTTP/1.1 200 OK
< Vary: Accept
< Server: MochiWeb/1.1 WebMachine/0.20 (There was kicking.)
< Content-Type: application/json
< Content-Length: 17
{"key": "value"}

This is what you generally want. The JSON was delivered with the proper content-type and so on. Note the presence of the "Vary" header.

However, if the same request is made with a typical browser's Accept header, like FireFox's, the result will be different:

> GET /js/simonw HTTP/1.1
> Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
< HTTP/1.1 200 OK
< Vary: Accept
< Server: MochiWeb/1.1 WebMachine/0.20 (There was kicking.)
< Content-Type: text/plain
< Content-Length: 17
{"key": "value"}

This time we got the same JSON content, but in text/plain so it will display nicely in a browser window if requested directly. A Web application loading this content would probably set the Accept header in XHR requests to "Accept: application/json" and get the first response.

This is a straightforward use of content-negotiation that serves a useful purpose and (by using Accept properly and providing the Vary response header) still works well with intermediaries and the rest of the mechanics of the Web.