Roxton Group Blog

Friday, July 22, 2011

News Aggregation

How is it possible that, despite Google, bloggers, and the power of the Fourth Estate, all the news aggregation models suck?

What I really, really want to see is a highly competitive tag-subscription framework. A user is subscribed to a set of plaintext tags. Content-producers tag their content (if they don't, it doesn't get seen, unless some magnanimous user assigns tags for them).

In the naive model*, users see everything that carries their tag. I'm intentionally excluding user-specific heuristics. Of course, this model results in too much content on popular tags, so users respond to poor articles and excessive volume with a "thumbs down" on articles, removing them from the stream**. In this way, users curate the stream in a very cutthroat way.

The most popular (and simple) tags come to naturally reflect mainstream tastes, but subcultures and content-producers alike have an interest in getting content into alternate streams, especially given the self-limiting "bandwidth" of popular tag streams. Alternate tags grow, with different user expectations about content. These tags might have weird names, such as "warren-alt" or "deficit-progressive", but they would be promulgated via 1) social networks, and 2) the presence of such tags on popular content.

This approach facilitates the rapid creation and identification of disparate media markets rooted in positions and subcultures. Tags are flexible, in that they not only allow you to get desirable content by category, but they also 1) permit identification with a subculture/market, 2) allow users to selectively subscribe to individual stories, possibly creating a much stronger revenue model for following up on older stories (improving journalism as a whole).

Google+ doesn't do this today, but Google and G+ are great candidates for this model, and it sounds like content-producers like you are in an increasingly better position to negotiate a good model for digital distribution.

Thanks for reading.

*You can imagine a more complicated model where consumption-heavy users or a random subset of users see unvetted content, but I wouldn't want this to be strong enough to interfere with the value of a globally identical stream.

**The ease with which an article could be removed from the stream could be based on content-producer reputation (i.e. How many successful (long-lived) articles a content-producer has published to the stream.)

Monday, April 11, 2011

Compulsion

The other day, a coworker echoed the oft-repeated notion that a certain degree of OCD is important for developers. I can see what he meant, but it set me thinking about my own experiences with compulsion.

Let me tell you a story.

When I was a kid, my elementary school had a large asphalt area with four equidistant, elevated buckets. Each bucket had four exits. When you'd toss a dodgeball into one, it would roll randomly out one of the sides.

Recess was chaos. Kids would run around, hoping the ball would come out their side so that they could shoot the next basket. It never occurred to me that this was a desirable property of the system.

Instead, I suffered intense anxiety. Four baskets. Four numbered(!) exits each. Clearly there were some rules. Either these kids were playing wrong, or they knew something I didn't. I was embarrassed by my ignorance, so I didn't ask. Instead, I'd try different patterns. I would try to throw the ball in such a way that it would use the next exit in sequence, or I would assign numbers to baskets and run to a different basket based on which exit the ball came out. The more difficult the problem and the more cleanly it incorporated all the elements of the problem space, the more satisfying the ruleset was. Surely I'd find the Right Way™.

There was also a climbable geodesic half-sphere on the playground. I would construct rules for how I would be allowed to navigate it. Whenever someone interfered with my careful graph walk, it was deeply upsetting.

Why didn't they understand?

Some people have what's called "a failure to compartmentalize." To some extent, almost everyone feels the need to make things line up. For some people, though, everything has to line up with everything else.

Consider Richard Feynman's account of a failing physics program:

"The students had memorized everything, but they didn't know what anything meant. When they heard 'light that is reflected from a medium with an index', they didn't know that it meant a material such as water. They didn't know that the 'direction of the light' is the direction in which you see something when you're looking at it, and so on. Everything was entirely memorized, yet nothing had been translated into meaningful words. So if I asked, 'What is Brewster's Angle?' I'm going into the computer with the right keywords. But if I say, 'Look at the water,' nothing happens - they don't have anything under 'Look at the water'!"

The students had placed their physical studies and their experience of physical reality in separate compartments. The term "compartmentalization" is often used with negative connotations to refer to people who use the scientific method effectively in their work but who also, for example, believe in homeopathy. Compartmentalization is not a purely negative thing, though. It also refers to the ability to frame radically different issues in a different emotional context. Staying on the right side of the road while driving is fundamentally different from, say, crunching autumn leaves under each of your feet in equal proportion.

My horses, let me show you them.

For someone raised in a strong, evangelical tradition, a failure to compartmentalize has its perks. Central to the tenets of that tradition is the idea that God is at the center of everything in your life. And for me, God was. I was among an elite cadre of God-fearing youths who would tussle over how best to reconcile various pieces of scripture to create a cohesive, comprehensive, and uncompromising vision of God's will. We'd debate fiercely both inside and outside bible study, and sometimes we'd even lead those studies.

It was as if we had each been given a shelf, and we had covered it in toy horses. There was a rhyme and a reason to the color, the sizes, the nose shapes, the sequence, and everything else about the horses on our shelves. If someone had a set of horses that didn't clash, we'd condescend to help them improve their collection. If someone had something other than a horse up there, we'd either browbeat them into submission or openly denigrate their heresy. Non-biblical sources of inspiration were deemed worldly; we'd been inoculated against anything but horses. At the center of it all, I had the conviction that my compulsion about my horse collection was, in fact, divine inspiration.

At some point, I became increasingly transfixed by the idea that there must exist Indian Buddhists – I had an evangelist's grasp on world religions at the time – who were as adamant about the collections on their shelves as I was about mine. They were as careful in their curation as I was, and they shared my sense of deep and abiding significance. What a dreadful challenge! Was this deep sense of significance merely a feeling shared across cultural and ideological boundaries, capable of inflecting certainty over a range of mutually conflicting ideas? If so, I would be able to invoke it in neutral contexts. Long story short, either I was right, or I'd somehow managed to convert my Ford Contour's steering wheel into a profound holy artifact.

I withered.

In college, I had a passionate affair with Libertarianism. Here was a framework that was broadly permissive (by my standards), yet still allowed me to exercise my compulsion in critiquing the world in terms of broader principles. Ah, I've found my voice again. But I soon found myself dismissing the genuine, human concerns of others for no other reason than that it failed to align with my rigid framework.

One of my college requirements involved writing a humanities paper. The paper, called a Sufficiency, takes the place of a full course. The student picks a professor. The two decide on a topic. The professor recommends reading material and incrementally reviews then grades the student's work. Before our meeting, I drafted an outline that intimated at the use of libertarianism as a launching point for the critical study and analysis of – what, European polity? New England literature? I don't recall. He suggested postmodernism.

Postmodernism? Okay.

I immediately became obsessed with the idea of narrative. The framing of a multitude of individual human experiences under a large analytical framework was termed "meta-narrative." To my disingenuous surprise, my compulsion to frame everything monolithically wasn't particularly novel or interesting; it was painfully common. And it was, I would quickly realize, a problem.

I figured that maybe instead of focusing on self-consistent frameworks directly, I could enumerate the sphere of human concerns, human narratives. I would hunt and collect them, like Pokémon. This took me into an exploration of privilege. I would quickly learn that some narratives were normative and others were not. This distinction first seemed reasonable, then arbitrary, then malicious, then, having spent time enumerating the narratives that resulted in narratives being normalized, I could no longer reduce the problem, but I could still see it as problematic. Early on, I fell into that foolish trap of going into feminist spaces, seeing that the normative view wasn't well-represented, and imagining that I had something to contribute by representing it. Quite the contrary, the value of these spaces is that other views are allowed to flourish in the hard-fought absence of the normative perspective. It was a hard-learned lesson, and I have nothing but admiration for those who left their teeth marks on my flesh.

And so, I'm a compulsive collector of pokéballs, obsessing more over what's missing than what's there. Over time, I've come to focus less on their self-consistency and more on their human authenticity. When people make naked assertions, be they social, technological, political, or culinary, I generally have to suppress the impulse to trot out my collection and demonstrate the concerns or possibilities they've neglected or antagonized; it generally doesn't make for good dinner party conversation, and if you're going to invalidate someone else, it should be for a good reason. The challenge, then, is not to exercise my compulsion to collect for its own sake, but to leverage its capacity in enabling technological progress and social change, ideally in tandem.

So. Does my compulsion make me a better developer? Ask me in a few years. I'm just hitting my stride.

Sunday, April 10, 2011

Constraint-Based Programming

Have you ever played Myst? The premise is that there's an Art for writing books that transport you to another reality. People skilled in the Art jot down coherent details about that reality and, in the process, become so involved with its conceptual visualization, like a radio's phase lock loop, that they manage to identify one reality uniquely and bridge the gap between semantics and substance.

A similar effect is achieved with constraint programming. You start to sketch out your problem, identify ambiguities, and resolve them until you've described a coherent system.

There's a Java CSP library called Choco. They're involved in the creation of JSR 331, which the 2010 JCP Awards body describes as pretty much the only innovative JSR in the 300's. Hah, faced. Ahem.

I work for a creative company and figured I'd try my hand at constraint programming, applying it to the old problem of text layout.

Line boundaries are determined by the bounds of line elements.
Lines may not exceed the boundaries of the text area.
Characters may be line elements. Their width is determined by font metrics and point size.
The space between characters is determined by font kerning.
The positional difference between adjacent lines is called leading.
If centering is enabled, the distance between a line and its bounding text area is the same on either side.
I want the point size to be as big as possible without exceeding X.

And so on... Then,

I'm interested in the position of individual line elements.
For PDF purposes, I'm also interested in greedily identifying each sequence of characters with identical formatting and no kerning.

The syntax wasn't beautiful*, but it was actually a pleasant programming experience. The kludgiest part was that I couldn't define adjacency constraints on ordered sets. (e.g. "The position of the next line element is equal to the position of the previous element plus the width of the previous element, plus the space between them.") To achieve that effect, I had to create individual constraints between each element. Unless Choco could identify that the constraints between the elements happened to be identical, there's no way it could handle that optimally.

But it worked. It told me everything I asked for.

Let's face it, you want this analysis happening at compile time, not runtime. In this simple form, the layout engine was usable as a one-off layout generation, but its suitability for real-time applications is dubious. Plus, I'm angling for solving well-constrained problems – problems to which constraint inference techniques can be applied to identify a single solution. I don't need something like Choco's incremental solver. I want my constraint compiler to write code that will take any compliant data model and produce the answers I want with as little processing as possible.

I also want it to only generate answers that code further down the pipeline specifically asks for. And I want the constraint compiler to discard constraints and relationships that aren't relevant to what's asked for. And I want the compiler to generate code that optimally handles the kinds of incremental changes to the data model that are possible in the application. And I want the compiler to infer which kinds of changes are possible. And I want to layer my constraint model in a modular, compositional way. And I want the cardinality and relationships of the components to be decided by an XSLT-flavored language that decouples the semantics of output and input data models. And I want useful circular dependencies – state that decides the view layer, and drag events that decide the state, with the compiler demanding the creation of handlers for drag events that over-constrain the problem space. And when the compiler's in debug mode, I want it to analyze the tree decomposition and inject a SillyWalk constraint that demonstrates readily to the developer where they failed to tighten things up.

And I want a pony.

*In all fairness, they created a parser to get around the syntax ugliness.

PS: Blogger HTML formatting is dead to me.

Tuesday, January 11, 2011

Better Services using DSLs

Domain-Specific Languages

I'm a big fan of declarative languages. My employer's largest website, eInvite.com, is largely XSLT-based. My office has a contingent of lesser and greater PLT Scheme fanatics. We've cultivated a lot of expertise in SQL, declarative Spring DI, and declarative Hibernate ORM. As big a fan as I am of declarative languages, I'm an even bigger fan of declarative domain-specific languages.

By expressing the behavior of your application in a language that deals directly with your domain, you can effectively guarantee that the part of the application that expresses your intent is correct, even if the underlying implementation is faulty.

Without a domain-specific language, you can rapidly fall into the trap of having your API centered around specific use-cases. Slightly different use cases involve the creation of new methods, resulting in slower development time, excessive function overloading, and parameter creep. If the full scope of domain functionality is extended to clients in the form of a language, clients can compose the appropriate semantic for their particular use-case without service-level modifications. The trick is preventing clients do things they're not supposed to in this highly enabling model. More on that later.

"But Adam," I hear you say. "Imperative semantics don't necessarily imply API creep. You can model your domain with objects, govern transactional behavior with annotations, and keep the methods on those objects tight and relevant."

True. And that's a great way to write libraries. But it's a crappy way to write services.

Client-State-on-Server Antipattern

Consider Facebook, eBay, PayPal. Each of these services provides an API. They're not bad APIs, but they don't fully express the domain exposed on the websites proper. Unlike a traditional client application, the websites don't use the service API. They're tightly integrated at the service level, with service-level presentation and a bagful of use-case-specific AJAX. Since the websites don't need the API, the API can remain impoverished or absent, or, even uglier, simply be a documented, augmented version of the AJAX calls that were originally designed to make the website punchier. I call this the Client-State-on-Server Antipattern.

The problems with server-side MVC approaches extend beyond anemic external APIs. One problem is authentication. What can be done to what by whom is a domain-level concern. Service-coupled approaches, however allow this concern to bleed into the controller layer, and from what I've seen, most web projects gravitate in this direction. To create a degree of separation, some architectures prohibit the controller layer from directly handling domain objects, leading to the creation of managers with clumped sets of use-case-specific methods, divorced from the problem context that inspired the use-case.

Solution to the Antipattern

Make your website use the same API as everybody else.
Implement permissions at the domain level.
Make your API a DSL interpreter, allowing your clients to exercise their domain intent cleanly and concisely in as few requests as possible.

None of these steps are trivial. There are two approaches to enforce #1. The first is to make a traditional MVC application that makes calls to a separate (different VM, different port) domain service. The second approach, which I'm more interested in, is to have all requests to the domain layer be made from the browser, by way of a fully-fledged client-heavy web application enabled by modern JavaScript engines and, nominally, HTML5 features.

In future posts, I'll explore various developer-friendly solutions in this problem space, as well as opportunities for novel architecture projects. I've got some rough design documentation typed up, and many of the concepts therein as well as discourses on relevant extant projects will pop up on this blog in due time. Fans of REST, XPath, cloud computing, parsers, custom editors, GWT, and code generation will all have something to chew on.

Introduction

Here's a little info on what this blog is about.

The Roxton Group is an organizational placeholder for research and work done independently of my employer. It is currently an organization of one, and I hope readers will forgive the hubris of the name, as I intend to continue using the moniker when I leverage the technology discussed on this blog to broader ends. More on that later.

In this blog, I'll discuss what I'm working on in addition to various topics of interest. These posts will reflect my current thinking, and as such, will often contain errors or even reflect an unfortunate confusion of ideas. It's my hope that publishing my thoughts on my work and various technical subjects will improve the quality of my writing, render my ideas more articulate, spur me to improved pacing in implementation, encourage the adoption of improved development techniques and paradigms among my readership, and allow me to benefit from insight imparted by readers.

My interests are centered around the nuts and bolts of website development, social media, and improved forms of web-enabled collaboration and commerce.