Brendan's Braindump: November 2008

Friday 21 November 2008

Some Agile notes from QCon SF 2008

I'm not the most excited person about Agile (hey if it gets the job done better and quicker then great, but that's not going to turn me into a religious fanatic) but I saw some interesting stuff about Agile today so I should report.

Some guys from salesforce.com spoke about how they migrated 50 teams, over 700 people, to Scrum -- all at once! Their rationale was that it was "like burning their boat after they had rowed to the other shore", and if some teams moved and others didn't, it would be a recipe for finger-pointing, missed dependency deadlines, and recriminations. Coupled with the fact that they all commit to the same codebase (!!) it sort of makes sense.

Notes:

Had to spend a lot of time training product owners
Now 75% test coverage
They have an “emergency brake” feature — if the build breaks, fixing it becomes number one priority, they don’t just roll back or deal with less coverage (this comes straight from Toyota's production system, where their assembly-line staff can stop the whole production line if they detect a problem coming from the previous workers, and all focus shifts to fixing the problem -- which usually only delays the production process by a minute or so)
They had lots of pain, people complained a lot — but they got over it in a few months
Phases:
- rollout (introduced tools – own app on force.com; office hours idea for UX/documentation people)
- adoption( release planning, sustainable velocity reducing overcommitting)
- excellence (moved systems testing to iterative approach, scrum of scrums as issue discussions rather than status reports, dependency tracking between teams)
- expansion
Focus this year is working with customers and partners and helping them go agile
Moving to IT and operations going agile..!
Some teams use TDD, some have QA people, no one fixed type
Systems testing people work with high-risk code first (which is identified at architecture stage)
They have short sprints with "releases" every month, but they only do a release three times a year, and they have a non-agile month at the end of each cyclefor testing and deployment (a "release sprint")
They use continuous integration even for db schema changes

What they would do differently:

involve more individuals earlier: eg openspace/unconference meeting
earlier, more intense training
more coaching
giving concrete deliverables to executives during the rollout, get them engaged

Keys to success

exec sponsorship and commitment
focus on principles rather than mechanics (eg not necessarily daily scrum standups etc)
focus on making okay teams excellent to get standard-bearers, rather than working with worst teams - “know which fires to let burn and which to fight”
radical transparency – trust.salesforce.com has uptime figures – accountable to customers
what the heat is on, stick to your principles
we failed – all along the way – we experimented, were patient and expected to make mistakes
having multiple teams helped — it’s not always agile that makes teams fail, so higher number of teams gives more chance of success

Slides (but if you search slideshare.net for "salesforce adm" you'll see many many more presentations -- they've obviously been milking this one for a while!)

Also on Agile, I'll report a few quotes I heard from the silicon valley agile users group, which had some acronym I missed. They were having a meeting in the main room while I was doing my email, and although it started off pretty dull, they soon got animated and were saying some very intelligent things about how people were getting too caught up in the ceremonies of scrum (chickens and pigs, standups, only three questions etc) and they were forgetting that the whole point of the manifesto is to be agile, to change things when you need to, and to deliver customer value and business value. It gladdened my heart!

Some choice quotes:

(When a prominent agile coach was asked how to sell agile to upper management:) "I ran a large agile project for (a very large organisation) for two and a half years... the whole time my boss thought it was a waterfall project"
“agile can be gamed, just like anything else”
"If you measure velocity by how many tasks you complete, people will complete all the easy but useless tasks. We shouldn't be measuring task velocity, we should be measuring business-value velocity... people do what they are measured by" [the mantra our VC professor Terry has drilled into us: structure drives behaviour]
“we’ve gone from one dogma [waterfall] to another dogma [scrum and XP] -- I thought the whole point was that we were supposed to be agile. What happened to thinking?”

Mark Nottingham on "HTTP Status"

From the author of XML and AtomPub specs to the author of the other half of the picture, the Atom Syndication Format spec-- Mark Nottingham. Seems like a nice guy, he apparently lives in Melbourne now! Smart guy.

Quick notes...

HTTP/1.1 was basically only written to “contain the damage” of 0.9 and 1.0 (vhosting, persistence, caching)
Mark was involved with the WS-* stack -- but he graciously apologised to the room for his sins ;-) An interesting comment regarding SOAP etc was that “having that much extension available in a protocol is socially irresponsible - protocols are all about agreement" and you need to draw lines to make soething useful. He was basically saying that WS-* allows you to do too much, giving you enough rope, and making the normal case hard just to make an extreme case possible. (Or something like that, if there's a blog post where he explains himself I'll gladly link to it instead of badly paraphrasing him)

Mark had a neat way of saying that RESTful APIs "use HTTP as protocol construction toolkit”. They're not built on top of HTTP, they're build as part of HTTP (in a way).

HTTP 1.1 bis: With Roy Fielding and others, Mark is working on "HTTP 1.1bis", a rewrite of the HTTP spec to make it much easier to read, to resolve ambiguities, and to define edge cases that were missed in the first version (eg "what happens when you put an ETag on the response to a PUT"?!). All this sounds very esoteric but people are really pushing the boundaries of HTTP these days with streaming services, Comet and Ajax, etc, so it's best to resolve the differences now rather than wait for implementations to define behaviour (and possibly have two versions of what happens in these scenarios)

One question I was wondering is how they will market the new spec: if it's being sold as "just a rewrite to make it easier to understand", then people won't pay much attention, but if people start creating new web servers that are "HTTP/1.1bis compliant" then it's a new standard, not a rewrite, and might as well have some new functionality as well! It's not obvious how this will work.

Compatibility: Mark mentioned an interesting point in passing: that “an http/1.0 server can still possibly take 1.1 directives” -- with squid as the canonical example. Squid officially doesn't support HTTP/1.1 yet, but it actually supports most 1.1 directives and commands.

HTTP methods: convention wisdom says that intermediaries might reject PUT and DELETE verbs due to security concerns, old gateways etc, but Mark asserted that it doesn’t really happen in practice. Google created a workaround whereby they send everything as a POST and have an extra HTTP header, “X-HTTP-Request” (I think that was right?) to "pretend" to do a PUT or DELETE. A bit silly really, signs that things need to change!

URI length: IE still limits URIs to 2k in length. Squid limits headers to 20k. HTTPbis is going to recommend at least 8k.

Cache testing: Coadvisor is a test suite for intermediaries

Headers/trailers: Most web programmers know how annoying it is to have to set all HTTP headers before you output any text. So they're thinking of “trailers” as well as headers to the envelope of your payload. This could be really useful.

Something about a 307 redirect for POST – not handled by safari... I kinda missed that bit?

Request-side cache control isn’t well supported – eg the act of posting to a blog should be able to invalidate the cache. Http currently has request cache control: eg “I’m okay with this being up to X seconds old”

Request pipelining – not supported except in safari, but would be v useful if it worked

Data ranges: need to be better supported. We should be able to jump to a section of a video etc without putting query params in the URI (although one thing you get from a URI is addressability, which shouldn't be overlooked in the quest to make things neat from an architecture perspective...)

OAuth: the IETF/OAuth BOF working group the other day went well, it could have been a culture clash but the "grey hairs" were visibly excited by the enthusiasm and drive of the OAuth guys (I guess this was Chris Messina, Eran Hammer-Lahav etc) and it ended up being "a bit of a love fest". So OAuth looks like becoming an IETF standard. Let's hope that means HTTP authentication improves a lot as a result.

New transport protocols: Looking at HTTP over SCTP, a streaming protocol I am not really familiar with. Mark is thinking of proxy-to-proxy overlays: one point-to-point many-streamed SCTP connection being muxed/demuxed to TCP at the edges.

Prefer header: is in internet draft now. More than content negotiation that lets you choose languages, encodings etc, Perfer lets you ask for semantically different content, eg only summaries or only pictures.

Typed links: making a comeback: “this invalidates X, the previous one is Y, edit this at Z” - similar to what Atom does with prev/next/edit etc. There will be a controlled list of types based on URIs (very semweb, which is nice), using the registry which already exists for Atom. (Mark didn't mention that he is the author of the internet-draft!)

What does the future hold...?

libraries of “higher-level but still RESTful abstractions,” ie systems that let users - webmachine is an example
Rack::Cache extends HTTP libraries to provide a better cache implementation built in to Ruby, he hopes to see them in other languages and frameworks soon
Building blocks for intermediaries, so people don't have to extend Squid every time they want to build some kind of intermediary system – eg xLightweb (Java)
The “O2.0” stack -- openid, oauth, comet, html5, gears etc -- “fail to consider overall architecture” - “cowboy development on the web” - “new pseudo-standards” - basically he wasn't very friendly to them! But I think he is much happier now that Messina etc are working with the IETF.

Link: http://tools.ietf.org/wg/httpbis/

Thursday 20 November 2008

QCON SF 08: Tim Bray on storage and persistence trends

The keynote this morning was Tim Bray, a bit of a guru in the unix and web development scene, who helped write the original XML spec among many many other things.

Some quick notes before getting to the meat of his talk:

The Drizzle db project is worth keeping an eye on – a key mysql committer forked mysql's code to focus on the most minimal sql engine possible — one date type, one float type, no triggers, as simple as possible, but fast and reliable. The idea is so compelling that people are apparently running it in production even though it's barely in alpha!

“Column oriented databases” are about to see their day – BigTable in Google AppEngine is probably the best indicator at the moment

CouchDB has just become a top-level apache project – the author is now employed by amazon, and the project is going really well. I know Dirk and the guys back at the ranch have been looking at the product so this is good news. Some quick CouchDB facts:
- REST-based, built in Erlang
- uses the “eventually consistent” model
- it has a nifty way of using MapReduce functions on the server to do views! (which could even be adapted to do "stored procedure" type functionality I guess)
- HTTP is only access protoccol - “the most debugged protocol on the internet”
So it sounds like CouchDB is here to stay. Good news.

Atompub vs WebDAV: performance is always questioned, but Bray is building an atompub server apache module, mod_atom that seems to perform pretty well, and he hasn't even started optimising it yet. Sounds like mod_atom is something else to keep an eye on.

Facebook gets 90,000 transactions/sec using memcached! (that's good... very good)

The main guts of the talk was a walkthrough the different layers of storage required by modern computer systems, in order of performance:

registers on a CPU,
Local cache (l-cache) in the processor,
DRAM on the server,
distributed hash table accessed over a network (eg memcached),
solid-state storage (ie Flash memory),
magnetic disk (or as Tim called it, "spinning rust"),
tape (which as Tim reminded us is used more than ever due to regulations like Sarbanes-Oxley requiring everyone to keep everything practically forever)

The news here is (a) a validation of our approach at the BBC's Forge project, where we use memcached as a critical part of the scaling infrastructure for dynamic publishing, just like a growing set of people, like Facebook, Yahoo, and many more, and (b) the introduction of solid-state storage to the list -- and so high up in the list!

But the thing that really got Tim excited was not just his impressive figures on how much faster solid state could be on the right filesystem (which was a bit of an ad for a new server released by his employer, Sun), but the fact that SSD has Moore's law on its side: as opposed to "spinning rust", SSD is all silicon, so it will only increase in price/performance over time.

As Tim says, "Ladies and gentlemen, you are looking at the future."

Note: For the business students that might stumble upon this blog, here's a reward for reading through all that techy stuff: Sandisk own many many patents in solid-state storage and they were strong enough to shrug off Samsung's offer a couple of months ago, so they could be an interesting stock to watch as solid-state disks become a key part of more and more high-end computer systems... but they're going down right now, and they might not have hit bottom yet as it looks like all the analysts are downgrading them one by one (not to be taken as investment advice blah blah)

Tim's slides (warning: the /tmp/ in the URL gives the indication that they may not be there forever...)

QCon SF 2008, day two

It's a shame that I missed the first day of QCon SF, especially as I enjoyed it so much last year. I was a paid delegate in 07 but this year I qualify for "starving student" rates (ie free), in exchange for volunteering by manning one of the session rooms tomorrow. Luckily that's the room about distributed databases so I probably would have gone to a lot of those sessions anyway. (For people looking for where to go, the "how did they build that?" architectures track is always great as well, but many of the people speaking were there last year as well so I don't really mind not going to that track, but I recommend it if you're into that stuff.)

I couldn't go to day one as I had classes yesterday, but from talking to a few people it seems that the better sessions were all today anyhow. The logistics were all great as usual, the wireless came thick and fast, and the food was pretty good -- ya gotta love that mid-afternoon ice cream run to keep the sugar levels high!

I'll spread the news over a few posts so if people want to comment on a particular section they can do so.

PS I might try to dig up my notes from last year's conference, as they seemed to go down well when I sent them around internally last year. That will probably qualify as the most out-of-date blog post ever!

Reviving the blog

An even two years after my last post, I figure it's about time to get this thing started again. Especially while I'm studying in Berkeley, I have a lot to say -- too bad I've left it so late that I only have about two weeks left... well I'd better hurry up and post some stuff then.

Apologies that there will be a mix of current stuff (eg QCon this week) and old stuff in no particular date order. I'm sure you can work things out, and anyway I don't think I'll be saying anything so cutting edge that the dates actually matter.

Enjoy, here's two at least a few posts before the next two-year gap...

Brendan's Braindump