The Holy Grail of Kickass API Documentation

Disclaimer: this is a bunch of vaporware.

Read/Write Asymmetry

It’s been a while since people solved the problem of making API documentation easily readable, while keeping it closely tied to the code: keep it in comments in the source code and export it to HTML.

One problem that still remains is that developers suck at writing documentation. They have no need for it (it’s just a waste of time: they already know how that stuff works), and they often don’t use the API they develop themselves, or at least not in all possible use cases, so they end up not knowing very well which information is most valuable to the users.

Users, on the other hand, know it all too well. They know a hack around that quirky behavior; they inadvertently found that undocumented feature (or was it a bug?); they have collectively field-tested it in a hundred times as many use cases as the author had originally foreseen; and they know what needs to be known.

They write about it too! You can see countless blog posts, forum answers, even comments appended to the documentation itself taking up where it left off. It just happens that this stuff never makes its way back into the official docs because it’s too much of a hassle!

If you’re just a user of some library, you usually have it installed in some far-away system directory, outside of source control, and read-only. Making a change to the documentation means finding where the repository is hosted, checking it out locally, finding where in the code that bloody method is implemented, making a change and sending a patch (where to, again?). That’s not your job right?

So why don’t we just make it all a wiki huh? Easy as pie. No funky markup in comment blocks, no generation step, it’s all HTML. You’re reading it, you spot something wrong or incomplete, edit it right there and you’re back to reading before you can say “back to reading.” And where is the code now? Well, somewhere completely dissociated from the documentation. You’re not even sure which version you’re reading about.

So what we really want is API documentation that:

  • comes directly from the source code;
  • is easily editable as a wiki;
  • and goes back to the source code.

There and Back Again

Rdoc.info takes any arbitrary repository and generates API documentation for you. So far it only works for Ruby repositories on Github, but it’s the only one I know so it’s the one I’ll talk about. The idea is applicable to anything that does the same type of auto-generation though.

That’s really cool then! Our first requirement is covered.

The second requirement is making a wiki, so I think we can borrow from about five million solutions.

Now HOW?! How do you turn that stuff back into source code again?!

chmod +w RDoc

Documentation on rdoc.info is generated per repository, per user, so you can decide, for each repository, if its documentation should be wiki-editable or not (if you made it this far I assume you really want it really bad). In that case, all you have to do is create a public branch called “docs”, and rdoc.info starts tracking that instead of the master branch, generating a wiki from it. Now you just tell everyone to come and edit!

Gathering edits

Rdoc.info forks your repository and commits every wiki edit to that fork. It’s quite simple: it just replaces the comment block next to a method/class/whatever with the edited version. On every commit, you receive a pull request.

Displaying

On the website, rdoc.info shows the latest wiki-edited docs by default, with a read-only “canonical” tab showing the last official version that was pushed before the recent edits.

Merging back

When you receive the pull request from rdoc.info, you merge rdoc.info’s branch back into yours, resolving possible conflicts and integrating the changes.

Edit History

What if you push to your “docs” branch when there are wiki commits that you haven’t integrated? Rdocs.info can’t try a normal merge, as there might be conflicts, and the whole thing has to be automatic. If it just does a “reset hard”, it’ll lose previous edits. So it does a merge, but with a no-hostages approach: automatically resolve every conflict by choosing your changes. It will count as a normal edit in the wiki, with the previous version still in the history.

Reverting to a particular version from the wiki interface is the same thing: the contents (only the doc comments, not the code) of the chosen version are commited on top of the current commit, and can be diffed with it.

Attribution, Statistics, Reputation

If the user making the edit is logged in, the commit can have her as the author, and statistics can be kept on who helped out the most and that kind of stuff. Methods with the least content can be marked as “stubs”, and editing those can be worth more points.

More into metadata territory, the developer could also possibly benefit from indirect usage statistics (through clicks and searches), and people up/downvoting individual methods, maybe even commenting on their design (whether they should be split in two; if the parameters or return values make sense, etc.).

You want it too?

From the few services I know which host API docs, I think the most likely to implement something like this are either rdoc.info or ApiDock. So write them and ask for it! (hey, I told you it was vaporware.)

Update: Apparently, something like this has been tried already by DocBox as a Google Summer of Code project. The code on Github is from last year and the website pointed at by RubyFlow seems to be down. Anyone with more info on this?

  • Share/Bookmark
  • jeffrafter
    Reading through your post this is very similar to something nick and I
    batted back and forth when we were making the site. In general we had
    considered taking this even further by doing cross site commits and which would make use of github's already existing web-based commit system. This was more apparent when we made http://docs.github.com (the rdoc.info mirror on GitHub). Auto-forking is of course a much more in depth part of that. Also, keep in mind that we store docs for every revision of a project... so it is possible that the default commit (the most recent) may not show your changes until they are accepted and folded in. How would you handle that?
  • The model I mentioned for the edit history above takes care of this. For repositories that choose to allow wiki editing, the default commit shown wouldn't be the most recent from the owner, but the most recent wiki commit (which, by definition, would be ahead of the latest owner commit). The wiki history would be like:
    edit - edit - merge from owner - edit - merge from owner - edit - edit

    And new changes would be instantly visible. The latest owner version will still be reachable from the history, and you can make it somehow visible to the visitor if they're seeing a page with changes that weren't yet folded in, linking to the latest pristine copy.
  • jeffrafter
    I think I am still a little confused on the "no-hostages" approach. Many of the projects hosted on rdoc.info were put there by people other than the project owner. It is likely that the owner will make a change after the wiki edit which will conflict. It seems like choosing the wiki version outright is dangerous... the owner may have changed the documentation because the implementation of a method changed... in which case you wouldn't want to save the wiki edit. It seems like the owner version needs to always be the one that shows.

    In our version, instead of maintaining a separate branch you do exactly what github encourages you to do and maintain a separate fork (which is just an ad-hoc kind of branching). When you make a change, we should do the following:

    (1) If you a committer on the project (or owner): automatically commit the change that you make to the wiki style interface. This is the easiest thing.

    (2) If you not a committer: fork the repo, but not as the rdoc.info user... as the current user (we will need to connect the github user to the rdoc.info user). Then make the commit to that fork and redirect from the current docs view to the users new forked version of the docs (which we auto-generate). Send a pull request.

    (3) Modify the interface to list related forks, revisions, and tags. This is planned already, but would become pretty important.

    Does that workflow make sense?
  • Oh wait. I need to clarify the difference between the "no-hostages" thing with the "which version to display by default" thing.

    When you go to the docs for a repository, you see the *latest possible version*, regardless where it came from. If that's the owner's latest commit, you see that. If it's a wiki edit after that latest owner commit, you see the wiki version. That's the "which version to display by default" thing.

    The "no-hostages" thing is the opposite of what you thought. When the owner commits something, *that* becomes the latest version. The previous wiki edit is moved to the history, and the owner's version is displayed (which is the same as saying that, in the merge, all conflicts are resolved *in favor of the owner*.)

    Of course that, for repositories that were put in rdoc.info by someone other than the owner, it would be frustrating for wiki editors to see their changes being clobbered by commits from the oblivious owner all the time (even though still recoverable from the history). That's why I suggested that wiki editing be opt-in by the owner.

    One problem with the workflow you suggested is that it creates a multi-headed wiki, without a clear linear history. And if you don't show the latest edit immediately (instead waiting before the owner folds them in), there's a higher chance that people will step on each other's toes, fixing the same typo or editing the same line multiple times. The longer feedback cycle makes conflicts more likely and integration more difficult. Worse than that, the workflow is incompatible with the way rdoc.info tracks documentation.

    Rdoc.info tracks documentation *per-fork*, independently. Take http://rdoc.info/projects/search?q=delayed_job, for example. I keep a fork of delayed_job because I have a feature there that is not present in other forks. According to your workflow, if I edit the docs for helder-delayed_job to describe my feature, collectiveidea and tobi are going to get pull requests for that change, which doesn't make sense. And if I want to edit *their* docs (maybe I found a typo in the docs for a feature they have and I don't), that change will be committed to *my* fork, which doesn't make sense either. Also, this workflow requires that a user have a github account and fork the repository just to make a simple edit. Unnecessary forking polutes the network and your fork list -- I'd rather have forks only for projects I'm doing significant contributions to.

    What we need is one linear (centralized) doc history *per fork*. So you'd have separate doc histories for helder-delayed_job and tobi-delayed_job, etc, which everyone could individually edit without needing to fork it themselves, or without affecting their own fork in case they had one.

    My original idea actually falls short here as well, as the rdoc.info user can't fork different repositories *of the same project*. So patch my idea with:
    -Rdoc.info forks your repository and commits every wiki edit to that fork.
    +Rdoc.info forks your repository (if it hasn't already), *creates a branch to track your fork* (named, say, "helder-docs") and commits every wiki edit to that branch.

    So if I go to the docs for tobi-delayed_job, what I'm seeing and editing is always the head of the tobi-docs branch in rdoc.info's fork of delayed_job (github.com/docs/delayed_job/tree/tobi-docs).

    If it's the docs for my own fork, edits go to helder/delayed_job/docs, which gets picked up by rdoc.info and merged back into docs/delayed_job/helder-docs.

    I don't know if I made this sound too confusing, but it's actually quite simple :) We can schedule a skype talk if you want so I can explain this better (I'm obvio171).

    Btw, do you guys consider opensourcing rdoc.info? ;)
  • jeffrafter
    Lots to digest here but I see where you are going. I think it will be hard to know which one will work better in practice until we try it :)

    I get what you are saying about over-forking being a problem, and I see how rdoc.info could maintain its own forks (I suppose these would live in the github.com/docs account we have setup).

    In terms of opensourcing... we did it from day one :) http://github.com/zapnap/rdocinfo. Fork away and implement this feature! ;)
  • Damn it. Now I have no excuse to offload this to you guys :P
  • Hi, this is Otto from APIdock. I'm replying here instead of private e-mail.

    When we were initially writing APIdock, having the wiki-style editing seemed like core functionality. However, after we had run the site for a while, it occured to me that wiki-editing doesn't actually solve the problem.

    The #1 priority is to lower the barrier of contributing - and that's exactly what user-generated notes are good at. However, many of them are not directly ready to be used in official documentation.

    So far it has been easier to pick the good stuff from notes and commit it to lifo's docrails Git repository. We're hoping to improve this in the future.

    There are some plans about opensourcing APIdock, so theoretically it would be possible for anyone to add the wiki editing functionality. But still, I don't think it would be so much better than simply committing to docrails.
  • I see what you mean, but I think just letting visitors add notes and requiring that someone else adapt it to the documentation and commit it is still too much trouble. And even if it weren't, this model basically covers edits which would add _new_ stuff, like "hey, you forgot to mention this and that".

    It still doesn't cover some classes of edits like correcting typos, grammar, reorganizing text to improve how it's structured, and rewriting parts to improve clarity. Notes basically add coverage, but don't improve quality.

    Opensourcing APIdock is great news! I haven't been following it too closely, so I hadn't heard of this. The whole auto-fork thing needed to implement wiki editing is indeed not trivial, but I think it would be a great thing to have.
blog comments powered by Disqus