Happy Flu !

As promised in my previous article, here is the experiment I was talking about. Happy Flu is a mix of viral marketing and diffusion experiment. The idea is to propagate a resource and track down its dissemination. The resource we have chosen is a visual representation of its own diffusion, which makes it quite fun.

If you wish to help, just click Spread it, and copy/paste it on your own blog, I wonder how far it can go !

Edit : Please do not link to this blog, the whole purpose of the experiment is to be able to track from which web page the applet was copied: my own blog is not important at all ;)

Following Trackbacks

It's been a really long time since I wrote an article but here I am again. For the last month, I've been working on something awesome : trying to understand how information spreads on the internet. This basically involves a lot of graph theory, a little sociology and a huge amount of intuition.

What I call information diffusion is barely a tree which nodes are web pages/articles and let u, v be two nodes of the tree, if v is a child of u, then the information posted on v was first read on u. The root being the original source of information (or at least the first observable occurrence).

The difficult part is to try and build such a tree. There are several papers on the subject, but most of them reconstruct the tree, using semantic wizardry : take a blog network, identify subjects (whatever it may mean) and using statistics and temporal information, try to guess the path the rumor/buzz/information took. While this method is OK and gives result, I'm not convinced this is the best way of doing things. I strongly believe that whatever are the results one wants to see, there is a way to reconstruct the diffusion to fit those specific results, and this might influence the measurement.

Then I thought that on blogs we had this wonderful concept called trackback. A couple of weeks ago I firmly believed that you could actually track the diffusion of an information just by looking at the trackbacks. The idea was pretty simple : I made the assumption that if blogger B reads something on blogger B's website, he'll create a trackback.

So I wrote a few lines of python and using the Technorati API I built a few trees. In each of those, the nodes are the articles, and there is an (directed) edge between two nodes if the target has made a trackback to the source.

Here is an example : I looked at how the announcement of YouTube's acquisition by Google spread from two different sources. The obvious thing is that the tree is really large and not deep at all. And this is not an artifact, I have quite a lot of data on other information trackbacks and they all follow the same pattern : almost everybody makes a trackback to the source, and there are extremely rare occurrences of trackbacks to another article.

I started wondering where I made a mistake, and after some reflexion I came to the conclusion that there is a very simple yet enlightening explanation : things do not work that way at all, trackbacks do not capture information diffusion.

I'll give you a real life illustration : let's say you have a friend and that this guy tells you something like "Hey, I saw an awesome video on YouTube yesterday, blablabla". Let's assume that you're interested, you'll go and watch the video and when you'll talk about it, you'll never say : "One of my pals told me he saw this video on YouTube, check it out" ; you'll directly cite the original source. This is exactly what happens on the internet.

Moreover, there is another important thing to consider : when blog A tracks back to blog B, there usually a link on blog B to blog A. And if blog B is TechCrunch (or any other huge blog), it's fair to assume that a little traffic will leak to blog A. This is an incentive to track back to highly visited blogs (which happen to be, in most cases, the source of the information).

You might wonder what I'll be doing now that I realized that the whole trackback stuff doesn't work out so well… I'm planning on launching something quite interesting in the following weeks, which might help understanding how information flows from blog to blog, so stay tuned ;)

Google treasure hunt

Google australia is holding a brain teasing treasure hunt. The first question looks like this:

A robot is located at the top-left corner of a 53 x 32 grid1. The robot can only move either down or right at any point in time. The robot is trying to reach the bottom-right corner of the grid. How many possible unique paths are there?

The first thing I thought about was calculating all the paths using dynamic programming and memoization:

Let P(w,h) be the number of paths the robot can take to go from to top left corner of an w x h grid to its bottom left corner. If the robot goes left, it has P(w-1, h) possible paths left, and if it goes down, it has P(w, h-1). Of course, P(w, 1) = P(1, h) = 1.

The second approach gives the same results, but is obtained in a completely different manner. In the first case, it's mainly algorithmic, whereas the second case is purely combinatorics and — I think — far more elegant.

In order for the robot to go from the top left corner of an w x h grid to its bottom left corner, it must go (w-1) times left, and (h-1) times down, which is a total of w+h-2 movements. The number of paths is then the number of way of choosing which of those movements are (for example) going left. That is, choosing w-1 elements out of a set of w+h-2, best known as the binomial coefficient C(w+h-2, w-1). Of course, to compute it, one would use the recurrence used in Pascal's triangle : C(n, k) = C(n-1, k) + C(n-1, k-1), with C(n,0) = 1 and C(0,k) = 0.

The really nice thing about this is that two different ideas lead to one same result (it's just a question of rewriting the recurrence). This is creativity ;)

Just to give you an idea of the numbers involved here : on a 67×51 grid, there are 2049503709637561751443717895527866 paths. Believe me: it's huge !

  1. the size of the grid is variable []

Namespacing JavaScript

One of the biggest drawbacks in JS is the lack of proper namespaces. Some have long used objects to avoid global pollution, but I think it's possible to go further. That's how I came with those two functions :

(function (global) {
    bundle = function (name, func) {
      var cur = global;
      var name_arr = name.split('.');
      for (i = 0, ilen = name_arr.length; i != ilen; i++) {
        var n = name_arr[i];
        cur = cur[n] = cur[n] || {};
      }
      if (func) func.call(cur);
    };
    fetch = function(imports, func) {
      var ob = {};
      var inject_bundle = function (name, ob) {
        var name_arr = name.split('.');
        var cur = eval(name_arr[0]);
        for (i = 1, ilen = name_arr.length - 1; i != ilen; i++) {
          var n = name_arr[i];
          cur = cur[n];
        }
        var last = name_arr[name_arr.length - 1];
        if (last == '*') for(k in cur) ob[k] = cur[k];
        else ob[last] = cur[last];
      }
      if (typeof imports == 'string') inject_bundle(imports, ob);
      else for (k in imports) inject_bundle(imports[k], ob);
      return func.call(ob);
    };
  })(this)

bundle is used to create a new namespace, given as a string. If a function is passed as a second argument, then it is called, bounded to the namespace. For example, one could use the following :

bundle('net.friggeri.foo.bar', function () {
    this.bleh = "I'm a turtle !";
  });
bundle('net.friggeri.quux', function () {
    this.a = 32;
    this.b = 52;
  });
print(net.friggeri.foo.bar.bleh);
print(net.friggeri.quux.a * net.frigger.quux.b);

You might think (and you'd be right) that having long names can be a pain (I would never use something like net.friggeri.quux.a). However, using the fetch function, you can populate an object with whatever slots in a given namespace. Then, using with, you can emulate some kind of pythonish import statement. An example might be useful at this point:

fetch([
  'net.friggeri.foo.bar',
  'net.friggeri.quux.*'
],
function () { with(this) {
  print(bar.bleh);
  print(a*b);
}})

The code is pretty ugly: I have to use eval at some point to make sure it still works if called in a nested function and I didn't find a way to avoid using with in the function passed to fetch. Of course it's possible to use a this.bar.bleh syntax, but the whole point of fetching the contents of the bundle is to avoid using such long variable names.

If you wish to expand this, or have any suggestions, you're welcome to comment.

Update : I just realized that it could be simplified a lot with having fetch returning directly the object, allowing something like :

with(fetch([
  'net.friggeri.foo.bar',
  'net.friggeri.quux.*'
]) {
  print(bar.bleh);
  print(a*b);
}

This is a very good example where having an all functional approach might not be the path to follow. ;)

JavaScript Shell Scripting

A few days ago, Peter Michaux posted a very promising article concerning xjs. I've long wanted to be able to use JS to write shell scripts (as one would do using bash, python or perl) but it lacked several important features.

Peter has added a File module which wraps Java file operations, and I see this as the first step towards taking JS out of web development. Let's face it, JS is really present in browsers, SSJS is taking off very slowly and even though a few apps use JS for scripting, it's still not mainstream.

The way I dream of JavaScript's future ? As a general purpose scripting language. This means that several things have to be done. First, an easy way of binding whatever library to JS. Second, a CPAN like repository dedicated to JS. There is absolutely no control on how would evolve the latter. The former, however, can be conceived right now.

I'm not really sure if I agree with Peter's choice of Rhino instead of SpiderMonkey, but I guess he has more experience with Java than with C — which is quite the opposite of my own misconceptions. I guess there are pros and cons for each of the choices, but I wonder if it'll be easy to port generic C libraries to JS using Rhino. I must clarify something at this point : I have a deep hatred of Java — it's a kind of zealous religion — and everything that comes near it appears like obscure magic to me.

Nevertheless, I really wish a lot of good things to xjs, and I hope that Peter will take into account the fact that an easy mechanism for porting library is a must-have.