Modelling the Quora topic network

Since its launch in 2010, Quora has been a question-and-answer site that actually works, and has managed to attract interesting, intelligent people to answer questions on all sorts of issues from technology, design and work, to food, travel and fitness. The Data Team at Quora have written a fascinating post about their analysis of the strutured topic data that has grown alongside the site itself.

Topics clustering around a Quora question

Topics clustering around a Quora question

When users ask a question on Quora they can add it to multiple ‘topics’, so that it becomes visible to other users who follow those topics. The Data Team looked at how topics overlap around questions, assigning weights based on the likelihood that a question labelled with topic A will also be labelled with topic B. (This likelihood is not the same both ways, so technically this was a ‘directed’ network.)

As you’d expect, topics that we know to be related (eg ‘NASA’ and ‘Moon Landing’) were linked in the network, but a more surprising finding is that the topic network seems to have a hierarchical structure:

a large topic like Cars and Automobiles is more likely to link to smaller topics, such as Car Engines and Auto Repair, than to another big one such as Books… Though these features make sense, they can’t be assumed a priori when building a topic graph based only on question co-occurrence. Instead, they are reflections of the developing hierarchy organically reproducing the relationships that we intuitively expect.

Further:

smaller, more specialized topics, such as Freddie Mercury and Brian May, tend to cluster closely together, while larger topics do not tend to do so.

In other words this user-generated data – created as a by-product of people adding and answering questions on Quora – seems, at least partially, to validate the tree-like structure we traditionally assign to knowledge. This ‘tree of knowledge’ is reflected in everything from the way we structure university departments to the way we organise books in libraries.

I’d also expect this model of the data to reveal new connections and new insights that were invisible or suppressed in a more traditional tree structure. Unfortunately Quora hasn’t released the full data set, but these connections can be glimpsed in their visualization of the strength of links between the top 33 topics.

Link strength between the top 33 topics on Quora

Link strength between the top 33 topics on Quora

Overall, the Quora team’s analysis supports the way the we have intuitively structured knowledge as a hierarchical tree with nested topics, but suggests some ways in which that structure falls short or is being eroded. If you’re interested in these issues, David Weinberger’s Everything Is Miscellaneous (Amazon US | Amazon UK) and Too Big to Know (Amazon US | Amazon UK) are great places to explore further.