Tag Archives: social media

Modelling the Quora topic network

Since its launch in 2010, Quora has been a question-and-answer site that actually works, and has managed to attract interesting, intelligent people to answer questions on all sorts of issues from technology, design and work, to food, travel and fitness. The Data Team at Quora have written a fascinating post about their analysis of the strutured topic data that has grown alongside the site itself.

Topics clustering around a Quora question

Topics clustering around a Quora question

When users ask a question on Quora they can add it to multiple ‘topics’, so that it becomes visible to other users who follow those topics. The Data Team looked at how topics overlap around questions, assigning weights based on the likelihood that a question labelled with topic A will also be labelled with topic B. (This likelihood is not the same both ways, so technically this was a ‘directed’ network.)

As you’d expect, topics that we know to be related (eg ‘NASA’ and ‘Moon Landing’) were linked in the network, but a more surprising finding is that the topic network seems to have a hierarchical structure:

a large topic like Cars and Automobiles is more likely to link to smaller topics, such as Car Engines and Auto Repair, than to another big one such as Books… Though these features make sense, they can’t be assumed a priori when building a topic graph based only on question co-occurrence. Instead, they are reflections of the developing hierarchy organically reproducing the relationships that we intuitively expect.


smaller, more specialized topics, such as Freddie Mercury and Brian May, tend to cluster closely together, while larger topics do not tend to do so.

In other words this user-generated data – created as a by-product of people adding and answering questions on Quora – seems, at least partially, to validate the tree-like structure we traditionally assign to knowledge. This ‘tree of knowledge’ is reflected in everything from the way we structure university departments to the way we organise books in libraries.

I’d also expect this model of the data to reveal new connections and new insights that were invisible or suppressed in a more traditional tree structure. Unfortunately Quora hasn’t released the full data set, but these connections can be glimpsed in their visualization of the strength of links between the top 33 topics.

Link strength between the top 33 topics on Quora

Link strength between the top 33 topics on Quora

Overall, the Quora team’s analysis supports the way the we have intuitively structured knowledge as a hierarchical tree with nested topics, but suggests some ways in which that structure falls short or is being eroded. If you’re interested in these issues, David Weinberger’s Everything Is Miscellaneous (Amazon US | Amazon UK) and Too Big to Know (Amazon US | Amazon UK) are great places to explore further.

Mapping the contraception debate on Twitter

This network analysis of Twitter users talking about contraception reveals a heavily US-dominated conversation, with participants clearly divided into Democrat / liberal and Republican / conservative groups, and little interchange between them.

Around 7,500 tweets mentioning ‘contraception’ or ‘birth control’ were collected during a 24-hour period in November last year. The follow relationships were then worked out between all the accounts that had tweeted.

The 'contraception' debate on Twitter

This approach underlines the ability of network analysis to discover online communities and is reminiscent of Lada Adamic’s network map of links between Republican and Democrat blogs in the run-up to the 2004 election. It suggests US politics has grown no less polarised since then, at least around this issue.

Lada Adamic Republican Democrat blogs

Lada Adamic’s famous visual of Democrat and Republican blogs during the 2004 US election

The image below shows the intersection between the Democrat and Republican communities in more detail.

Contraception Twitter network detail

You can download a PDF of the network showing individual account names here (21 MB).