Home » Blog » ICT4D » Visualizing Social Networks… in Excel

Visualizing Social Networks… in Excel

In the spirit of attending OPCs – “other people’s conferences”, conferences where you’re invited, but not part of the demographic/professional group the conference is aimed at – I’m now at the Microsoft Research Faculty Summit. I’m not a computer scientist, not university teaching faculty, and I’m not doing any research sponsored by Microsoft… all of which turns out to be okay, as it’s a pretty interesting gathering looking at current research topics in computer science, with a strong emphasis on the study of social networks… something that interest me, even if I’m not doing a ton of active work on the topic.

This emphasis on social network studies helps explain why I’m currently sitting in a packed conference room, learning about an extension to Excel. Even at Microsoft conferences, Excel extensions don’t usually get this type of attention. But the extension, .NetMap, has been developed by Marc A. Smith, a pioneering researcher on social networks who’s done important work on analyzing relationships in Usenet groups in his time at Microsoft Research.

Much of Marc’s recent work has looked at behavioral patterns in technical support newsgroups in Usenet. As it turns out, these groups are still hugely important for people looking for technical support (even in the days of pervasive spam) and Microsoft is interested in cultivating the utility of these networks. Rather than analyzing the content of these newsgroups (hard to do, as they’re huge), Smith and his team looked at structures. They did a great deal of network mapping, graphing the posts and responses, and seeing the structures that emerge. At least three types have emerged:

– Answer people – these people almost never post new threads, but answer the queries of a large number of unconnected people. In network terms, they’ve got high out-degree and low in-degree. These folks are utterly essential in the functioning of technical newsgroups, as they’re the folks that newbies end up getting support from

– Reply magnets – some people have a gift (or a technique) for posting in a way that gets responses. Reply magnets are the opposite of answer people – they post infrequently and everyone answers. Smith sees roughly 0.5% of these people in newsgroups, but their posts get 30% of the responses from roughly 30% of all users. Basically, these folks are specialists in setting the agenda, which has interesting implications for political discussions in newsgroups, as these folks are capable of nominating agenda topics with much more success that the average user.

– Discussion people both post and answer a lot, and have long, sustained connections with lots of people. They’re the classic discussion group user, but they’re less common that we tend to assume.

If we can ennumerate these discussion types, we can characterize different ecosystems in terms of what users live in what ecosystems. It’s possible that these roles change over time – so far, Marc observes that most people seem to stay in their roles, but attenuate over time, becoming less active. It would be very interesting to see whether there are networks where people become more interactive over time. (Facebook, for instance.)

Smith observes that as social media becomes the dominant media online, we’re moving from the anonymous to the “named” internet – content created generally has an identity, real or psuedonymous, attached to it. As such, we’re getting incredible sets of data that social scientists can study, because “all social media leaves ties”, and “our relationships are increasingly self documenting.”

Screenshot from .NetMap

Here’s the thing – it’s increasingly easy to find this data, but hard to map it in meaningful ways. Smith observes that there are a couple of good Java toolkits for social network mapping but, oddly, no feature in Excel. So he and his group have built one. Using their tool – .NetMap, which can be downloaded at Codeplex, Microsoft’s Source Forge-like repository for open source projects, plugs into Excel and lets you enter a list of relationships, and get output as a network map. The tool is integrated with Windows to provide one of the coolest demonstration feature – the tool will index your mail and graph your personal social network based on your mail interactions.

One thing that becomes very clear is that you want to filter these maps – with some pretty simple excel manipulations, it’s possible to filter a map to the strongest ties and to visualize the vertices in different ways. As Marc gives his talk, one of his collaborators crunches a set of data from Digg and is able to demonstrate that there aren’t small, competing groups within Digg who upvote on only certain topics – what there is instead is a core of highly active users who tend to upvote across different topics.

I’m looking forward to using the tool, but a bit disappointed that it currently works only on Windows – I suspect a lot of social scientists are using alternative platforms, and hope that as the project moves out of the research space and into the mainstream, it will be more widely supported.