Home » Blog » Africa » Studying Twitter and the Moldovan protests

Studying Twitter and the Moldovan protests

Moldova’s parliamentary elections on April 5th appeared to return the governing Communist party to power. Reuters reports that exit polls showed the Communists with 46% of the vote; figures from the board of elections released the following day gave the party a 50% share.

In Moldova, the parliament elects the president. While Vladimir Voronin has served two terms and is ineligible for a third term, there’s a storm brewing over the future presidency – three parties that favor closer ties with the EU say that they won’t form a coalition government with the Communists. After the announcement of preliminary results on April 6th, there were complaints about election fraud, claims that Voronin had packed the voting lists with the names of dead and nonexistent people to keep his party in power.

On Tuesday, April 7th, at least 10,000 protesters took control of the President’s office and of parliament in the capital, Chisinau. Over 100 people were hurt and one killed during demonstrations and the police response. By Wednesday, April 8th, the buildings had been retaken, and almost two hundred opposition protesters were arrested. In the hopes of calming the tense situation, a vote recount will be held Wednesday. But the situation looks difficult to defuse – Moldova is desperately poor, and up to a quarter of the country’s population is working abroad. The result, suggests Mansur Mirovalev in the AP, may be a generation gap between pensioners who traditionally side with Russia in political disputes, and students, who look to the EU and specifically to Romania. Protests continued on Sunday, bringing fewer people into the streets, but suggesting that a vote recount may not be sufficient to calm the situation in Chisinau.


There’s a parallel timeline to this one, focusing on the use of social media in organizing these protests. The Telegraph reported on April 7th that students had used Twitter to organize their protests. My friend and colleague Evgeny Morozov explored the idea in more detail on his blog, net.effect, helping to set off a storm of articles and posts about “the Twitter revolution” in Moldova. Commentators stepped up to debunk the role of Twitter in organizing the protests, leading to an interesting debate between Daniel Bennett – “The Myth of the Moldova Twitter Revolution” – and Morozov – “Moldova’s Twitter Revolution is NOT a Myth“.

While I weighed in on the argument a few days back, I realized that the argument over the role of Twitter in Moldovan activism reminded me of a lot of arguments about new media I’ve had in the past few years, both at Berkman and elsewhere. These arguments tend to be long on examples and storytelling, and very short on data and analysis. That frustration is what led friends and me to start building MediaCloud, a platform designed to study the spread of ideas between the blogosphere and mainstream media. The goal is to help turn media criticism from a largely qualitative to a partly quantitative pursuit.

MediaCloud doesn’t currently track Twitter – we’ve optimized the system to look at blogposts and newspaper articles and make generalizations about the subject matter of those pieces. It’s very hard to perform that sort of text analysis on the very short messages that come through Twitter.

At the same time, Twitter’s a very appealing platform for media research because it promises completeness. When tracking an idea that spreads between blogs (or blogs and mainstream news sources), there will always be sources not tracked by MediaCloud or fully indexed by Google. But Twitter is a self-contained universe – it’s virtually impossible to assemble a set of all webpages that mention the Moldova protests during the past week. Assembling a set of all tweets, on the other hand, is absolutely possible.

A brutally simple and stupid approach to the problem would involve retrieving every tweet for the past week and throwing away the ones that aren’t about Moldova. I decided to sharpen my large, heavy rock slightly and try something marginally less dumb. I decided to grab every tweet that included the #pman tag between Tuesday morning and yesterday afternoon (April 7th – April 12th).


(What follows is geekery about doing research on Twitter, and surprisingly few conclusions about whether Twitter did or didn’t help organize the revolution. If you’re interested in questions about what quantitative research might tell us on platforms like Twitter, it might be fun. If you’re looking for me to tell you whether Evgeny or Daniel is right, you’re out of luck, for the moment at least.)

Grabbing those 32,000 tweets involves talking Twitter’s search engine into doing something it really doesn’t like doing – giving you more than 1500 search results. The trick involves manipulating the “max_id” field in the twitter search URL. Try a search on twitter. Go to the second page of results. You should see a URL that looks something like this:

http://search.twitter.com/search?max_id=1511783811&page=2&q=%23pman&rpp=100

Picking apart the URL:
max_id=1511783811 – Only return results up to tweet #1511783811 in the database
page=2 – Hand over the second page of results
q=%23pman – The query is for the string #pman, encoded to escape the hash
rpp=100 – Give the user 100 results per page

While you can manipulate these variables to your heart’s content, you can’t get more than 100 results per page. And if you retrieve 100 results per page, your results will stop at around 15 pages – the engine, by default, wants to give you only 1500 results on any search. This makes sense from a user perspective – it’s pretty rare that you actually want to read the last 1500 posts that mention the fail whale – but it’s a pain in the ass for researchers.

What you need to do is figure out the approximate tweet ID number that was current when the phenomenon you’re studying was taking place. If you’re a regular twitterer, go to your personal timeline, find a tweet you posted on April 7th, and click on the date to get the ID of the tweet. In the early morning (GMT) of the 7th, the ID for a new tweet was roughly 1468000000 – the URL http://search.twitter.com/search?max_id=1468000000&q=%23pman&rpp=100 retrieves the first four tweets to use the tag #pman, including our Ur-tweet:

evisoft: neata, propun sa utilizam tag-ul #pman pentru mesajele din piata marii adunari nationale

My Romanian’s a little rusty, but Vitalie EÅŸanu appears to be suggesting we use the tag #pman – short for Piata Marii Adunari Nationale, the main square in Chisinau where the protests were slated to begin – in reference to posts about the protests. His post is timestamped 4:40am GMT, suggesting that there were at least some discussions about promoting the protests on Twitter before protesters took to the streets.

Now the key is to grab URLs from Twitter, increasing the max_id variable in steps so that we’re getting all results from the start tweet ID to the current tweet ID. My perl script to do this steps by 10,000 results at a time, scraping the results I get from Twitter (using the Atom feed, not the HTML) and dumping novel results into a database. This seems like a pretty fine-toothed comb to use… but if you want to be comprehensive, it’s important to figure out what maximum “tweet density” is before running your code.


Density of tweets charted against blocks of 100,000 tweets

At some point on Friday, we hit a peak tweet density – 410 of 100,000 tweets included the #pman tag. Had I been scraping results by iterating 100,000 tweets at a time, I would have had four pages of new results – my script is only looking at the first page, so I’d be dropping results. If I ran the script again, I’d try to figure out the maximum tweet density by looking for the moment where the meme was most hyped, try to do a back of the envelope calculation as to an optimum step size and then halve it – that would probably have me using 20,000 steps for this set.

Some things to think about if you’re interested in scraping Twitter:

– There’s a lot of data. On the days for which I have complete data, I saw roughly 7 million tweets a day on weekdays, 6 million on Saturday. (Wednesday: 7087156, Thursday: 6921776, Friday: 6929399, Saturday: 5977967). You’re going to have more luck choosing good tags or keywords and using the search to limit your set than by grabbing all the data. (Smart monkey sharpens rock first.)

– It’s easy to break Twitter. Twitter breaks Twitter all the time – how many other tools can you think of where their semi-official mascot is the symbol of their dysfunction and downtime? So be really nice. Your spider should identify itself as a spider and include your email address so the admins can tell you to back off if it’s hurting the site. My spider waited two seconds per page and took a minute off if it got a 500.

– Use as big a step size as you can without losing data. Bigger steps mean fewer retrieves.

– Please don’t re-spider the #pman tag. I’ll happily share the data with you – which I’ve already done some preprocessing on. It’s here as an .xls file. If you need it in something else like .csv, let me know and I may be able to accommodate you. I plan on updating the set in a couple more days and will post when I do.

So what can you do with 32,107 brief messages in a language you don’t speak? I was curious to see the time distribution of the messages – the tweet density histogram above is a little confusing, because the density of tweets can change from moment to moment on twitter. Here’s how the activity breaks up over time:


Tweets versus time, starting around 04:00 GMT on April 7th, and extending through 21:00 GMT on April 12th

There’s very little activity until midday Moldovan time on Tuesday, at which point the protests are in full swing. By 12:30pm (09:30 GMT), we’re seeing tweets like:

@Moscovici: Protesters are injured. The Presidency building is damaged and protesters get inside. Police prepares to fight back. Chisinau, Moldova #pman

@bunelul: Anticomunist Protesters are injured. The Presidency building is damaged and protesters get inside. Police prepares to fight back. #pman

What’s interesting (to me, at least) is how quickly the Twitter analytics community made it onto the scene. An hour after these first posts about the protests, we see:

@whatthetrend: Why is #pman trending? Help explain why at What The Trend? http://wttrend.com/2173

I’d been operating under the theory that there was some Twitter use during the protests, but that the sustaining interest – the peaks on subsequent weekdays – had as much to do with self-congratulatory Twitterers talking about the revolutionary potential of social media as it did with actual discussions concerning people in Moldova and the Moldovan diaspora. There’s some evidence to support that viewpoint – here’s an exchange from Wednesday night:

4/8/09 23:57:24 1480123179 PaulMaior (Paul Maior) RT @guykawasaki Twitter revolution: 10,000 protesters organize in Moldova via FB, Twitter and texts http://adjix.com/2w9d #pman

4/9/09 0:08:21 1480180156 chriskeating (chriskeating) The people of Moldova say with one voice: IT’S NOT ABOUT F*CKING TWITTER! #pman

But the whole point of quantitative analysis is that you might get a different understanding from the numbers than from reading individual tweets. I was surprised to see a “professionalization” of tweeting as time went on. In other words, on the first day of the use of #pman, we saw an average of 5.87 posts per person using the tag – the sort of volume one might expect from a protester with a mobile phone. By Sunday, we’ve got roughly the same volume of tweets, but a much smaller set of people speaking, and an average of 14.74 tweets per author.


tweets authors mean
Tuesday 3820 651 5.87
Wednesday 6684 1050 6.37
Thursday 7300 643 11.35
Friday 7003 529 13.24
Saturday 4012 275 14.59
Sunday 3288 223 14.74

These numbers are almost certainly the result of a lot of people posting single tweets, and a small group of people posting tons of tweets, a classic long-tail/Pareto distribution. (For the whole set, the mean/median/mode is 16.22/2/1.) On Friday, the most active Twitterer, Zalmox3s, offers 465 posts with the #pman tag. (His posting volume may explain why he’s got only 51 followers. It might be interesting to try to figure out who had the biggest influence on promoting the #pman conversation – perhaps a product of number of #pman tweets and number of followers.)

There’s also quite a bit of staying power within the group – I saw only 1979 unique Twitterers using this tag over the six-day set. On Wednesday, the second day of the tag’s life, 53% of the people who’ve ever used the tag (in this data set) use the tag. That suggests to me that this isn’t so much a viral phenomenon, which keeps adding members, as a community that formed quickly, got some out-of-group attention (including my four #pman tweets :-) and has retained a small, hard-core membership.

My bitter, cynical hope had been to demonstrate that the conversation switched from a small Romanian-language conversation about the actual protest events to a self-congratulation festival in the English-language twittersphere. Good thing we’ve got data to prove me wrong. Using Daniel Steinbock‘s kickass tool TagCrowd, I was able to generate word frequencies for each day’s worth of data. This is a fun hack – unless you remove “stop words” from your data, you’ll get a frequency map telling you that people on Twitter were talking about “I” “me” and “here”. I fed Daniel’s tool a custom English-language stoplist… and his had a Romanian one built in!

On the first day of the tag’s use, the top twenty terms were moldova, rt, chisinau, protesters, live, ro, moscovici, twitter, tv, md, revolution, police, protests, piata, anti-communist, voronin, fost, protest, protv, romania. “rt” – which appears in 908 of the day’s 3820 tweets – is short for “retweet” – it’s a sign that a poster is quoting someone else. On Tuesday, a lot of the tweeting was amplifying reports from people in the square, and “rt” was a common term. By Wednesday, it dropped to fourth in the frequency tables, and to 29th by Thursday – something is clearly changing in the nature of the discourse. Across the board, we see “rt” appear in 1932 of over 32k tweets, or roughly 6% of all posts. On Tuesday, when people were desperate for on-the-ground news, it appeared in almost 24%.

created at TagCrowd.com


TagCrowd for tweets containing #pman on 4/8/2009

I’d expected to see “twitter” emerge as one of the most popular terms by Wednesday or Thursday, and to see the conversation shift into English. Twitter ranks third in term incidence on Wednesday and there’s a bit more English in the word cloud. But by Thursday, Twitter’s out of the top 20 entirely and “comunistii” ranks behind Moldova and Chisinau. So yes, the conversation on Wednesday – the busiest day with over 1,000 authors – included lots of non-Moldovans. But the conversation quickly shifted back to the political standoff. (The word clouds are really interesting, but take up a lot of space – the downloadable data includes the 40 most frequent words for each day.)

Evgeny made the point that tracking the number of Twitter users in Moldova (reported to be under 200) doesn’t adequately show the impact of social media – how do we know if a tweet is reproduced on a blog or on Facebook? What’s the interaction between different types of social media? I don’t have a good answer – I think Evgeny’s right, but I also think it’s a bit of a cop-out. But I thought it would be interesting to see what URLs were referenced the most often in the #pman tweets. Here’s the top ten, plus their incidence:

92 http://revolutiemoldova.islandjewelers.us/ (Romanian-lanaguage blog)
74 http://redkokane.blogspot.com/ (Romanian-lanaguage blog)
59 http://tinyurl.com/dlwvtb (http://www.imarin.net/2009/04/moldova-revolution-2009.html, Romanian-lanaguage blog)
58 http://tinyurl.com/c (probably a broken version of the following URL)
57 http://tinyurl.com/c6zckl (http://pmfu.blogspot.com/2009/04/tinerii-arestati-in-chisinau.html)
51 http://www.pldm.md/ (Party site for the Liberal Democrats in Moldova)
50 http://www.antena3.ro/live.php (Streaming TV in Romanian)
49 http://www.azi.md/ro/story/2146 (Romanian-language news story)
47 http://tinyurl.com/c88rd5 (Romanian-language news story)
45 http://tinyurl.com/cda895 (English translation of a Romanian blog post)

Stories on CNN, the BBC, the New York Times and our battles over whether or not the events of the week are a twitter revolution, an old fashioned revolution or a riot do make it onto the list of URLs, but they don’t rank very highly – the #pman tag is mostly being used by Romanian speakers to share information, both through Romanian news sites and independent blogs. Again, my cynicism is shattered.

There’s a lot more I’d like to do with this data. I think it would be great to do some social graphs and try to track the spread of information. I’d love to work with a Romanian speaker to understand who was seeding and who was amplifying information in this space, and to look at the accuracy of reportorial information on the day of the protests. I plan on capturing data sets for some related tags and keywords – moldova, moldavia, chisinau, voronin, comunistii – and I’d be grateful for suggestions for keywords to study, especially Russian terms.

I’m most interested in comparing this tag to some other Twitter-reported stories. I’m hoping I can retrieve data on #Madagascar before Twitter expires it’s archives – while the Twitterverse was much less attuned to the tensions and coup in that country, it might provide an interesting contrasting case.

What I’m hoping to do – with my colleagues at Berkman and with anyone who wants to join the conversation – is try to figure out a set of techniques and tools we can apply to any breaking events on Twitter. As news breaks, we’ve got the opportunity to capture these conversations and study them at length – seems like an opportunity we should take advantage of and get better at.

Please feel free to download and play with the data yourself. It includes the full set of tweets, the tweets broken up per day, the top authors per day, the most popular terms per day, the top authors through the whole set, and the most popular URLs referenced in the set of tweets. Would love to hear what you come up with as well.

32 thoughts on “Studying Twitter and the Moldovan protests”

  1. Really interesting article from both, a technical point of view and a sociopolitical one. If i may express my opinion , it was a combination of first hand witness accounts , gathering & filtering and RT’ing information , at the same time being under constant SIS attack and attempted manipulation. I actually advised the CIA to take a look at the possibilities that twittering may have in informational “warfare”. If you want to contact me feel free to do so , trough e-mail or via Twitter.

  2. Incredible work, Ethan.
    Now, you must take a Romanian linguist to compile the messages according to their real impact. Such events are not just about numbers. Sometime, a single sentence can have the impact that the whole world culture doesn’t have. I am also curious who triggered this movement. It appears that it was a dirty manipulation … keep tracking this, please.

  3. He he, Zamolx3s, your are tracking me from US or what? we were writing here in the same time … don’t answer here, Ethan might not agree to twitt on his blog ..

  4. My respect for you and your work done researching the #pman tag and my country with its reality and paradoxes.

    But, I would think once again about posting here names, names and names. There are a lot of people in a small country posting under real names and note that the posts related to prostests where made only for reporting and to inform the people around the world of mass disorders and police maltreat. These people could be found and intimidated/arrested.

    Did you know that here, in the republic of Modlova, one young man died already, killed by pollice. There are talks about 2-3 more. …young people disapeared. There are talk of a large, very large for such small country, numeber of arrests and maltreat by the pollice and secret services. (300 to 800 arested and terrorized)

    Check these links (in romanian)
    http://www.jurnaltv.md/?article=2105
    http://www.timpul.md/news/2009/04/13/1580
    there are hundreds of such reports, i don`t want to post a lot here as it will be seen as spam.

    This post+file into police and secret service hand will put pressure on moldavian twitterers.

    Check again and decide if you want to be an accomplice of comunists revealing to them real people names.

    P.S. Sorry for mistakes – my English is not the best.

  5. Pingback: A Fresh Batch of Links for You « Nancy Scola

  6. What a cool article!! The flowcharts and graphs really brought it to life.

    Watched a video the other day about twitter, it noted that “Twitter was a key tool in terms of ‘mobilizing people and shifting around,’ because it allows people to file and read updates via their mobile phones.” Its so true. That is perhaps the best use for this new technology.

    Here’s the link: http://www.newsy.com/videos/twittering_a_revolution/

  7. fantastic article – this is the type of writing that keeps me coming back to wired, not the endless mac vs pc debate.

    Thank you for sharing the data set as well, i’ve only just started in the social research industry and this is something very interesting to look into in my spare time.

  8. I did a little experiment on the day of the events. These were posted at a couple of minutes interval…

    – Following the uprising in Moldova, on Twitter #pman

    – The .md uprising seems big on Twitter. I wonder how much is propaganda. Next 2 tweets are fake!!! It’s bate for the media, ignore them.

    – Russia’s 4th army tanks, based in Tiraspol, are moving towards Chisinau. #Moldova #pman

    – Moldavian navy choppers “engaged to restore order in the capital”. Pictures coming soon. #moldova #pman

    – Ok, just seconds later my tweets are being RT-ed. The ball is rolling like a headless chicken.

    – RT @Ceziceu: @gr stupid fake twitts. no army and navy in #Moldova. #pman

    – I take back what I said. Twitter as news does work. My troll/experiment was quickly uncovered. Happy.

    http://twitter.com/gr

  9. Gabriel gives a great example of data poisoning.
    Anyway, Ethan, thanks for the analysis – and glad you are looking into Twitter measurement/monitoring of activism. I took or rather organized to take a deeper dive into the various Twitter analyzers and metrics floating around out there – and started with a brief taxonomy.

    http://beth.typepad.com/beths_blog/2009/03/twitter-analytics-and-measurement-tools-a-taxonomy.html

    It would be interesting to see a social network analysis of the relationships – and through what weak or strong ties the activity on Twitter started to change. Lots to think about here – thanks again

  10. Pingback: …My heart’s in Accra » Deliver us from Twitter…

  11. Pingback: ICTlogy

  12. Pingback: …My heart’s in Accra » Twitter and social graph analysis

  13. Pingback: links for 2009-04-17 « Participatory TV

  14. Pingback: » El interesado cuento del uso político de twitter

  15. Pingback: » Basta ya!

  16. Pingback: Révolution Twitter en Iran : naissance d’un cliché ? | ElyseeInside.fr

  17. Pingback: El interesado cuento del impacto político de twitter | la broma

  18. Pingback: Briefing: Friday, 17th April 2009 | Firetail

  19. Pingback: Internet, reti sociali e cause « Torino al verde

  20. Pingback: ciberesfera » Blog Archive » links for 2010-01-17

  21. Pingback: How I Use Twitter Without Being Overwhelmed · All The Info You Need

  22. Pingback: Social Marketer | Blog | How I Use Twitter Without Being Overwhelmed

  23. It is interesting that the tag for this post is ‘Africa’ :))

    Otherwise, thanks a lot for this post. I am working on a paper about the interpretations of the Twitter revolution in Moldova and I am trying to shed some light also on how the use of Internet (not Twitter) contributed to such a large turnout in the square. The dataset is definitely very useful, I was waiting for the Library of Congress to make the Twitter archive public, but it turns out your dataset is rich enough. Thanks again.

  24. Pingback: Is Moldova having a Twitter revolution? | Firetail

  25. Pingback: Jedi Mind Twits: #HashTags | The Write Stuff

  26. Pingback: How I Use Twitter Without Being Overwhelmed

  27. Pingback: Jedi Mind Twits: #HashTags « Random Thoughts Random Thoughts

  28. Pingback: Jedi Mind Twits: #HashTags | Random Thoughts Random Thoughts

Comments are closed.