“It’s a beautiful thing, the destruction of words,” wrote George Orwell in his classic political satire, Nineteen-Eighty Four. Some 60 years later censors in China would be drawing lessons from his dystopian masterpiece, and academics would be mining “big data” for clues about censorship and social trends in China.
This intersection of censorship and big data has been the research interest of Dr. Fu King-wa, Assistant Professor at the Journalism and Media Studies Centre, and the architect of a just-completed project called “Weiboscope.” It compiled the profiles of 350,000 Sina Weibo users with more than 1,000 followers, and using software developed in-house, automatically tracked when a post was made, when it was deleted by censors, and what key words the government used to screen content.
The project collected 226 million messages from over 14 million users. Of these, roughly 10 million messages had been deleted by censors. Dr. Fu also found that 17,594 key words or phrases were subject to a “permission denied” message when searched on the Internet between January and June 2012.
“Studying big data from China is important because the country has no free press, making it hard to gage public opinion and track social trends,” Dr. Fu said. “Traditional media both in China and the West pick up stories late, whereas we can track them almost in real time, particularly protests and government reactions.”
One key finding is that China’s netizens are devising creative ways to get around censors by using code words to mask sensitive subjects on the Internet. Beijing’s response, however, has resulted in some bizarre instances of Orwellian censorship. One case resulted in the Chinese word for “tomato” being blocked, due to its association with Chongqing and the Bo Xilai corruption scandal. “Head nurse” was also blocked as it referred to Wang Lijun, the former Chongqing police chief and associate of Bo Xilai.