data mining

Gaydar - MIT Social Network Mining Predicts Sexuality

Using data from the social network Facebook, they made a striking discovery: just by looking at a person’s online friends, they could predict whether the person was gay. They did this with a software program that looked at the gender and sexuality of a person’s friends and, using statistical analysis, made a prediction. The two students had no way of checking all of their predictions, but based on their own knowledge outside the Facebook world, their computer program appeared quite accurate for men, they said. People may be effectively “outing” themselves just by the virtual company they keep.

This sort of data mining is becoming increasingly popular and the spread of social networks means we give more and more about ourselves away either knowingly or unknowingly. I had not even considered the power to profile people in various ways without even looking at the contents of their profile but simply looking at their friends. Of course, the best results were a hybrid of the two methods, but still, the predictive power of such a thing is a marketer/government's dream and someone concerned with privacy's nightmare. Now, for most things there are simple recommendations for how you conduct yourself online that can help protect you, but this one I cannot really think of one. Does anyone have idea?

Full Story

IP blocking

Got an interesting email from Anonymizer today. I used to be a subscriber to their service and this seemed like an interesting offering.

What is IP Blocking?
Because IP addresses are public and attributable, it's easy for Web site administrators to know who visits their site. When you conduct online research, you share potentially confidential information each time you visit a competitor's Web site and reveal your focus of interest.

Furthermore, any target site that recognizes visitors as belonging to a "competitor" can block access, or worse redirect you to cloaked sites designed to display false or outdated information created specifically to mislead and spoil your research.

Even if you are using a non-attributable IP address from Anonymous Surfing™, the volume and pattern of your traffic will raise a red flag of suspect activities to Web administrators who would then be able to block you out.

5 Best Practices for Conducting Competitive Intelligence & Data Harvesting Online

1. Spread traffic across as many days as possible, and at least over a 24 hour period. This keeps the instances of IP addresses seen in the Web analytic logs to a minimum.

2. Spread traffic across many IP addresses. If you are going to connect to the same site repetitively or use robots to harvest data, you need more than a handful of IP addresses. Web administrators will quickly be able to recognize a pattern and block your IP’s from accessing their site.

'I've Got Nothing to Hide' and Other Misunderstandings of Privacy

Abstract
In this short essay, written for a symposium in the San Diego Law Review, Professor Daniel Solove examines the nothing to hide argument. When asked about government surveillance and data mining, many people respond by declaring: I've got nothing to hide. According to the nothing to hide argument, there is no threat to privacy unless the government uncovers unlawful activity, in which case a person has no legitimate justification to claim that it remain private. The nothing to hide argument and its variants are quite prevalent, and thus are worth addressing. In this essay, Solove critiques the nothing to hide argument and exposes its faulty underpinnings.

This short, 25 page paper covers more than a year's worth of newspaper articles and blog postings. Rarely do we see such a good discourse about a complex topic like privacy. This isn't a sensationalist piece like many articles out there. The author, A Professor Solove, even makes a point not to be sensationalist. I would quote the whole article if I were allowed, it was that good. But for the author's sake and for yours I will just share a few quotes I especially liked.