IP blocking

Got an interesting email from Anonymizer today. I used to be a subscriber to their service and this seemed like an interesting offering.

What is IP Blocking?
Because IP addresses are public and attributable, it's easy for Web site administrators to know who visits their site. When you conduct online research, you share potentially confidential information each time you visit a competitor's Web site and reveal your focus of interest.

Furthermore, any target site that recognizes visitors as belonging to a "competitor" can block access, or worse redirect you to cloaked sites designed to display false or outdated information created specifically to mislead and spoil your research.

Even if you are using a non-attributable IP address from Anonymous Surfing™, the volume and pattern of your traffic will raise a red flag of suspect activities to Web administrators who would then be able to block you out.

5 Best Practices for Conducting Competitive Intelligence & Data Harvesting Online

1. Spread traffic across as many days as possible, and at least over a 24 hour period. This keeps the instances of IP addresses seen in the Web analytic logs to a minimum.

2. Spread traffic across many IP addresses. If you are going to connect to the same site repetitively or use robots to harvest data, you need more than a handful of IP addresses. Web administrators will quickly be able to recognize a pattern and block your IP’s from accessing their site.

3. Harvesting tools should be configured to extract text data only (unless images are required which in this case a custom system may be needed) as well as unneeded links.

4. Limit the amount of data transfer on a per minute basis. Harvesting activity should be configured to resemble an actual user going through a browser rather than an automated tool.

5. Connect to the target using the actual name of the Web site rather than its IP address. When Web site administrators look at their log files this type of connection will stand out since most people using a browser will put in the site name rather than the IP address.