A. ManageFlitter Click-Through Analytics & Bots
The ManageFlitter PowerPost feature allows our users to schedule Tweets for optimal times and track click-throughs on links in the scheduled tweets.
We have now integrated bot detection mechanisms that will detect and exclude most bots from the click-through analytic metrics, resulting in a more accurate measure for our customers (see analytics graph below).
Let's take a closer look at a real case study to see the impact of bots and crawlers on Tweet analytics.
Following is a Tweet that we recently tweeted out via PowerPost on the @manageflitter Twitter account.
Twitter's Stock Is Going Crazy Again. Here Are A Few Ideas On Why. http://t.co/g8pVs8kaEm— ManageFlitter (@ManageFlitter) December 26, 2013
As you can see a single link that we Tweeted recently attracted a total of 140 clicks. 118 of those clicks, or a whopping 84% have been identified as bots, leaving us with 22 genuine "human" clicks.
Let us drill down further and determine what kind of bots generated those 118 clicks.
Here are the "user agent" strings of a handful of bots that ManageFlitter detected:
|User Agent||Clicks||What is it?|
|Empty user agent||52||Who knows!|
|Ruby||4||A bot written in Ruby language|
|Twitterbot/1.0||3||Twitter bots checking posted links|
|Apache-HttpCnt/4.2.3 (java 1.5)||2||A bot written in Java, possibly Android app|
|Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)||1|
|Mozilla/5.0 (compatible; Embedly/0.2; +http://support.embed.ly/)||1||Embed.ly|
|MetaURI API/2.0 +metauri.com||1||metauri.com|
|Readability/x68xx2 - http://readability.com/about/||1||readability.com|
|python-requests/1.2.0 CPython/2.7.3 Linux/3.2.0-41-virtual||1||A bot written in Python using the requests library|
|PycURL/7.19.5||1||A bot written in Python using PycURL|
Two examples of legitimate user agents are:
Chrome version 32, installed on a 64-bit version of Windows,
- Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36
- Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:25.0) Gecko/20100101 Firefox/25.0
Moving forward we are going to continue to improve our bot detection methods to identify more bots and improve accuracy.
B. Bots: What & Why
Humans are not the only ones browsing and clicking on links on the internet! There are programmed bots and scripts quietly crawling from one page to another carefully reading, analysing and collecting information from tens of pages in a blink of an eye.
In fact, a recent study by Incapsula found that bots (or "non-human" traffic) make up 61.5% of all website traffic. That's a 21% increase from the last year's figures. Human traffic only makes up 38.5% of the traffic.
The figures were calculated from a sample data set collected by Incapsula which included 1.45 billion visits to a group of 20,000 websites over a time period of 90 days.
Report image courtesy of Incapsula
C. Friendly Bots - Serve Us Humans!
So what do these bots do and why are there so many of them? "Bot" is a rather general term. Google or more precisely Googlebot is undoubtledly the most famous and the most powerful bot on the internet. Googlebots are constantly browsing webpages on the internet to update the Google search index.
Another group of bots is those belonging to modern web services or mobile applications. For example bookmarking services may occasionally visit the bookmarked URLs to check if the link is still available or to simply fetch the page title.
Some mobile applications have bots that fetch content and format and display it to the user in a different way than the original. Some websites use bots to scrape pricing information about various products to find bargains or to monitor the price of a particular product.
Another popular use case comes from the SEO-related tools and services that need to frequently check websites for changes in SEO metrics or to monitor website health.
Even though this group of bots contribute to the total volume of "non-human" traffic, they still largely - although indirectly - represent human usage of the web and its corresponding services.
After all most of those bots are working to serve us humans in one way or another. This group of bots are likely the most significant cause of the 21% increase in the percentage of non-human traffic from last year as found by Incapsula.
D. Evil Bots - "Great blog post! here's a link to my website ... "
Spammers create bots that frequently search for websites where they can post a spam comment and perhaps place a link to their websites somewhere on the page. Spam bots have evolved during the last few years and some of them are capable of performing very sophisticated tricks to appear as human-like as possible and to circumvent spam detection mechanisms.
For example they may be equipped with technologies to automatically solve CAPTCHA or other anti-spam challenges, they will utilise a large pool of IP addresses scattered across many physical locations to avoid getting blocked, some will even deliberately operate at slower speeds to simulate the usage pattern of actual human users because some anti-spam techniques rely on the fact that most spam bots operate tens or hundreds of times faster than humans.
The good news is that spam bots have become less popular, down to 0.5% of the traffic from 2% in the previous year ( http://www.incapsula.com/blog/bot-traffic-report-2013.html ). This change has been attributed to Google's efforts in fighting spam by periodically tweaking their algorithms, making it more expensive and less rewarding for the spammers to stay in business.
E. Bots - An Even Darker side
If you think spammer bots are scary, you probably aren't aware of hacking-tool bots. These bots are not necessarily always used for malicious activities, they may be used as a tool to improve security but they are popular and frequently used by cyber criminals.
These bots scan websites looking for outdated and vulnerable versions of software that they can exploit, they would most likely generate a report or notify a human when they come across something interesting but sophisticated ones may be capable of taking a step further and automatically exploiting the vulnerabilities that they find.
They will peek inside the sitemap files and the robots.txt looking for pages that you specifically didn't want indexed. They will find login forms and password protected pages and try common username and passwords to see if they can get in, at the same time scooping up any email addresses that they may come across in the journey.
Friendly and ethical bots identify themselves as bots using a "User-Agent string" which is a piece of text that describes what kind of web browser or tool is fetching the page. For example when Googlebot visits a page the user-agent string would say something like "Googlebot/2.1 (+http://www.google.com/bot.html)".
On the other hand there are malicious bots who feel no obligation to send a truthful user-agent string along with their requests. They will simply lie about who they are.
For example a malicious bot could use Googlebot's user-agent to pretend to be Googlebot. Typically these bots have a large list of user-agents that they pick from, either on a random basis or more sophisticated bots may pick a specific user-agent depending on what they want to achieve.
These bots cannot be easily detected and are a source of problem for companies and individuals that are interested in keeping track of the level of attention their webpages are getting.
As mentioned earlier the volume of bot activity has now reached a level that you can no longer ignore the bots. If you see 1,000 clicks on the links that you are publishing 600 of them may very well be coming from bots, resulting in inflated and inaccurate metrics.
Hopefully the above information helps provide an understanding of the hazy world of machines acting like humans.
Appropriately we can end off on this quote by Eric Schmidt - the executive chairman of Google:
"The race is between computers and people and the people need to win."