Apache Access Logs Find Spiders by Counting Requests to IP Addresses

If you would like a quick summary of the IP addresses that are beating the **** out of your server by firing lots of requests for quite possibly malicious or otherwise nefarious reasons then try this little bash script:

#!/bin/bash
LOG_FILE=/var/www/vhosts/DOMAIN.co.uk/statistics/logs/access_log
OUT_FILE=/tmp/spider_analysis

#This generates a file with the top 20 IP addresses by number of requests
cat $LOG_FILE | awk '{print $1}' | sort | uniq -c | sort -nr | head -n 20 > $OUT_FILE

echo "Top 20 IP addresses by number of request"
cat $OUT_FILE

#allow for loop to split on new line
IFS_BAK=$IFS
IFS="
"

for i in `cat $OUT_FILE`
do
    COUNT=`echo $i | awk '{print $1}'`
    IP_ADD=`echo $i | awk '{print $2}'`
    echo ""
    echo "---------------------------------"
    echo ""
    echo "$IP_ADD has made $COUNT requests"
    echo "Whois Information"
    whois $IP_ADD 
    #lynx -dump http://who.cc/$IP_ADD # whois was blocked on the server i was using for some reason, use lynx as a work around
    echo ""
    echo "---------------------------------"
    echo ""
done

# set that back
IFS=$IFS_BAK
IFS_BAK=

You would use this to give you some idea of which IPs are hitting the server a lot.

Usually you would expect to see a lot of these being search engines which you likely want to allow. However if you see any domestic or other IP addresses then you may choose to block these.


Tags: securityspiderlinuxapachebashaccessblocklogexaminewhoisbotrobotbad