How to count occurrences of text in a file?












18

















I have a log file sorted by IP addresses,
I want to find the number of occurrences of each unique IP address.
How can I do this with bash? Possibly listing the number of occurrences next to an ip, such as:



5.135.134.16 count: 5
13.57.220.172: count 30
18.206.226 count:2


and so on.



Here’s a sample of the log:



5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:56 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:06 -0400] "POST /wp-login.php HTTP/1.1" 200 3985 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:08 -0400] "POST /wp-login.php HTTP/1.1" 200 3833 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:09 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:11 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:12 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:15 -0400] "POST /wp-login.php HTTP/1.1" 200 3837 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:17 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] "GET / HTTP/1.1" 200 25160 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"









share|improve this question




















  • 1





    With “bash”, do you mean the plain shell or the command line in general?

    – dessert
    Mar 28 at 21:55






  • 1





    Do you have any database software available to use?

    – SpacePhoenix
    2 days ago






  • 1





    Related

    – Julien Lopez
    2 days ago













  • The log is from an appache2 server, not really a database. bash is what I would prefer, in a general use case. I see the python and perl solutions, if they are good for someone else, that is great. the initial sorting was done with sort -V though I think that wasn't required. I sent the top 10 abusers of the login page to the system admin with recommendations for banning respective subnets. for example, One IP hit the login page over 9000 times. that IP, & its class D subnet is now blacklisted. I'm sure we could automate this, though that is a different question.

    – j0h
    7 hours ago
















18

















I have a log file sorted by IP addresses,
I want to find the number of occurrences of each unique IP address.
How can I do this with bash? Possibly listing the number of occurrences next to an ip, such as:



5.135.134.16 count: 5
13.57.220.172: count 30
18.206.226 count:2


and so on.



Here’s a sample of the log:



5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:56 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:06 -0400] "POST /wp-login.php HTTP/1.1" 200 3985 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:08 -0400] "POST /wp-login.php HTTP/1.1" 200 3833 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:09 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:11 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:12 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:15 -0400] "POST /wp-login.php HTTP/1.1" 200 3837 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:17 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] "GET / HTTP/1.1" 200 25160 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"









share|improve this question




















  • 1





    With “bash”, do you mean the plain shell or the command line in general?

    – dessert
    Mar 28 at 21:55






  • 1





    Do you have any database software available to use?

    – SpacePhoenix
    2 days ago






  • 1





    Related

    – Julien Lopez
    2 days ago













  • The log is from an appache2 server, not really a database. bash is what I would prefer, in a general use case. I see the python and perl solutions, if they are good for someone else, that is great. the initial sorting was done with sort -V though I think that wasn't required. I sent the top 10 abusers of the login page to the system admin with recommendations for banning respective subnets. for example, One IP hit the login page over 9000 times. that IP, & its class D subnet is now blacklisted. I'm sure we could automate this, though that is a different question.

    – j0h
    7 hours ago














18












18








18


6








I have a log file sorted by IP addresses,
I want to find the number of occurrences of each unique IP address.
How can I do this with bash? Possibly listing the number of occurrences next to an ip, such as:



5.135.134.16 count: 5
13.57.220.172: count 30
18.206.226 count:2


and so on.



Here’s a sample of the log:



5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:56 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:06 -0400] "POST /wp-login.php HTTP/1.1" 200 3985 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:08 -0400] "POST /wp-login.php HTTP/1.1" 200 3833 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:09 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:11 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:12 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:15 -0400] "POST /wp-login.php HTTP/1.1" 200 3837 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:17 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] "GET / HTTP/1.1" 200 25160 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"









share|improve this question


















I have a log file sorted by IP addresses,
I want to find the number of occurrences of each unique IP address.
How can I do this with bash? Possibly listing the number of occurrences next to an ip, such as:



5.135.134.16 count: 5
13.57.220.172: count 30
18.206.226 count:2


and so on.



Here’s a sample of the log:



5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:56 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:06 -0400] "POST /wp-login.php HTTP/1.1" 200 3985 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:08 -0400] "POST /wp-login.php HTTP/1.1" 200 3833 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:09 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:11 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:12 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:15 -0400] "POST /wp-login.php HTTP/1.1" 200 3837 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:17 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] "GET / HTTP/1.1" 200 25160 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"






command-line bash sort uniq






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 28 at 22:25









dessert

25.3k673107




25.3k673107










asked Mar 28 at 21:51









j0hj0h

6,5621657121




6,5621657121








  • 1





    With “bash”, do you mean the plain shell or the command line in general?

    – dessert
    Mar 28 at 21:55






  • 1





    Do you have any database software available to use?

    – SpacePhoenix
    2 days ago






  • 1





    Related

    – Julien Lopez
    2 days ago













  • The log is from an appache2 server, not really a database. bash is what I would prefer, in a general use case. I see the python and perl solutions, if they are good for someone else, that is great. the initial sorting was done with sort -V though I think that wasn't required. I sent the top 10 abusers of the login page to the system admin with recommendations for banning respective subnets. for example, One IP hit the login page over 9000 times. that IP, & its class D subnet is now blacklisted. I'm sure we could automate this, though that is a different question.

    – j0h
    7 hours ago














  • 1





    With “bash”, do you mean the plain shell or the command line in general?

    – dessert
    Mar 28 at 21:55






  • 1





    Do you have any database software available to use?

    – SpacePhoenix
    2 days ago






  • 1





    Related

    – Julien Lopez
    2 days ago













  • The log is from an appache2 server, not really a database. bash is what I would prefer, in a general use case. I see the python and perl solutions, if they are good for someone else, that is great. the initial sorting was done with sort -V though I think that wasn't required. I sent the top 10 abusers of the login page to the system admin with recommendations for banning respective subnets. for example, One IP hit the login page over 9000 times. that IP, & its class D subnet is now blacklisted. I'm sure we could automate this, though that is a different question.

    – j0h
    7 hours ago








1




1





With “bash”, do you mean the plain shell or the command line in general?

– dessert
Mar 28 at 21:55





With “bash”, do you mean the plain shell or the command line in general?

– dessert
Mar 28 at 21:55




1




1





Do you have any database software available to use?

– SpacePhoenix
2 days ago





Do you have any database software available to use?

– SpacePhoenix
2 days ago




1




1





Related

– Julien Lopez
2 days ago







Related

– Julien Lopez
2 days ago















The log is from an appache2 server, not really a database. bash is what I would prefer, in a general use case. I see the python and perl solutions, if they are good for someone else, that is great. the initial sorting was done with sort -V though I think that wasn't required. I sent the top 10 abusers of the login page to the system admin with recommendations for banning respective subnets. for example, One IP hit the login page over 9000 times. that IP, & its class D subnet is now blacklisted. I'm sure we could automate this, though that is a different question.

– j0h
7 hours ago





The log is from an appache2 server, not really a database. bash is what I would prefer, in a general use case. I see the python and perl solutions, if they are good for someone else, that is great. the initial sorting was done with sort -V though I think that wasn't required. I sent the top 10 abusers of the login page to the system admin with recommendations for banning respective subnets. for example, One IP hit the login page over 9000 times. that IP, & its class D subnet is now blacklisted. I'm sure we could automate this, though that is a different question.

– j0h
7 hours ago










8 Answers
8






active

oldest

votes


















13














You can use grep and uniq for the list of addresses, loop over them and grep again for the count:



for i in $(<log grep -o '^[^ ]*' | uniq); do
printf '%s count %dn' "$i" $(<log grep -c "$i")
done


grep -o '^[^ ]*' outputs every character from the beginning (^) until the first space of each line, uniq removes repeated lines, thus leaving you with a list of IP addresses. Thanks to command substitution, the for loop loops over this list printing the currently processed IP followed by “ count ” and the count. The latter is computed by grep -c, which counts the number of lines with at least one match.



Example run



$ for i in $(<log grep -o '^[^ ]*'|uniq);do printf '%s count %dn' "$i" $(<log grep -c "$i");done
5.135.134.16 count 5
13.57.220.172 count 9
13.57.233.99 count 1
18.206.226.75 count 2
18.213.10.181 count 3





share|improve this answer





















  • 12





    This solution iterates over the input file repeatedly, once for each IP address, which will be very slow if the file is large. The other solutions using uniq -c or awk only need to read the file once,

    – David
    Mar 29 at 1:56






  • 1





    @David this is true, but this would have been my first go at it as well, knowing that grep counts. Unless performance is measurably a problem... dont prematurely optimize?

    – D. Ben Knoble
    2 days ago






  • 3





    I would not call it a premature optimization, given that the more efficient solution is also simpler, but to each their own.

    – David
    2 days ago



















35














You can use cut and uniq tools:



cut -d ' ' -f1 test.txt  | uniq -c
5 5.135.134.16
9 13.57.220.172
1 13.57.233.99
2 18.206.226.75
3 18.213.10.181


Explanation :





  • cut -d ' ' -f1 : extract first field (ip address)


  • uniq -c : report repeated lines and display the number of occurences






share|improve this answer










New contributor




Mikael Flora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 6





    One could use sed, e.g. sed -E 's/ *(S*) *(S*)/2 count: 1/' to get the output exactly like OP wanted.

    – dessert
    Mar 28 at 22:22






  • 2





    This should be the accepted answer, as the one by dessert needs to read the file repeatedly so is much slower. And you can easily use sort file | cut .... in case you're not sure if the file is already sorted.

    – Guntram Blohm
    2 days ago



















13














If you don't specifically require the given output format, then I would recommend the already posted cut + uniq based answer



If you really need the given output format, a single-pass way to do it in Awk would be



awk '{c[$1]++} END{for(i in c) print i, "count: " c[i]}' log


This is somewhat non-ideal when the input is already sorted since it unnecessarily stores all the IPs into memory - a better, though more complicated, way to do it in the pre-sorted case (more directly equivalent to uniq -c) would be:



awk '
NR==1 {last=$1}
$1 != last {print last, "count: " c[last]; last = $1}
{c[$1]++}
END {print last, "count: " c[last]}
'


Ex.



$ awk 'NR==1 {last=$1} $1 != last {print last, "count: " c[last]; last = $1} {c[$1]++} END{print last, "count: " c[last]}' log
5.135.134.16 count: 5
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3





share|improve this answer


























  • it would be easy to change the cut + uniq based answer with sed to appear in the demanded format.

    – Peter A. Schneider
    2 days ago











  • @PeterA.Schneider yes it would - I believe that was already pointed out in comments to that answer

    – steeldriver
    2 days ago













  • Ah, yes, I see.

    – Peter A. Schneider
    2 days ago



















8














Here is one possible solution:





IN_FILE="file.log"
for IP in $(awk '{print $1}' "$IN_FILE" | sort -u)
do
echo -en "${IP}tcount: "
grep -c "$IP" "$IN_FILE"
done



  • replace file.log with the actual file name.

  • the command substitution expression $(awk '{print $1}' "$IN_FILE" | sort -u) will provide a list of the unique values of the first column.

  • then grep -c will count each of these values within the file.




$ IN_FILE="file.log"; for IP in $(awk '{print $1}' "$IN_FILE" | sort -u); do echo -en "${IP}tcount: "; grep -c "$IP" "$IN_FILE"; done
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
5.135.134.16 count: 5





share|improve this answer





















  • 1





    Prefer printf...

    – D. Ben Knoble
    2 days ago






  • 1





    This means you need to process the entire file multiple times. Once to get the list of IPs and then once more for each of the IPs you find.

    – terdon
    2 days ago



















5














Some Perl:



$ perl -lae '$k{$F[0]}++; }{ print "$_ count: $k{$_}" for keys(%k)' log 
13.57.233.99 count: 1
18.206.226.75 count: 2
13.57.220.172 count: 9
5.135.134.16 count: 5
18.213.10.181 count: 3


This is the same idea as Steeldriver's awk approach, but in Perl. The -a causes perl to automatically split each input line into the array @F, whose first element (the IP) is $F[0]. So, $k{$F[0]}++ will create the hash %k, whose keys are the IPs and whose values are the number of times each IP was seen. The }{ is funky perlspeak for "do the rest at the very end, after processing all input". So, at the end, the script will iterate over the keys of the hash and print the current key ($_) along with its value ($k{$_}).



And, just so people don't think that perl forces you to write script that look like cryptic scribblings, this is the same thing in a less condensed form:



perl -e '
while (my $line=<STDIN>){
@fields = split(/ /, $line);
$ip = $fields[0];
$counts{$ip}++;
}
foreach $ip (keys(%counts)){
print "$ip count: $counts{$ip}n"
}' < log





share|improve this answer































    4














    Maybe this is not what the OP want; however, if we know that the IP address length will be limited to 15 characters, a quicker way to display the counts with unique IPs from a huge log file can be achieved using uniq command alone:



    $ uniq -w 15 -c log

    5 5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] ...
    9 13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] ...
    1 13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] ...
    2 18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] ...
    3 18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] ...


    Options:



    -w N compares no more than N characters in lines



    -c will prefix lines by the number of occurrences



    Alternatively, For exact formatted output I prefer awk (should also work for IPV6 addresses), ymmv.



    $ awk 'NF { print $1 }' log | sort -h | uniq -c | awk '{printf "%s count: %dn", $2,$1 }'

    5.135.134.16 count: 5
    13.57.220.172 count: 9
    13.57.233.99 count: 1
    18.206.226.75 count: 2
    18.213.10.181 count: 3


    Note that uniq won't detect repeated lines in the input file if they are not adjacent, so it may be necessary to sort the file.






    share|improve this answer










    New contributor




    Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.
















    • 1





      Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.

      – Martin Thornton
      2 days ago













    • I like it, I didnt know uniq could count!

      – j0h
      13 hours ago



















    1














    FWIW, Python 3:



    from collections import Counter

    with open('sample.log') as file:
    counts = Counter(line.split()[0] for line in file)

    for ip_address, count in counts.items():
    print('%-15s count: %d' % (ip_address, count))


    Output:



    13.57.233.99     count: 1
    18.213.10.181 count: 3
    5.135.134.16 count: 5
    18.206.226.75 count: 2
    13.57.220.172 count: 9





    share|improve this answer

































      0














      cut -f1 -d- my.log | sort | uniq -c


      Explanation: Take the first field of my.log splitting on dashes - and sort it. uniq needs sorted input. -c tells it to count occurrences.






      share|improve this answer










      New contributor




      PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





















        Your Answer








        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "89"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1129521%2fhow-to-count-occurrences-of-text-in-a-file%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        8 Answers
        8






        active

        oldest

        votes








        8 Answers
        8






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        13














        You can use grep and uniq for the list of addresses, loop over them and grep again for the count:



        for i in $(<log grep -o '^[^ ]*' | uniq); do
        printf '%s count %dn' "$i" $(<log grep -c "$i")
        done


        grep -o '^[^ ]*' outputs every character from the beginning (^) until the first space of each line, uniq removes repeated lines, thus leaving you with a list of IP addresses. Thanks to command substitution, the for loop loops over this list printing the currently processed IP followed by “ count ” and the count. The latter is computed by grep -c, which counts the number of lines with at least one match.



        Example run



        $ for i in $(<log grep -o '^[^ ]*'|uniq);do printf '%s count %dn' "$i" $(<log grep -c "$i");done
        5.135.134.16 count 5
        13.57.220.172 count 9
        13.57.233.99 count 1
        18.206.226.75 count 2
        18.213.10.181 count 3





        share|improve this answer





















        • 12





          This solution iterates over the input file repeatedly, once for each IP address, which will be very slow if the file is large. The other solutions using uniq -c or awk only need to read the file once,

          – David
          Mar 29 at 1:56






        • 1





          @David this is true, but this would have been my first go at it as well, knowing that grep counts. Unless performance is measurably a problem... dont prematurely optimize?

          – D. Ben Knoble
          2 days ago






        • 3





          I would not call it a premature optimization, given that the more efficient solution is also simpler, but to each their own.

          – David
          2 days ago
















        13














        You can use grep and uniq for the list of addresses, loop over them and grep again for the count:



        for i in $(<log grep -o '^[^ ]*' | uniq); do
        printf '%s count %dn' "$i" $(<log grep -c "$i")
        done


        grep -o '^[^ ]*' outputs every character from the beginning (^) until the first space of each line, uniq removes repeated lines, thus leaving you with a list of IP addresses. Thanks to command substitution, the for loop loops over this list printing the currently processed IP followed by “ count ” and the count. The latter is computed by grep -c, which counts the number of lines with at least one match.



        Example run



        $ for i in $(<log grep -o '^[^ ]*'|uniq);do printf '%s count %dn' "$i" $(<log grep -c "$i");done
        5.135.134.16 count 5
        13.57.220.172 count 9
        13.57.233.99 count 1
        18.206.226.75 count 2
        18.213.10.181 count 3





        share|improve this answer





















        • 12





          This solution iterates over the input file repeatedly, once for each IP address, which will be very slow if the file is large. The other solutions using uniq -c or awk only need to read the file once,

          – David
          Mar 29 at 1:56






        • 1





          @David this is true, but this would have been my first go at it as well, knowing that grep counts. Unless performance is measurably a problem... dont prematurely optimize?

          – D. Ben Knoble
          2 days ago






        • 3





          I would not call it a premature optimization, given that the more efficient solution is also simpler, but to each their own.

          – David
          2 days ago














        13












        13








        13







        You can use grep and uniq for the list of addresses, loop over them and grep again for the count:



        for i in $(<log grep -o '^[^ ]*' | uniq); do
        printf '%s count %dn' "$i" $(<log grep -c "$i")
        done


        grep -o '^[^ ]*' outputs every character from the beginning (^) until the first space of each line, uniq removes repeated lines, thus leaving you with a list of IP addresses. Thanks to command substitution, the for loop loops over this list printing the currently processed IP followed by “ count ” and the count. The latter is computed by grep -c, which counts the number of lines with at least one match.



        Example run



        $ for i in $(<log grep -o '^[^ ]*'|uniq);do printf '%s count %dn' "$i" $(<log grep -c "$i");done
        5.135.134.16 count 5
        13.57.220.172 count 9
        13.57.233.99 count 1
        18.206.226.75 count 2
        18.213.10.181 count 3





        share|improve this answer















        You can use grep and uniq for the list of addresses, loop over them and grep again for the count:



        for i in $(<log grep -o '^[^ ]*' | uniq); do
        printf '%s count %dn' "$i" $(<log grep -c "$i")
        done


        grep -o '^[^ ]*' outputs every character from the beginning (^) until the first space of each line, uniq removes repeated lines, thus leaving you with a list of IP addresses. Thanks to command substitution, the for loop loops over this list printing the currently processed IP followed by “ count ” and the count. The latter is computed by grep -c, which counts the number of lines with at least one match.



        Example run



        $ for i in $(<log grep -o '^[^ ]*'|uniq);do printf '%s count %dn' "$i" $(<log grep -c "$i");done
        5.135.134.16 count 5
        13.57.220.172 count 9
        13.57.233.99 count 1
        18.206.226.75 count 2
        18.213.10.181 count 3






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Mar 28 at 23:11

























        answered Mar 28 at 22:08









        dessertdessert

        25.3k673107




        25.3k673107








        • 12





          This solution iterates over the input file repeatedly, once for each IP address, which will be very slow if the file is large. The other solutions using uniq -c or awk only need to read the file once,

          – David
          Mar 29 at 1:56






        • 1





          @David this is true, but this would have been my first go at it as well, knowing that grep counts. Unless performance is measurably a problem... dont prematurely optimize?

          – D. Ben Knoble
          2 days ago






        • 3





          I would not call it a premature optimization, given that the more efficient solution is also simpler, but to each their own.

          – David
          2 days ago














        • 12





          This solution iterates over the input file repeatedly, once for each IP address, which will be very slow if the file is large. The other solutions using uniq -c or awk only need to read the file once,

          – David
          Mar 29 at 1:56






        • 1





          @David this is true, but this would have been my first go at it as well, knowing that grep counts. Unless performance is measurably a problem... dont prematurely optimize?

          – D. Ben Knoble
          2 days ago






        • 3





          I would not call it a premature optimization, given that the more efficient solution is also simpler, but to each their own.

          – David
          2 days ago








        12




        12





        This solution iterates over the input file repeatedly, once for each IP address, which will be very slow if the file is large. The other solutions using uniq -c or awk only need to read the file once,

        – David
        Mar 29 at 1:56





        This solution iterates over the input file repeatedly, once for each IP address, which will be very slow if the file is large. The other solutions using uniq -c or awk only need to read the file once,

        – David
        Mar 29 at 1:56




        1




        1





        @David this is true, but this would have been my first go at it as well, knowing that grep counts. Unless performance is measurably a problem... dont prematurely optimize?

        – D. Ben Knoble
        2 days ago





        @David this is true, but this would have been my first go at it as well, knowing that grep counts. Unless performance is measurably a problem... dont prematurely optimize?

        – D. Ben Knoble
        2 days ago




        3




        3





        I would not call it a premature optimization, given that the more efficient solution is also simpler, but to each their own.

        – David
        2 days ago





        I would not call it a premature optimization, given that the more efficient solution is also simpler, but to each their own.

        – David
        2 days ago













        35














        You can use cut and uniq tools:



        cut -d ' ' -f1 test.txt  | uniq -c
        5 5.135.134.16
        9 13.57.220.172
        1 13.57.233.99
        2 18.206.226.75
        3 18.213.10.181


        Explanation :





        • cut -d ' ' -f1 : extract first field (ip address)


        • uniq -c : report repeated lines and display the number of occurences






        share|improve this answer










        New contributor




        Mikael Flora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.
















        • 6





          One could use sed, e.g. sed -E 's/ *(S*) *(S*)/2 count: 1/' to get the output exactly like OP wanted.

          – dessert
          Mar 28 at 22:22






        • 2





          This should be the accepted answer, as the one by dessert needs to read the file repeatedly so is much slower. And you can easily use sort file | cut .... in case you're not sure if the file is already sorted.

          – Guntram Blohm
          2 days ago
















        35














        You can use cut and uniq tools:



        cut -d ' ' -f1 test.txt  | uniq -c
        5 5.135.134.16
        9 13.57.220.172
        1 13.57.233.99
        2 18.206.226.75
        3 18.213.10.181


        Explanation :





        • cut -d ' ' -f1 : extract first field (ip address)


        • uniq -c : report repeated lines and display the number of occurences






        share|improve this answer










        New contributor




        Mikael Flora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.
















        • 6





          One could use sed, e.g. sed -E 's/ *(S*) *(S*)/2 count: 1/' to get the output exactly like OP wanted.

          – dessert
          Mar 28 at 22:22






        • 2





          This should be the accepted answer, as the one by dessert needs to read the file repeatedly so is much slower. And you can easily use sort file | cut .... in case you're not sure if the file is already sorted.

          – Guntram Blohm
          2 days ago














        35












        35








        35







        You can use cut and uniq tools:



        cut -d ' ' -f1 test.txt  | uniq -c
        5 5.135.134.16
        9 13.57.220.172
        1 13.57.233.99
        2 18.206.226.75
        3 18.213.10.181


        Explanation :





        • cut -d ' ' -f1 : extract first field (ip address)


        • uniq -c : report repeated lines and display the number of occurences






        share|improve this answer










        New contributor




        Mikael Flora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.










        You can use cut and uniq tools:



        cut -d ' ' -f1 test.txt  | uniq -c
        5 5.135.134.16
        9 13.57.220.172
        1 13.57.233.99
        2 18.206.226.75
        3 18.213.10.181


        Explanation :





        • cut -d ' ' -f1 : extract first field (ip address)


        • uniq -c : report repeated lines and display the number of occurences







        share|improve this answer










        New contributor




        Mikael Flora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        share|improve this answer



        share|improve this answer








        edited Mar 28 at 22:34





















        New contributor




        Mikael Flora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        answered Mar 28 at 22:04









        Mikael FloraMikael Flora

        411117




        411117




        New contributor




        Mikael Flora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.





        New contributor





        Mikael Flora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        Mikael Flora is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.








        • 6





          One could use sed, e.g. sed -E 's/ *(S*) *(S*)/2 count: 1/' to get the output exactly like OP wanted.

          – dessert
          Mar 28 at 22:22






        • 2





          This should be the accepted answer, as the one by dessert needs to read the file repeatedly so is much slower. And you can easily use sort file | cut .... in case you're not sure if the file is already sorted.

          – Guntram Blohm
          2 days ago














        • 6





          One could use sed, e.g. sed -E 's/ *(S*) *(S*)/2 count: 1/' to get the output exactly like OP wanted.

          – dessert
          Mar 28 at 22:22






        • 2





          This should be the accepted answer, as the one by dessert needs to read the file repeatedly so is much slower. And you can easily use sort file | cut .... in case you're not sure if the file is already sorted.

          – Guntram Blohm
          2 days ago








        6




        6





        One could use sed, e.g. sed -E 's/ *(S*) *(S*)/2 count: 1/' to get the output exactly like OP wanted.

        – dessert
        Mar 28 at 22:22





        One could use sed, e.g. sed -E 's/ *(S*) *(S*)/2 count: 1/' to get the output exactly like OP wanted.

        – dessert
        Mar 28 at 22:22




        2




        2





        This should be the accepted answer, as the one by dessert needs to read the file repeatedly so is much slower. And you can easily use sort file | cut .... in case you're not sure if the file is already sorted.

        – Guntram Blohm
        2 days ago





        This should be the accepted answer, as the one by dessert needs to read the file repeatedly so is much slower. And you can easily use sort file | cut .... in case you're not sure if the file is already sorted.

        – Guntram Blohm
        2 days ago











        13














        If you don't specifically require the given output format, then I would recommend the already posted cut + uniq based answer



        If you really need the given output format, a single-pass way to do it in Awk would be



        awk '{c[$1]++} END{for(i in c) print i, "count: " c[i]}' log


        This is somewhat non-ideal when the input is already sorted since it unnecessarily stores all the IPs into memory - a better, though more complicated, way to do it in the pre-sorted case (more directly equivalent to uniq -c) would be:



        awk '
        NR==1 {last=$1}
        $1 != last {print last, "count: " c[last]; last = $1}
        {c[$1]++}
        END {print last, "count: " c[last]}
        '


        Ex.



        $ awk 'NR==1 {last=$1} $1 != last {print last, "count: " c[last]; last = $1} {c[$1]++} END{print last, "count: " c[last]}' log
        5.135.134.16 count: 5
        13.57.220.172 count: 9
        13.57.233.99 count: 1
        18.206.226.75 count: 2
        18.213.10.181 count: 3





        share|improve this answer


























        • it would be easy to change the cut + uniq based answer with sed to appear in the demanded format.

          – Peter A. Schneider
          2 days ago











        • @PeterA.Schneider yes it would - I believe that was already pointed out in comments to that answer

          – steeldriver
          2 days ago













        • Ah, yes, I see.

          – Peter A. Schneider
          2 days ago
















        13














        If you don't specifically require the given output format, then I would recommend the already posted cut + uniq based answer



        If you really need the given output format, a single-pass way to do it in Awk would be



        awk '{c[$1]++} END{for(i in c) print i, "count: " c[i]}' log


        This is somewhat non-ideal when the input is already sorted since it unnecessarily stores all the IPs into memory - a better, though more complicated, way to do it in the pre-sorted case (more directly equivalent to uniq -c) would be:



        awk '
        NR==1 {last=$1}
        $1 != last {print last, "count: " c[last]; last = $1}
        {c[$1]++}
        END {print last, "count: " c[last]}
        '


        Ex.



        $ awk 'NR==1 {last=$1} $1 != last {print last, "count: " c[last]; last = $1} {c[$1]++} END{print last, "count: " c[last]}' log
        5.135.134.16 count: 5
        13.57.220.172 count: 9
        13.57.233.99 count: 1
        18.206.226.75 count: 2
        18.213.10.181 count: 3





        share|improve this answer


























        • it would be easy to change the cut + uniq based answer with sed to appear in the demanded format.

          – Peter A. Schneider
          2 days ago











        • @PeterA.Schneider yes it would - I believe that was already pointed out in comments to that answer

          – steeldriver
          2 days ago













        • Ah, yes, I see.

          – Peter A. Schneider
          2 days ago














        13












        13








        13







        If you don't specifically require the given output format, then I would recommend the already posted cut + uniq based answer



        If you really need the given output format, a single-pass way to do it in Awk would be



        awk '{c[$1]++} END{for(i in c) print i, "count: " c[i]}' log


        This is somewhat non-ideal when the input is already sorted since it unnecessarily stores all the IPs into memory - a better, though more complicated, way to do it in the pre-sorted case (more directly equivalent to uniq -c) would be:



        awk '
        NR==1 {last=$1}
        $1 != last {print last, "count: " c[last]; last = $1}
        {c[$1]++}
        END {print last, "count: " c[last]}
        '


        Ex.



        $ awk 'NR==1 {last=$1} $1 != last {print last, "count: " c[last]; last = $1} {c[$1]++} END{print last, "count: " c[last]}' log
        5.135.134.16 count: 5
        13.57.220.172 count: 9
        13.57.233.99 count: 1
        18.206.226.75 count: 2
        18.213.10.181 count: 3





        share|improve this answer















        If you don't specifically require the given output format, then I would recommend the already posted cut + uniq based answer



        If you really need the given output format, a single-pass way to do it in Awk would be



        awk '{c[$1]++} END{for(i in c) print i, "count: " c[i]}' log


        This is somewhat non-ideal when the input is already sorted since it unnecessarily stores all the IPs into memory - a better, though more complicated, way to do it in the pre-sorted case (more directly equivalent to uniq -c) would be:



        awk '
        NR==1 {last=$1}
        $1 != last {print last, "count: " c[last]; last = $1}
        {c[$1]++}
        END {print last, "count: " c[last]}
        '


        Ex.



        $ awk 'NR==1 {last=$1} $1 != last {print last, "count: " c[last]; last = $1} {c[$1]++} END{print last, "count: " c[last]}' log
        5.135.134.16 count: 5
        13.57.220.172 count: 9
        13.57.233.99 count: 1
        18.206.226.75 count: 2
        18.213.10.181 count: 3






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Mar 28 at 22:36

























        answered Mar 28 at 22:12









        steeldriversteeldriver

        70.5k11114187




        70.5k11114187













        • it would be easy to change the cut + uniq based answer with sed to appear in the demanded format.

          – Peter A. Schneider
          2 days ago











        • @PeterA.Schneider yes it would - I believe that was already pointed out in comments to that answer

          – steeldriver
          2 days ago













        • Ah, yes, I see.

          – Peter A. Schneider
          2 days ago



















        • it would be easy to change the cut + uniq based answer with sed to appear in the demanded format.

          – Peter A. Schneider
          2 days ago











        • @PeterA.Schneider yes it would - I believe that was already pointed out in comments to that answer

          – steeldriver
          2 days ago













        • Ah, yes, I see.

          – Peter A. Schneider
          2 days ago

















        it would be easy to change the cut + uniq based answer with sed to appear in the demanded format.

        – Peter A. Schneider
        2 days ago





        it would be easy to change the cut + uniq based answer with sed to appear in the demanded format.

        – Peter A. Schneider
        2 days ago













        @PeterA.Schneider yes it would - I believe that was already pointed out in comments to that answer

        – steeldriver
        2 days ago







        @PeterA.Schneider yes it would - I believe that was already pointed out in comments to that answer

        – steeldriver
        2 days ago















        Ah, yes, I see.

        – Peter A. Schneider
        2 days ago





        Ah, yes, I see.

        – Peter A. Schneider
        2 days ago











        8














        Here is one possible solution:





        IN_FILE="file.log"
        for IP in $(awk '{print $1}' "$IN_FILE" | sort -u)
        do
        echo -en "${IP}tcount: "
        grep -c "$IP" "$IN_FILE"
        done



        • replace file.log with the actual file name.

        • the command substitution expression $(awk '{print $1}' "$IN_FILE" | sort -u) will provide a list of the unique values of the first column.

        • then grep -c will count each of these values within the file.




        $ IN_FILE="file.log"; for IP in $(awk '{print $1}' "$IN_FILE" | sort -u); do echo -en "${IP}tcount: "; grep -c "$IP" "$IN_FILE"; done
        13.57.220.172 count: 9
        13.57.233.99 count: 1
        18.206.226.75 count: 2
        18.213.10.181 count: 3
        5.135.134.16 count: 5





        share|improve this answer





















        • 1





          Prefer printf...

          – D. Ben Knoble
          2 days ago






        • 1





          This means you need to process the entire file multiple times. Once to get the list of IPs and then once more for each of the IPs you find.

          – terdon
          2 days ago
















        8














        Here is one possible solution:





        IN_FILE="file.log"
        for IP in $(awk '{print $1}' "$IN_FILE" | sort -u)
        do
        echo -en "${IP}tcount: "
        grep -c "$IP" "$IN_FILE"
        done



        • replace file.log with the actual file name.

        • the command substitution expression $(awk '{print $1}' "$IN_FILE" | sort -u) will provide a list of the unique values of the first column.

        • then grep -c will count each of these values within the file.




        $ IN_FILE="file.log"; for IP in $(awk '{print $1}' "$IN_FILE" | sort -u); do echo -en "${IP}tcount: "; grep -c "$IP" "$IN_FILE"; done
        13.57.220.172 count: 9
        13.57.233.99 count: 1
        18.206.226.75 count: 2
        18.213.10.181 count: 3
        5.135.134.16 count: 5





        share|improve this answer





















        • 1





          Prefer printf...

          – D. Ben Knoble
          2 days ago






        • 1





          This means you need to process the entire file multiple times. Once to get the list of IPs and then once more for each of the IPs you find.

          – terdon
          2 days ago














        8












        8








        8







        Here is one possible solution:





        IN_FILE="file.log"
        for IP in $(awk '{print $1}' "$IN_FILE" | sort -u)
        do
        echo -en "${IP}tcount: "
        grep -c "$IP" "$IN_FILE"
        done



        • replace file.log with the actual file name.

        • the command substitution expression $(awk '{print $1}' "$IN_FILE" | sort -u) will provide a list of the unique values of the first column.

        • then grep -c will count each of these values within the file.




        $ IN_FILE="file.log"; for IP in $(awk '{print $1}' "$IN_FILE" | sort -u); do echo -en "${IP}tcount: "; grep -c "$IP" "$IN_FILE"; done
        13.57.220.172 count: 9
        13.57.233.99 count: 1
        18.206.226.75 count: 2
        18.213.10.181 count: 3
        5.135.134.16 count: 5





        share|improve this answer















        Here is one possible solution:





        IN_FILE="file.log"
        for IP in $(awk '{print $1}' "$IN_FILE" | sort -u)
        do
        echo -en "${IP}tcount: "
        grep -c "$IP" "$IN_FILE"
        done



        • replace file.log with the actual file name.

        • the command substitution expression $(awk '{print $1}' "$IN_FILE" | sort -u) will provide a list of the unique values of the first column.

        • then grep -c will count each of these values within the file.




        $ IN_FILE="file.log"; for IP in $(awk '{print $1}' "$IN_FILE" | sort -u); do echo -en "${IP}tcount: "; grep -c "$IP" "$IN_FILE"; done
        13.57.220.172 count: 9
        13.57.233.99 count: 1
        18.206.226.75 count: 2
        18.213.10.181 count: 3
        5.135.134.16 count: 5






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Mar 28 at 22:20

























        answered Mar 28 at 22:07









        pa4080pa4080

        14.8k52872




        14.8k52872








        • 1





          Prefer printf...

          – D. Ben Knoble
          2 days ago






        • 1





          This means you need to process the entire file multiple times. Once to get the list of IPs and then once more for each of the IPs you find.

          – terdon
          2 days ago














        • 1





          Prefer printf...

          – D. Ben Knoble
          2 days ago






        • 1





          This means you need to process the entire file multiple times. Once to get the list of IPs and then once more for each of the IPs you find.

          – terdon
          2 days ago








        1




        1





        Prefer printf...

        – D. Ben Knoble
        2 days ago





        Prefer printf...

        – D. Ben Knoble
        2 days ago




        1




        1





        This means you need to process the entire file multiple times. Once to get the list of IPs and then once more for each of the IPs you find.

        – terdon
        2 days ago





        This means you need to process the entire file multiple times. Once to get the list of IPs and then once more for each of the IPs you find.

        – terdon
        2 days ago











        5














        Some Perl:



        $ perl -lae '$k{$F[0]}++; }{ print "$_ count: $k{$_}" for keys(%k)' log 
        13.57.233.99 count: 1
        18.206.226.75 count: 2
        13.57.220.172 count: 9
        5.135.134.16 count: 5
        18.213.10.181 count: 3


        This is the same idea as Steeldriver's awk approach, but in Perl. The -a causes perl to automatically split each input line into the array @F, whose first element (the IP) is $F[0]. So, $k{$F[0]}++ will create the hash %k, whose keys are the IPs and whose values are the number of times each IP was seen. The }{ is funky perlspeak for "do the rest at the very end, after processing all input". So, at the end, the script will iterate over the keys of the hash and print the current key ($_) along with its value ($k{$_}).



        And, just so people don't think that perl forces you to write script that look like cryptic scribblings, this is the same thing in a less condensed form:



        perl -e '
        while (my $line=<STDIN>){
        @fields = split(/ /, $line);
        $ip = $fields[0];
        $counts{$ip}++;
        }
        foreach $ip (keys(%counts)){
        print "$ip count: $counts{$ip}n"
        }' < log





        share|improve this answer




























          5














          Some Perl:



          $ perl -lae '$k{$F[0]}++; }{ print "$_ count: $k{$_}" for keys(%k)' log 
          13.57.233.99 count: 1
          18.206.226.75 count: 2
          13.57.220.172 count: 9
          5.135.134.16 count: 5
          18.213.10.181 count: 3


          This is the same idea as Steeldriver's awk approach, but in Perl. The -a causes perl to automatically split each input line into the array @F, whose first element (the IP) is $F[0]. So, $k{$F[0]}++ will create the hash %k, whose keys are the IPs and whose values are the number of times each IP was seen. The }{ is funky perlspeak for "do the rest at the very end, after processing all input". So, at the end, the script will iterate over the keys of the hash and print the current key ($_) along with its value ($k{$_}).



          And, just so people don't think that perl forces you to write script that look like cryptic scribblings, this is the same thing in a less condensed form:



          perl -e '
          while (my $line=<STDIN>){
          @fields = split(/ /, $line);
          $ip = $fields[0];
          $counts{$ip}++;
          }
          foreach $ip (keys(%counts)){
          print "$ip count: $counts{$ip}n"
          }' < log





          share|improve this answer


























            5












            5








            5







            Some Perl:



            $ perl -lae '$k{$F[0]}++; }{ print "$_ count: $k{$_}" for keys(%k)' log 
            13.57.233.99 count: 1
            18.206.226.75 count: 2
            13.57.220.172 count: 9
            5.135.134.16 count: 5
            18.213.10.181 count: 3


            This is the same idea as Steeldriver's awk approach, but in Perl. The -a causes perl to automatically split each input line into the array @F, whose first element (the IP) is $F[0]. So, $k{$F[0]}++ will create the hash %k, whose keys are the IPs and whose values are the number of times each IP was seen. The }{ is funky perlspeak for "do the rest at the very end, after processing all input". So, at the end, the script will iterate over the keys of the hash and print the current key ($_) along with its value ($k{$_}).



            And, just so people don't think that perl forces you to write script that look like cryptic scribblings, this is the same thing in a less condensed form:



            perl -e '
            while (my $line=<STDIN>){
            @fields = split(/ /, $line);
            $ip = $fields[0];
            $counts{$ip}++;
            }
            foreach $ip (keys(%counts)){
            print "$ip count: $counts{$ip}n"
            }' < log





            share|improve this answer













            Some Perl:



            $ perl -lae '$k{$F[0]}++; }{ print "$_ count: $k{$_}" for keys(%k)' log 
            13.57.233.99 count: 1
            18.206.226.75 count: 2
            13.57.220.172 count: 9
            5.135.134.16 count: 5
            18.213.10.181 count: 3


            This is the same idea as Steeldriver's awk approach, but in Perl. The -a causes perl to automatically split each input line into the array @F, whose first element (the IP) is $F[0]. So, $k{$F[0]}++ will create the hash %k, whose keys are the IPs and whose values are the number of times each IP was seen. The }{ is funky perlspeak for "do the rest at the very end, after processing all input". So, at the end, the script will iterate over the keys of the hash and print the current key ($_) along with its value ($k{$_}).



            And, just so people don't think that perl forces you to write script that look like cryptic scribblings, this is the same thing in a less condensed form:



            perl -e '
            while (my $line=<STDIN>){
            @fields = split(/ /, $line);
            $ip = $fields[0];
            $counts{$ip}++;
            }
            foreach $ip (keys(%counts)){
            print "$ip count: $counts{$ip}n"
            }' < log






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 2 days ago









            terdonterdon

            67.4k13139222




            67.4k13139222























                4














                Maybe this is not what the OP want; however, if we know that the IP address length will be limited to 15 characters, a quicker way to display the counts with unique IPs from a huge log file can be achieved using uniq command alone:



                $ uniq -w 15 -c log

                5 5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] ...
                9 13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] ...
                1 13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] ...
                2 18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] ...
                3 18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] ...


                Options:



                -w N compares no more than N characters in lines



                -c will prefix lines by the number of occurrences



                Alternatively, For exact formatted output I prefer awk (should also work for IPV6 addresses), ymmv.



                $ awk 'NF { print $1 }' log | sort -h | uniq -c | awk '{printf "%s count: %dn", $2,$1 }'

                5.135.134.16 count: 5
                13.57.220.172 count: 9
                13.57.233.99 count: 1
                18.206.226.75 count: 2
                18.213.10.181 count: 3


                Note that uniq won't detect repeated lines in the input file if they are not adjacent, so it may be necessary to sort the file.






                share|improve this answer










                New contributor




                Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.
















                • 1





                  Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.

                  – Martin Thornton
                  2 days ago













                • I like it, I didnt know uniq could count!

                  – j0h
                  13 hours ago
















                4














                Maybe this is not what the OP want; however, if we know that the IP address length will be limited to 15 characters, a quicker way to display the counts with unique IPs from a huge log file can be achieved using uniq command alone:



                $ uniq -w 15 -c log

                5 5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] ...
                9 13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] ...
                1 13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] ...
                2 18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] ...
                3 18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] ...


                Options:



                -w N compares no more than N characters in lines



                -c will prefix lines by the number of occurrences



                Alternatively, For exact formatted output I prefer awk (should also work for IPV6 addresses), ymmv.



                $ awk 'NF { print $1 }' log | sort -h | uniq -c | awk '{printf "%s count: %dn", $2,$1 }'

                5.135.134.16 count: 5
                13.57.220.172 count: 9
                13.57.233.99 count: 1
                18.206.226.75 count: 2
                18.213.10.181 count: 3


                Note that uniq won't detect repeated lines in the input file if they are not adjacent, so it may be necessary to sort the file.






                share|improve this answer










                New contributor




                Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.
















                • 1





                  Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.

                  – Martin Thornton
                  2 days ago













                • I like it, I didnt know uniq could count!

                  – j0h
                  13 hours ago














                4












                4








                4







                Maybe this is not what the OP want; however, if we know that the IP address length will be limited to 15 characters, a quicker way to display the counts with unique IPs from a huge log file can be achieved using uniq command alone:



                $ uniq -w 15 -c log

                5 5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] ...
                9 13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] ...
                1 13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] ...
                2 18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] ...
                3 18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] ...


                Options:



                -w N compares no more than N characters in lines



                -c will prefix lines by the number of occurrences



                Alternatively, For exact formatted output I prefer awk (should also work for IPV6 addresses), ymmv.



                $ awk 'NF { print $1 }' log | sort -h | uniq -c | awk '{printf "%s count: %dn", $2,$1 }'

                5.135.134.16 count: 5
                13.57.220.172 count: 9
                13.57.233.99 count: 1
                18.206.226.75 count: 2
                18.213.10.181 count: 3


                Note that uniq won't detect repeated lines in the input file if they are not adjacent, so it may be necessary to sort the file.






                share|improve this answer










                New contributor




                Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.










                Maybe this is not what the OP want; however, if we know that the IP address length will be limited to 15 characters, a quicker way to display the counts with unique IPs from a huge log file can be achieved using uniq command alone:



                $ uniq -w 15 -c log

                5 5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] ...
                9 13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] ...
                1 13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] ...
                2 18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] ...
                3 18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] ...


                Options:



                -w N compares no more than N characters in lines



                -c will prefix lines by the number of occurrences



                Alternatively, For exact formatted output I prefer awk (should also work for IPV6 addresses), ymmv.



                $ awk 'NF { print $1 }' log | sort -h | uniq -c | awk '{printf "%s count: %dn", $2,$1 }'

                5.135.134.16 count: 5
                13.57.220.172 count: 9
                13.57.233.99 count: 1
                18.206.226.75 count: 2
                18.213.10.181 count: 3


                Note that uniq won't detect repeated lines in the input file if they are not adjacent, so it may be necessary to sort the file.







                share|improve this answer










                New contributor




                Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                share|improve this answer



                share|improve this answer








                edited 14 hours ago





















                New contributor




                Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                answered 2 days ago









                Y. PradhanY. Pradhan

                412




                412




                New contributor




                Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.





                New contributor





                Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                Y. Pradhan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.








                • 1





                  Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.

                  – Martin Thornton
                  2 days ago













                • I like it, I didnt know uniq could count!

                  – j0h
                  13 hours ago














                • 1





                  Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.

                  – Martin Thornton
                  2 days ago













                • I like it, I didnt know uniq could count!

                  – j0h
                  13 hours ago








                1




                1





                Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.

                – Martin Thornton
                2 days ago







                Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.

                – Martin Thornton
                2 days ago















                I like it, I didnt know uniq could count!

                – j0h
                13 hours ago





                I like it, I didnt know uniq could count!

                – j0h
                13 hours ago











                1














                FWIW, Python 3:



                from collections import Counter

                with open('sample.log') as file:
                counts = Counter(line.split()[0] for line in file)

                for ip_address, count in counts.items():
                print('%-15s count: %d' % (ip_address, count))


                Output:



                13.57.233.99     count: 1
                18.213.10.181 count: 3
                5.135.134.16 count: 5
                18.206.226.75 count: 2
                13.57.220.172 count: 9





                share|improve this answer






























                  1














                  FWIW, Python 3:



                  from collections import Counter

                  with open('sample.log') as file:
                  counts = Counter(line.split()[0] for line in file)

                  for ip_address, count in counts.items():
                  print('%-15s count: %d' % (ip_address, count))


                  Output:



                  13.57.233.99     count: 1
                  18.213.10.181 count: 3
                  5.135.134.16 count: 5
                  18.206.226.75 count: 2
                  13.57.220.172 count: 9





                  share|improve this answer




























                    1












                    1








                    1







                    FWIW, Python 3:



                    from collections import Counter

                    with open('sample.log') as file:
                    counts = Counter(line.split()[0] for line in file)

                    for ip_address, count in counts.items():
                    print('%-15s count: %d' % (ip_address, count))


                    Output:



                    13.57.233.99     count: 1
                    18.213.10.181 count: 3
                    5.135.134.16 count: 5
                    18.206.226.75 count: 2
                    13.57.220.172 count: 9





                    share|improve this answer















                    FWIW, Python 3:



                    from collections import Counter

                    with open('sample.log') as file:
                    counts = Counter(line.split()[0] for line in file)

                    for ip_address, count in counts.items():
                    print('%-15s count: %d' % (ip_address, count))


                    Output:



                    13.57.233.99     count: 1
                    18.213.10.181 count: 3
                    5.135.134.16 count: 5
                    18.206.226.75 count: 2
                    13.57.220.172 count: 9






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited 9 hours ago

























                    answered 9 hours ago









                    wjandreawjandrea

                    9,47842664




                    9,47842664























                        0














                        cut -f1 -d- my.log | sort | uniq -c


                        Explanation: Take the first field of my.log splitting on dashes - and sort it. uniq needs sorted input. -c tells it to count occurrences.






                        share|improve this answer










                        New contributor




                        PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        Check out our Code of Conduct.

























                          0














                          cut -f1 -d- my.log | sort | uniq -c


                          Explanation: Take the first field of my.log splitting on dashes - and sort it. uniq needs sorted input. -c tells it to count occurrences.






                          share|improve this answer










                          New contributor




                          PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                          Check out our Code of Conduct.























                            0












                            0








                            0







                            cut -f1 -d- my.log | sort | uniq -c


                            Explanation: Take the first field of my.log splitting on dashes - and sort it. uniq needs sorted input. -c tells it to count occurrences.






                            share|improve this answer










                            New contributor




                            PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.










                            cut -f1 -d- my.log | sort | uniq -c


                            Explanation: Take the first field of my.log splitting on dashes - and sort it. uniq needs sorted input. -c tells it to count occurrences.







                            share|improve this answer










                            New contributor




                            PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.









                            share|improve this answer



                            share|improve this answer








                            edited 9 hours ago









                            wjandrea

                            9,47842664




                            9,47842664






                            New contributor




                            PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.









                            answered yesterday









                            PhDPhD

                            101




                            101




                            New contributor




                            PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.





                            New contributor





                            PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.






                            PhD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                            Check out our Code of Conduct.






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Ask Ubuntu!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1129521%2fhow-to-count-occurrences-of-text-in-a-file%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                How did Captain America manage to do this?

                                迪纳利

                                南乌拉尔铁路局