How to setup PasteHunter in a VirtualBox

I’ve been using this tool for a couple of weeks now and I’ve been amazed at the stuff I’ve found on Pastebin, even just using the default rules that come with the application.

The tool being, as the title suggests, is called “PasteHunter” –

The author of the tool describes it as:

“PasteHunter is a python3 application that isdesigned to query a collection of sites that host publicly pasted data. For allthe pastes it finds, it scans the raw contents against a series of Yara ruleslooking for information that can be used by an organisation or aresearcher.” From<>

As a researcher, this interested me somewhat…

Prerequisites – A couple of things to be aware of before you get started:

  • You will need a paid subscription to Pastebin Pro ( The usual price for lifetime membership is $49.95, but they run regular promotions where you can get it vastly cheaper. For example, at time of writing they are running a promotion to get it for $29.95 and I’ve seen it go as low as $19.95.
  • For the best results, it would be ideal to run this 24/7 in a dedicated system, however this guide specifies how to create this in a VM as a “Proof of Concept”.

Now for people that are new to the field of Information Security and Linux Systems (Predominantly Ubuntu Server for this blog), I thought I’d start with a guide for how to install PasteHunter. I’ve opted to set this up on a Ubuntu Server, hosted in Virtual Box. This allows me to keep everything separate from my main host, and gives me flexibility for migration, backups, snapshotting, etc.

Disclaimer – Pastebin is often utilised by malicious actors who post Personally Identifiable Information (PII), Usernames & Password combinations, Code Segments and other pastes which are not necessarily intended for public consumption. Please be aware of the risks associated with scraping Pastebin, this blog below is purely for research purposes.

1. Setup Ubuntu Server 18.xx

My Virtual Machine (VM) is setup with the default settings, which includes the default settings for the network card (NAT):

2. Port Forwarding

This next part is optional, but I would recommend it. It allows you to keep the VM working in the background, while you SSH to it using your favourite SSH client. You can use PuTTY or I recommend Termius – the free version is suitable for my needs. Under “Adapter 1” in Network settings, click “Advanced” then “Port Forwarding”:

Take note of your UbuntuServer’s IP (can be retrieved by running ifconfig in the command line), e.g. below.

Enter the details as presented below, and where it says Guest IP, add your server’s IP. If you don’t want to enable SSH, you will still need to add the ElasticSearch and Kibana Port forwarding rules as seen below.

3. Python and Java Installation

Once you have SSH access, run the following commands:

$ sudo apt update && sudo apt upgrade 
$ sudo apt install python3-dev 
$ sudo add-apt-repository universe && sudo apt-get update 
$ sudo apt install python3-pip 
$ sudo pip3 install -U setuptools 
$ sudo apt install yara 
$ sudo apt install git 
$ sudo apt install openjdk-11-jdk

4. Install Yara-Python

$ git clone --recursive 
$ cd yara-python 
$ python3 build 
$ sudo python3 install

5. Install Elasticsearch using the instructions here:

Use the steps under the heading “Import the Elastic PGP Key” – The installation will fail if you do not import the PGP key and follow the instructions from there.

6. Install Kibana using the instructions here:[

As you have already imported the PGP key, you can start at the steps under the heading “Installing from APT Repository”.

7. Configure ElasticSearch

$ sudo nano /etc/elasticsearch/elasticsearch.yml

Ensure that the followingsettings are specified: pastescrape-elasticsearch pastescrape 
# Set the bind address to a specific IP (IPv4 or IPv6):
# Set a custom port for HTTP: 
http.port: 9200 


8. Autostart ElasticSearch

$ sudo /bin/systemctl daemon-reload
$ sudo /bin/systemctl enable elasticsearch.service 

Reboot VM

Try to connect to ElasticSearch from your Host using

If it worked, you shouldsee the below:

9. Configure Kibana

$ sudo nano /etc/kibana/kibana.yml 

Change the lines below to reflect the following:

server.port: 5601 
elasticsearch.url: "" 

Save (CTRL + O)

10. Autostart Kibana

$ sudo /bin/systemctl daemon-reload
$ sudo /bin/systemctl enable kibana.service 

Reboot VM

If the changes you made worked, you should see this when navigating to

When Kibana asks for the Index Pattern, make sure you enter *. This will ensure that all indexes in Elasticsearch will be available for searching in the “Discover” tab. Click Next.

On Step 2 of 2, ensure you select @timestamp in the Time Filter field name (see below).

11. Download PasteHunter from Github

$ git clone
$ cd pastehunter
$ pip3 install -r requirements.txt 
$ cp settings.json.sample settings.json 
$ nano settings.json 

Change the following settings in this file:

       "elastic_host": "",
         "api_host": "",

Save (CTRL + O)

12. Setup interaction with Pastebin

Lastly, before you can scrape Pastebin, you need to Whitelist your IP:

If you’re having issues identifying what you’re external IPv4 is, use this command:

$ curl 

If your IP has been successfully whitelisted, the below link will return some JSON data for you:

13. Final Housekeeping

Firstly, make sure your script is scraping correctly by using the command below:

$ python3

You should see output on the screen like the following:

INFO:Sleeping for 300 Seconds INFO:Populating Queue INFO:Fetching paste list from inputs.pastebin INFO:Added 89 Items to the queue INFO:Blacklisted paste 6AxKEWZH INFO:Blacklisted paste aTc3A9nt INFO:Blacklisted paste 2tCPgutc INFO:Blacklisted paste 2zxwaLB8 INFO:Blacklisted paste wa5xDWhM INFO:Sleeping for 300 Seconds

Navigate to to get to your instance of Kibana. Kibana should automatically show the elasticsearch index, if it doesn’t go to “Management” in Kibana, “Index Management” and the newly created index should be in there. If it is, go to “Discover” in Kibana and look at your first hits! Press CTRL + C to stop this running.

Next, you want to create a new service and have this running in the background. Use the following commands:

$ mv ~/pastehunter /opt/
$ nano /etc/systemd/system/pastehunter.service

In this file, add the following:

ExecStart=/usr/bin/python3 /opt/pastehunter/

Make sure you edit the “WorkingDirectory” line to ensure that it reflects the path that you copied the PasteHunter directory to earlier, this should be reflected on the “ExecStart” line. User/Group, add whatever user you’d like this to service to run under. The user you’re logged in as will be sufficient.

Ensure your newly added service can be executed with the following command:

$ chmod 644 /etc/systemd/system/pastehunter.service 

Next, lets see if it works, use this command to start the service:

$ sudo systemctl start pastehunter.service

Use this command to check if it’s working:

$ sudo systemctl status pastehunter.service

If it worked, you’ll see output like the following:

$ sudo systemctl status pastehunter.service pastehunter.service - PasteHunter   Loaded: loaded (/etc/systemd/system/pastehunter.service; enabled; vendor preset: enabled)  Active: active (running) since Tue 2019-01-29 13:36:41 CET; 1h 57min ago Main PID: 726 (python3)
    Tasks: 7 (limit: 2304)
   CGroup: /system.slice/pastehunter.service           ├─726 /usr/bin/python3 /opt/pastehunter/           ├─867 /usr/bin/python3 /opt/pastehunter/           ├─868 /usr/bin/python3 /opt/pastehunter/           ├─869 /usr/bin/python3 /opt/pastehunter/           ├─870 /usr/bin/python3 /opt/pastehunter/           └─871 /usr/bin/python3 /opt/pastehunter/pastehunter.pyJan 29 15:30:32 [omitted] python3[726]: paste list from inputs.pastebinJan 29 15:30:32 [omitted] python3[726]: 92 Items to the queueJan 29 15:30:33 [omitted] python3[726]: paste J4NcgWsDJan 29 15:30:34 [omitted] python3[726]: paste zcwjPT9cJan 29 15:30:35 [omitted] python3[726]: paste sHw9L73EJan 29 15:30:35 [omitted] python3[726]: paste VWziDjCYJan 29 15:30:35 [omitted] python3[726]: Post Module postprocess.post_email on WST0CH0ZJan 29 15:30:36 [omitted] python3[726]: paste TNpEx4kBJan 29 15:30:36 [omitted] python3[726]: paste 1aDVg8atJan 29 15:30:42 [omitted] python3[726]: for 300 Seconds

Lastly, lets enable the service to start on boot:

$ sudo systemctl enable pastehunter.service 

And that’s it! Happy scraping! Here’s a Pie Chart of a few of the Yara Rules that hit over a week period. If you have all the Yara Rules enabled, you will likely get hits most times your script runs.

Any issues, please leave a comment below and I’ll do my best to help. I’m also planning a Part 2 to this blog, which involves moving this to the cloud and setting up email alerts.

If you’re using Ubuntu 16.04

If you’re using Ubuntu Server 16.04, it is likely that your distribution does not include Java 11. You will therefore need to use the commands below to include this in order to use this guide.

$ sudo add-apt-repository ppa:openjdk-r/ppa
$ sudo apt-get update -q
$ sudo apt install -y openjdk-11-jdk

From <>


Useful Commands

If you’re having issues getting this to work, I firstly recommend verifying that the processes are running, use the following command:

$ ps ax | grep pastehunter

If pastehunter is running, you will see output like the below:

Another useful command is if you make changes to the settings file, or you would like pastehunter to stop running for any reason, use the below:

$ pkill -9 -f

ElasticSearch Stops Running

After approximately 12 hours, the ElasticSearch service goes down, which stops PasteHunter being able to output the data that matches the Yara Rules. This can be rectified by editing the ElasticSearch service. Use the below command:

$ sudo systemctl edit --full elasticsearch.service

Next, under the [Service] heading, add the following lines (please be aware that the last heading is [Install] and it will not work if it’s entered at the bottom under this heading).

# Keep the Service Up 


Ctrl + O to save, then restart the service:

$ sudo systemctl restart elasticsearch.service

Check to make sure the change worked:

$ sudo systemctl status elasticsearch.service

If there are any errors, the service will have stopped, and the error will be displayed. Please check under what heading you made the changes before commenting.

2 thoughts on “How to setup PasteHunter in a VirtualBox”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s