PhDs.io

Programming
Author
Affiliation

Mingze Gao, PhD

Macquarie University

Published

September 9, 2024

A fast literature search engine.

Give it a try: PhDs.io.

Why I created PhDs.io

When I first started my PhD study, one of the biggest challenges I faced was navigating the sheer volume of research papers in finance and related fields. It’s easy to feel overwhelmed when there are thousands of papers published every year. The time spent searching for quality references, vetting sources, and organizing findings can feel like a second job on top of research and teaching.

That’s where the idea for PhDs.io came from. I wanted to create a platform where finance researchers—whether they’re PhD students just starting out or seasoned academics—can find and access high-quality research without the headache. I envisioned a tool that makes it easy to discover research papers across finance, economics, accounting, and related fields.

But this project isn’t just for me or my peers. I also want to support the next generation of researchers who may not have the same level of experience. PhDs.io is my way of giving back to the community—creating something that can help those navigating the often confusing world of academic research.

As for the future, my vision for PhDs.io goes beyond a mere repository of papers. I see it growing into a platform that offers personalized recommendations, connects researchers with similar interests, and even provides collaborative tools for teams working on interdisciplinary projects. It’s an ambitious goal, but one that I believe can make a meaningful impact in our field.

This journey isn’t something I can do alone. The platform is currently supported by my own contributions, but as it expands, I hope others will see the value in what we’re building here and join the effort. Whether through financial support, collaborations, or just spreading the word, every little bit helps us move toward a better research experience for everyone.

This post

In what follows, I explain a little bit of the development of PhDs.io, in case someone would like to build a similar or even better project.

  • This project is built with KerkoApp, a web application that uses Kerko to provide a user-friendly search and browsing interface for sharing a bibliography managed with the Zotero reference manager.
  • KerkoApp is built in Python with the Flask framework. It is just a thin container around Kerko and, as such, inherits most of its features directly from Kerko. However, it adds support for TOML configuration files, allowing a good separation of configuration from code.

In summary, I and my awesome contributors collectively maintain a Zotero group library. We collect and organize high-quality research papers from within Zotero, which is then synced to my server that runs Kerko to serve the website to the public.

Deployment

Website

Start with a fresh AWS EC2 instance. I’m using an ARM-based t4g.small running Ubuntu 24.04 LTS.

sudo apt update

Install pip and venv.

sudo apt install python3-pip python3-venv

Follow the deployment guide of kerko.

Note

I use a modified version of kerkoapp. So I install my fork not the original one. Use the following instead.

git clone https://github.com/mgao6767/phds.io.git ~/kerkoapp

Enable SSL

Use Let’s Encrypt to obtain a free SSL certificate and configure Nginx to use it.

Install Certbot and its Nginx plugin to automate SSL certificate generation.

sudo apt update
sudo apt install certbot python3-certbot-nginx

Run Certbot to get a certificate for the domain.

sudo certbot --nginx -d phds.io

Certbot will automatically configure Nginx to use SSL. Once Certbot has finished, verify Nginx configuration to ensure there are no errors.

sudo nginx -t

If everything is correct, reload Nginx:

sudo systemctl reload nginx

Let’s Encrypt certificates expire after 90 days, but Certbot takes care of the renewal automatically. Check the Certbot renewal process with:

sudo certbot renew --dry-run

This tests the automatic renewal without actually renewing the certificate.

Rate limit and IP blocking

Enable rate limiting in Nginx

Nginx has built-in support for rate limiting based on IP addresses. This can help prevent individual users from overwhelming the server with too many requests in a short period.

sudo vim /etc/nginx/sites-available/kerkoapp.conf

Add the following in the http block:

http {
    # Rate limiting: limit each IP to 5 requests per second with a burst of 12 requests.
    limit_req_zone $binary_remote_addr zone=ip:10m rate=5r/s;
    
    server {
        listen 80;
        server_name phds.io;

        location / {
            # Apply rate limiting to all requests under this location
            limit_req zone=ip burst=12 delay=8;

            # Other configurations, such as proxy_pass, root, etc.
            ...
        }
    }
}

Fail2Ban for IP blocking

While Nginx’s rate limiting helps to slow down abusive clients, use Fail2Ban to ban IPs that continuously exceed rate limits by monitoring Nginx logs.

sudo apt install fail2ban

Next, create a configuration file for Nginx rate limit monitoring. Open /etc/fail2ban/jail.local:

sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local
sudo vim /etc/fail2ban/jail.local

Enable nginx-limit-req.

[nginx-limit-req]
enabled = true
port    = http,https
logpath = %(nginx_error_log)s

Edit the filter file for Fail2Ban if necessary:

sudo vim /etc/fail2ban/filter.d/nginx-limit-req.conf

Restart Fail2Ban to apply the changes:

sudo service fail2ban restart

Check Fail2ban logs:

tail -f /var/log/fail2ban.log

Manual jail (ban forever). In filer.d/manual.conf:

[Definition]
failregex = 
ignoreregex = 

In jail.d/custom.conf:

[manual]
banaction = %(banaction_allports)s
bantime = -1
enabled = true

To ban an ip:

fail2ban-client set manual banip 11.22.33.44
Back to top