Categories
Uncategorized

RPD Bulk Decisions Dataset

Description: This is a bulk open-access dataset in JSON, Parquet and Hugging Face Dataset formats with the full text of all Immigration and Refugee Board (IRB) Refugee Protection Division (RPD) cases provided by the IRB to the Refugee Law Lab, covering the 2002 to 2020 period. Because the IRB is no longer publishing RPD decisions, we consider the dataset to be a legacy dataset. For more recent decisions obtained via Access to Information requests, see the RLLR Bulk Decisions Dataset. The process through which data is collected and processed, as well as code snippets for loading the data, are available in a repository on the Refugee Law Lab Github.

Data: https://github.com/Refugee-Law-Lab/rpd_bulk_data/blob/master/DATA/yearly

Code Repositoryhttps://github.com/Refugee-Law-Lab/rpd_bulk_data

Current Coverage: 2002-2020

Number of Decisions: ~12,500

Languages: English & French

Format: JSON, Parquet, Hugging Face Dataset

License: Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

Citation: Sean Rehaag, “RAD Bulk Decisions Dataset” (2023), online: Refugee Law Laboratory https://refugeelab.ca/bulk-data/rpd

Programmatic Access in Python (JSON via GitHub):

import pandas as pd
import requests
import json

start_year = 2002 # First year of data sought (2002+)
end_year = 2020 # Last year of data sought (2020 -)
base_ulr = 'https://raw.githubusercontent.com/Refugee-Law-Lab/rpd_bulk_data/master/DATA/YEARLY/' 

data results = [] 
     for year in range(start_year, end_year+1): 
     url = base_ulr + f'{year}.json'
     results.extend(requests.get(url).json()) 

df = pd.DataFrame(results)
df