Categories
Uncategorized

RLLR Bulk Decisions Dataset

Description: This is a bulk open-access dataset in JSON format with the full text of all Immigration and Refugee Board (Refugee Protection Division) cases included in the Refugee Law Lab Reporter. The process through which data is collected and updated, as well as code snippets for loading the data, are available in a repository on the Refugee Law Lab Github.

Data: https://github.com/Refugee-Law-Lab/rllr_bulk_data/blob/master/DATA/yearly

Code Repository: https://github.com/Refugee-Law-Lab/rllr_bulk_data

Current Coverage: 2019-Present

Number of Decisions: ~500

Languages: English

Format: JSON (yearly files), Parquet, Hugging Face Dataset

License: Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

Citation: Sean Rehaag, “RLLR Bulk Decisions Dataset” (2023), online: Refugee Law Laboratory https://refugeelab.ca/bulk-data/rllr

Programmatic Access in Python (JSON via GitHub):

import pandas as pd
import requests

# Set variables
start_year = 2019  # First year of data sought (2019 +)
end_year = 2022  # Last year of data sought (2022 -)

base_ulr = 'https://raw.githubusercontent.com/Refugee-Law-Lab/rllr_bulk_data/master/DATA/YEARLY/'

# load data
results = []
for year in range(start_year, end_year+1):
        url = base_ulr + f'{year}.json'
        results.extend(requests.get(url).json())

# convert to dataframe
df = pd.DataFrame(results)