RLLR Bulk Decisions Dataset
Description: This is a bulk open-access dataset in JSON format with the full text of all Immigration and Refugee Board (Refugee Protection Division) cases included in the Refugee Law Lab Reporter. The process through which data is collected and updated, as well as code snippets for loading the data, are available in a repository on the Refugee Law Lab Github.
Data: https://github.com/Refugee-Law-Lab/rllr_bulk_data/blob/master/DATA/yearly
Code Repository: https://github.com/Refugee-Law-Lab/rllr_bulk_data
Current Coverage: 2019-Present
Number of Decisions: ~500
Languages: English
Format: JSON (yearly files), Parquet, Hugging Face Dataset
License: Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
Citation: Sean Rehaag, “RLLR Bulk Decisions Dataset” (2023), online: Refugee Law Laboratory https://refugeelab.ca/bulk-data/rllr
Programmatic Access in Python (via Hugging Face Datasets):
from datasets import load_dataset import pandas as pd dataset = load_dataset("refugee-law-lab/canadian-legal-data", "RLLR", split="train") # convert to dataframe df = pd.DataFrame(dataset) df
Programmatic Access to in Python (via Parquet):
import pandas as pd import requests from io import BytesIO url = 'https://huggingface.co/datasets/refugee-law-lab/canadian-legal-data/resolve/main/RLLR/train.parquet' # load data results = requests.get(url) # convert to dataframe df = pd.read_parquet(BytesIO(results.content)) df
Programmatic Access in Python (JSON via GitHub):
import pandas as pd import requests # Set variables start_year = 2019 # First year of data sought (2019 +) end_year = 2022 # Last year of data sought (2022 -) base_ulr = 'https://raw.githubusercontent.com/Refugee-Law-Lab/rllr_bulk_data/master/DATA/YEARLY/' # load data results = [] for year in range(start_year, end_year+1): url = base_ulr + f'{year}.json' results.extend(requests.get(url).json()) # convert to dataframe df = pd.DataFrame(results)