Federal Bulk Legislation Dataset

Description: This is a bulk open-access dataset in JSON format with the full text of all consolidated Federal legislation, as maintained by the Federal Department of Justice. The full text excludes tables, annexes, schedules, and forms. The process through which data is collected and processed, as well as code snippets for loading the data, are available in a repository on the Refugee Law Lab Github.

Data: https://github.com/Refugee-Law-Lab/legislation-fed-bulk-data/tree/main/DATA

Code Repository: https://github.com/Refugee-Law-Lab/legislation-fed-bulk-data

Current Coverage: All consolidated acts

Number of Documents: ~1,800

Languages: English & French

Format: JSON, Parquet, Hugging Face Dataset

License: Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

Citation: Sean Rehaag, “Federal Bulk Legislation Dataset” (2024), online: Refugee Law Laboratory https://refugeelab.ca/bulk-data/legislation-fed

Programmatic Access in Python (via Hugging Face Datasets):

from datasets import load_dataset
import pandas as pd

dataset = load_dataset("refugee-law-lab/canadian-legal-data", "LEGISLATION-FED", split="train")

# convert to dataframe
df = pd.DataFrame(dataset)
df

Programmatic Access to in Python (via Parquet):

import pandas as pd
import requests
from io import BytesIO

url = 'https://huggingface.co/datasets/refugee-law-lab/canadian-legal-data/resolve/main/LEGISLATION-FED/train.parquet'

# load data
results = requests.get(url)

# convert to dataframe
df = pd.read_parquet(BytesIO(results.content))
df

Programmatic Access in Python (JSON via GitHub):

import pandas as pd

# load english data
url = 'https://raw.githubusercontent.com/Refugee-Law-Lab/legislation-fed-bulk-data/main/DATA/df_acts_en.json'
df = pd.read_json(url, orient='records', lines=True)

# load french data
url = 'https://raw.githubusercontent.com/Refugee-Law-Lab/legislation-fed-bulk-data/main/DATA/df_acts_fr.json'
df2 = pd.read_json(url, orient='records', lines=True)

#combine both dataframes
df = pd.concat([df, df2], ignore_index=True)

df