ethnicolr2: Predict Race and Ethnicity From Names¶
ethnicolr2 is a modern PyTorch-based machine learning package that predicts race and ethnicity from names using LSTM neural networks. It’s trained on US Census data and Florida voter registration data to provide accurate predictions based on:
Last name only (census model or Florida model)
First and last name combined (Florida full name model)
Quick Start¶
# Install ethnicolr2
uv add ethnicolr2
# or
pip install ethnicolr2
import pandas as pd
from ethnicolr2 import pred_fl_last_name
# Predict from last names
df = pd.DataFrame({'last': ['Smith', 'Zhang', 'Rodriguez']})
result = pred_fl_last_name(df, lname_col='last')
print(result)
Key Features¶
Trained on US Census data and Florida voter registration with proven accuracy for demographic prediction.
Built with PyTorch 2.x for efficient neural network inference with LSTM models.
Both Python API and command-line interface for seamless integration into your workflow.
Documentation Sections¶
Installation, quickstart guide, and core concepts to get you up and running quickly.
Detailed tutorials, examples, and best practices for different use cases.
Complete API documentation with all classes, functions, and parameters.
Contributing guidelines, testing, and development setup information.
Supported Prediction Categories¶
The models predict one of five race/ethnicity categories:
nh_white: Non-Hispanic Whitenh_black: Non-Hispanic Blackhispanic: Hispanicasian: Asianother: Other
Available Models¶
Model |
Input |
Training Data |
Use Case |
|---|---|---|---|
Census Last Name |
Last name only |
US Census 2000/2010 |
General population predictions |
Florida Last Name |
Last name only |
FL voter registration |
State-specific predictions |
Florida Full Name |
First + Last name |
FL voter registration |
Highest accuracy predictions |