Utilities API Reference

This page documents utility functions and command-line tools.

Command-Line Interface

ethnicolr2 provides several command-line tools for batch processing:

Census Data Lookup

# Look up census data by last name
census_ln input.csv -l last_name -o output.csv

Prediction Commands

# Florida last name model
pred_fl_last_name input.csv -l last_name -o predictions.csv

# Florida full name model
pred_fl_full_name input.csv -l last_name -f first_name -o predictions.csv

# Census last name model
pred_census_last_name input.csv -l last_name -o predictions.csv

Model Download

# Download pre-trained models (if needed)
ethnicolr2_download_models

Command Line Options

All command-line tools support these common options:

  • -h, --help: Show help message and exit

  • -o OUTPUT, --output OUTPUT: Output file path

  • -l LAST, --last LAST: Column name or index for last names

  • -f FIRST, --first FIRST: Column name or index for first names (where applicable)

Programmatic Usage

You can also use the command-line functionality programmatically:

import subprocess
import pandas as pd

# Prepare input data
df = pd.DataFrame({'surname': ['Smith', 'Zhang', 'Rodriguez']})
df.to_csv('input.csv', index=False)

# Run prediction via command line
result = subprocess.run([
    'pred_fl_last_name',
    'input.csv',
    '-l', 'surname',
    '-o', 'output.csv'
], capture_output=True, text=True)

# Load results
if result.returncode == 0:
    predictions = pd.read_csv('output.csv')
    print(predictions)
else:
    print(f"Error: {result.stderr}")

Batch Processing

For large datasets, the command-line interface automatically handles batch processing:

# Process large CSV files efficiently
pred_fl_last_name large_dataset.csv -l lastname -o results.csv

The tools automatically:

  • Process data in chunks to manage memory usage

  • Show progress bars for long-running operations

  • Handle encoding issues gracefully

  • Preserve original column order and additional columns