Getting Started¶

This guide will help you install naampy and make your first predictions.

Installation¶

Requirements¶

Python 3.11
pip or uv package manager

Install from PyPI¶

We strongly recommend installing naampy inside a Python virtual environment (see venv documentation):

pip install naampy

Or if you’re using uv:

uv pip install naampy

Install from Source¶

To install the latest development version:

git clone https://github.com/appeler/naampy.git
cd naampy
pip install -e .

Quick Start¶

Basic Usage¶

import pandas as pd
from naampy import in_rolls_fn_gender, predict_fn_gender

# Create a DataFrame with names
names_df = pd.DataFrame({'name': ['Priyanka', 'Rahul', 'Anjali']})

# Get gender predictions from electoral roll data
result = in_rolls_fn_gender(names_df, 'name')
print(result[['name', 'prop_female', 'prop_male']])

Using the ML Model¶

For names not in the electoral roll database:

# Use the neural network model for predictions
names = ['Aadhya', 'Reyansh', 'Kiara']
predictions = predict_fn_gender(names)
print(predictions)

Understanding the Output¶

Electoral Roll Data (`in_rolls_fn_gender`)¶

The function returns a DataFrame with the original data plus these columns:

prop_female: Proportion of people with this name who are female (0-1)
prop_male: Proportion of people with this name who are male (0-1)
prop_third_gender: Proportion of people with this name who are third gender (0-1)
n_female: Total count of females with this name in the dataset
n_male: Total count of males with this name in the dataset
n_third_gender: Total count of third gender individuals with this name

ML Model Predictions (`predict_fn_gender`)¶

The function returns a DataFrame with:

name: The input name
pred_gender: Predicted gender (‘male’ or ‘female’)
pred_prob: Confidence score for the prediction (0-1)

How it Works¶

When you first run in_rolls_fn_gender, it downloads data from Harvard Dataverse to a local cache folder. Subsequent runs use the cached data for faster performance.

The package provides two complementary approaches:

Electoral Roll Data: Statistical data from millions of Indian voters
Machine Learning Model: Neural network trained on name patterns

For names not found in the electoral roll database, the package automatically falls back to the ML model.

Next Steps¶

Read the User Guide for more detailed examples
Check the API Reference for all available options
Learn about the methodology and data sources