Getting Started

This guide will help you install naampy and make your first predictions.

Installation

Requirements

  • Python 3.11

  • pip or uv package manager

Install from PyPI

We strongly recommend installing naampy inside a Python virtual environment (see venv documentation):

pip install naampy

Or if you’re using uv:

uv pip install naampy

Install from Source

To install the latest development version:

git clone https://github.com/appeler/naampy.git
cd naampy
pip install -e .

Quick Start

Basic Usage

import pandas as pd
from naampy import in_rolls_fn_gender, predict_fn_gender

# Create a DataFrame with names
names_df = pd.DataFrame({'name': ['Priyanka', 'Rahul', 'Anjali']})

# Get gender predictions from electoral roll data
result = in_rolls_fn_gender(names_df, 'name')
print(result[['name', 'prop_female', 'prop_male']])

Using the ML Model

For names not in the electoral roll database:

# Use the neural network model for predictions
names = ['Aadhya', 'Reyansh', 'Kiara']
predictions = predict_fn_gender(names)
print(predictions)

Understanding the Output

Electoral Roll Data (in_rolls_fn_gender)

The function returns a DataFrame with the original data plus these columns:

  • prop_female: Proportion of people with this name who are female (0-1)

  • prop_male: Proportion of people with this name who are male (0-1)

  • prop_third_gender: Proportion of people with this name who are third gender (0-1)

  • n_female: Total count of females with this name in the dataset

  • n_male: Total count of males with this name in the dataset

  • n_third_gender: Total count of third gender individuals with this name

ML Model Predictions (predict_fn_gender)

The function returns a DataFrame with:

  • name: The input name

  • pred_gender: Predicted gender (‘male’ or ‘female’)

  • pred_prob: Confidence score for the prediction (0-1)

How it Works

When you first run in_rolls_fn_gender, it downloads data from Harvard Dataverse to a local cache folder. Subsequent runs use the cached data for faster performance.

The package provides two complementary approaches:

  1. Electoral Roll Data: Statistical data from millions of Indian voters

  2. Machine Learning Model: Neural network trained on name patterns

For names not found in the electoral roll database, the package automatically falls back to the ML model.

Next Steps