API Reference

This section provides detailed documentation of the parsernaam API.

Core Classes

class parsernaam.parse.ParseNames[source]

Bases: Parsernaam

Main API class for parsing names using machine learning models.

This class provides the primary interface for name parsing functionality, extending the base Parsernaam class with predefined model file paths. Uses LSTM neural networks to classify names as first/last or determine positional ordering in multi-word names.

Example

>>> import pandas as pd
>>> from parsernaam.parse import ParseNames
>>> df = pd.DataFrame({'name': ['John Smith', 'Kim Yeon']})
>>> results = ParseNames.parse(df)
>>> print(results['parsed_name'][0])
{'name': 'John Smith', 'type': 'first_last', 'prob': 0.998}
MODEL_FN = 'models/parsernaam.pt'
MODEL_POS_FN = 'models/parsernaam_pos.pt'
VOCAB_FN = 'models/parsernaam.joblib'
classmethod parse(df: DataFrame) DataFrame[source]

Parse names

Return type:

DataFrame

Parameters:

df – DataFrame with names

Returns:

DataFrame with parsed names

parsernaam.parse.parse_names(df: DataFrame) DataFrame

Parse names

Return type:

DataFrame

Parameters:

df – DataFrame with names

Returns:

DataFrame with parsed names

parsernaam.parse.main() int | None[source]

Main method to parse names

Return type:

int | None

Returns:

Exit code (None for success)

class parsernaam.naam.Parsernaam[source]

Bases: object

Parse names

classmethod parse(df: DataFrame, model_fn: str, model_fn_pos: str, vocab_fn: str) DataFrame[source]

Parse names using ML models

Return type:

DataFrame

Parameters:
  • df – DataFrame with ‘name’ column containing names to parse

  • model_fn – Path to single name model file

  • model_fn_pos – Path to positional name model file

  • vocab_fn – Path to vocabulary file

Returns:

DataFrame with added ‘parsed_name’ column

Raises:
  • ValueError – If required ‘name’ column is missing

  • FileNotFoundError – If model files cannot be found

Model Architecture

class parsernaam.model.LSTM(input_size: int, hidden_size: int, output_size: int, num_layers: int = 1)[source]

Bases: Module

LSTM neural network for name classification.

A multi-layer LSTM network with embedding layer for character-level name classification. Supports both single name classification (first/last) and positional classification (first_last/last_first).

__init__(input_size: int, hidden_size: int, output_size: int, num_layers: int = 1)[source]

Initialize LSTM model.

Parameters:
  • input_size – Size of vocabulary (number of unique characters)

  • hidden_size – Hidden layer dimension

  • output_size – Number of output classes

  • num_layers – Number of LSTM layers

forward(input: Tensor) Tensor[source]

Forward pass through the network.

Return type:

Tensor

Parameters:

input – Input tensor of character indices [batch_size, sequence_length]

Returns:

Log-softmax probabilities for each class [batch_size, num_classes]

Utilities

To process arguments from the command line.

parsernaam.utils.get_args(argv: list[str], description: str, epilog: str, default_out: str) Namespace[source]

Parse command line arguments for the parsernaam CLI tool.

Return type:

Namespace

Parameters:
  • argv – List of command line arguments

  • description – Description text for the argument parser

  • epilog – Example usage text shown after help

  • default_out – Default output filename

Returns:

Parsed command line arguments namespace

Example

>>> args = get_args(['input.csv', '-o', 'output.csv'],
...                 'Parse names', 'Example usage', 'out.csv')
>>> args.input
'input.csv'

Configuration

Configuration constants for parsernaam.

This module contains all the hardcoded constants used throughout the parsernaam package, including model parameters, file paths, and classification categories.

class parsernaam.config.ModelConfig[source]

Bases: object

Model configuration constants.

Contains all the hyperparameters and settings used by the LSTM models for name parsing, including architecture parameters and file locations.

HIDDEN_SIZE

Dimension of LSTM hidden layers

NUM_LAYERS

Number of LSTM layers in the model

SEQUENCE_LENGTH

Maximum length of input name sequences

CATEGORIES_SINGLE

Classification labels for single names

CATEGORIES_POSITIONAL

Classification labels for multi-word names

MODEL_FILES

Paths to model and vocabulary files

HIDDEN_SIZE: Final[int] = 256
NUM_LAYERS: Final[int] = 2
SEQUENCE_LENGTH: Final[int] = 30
CATEGORIES_SINGLE: Final[list[str]] = ['last', 'first']
CATEGORIES_POSITIONAL: Final[list[str]] = ['last_first', 'first_last']
MODEL_FILES: Final[dict[str, str]] = {'positional': 'models/parsernaam_pos.pt', 'single': 'models/parsernaam.pt', 'vocab': 'models/parsernaam.joblib'}

Package Information

ParserNaam is a package for parsing names.

class parsernaam.ParseNames[source]

Bases: Parsernaam

Main API class for parsing names using machine learning models.

This class provides the primary interface for name parsing functionality, extending the base Parsernaam class with predefined model file paths. Uses LSTM neural networks to classify names as first/last or determine positional ordering in multi-word names.

Example

>>> import pandas as pd
>>> from parsernaam.parse import ParseNames
>>> df = pd.DataFrame({'name': ['John Smith', 'Kim Yeon']})
>>> results = ParseNames.parse(df)
>>> print(results['parsed_name'][0])
{'name': 'John Smith', 'type': 'first_last', 'prob': 0.998}
MODEL_FN = 'models/parsernaam.pt'
MODEL_POS_FN = 'models/parsernaam_pos.pt'
VOCAB_FN = 'models/parsernaam.joblib'
classmethod parse(df: DataFrame) DataFrame[source]

Parse names

Return type:

DataFrame

Parameters:

df – DataFrame with names

Returns:

DataFrame with parsed names

Usage Examples

Basic parsing:

from parsernaam.parse import ParseNames
import pandas as pd

df = pd.DataFrame({'name': ['John Smith', 'Jane Doe']})
results = ParseNames.parse(df)

Model architecture:

from parsernaam.model import LSTM

# Model automatically loaded and cached
model = LSTM(input_size=100, hidden_size=128, output_size=2, num_layers=1)

Command line utilities:

from parsernaam.utils import get_args

args = get_args(['input.csv', '-o', 'output.csv', '-n', 'name'], 
                'Parse names', 'Example usage', 'out.csv')