API Reference¶
This section provides detailed documentation of the parsernaam API.
Core Classes¶
- class parsernaam.parse.ParseNames[source]¶
Bases:
ParsernaamMain API class for parsing names using machine learning models.
This class provides the primary interface for name parsing functionality, extending the base Parsernaam class with predefined model file paths. Uses LSTM neural networks to classify names as first/last or determine positional ordering in multi-word names.
Example
>>> import pandas as pd >>> from parsernaam.parse import ParseNames >>> df = pd.DataFrame({'name': ['John Smith', 'Kim Yeon']}) >>> results = ParseNames.parse(df) >>> print(results['parsed_name'][0]) {'name': 'John Smith', 'type': 'first_last', 'prob': 0.998}
- MODEL_FN = 'models/parsernaam.pt'¶
- MODEL_POS_FN = 'models/parsernaam_pos.pt'¶
- VOCAB_FN = 'models/parsernaam.joblib'¶
- parsernaam.parse.parse_names(df: DataFrame) DataFrame¶
Parse names
- Return type:
DataFrame- Parameters:
df – DataFrame with names
- Returns:
DataFrame with parsed names
- parsernaam.parse.main() int | None[source]¶
Main method to parse names
- Return type:
int|None- Returns:
Exit code (None for success)
- class parsernaam.naam.Parsernaam[source]¶
Bases:
objectParse names
- classmethod parse(df: DataFrame, model_fn: str, model_fn_pos: str, vocab_fn: str) DataFrame[source]¶
Parse names using ML models
- Return type:
DataFrame- Parameters:
df – DataFrame with ‘name’ column containing names to parse
model_fn – Path to single name model file
model_fn_pos – Path to positional name model file
vocab_fn – Path to vocabulary file
- Returns:
DataFrame with added ‘parsed_name’ column
- Raises:
ValueError – If required ‘name’ column is missing
FileNotFoundError – If model files cannot be found
Model Architecture¶
- class parsernaam.model.LSTM(input_size: int, hidden_size: int, output_size: int, num_layers: int = 1)[source]¶
Bases:
ModuleLSTM neural network for name classification.
A multi-layer LSTM network with embedding layer for character-level name classification. Supports both single name classification (first/last) and positional classification (first_last/last_first).
Utilities¶
To process arguments from the command line.
- parsernaam.utils.get_args(argv: list[str], description: str, epilog: str, default_out: str) Namespace[source]¶
Parse command line arguments for the parsernaam CLI tool.
- Return type:
Namespace- Parameters:
argv – List of command line arguments
description – Description text for the argument parser
epilog – Example usage text shown after help
default_out – Default output filename
- Returns:
Parsed command line arguments namespace
Example
>>> args = get_args(['input.csv', '-o', 'output.csv'], ... 'Parse names', 'Example usage', 'out.csv') >>> args.input 'input.csv'
Configuration¶
Configuration constants for parsernaam.
This module contains all the hardcoded constants used throughout the parsernaam package, including model parameters, file paths, and classification categories.
- class parsernaam.config.ModelConfig[source]
Bases:
objectModel configuration constants.
Contains all the hyperparameters and settings used by the LSTM models for name parsing, including architecture parameters and file locations.
- HIDDEN_SIZE
Dimension of LSTM hidden layers
- NUM_LAYERS
Number of LSTM layers in the model
- SEQUENCE_LENGTH
Maximum length of input name sequences
- CATEGORIES_SINGLE
Classification labels for single names
- CATEGORIES_POSITIONAL
Classification labels for multi-word names
- MODEL_FILES
Paths to model and vocabulary files
-
HIDDEN_SIZE:
Final[int] = 256
-
NUM_LAYERS:
Final[int] = 2
-
SEQUENCE_LENGTH:
Final[int] = 30
-
CATEGORIES_SINGLE:
Final[list[str]] = ['last', 'first']
-
CATEGORIES_POSITIONAL:
Final[list[str]] = ['last_first', 'first_last']
-
MODEL_FILES:
Final[dict[str,str]] = {'positional': 'models/parsernaam_pos.pt', 'single': 'models/parsernaam.pt', 'vocab': 'models/parsernaam.joblib'}
Package Information¶
ParserNaam is a package for parsing names.
- class parsernaam.ParseNames[source]¶
Bases:
ParsernaamMain API class for parsing names using machine learning models.
This class provides the primary interface for name parsing functionality, extending the base Parsernaam class with predefined model file paths. Uses LSTM neural networks to classify names as first/last or determine positional ordering in multi-word names.
Example
>>> import pandas as pd >>> from parsernaam.parse import ParseNames >>> df = pd.DataFrame({'name': ['John Smith', 'Kim Yeon']}) >>> results = ParseNames.parse(df) >>> print(results['parsed_name'][0]) {'name': 'John Smith', 'type': 'first_last', 'prob': 0.998}
- MODEL_FN = 'models/parsernaam.pt'¶
- MODEL_POS_FN = 'models/parsernaam_pos.pt'¶
- VOCAB_FN = 'models/parsernaam.joblib'¶
Usage Examples¶
Basic parsing:
from parsernaam.parse import ParseNames
import pandas as pd
df = pd.DataFrame({'name': ['John Smith', 'Jane Doe']})
results = ParseNames.parse(df)
Model architecture:
from parsernaam.model import LSTM
# Model automatically loaded and cached
model = LSTM(input_size=100, hidden_size=128, output_size=2, num_layers=1)
Command line utilities:
from parsernaam.utils import get_args
args = get_args(['input.csv', '-o', 'output.csv', '-n', 'name'],
'Parse names', 'Example usage', 'out.csv')