CSV Database Repository¶
csv_database_repository ¶
CSV Database Repository - Base Implementation.
Provides base class for CSV-based database repositories with lazy loading, caching, schema validation, and merge operations.
Classes:
| Name | Description |
|---|---|
CSVDatabaseRepository | Base class for CSV database operations with caching and validation |
Classes¶
CSVDatabaseRepository ¶
CSVDatabaseRepository(filepath: Path, encoding: str = 'utf-8', separator: str = ';', required_columns: Optional[list[str]] = None)
Base implementation for CSV-based database repositories.
Provides common functionality for loading, caching, validating, and merging CSV databases. Specific database repositories inherit from this class.
Attributes:
| Name | Type | Description |
|---|---|---|
filepath | Path | Path to CSV database file |
encoding | str | File encoding (default: 'utf-8') |
separator | str | CSV separator (default: ';') |
required_columns | list[str] | List of required column names for validation |
_data | Optional[DataFrame] | Cached database data (lazy loaded) |
Methods:
| Name | Description |
|---|---|
load_data | Load CSV database with caching |
reload_data | Force reload database from file |
merge_with_dataset | Merge dataset with database |
get_column_names | Get column names from database |
validate_schema | Validate database schema |
get_stats | Get database statistics |
Notes
- Implements lazy loading with caching for performance
- Optimizes dtypes to reduce memory usage
Initialize CSV database repository.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath | Path | Path to CSV file. | required |
encoding | str | File encoding. | 'utf-8' |
separator | str | CSV separator. | ';' |
required_columns | Optional[list[str]] | List of required column names for validation. | None |
Source code in src/infrastructure/persistence/csv_database_repository.py
Functions¶
load_data ¶
Load CSV database into DataFrame with caching.
Returns:
| Type | Description |
|---|---|
DataFrame | Database data with optimized dtypes |
Raises:
| Type | Description |
|---|---|
FileNotFoundError | If CSV file doesn't exist |
ValueError | If CSV format is invalid or required columns missing |
Source code in src/infrastructure/persistence/csv_database_repository.py
reload_data ¶
Force reload database from file.
Clears cache and reloads data from CSV file.
Returns:
| Type | Description |
|---|---|
DataFrame | Freshly loaded database data |
Source code in src/infrastructure/persistence/csv_database_repository.py
merge_with_dataset ¶
Merge dataset with database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_df | DataFrame | Input dataset (must have join column) | required |
on | str | Column name to join on | 'ko' |
how | str | Join type ('inner', 'left', 'right', 'outer') | 'inner' |
Returns:
| Type | Description |
|---|---|
DataFrame | Merged DataFrame |
Raises:
| Type | Description |
|---|---|
ValueError | If join column missing in either DataFrame |
Source code in src/infrastructure/persistence/csv_database_repository.py
get_column_names ¶
Get column names from database.
Returns:
| Type | Description |
|---|---|
list[str] | List of column names |
validate_schema ¶
Validate database schema.
Checks if all required columns are present in DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df | Optional[DataFrame] | DataFrame to validate (if None, loads from file) | None |
Returns:
| Type | Description |
|---|---|
bool | True if all required columns present, False otherwise |
Source code in src/infrastructure/persistence/csv_database_repository.py
get_stats ¶
Get database statistics.
Returns:
| Type | Description |
|---|---|
dict | Dictionary containing: - 'rows': Number of rows - 'columns': Number of columns - 'memory_mb': Memory usage in MB - 'column_names': List of column names - 'dtypes': Dictionary of column datatypes |