File Types API ReferenceΒΆ
This document describes the file type system in r2x-core, which is used to validate and handle different data file formats.
OverviewΒΆ
The FileType class hierarchy provides a type-safe way to work with different file formats. Each file type knows whether it can support time series data through the supports_timeseries attribute.
Base ClassΒΆ
FileTypeΒΆ
@dataclass(slots=True)
class FileType:
"""Base class for file data types."""
supports_timeseries: bool = False
model_config = ConfigDict(arbitrary_types_allowed=True)
Attributes:
supports_timeseries(bool): Whether this file type can store time series data. Default isFalse.
File Type ClassesΒΆ
TableFileΒΆ
Represents tabular data files (CSV, TSV).
class TableFile(FileType):
"""Data model for tabular data (CSV, TSV, etc.)."""
supports_timeseries: bool = True
Supported Extensions:
.csv- Comma-separated values.tsv- Tab-separated values
Time Series Support: β Yes
Use Cases:
Component definitions (generators, buses, lines)
Time series profiles (hourly generation, load)
Human-readable data interchange
Example:
from pathlib import Path
from r2x_core import DataFile, FileInfo
# Component data
components = DataFile(
name="generators",
fpath=Path("data/generators.csv"),
)
assert isinstance(components.file_type, TableFile)
# Time series data
profiles = DataFile(
name="profiles",
fpath=Path("data/profiles.csv"),
info=FileInfo(is_timeseries=True),
)
assert isinstance(profiles.file_type, TableFile)
H5FileΒΆ
Represents HDF5 (Hierarchical Data Format) files.
class H5File(FileType):
"""Data model for HDF5 data."""
supports_timeseries: bool = True
Supported Extensions:
.h5- HDF5 format.hdf5- HDF5 format (alternate extension)
Time Series Support: β Yes
Use Cases:
Large time series datasets
Multi-year profiles in hierarchical structure
High-performance data storage
Complex nested data structures
Example:
# Multi-year time series in HDF5
profiles = DataFile(
name="generation_profiles",
fpath=Path("data/profiles_2020_2050.h5"),
info=FileInfo(is_timeseries=True),
)
assert isinstance(profiles.file_type, H5File)
assert profiles.file_type.supports_timeseries
ParquetFileΒΆ
Represents Apache Parquet columnar storage files.
class ParquetFile(FileType):
"""Data model for Parquet data."""
supports_timeseries: bool = True
Supported Extensions:
.parquet- Apache Parquet format
Time Series Support: β Yes
Use Cases:
Large time series with excellent compression
Wide tables with many columns
Data interchange with analytics tools
Efficient columnar queries
Example:
# Load profiles in Parquet format
load_data = DataFile(
name="load_profiles",
fpath=Path("data/load.parquet"),
info=FileInfo(is_timeseries=True),
)
assert isinstance(load_data.file_type, ParquetFile)
JSONFileΒΆ
Represents JSON (JavaScript Object Notation) files.
class JSONFile(FileType):
"""Data model for JSON data."""
supports_timeseries: bool = False
Supported Extensions:
.json- JSON format
Time Series Support: β No
Use Cases:
Component definitions
Configuration files
Metadata
Hierarchical component relationships
Example:
# Component metadata in JSON
metadata = DataFile(
name="model_metadata",
fpath=Path("data/metadata.json"),
info=FileInfo(is_timeseries=False), # Default
)
assert isinstance(metadata.file_type, JSONFile)
assert not metadata.file_type.supports_timeseries
# This would raise ValueError
# bad = DataFile(
# fpath=Path("data/profiles.json"),
# info=FileInfo(is_timeseries=True), # ERROR! JSON doesn't support time series
# )
XMLFileΒΆ
Represents XML (eXtensible Markup Language) files.
class XMLFile(FileType):
"""Data model for XML data."""
supports_timeseries: bool = False
Supported Extensions:
.xml- XML format
Time Series Support: β No
Use Cases:
Legacy model formats
Hierarchical component definitions
Configuration with complex nesting
Example:
# Component definitions in XML
components = DataFile(
name="network",
fpath=Path("data/network.xml"),
)
assert isinstance(components.file_type, XMLFile)
assert not components.file_type.supports_timeseries
Extension MappingΒΆ
The EXTENSION_MAPPING dictionary maps file extensions to their corresponding FileType classes:
EXTENSION_MAPPING: dict[str, type[FileType]] = {
".csv": TableFile,
".tsv": TableFile,
".h5": H5File,
".hdf5": H5File,
".parquet": ParquetFile,
".json": JSONFile,
".xml": XMLFile,
}
This mapping is used internally by DataFile.file_type to determine the file type from the file extension.
Type AliasΒΆ
TableDataFileTypeΒΆ
A type alias for file types that represent tabular data:
TableDataFileType: TypeAlias = TableFile | H5File
Usage:
from r2x_core.file_types import TableDataFileType
def process_table_data(file_type: TableDataFileType) -> None:
"""Process tabular data files."""
match file_type:
case TableFile():
# Handle CSV/TSV
...
case H5File():
# Handle HDF5
...
ValidationΒΆ
File types are validated automatically when accessing DataFile.file_type:
from pathlib import Path
from r2x_core import DataFile, FileInfo
# Valid: CSV supports time series
valid = DataFile(
name="profiles",
fpath=Path("data/profiles.csv"),
info=FileInfo(is_timeseries=True),
)
print(valid.file_type) # TableFile()
# Invalid: Unknown extension
try:
invalid_ext = DataFile(
name="data",
fpath=Path("data/file.xyz"),
)
_ = invalid_ext.file_type # Raises ValueError
except ValueError as e:
print(e) # "Unsupported file extension: .xyz"
# Invalid: JSON doesn't support time series
try:
invalid_ts = DataFile(
name="profiles",
fpath=Path("data/profiles.json"),
info=FileInfo(is_timeseries=True),
)
_ = invalid_ts.file_type # Raises ValueError
except ValueError as e:
print(e) # "File type JSONFile does not support time series data..."
Adding New File TypesΒΆ
To add support for a new file format:
Create a new FileType subclass:
@dataclass(slots=True)
class NetCDFFile(FileType):
"""Data model for NetCDF data."""
supports_timeseries: bool = True # If it supports time series
Add to EXTENSION_MAPPING:
EXTENSION_MAPPING: dict[str, type[FileType]] = {
# ... existing mappings ...
".nc": NetCDFFile,
".netcdf": NetCDFFile,
}
Update TableDataFileType if needed:
# If the new type represents tabular data
TableDataFileType: TypeAlias = TableFile | H5File | NetCDFFile
Thatβs it! The validation and type checking will work automatically.
Best PracticesΒΆ
Set supports_timeseries correctly: This determines what kinds of data can be stored in this format.
Use type hints: When writing functions that work with specific file types, use type hints for better IDE support:
def process_csv(file_type: TableFile) -> None: ...
Pattern matching: Use structural pattern matching to handle different file types:
match datafile.file_type: case TableFile(): ... case H5File(): ... case ParquetFile(): ...
Check supports_timeseries: Before processing time series, verify the file type supports it:
if datafile.info and datafile.info.is_timeseries: assert datafile.file_type.supports_timeseries # Safe to process as time series
See AlsoΒΆ
DataFile Reference - Complete DataFile API
Attaching Time Series - Time series guide
Parser Basics - Using file types in parsers