Versioning and Upgrades: Design Philosophy¶

The Problem We Solve¶

Data models and system schemas evolve. A parser built for v1.0 data won’t work with v2.0. Traditional approaches to this problem fall into two categories:

Monolithic Migrations: One big transformation function that knows about all versions. Problem: unmaintainable, hard to test, risky to deploy.
Database Migrations: Complex tools like Alembic. Problem: overkill for static data files, hard to understand data transformations.

r2x-core uses staged upgrades: incremental transformations that compose together, each targeting a specific version range. This approach balances simplicity, testability, and maintainability.

Staged Upgrades vs Monolithic Upgrades¶

Monolithic Approach (Fragile)¶

def upgrade_all(data, current_version: str):
    if current_version == "1.0.0":
        # Upgrade 1.0.0 → 1.1.0
        data = rename_columns_v1_1(data)
        # Upgrade 1.1.0 → 1.5.0
        data = add_field_v1_5(data)
        # ... 20 more transformations ...
    elif current_version == "1.5.0":
        # Duplicate logic for partial upgrades
        data = add_field_v1_5(data)
        # ... more duplicated code ...
    return data

Problems:

Logic duplication across version branches
Hard to test individual transformations
Single point of failure: if one transformation breaks, entire upgrade fails
Difficult to debug: which transformation failed?
Risky to modify: changing one step might affect others unexpectedly

Staged Approach (Maintainable)¶

upgrades = [
    UpgradeStep(
        name="rename_columns",
        func=rename_columns_v1_1,
        target_version="1.1.0",
        priority=10
    ),
    UpgradeStep(
        name="add_field",
        func=add_field_v1_5,
        target_version="1.5.0",
        priority=20
    ),
    # ... more discrete steps ...
]

# Execute in order
for step in upgrades:
    result = run_upgrade_step(step, data, current_version=current_ver, strategy=strategy)
    if is_err(result):
        print(f"Failed at {step.name}")
        break
    data = result.ok()

Benefits:

Each step is independent and testable
Easy to identify exactly which step failed
New versions add new steps without modifying old ones
Individual transformations are simple and understandable
Easier to rollback or skip problematic versions

Version Comparison Strategies¶

Different projects need different versioning schemes. r2x-core provides a plugin architecture:

Semantic Versioning (Most Common)¶

strategy = SemanticVersioningStrategy()
assert strategy.compare_versions("1.0.0", target="2.0.0") == -1  # 1.0.0 < 2.0.0
assert strategy.compare_versions("2.0.0", target="2.0.0") == 0   # equal
assert strategy.compare_versions("3.0.0", target="2.0.0") == 1   # 3.0.0 > 2.0.0

Semantically versioned data follows major.minor.patch convention. Upgrades typically apply to a range (e.g., versions 1.5.0 to 1.9.9 need this upgrade to reach 2.0.0).

Git-Based Versioning¶

strategy = GitVersioningStrategy(repo_path=Path("."))
# Compares by commit order in git history

Useful for projects where commits are the source of truth. Typical in continuous integration/deployment scenarios.

Custom Strategies¶

class CustomVersionStrategy:
    def compare_versions(self, current: str, *, target: str) -> int:
        # Your comparison logic
        pass

Enables domain-specific versioning (e.g., “v2024-Q1” style versions).

Version Ranges and Upgrade Applicability¶

An upgrade step specifies when it applies:

upgrade = UpgradeStep(
    name="fix_bug_123",
    func=fix_bug_123,
    target_version="1.5.3",
    min_version="1.5.0",    # Don't apply if older than this
    max_version="1.5.2"     # Don't apply if newer than this
)

The logic: “If current version is in [min_version, max_version], and it’s < target_version, run this upgrade.”

This allows:

Targeted Fixes: Apply patches only to affected versions
Conditional Upgrades: Skip upgrades not needed for a particular version
Multi-Path Upgrades: Different transformations for different starting versions

FILE vs SYSTEM Upgrades¶

FILE Upgrades (Most Common)¶

Transform raw data files before parsing. Typical operations:

Rename columns in CSV
Add missing columns with defaults
Reformat dates
Restructure JSON

upgrade = UpgradeStep(
    name="rename_csv_columns",
    func=lambda data: data,  # Modify file representation
    target_version="1.1.0",
    upgrade_type=UpgradeType.FILE  # Raw data file operations
)

Applied in parser.parse() workflow before DataStore/System initialization.

SYSTEM Upgrades (Less Common)¶

Modify System objects loaded from cache. Typical operations:

Update component attributes
Reorganize component hierarchy
Recalculate derived values

upgrade = UpgradeStep(
    name="fix_component_types",
    func=lambda system: system,  # Modify system object
    target_version="2.0.0",
    upgrade_type=UpgradeType.SYSTEM  # System object operations
)

Applied when loading cached systems via System.from_json(upgrader=...).

Priority and Execution Order¶

Upgrades execute in priority order (lower numbers first). This enables dependencies:

# Priority 10: Must run first (prepares data)
step1 = UpgradeStep(name="prep", priority=10, ...)

# Priority 20: Depends on step1
step2 = UpgradeStep(name="transform", priority=20, ...)

# Priority 30: Depends on step2
step3 = UpgradeStep(name="validate", priority=30, ...)

Design Trade-offs¶

Why Not Just Update All Data Upfront?¶

You might wonder: why not upgrade all data once when the new version is released, rather than on-demand?

Reasons:

Optionality: Some users might not upgrade immediately. Supporting multiple versions reduces friction.
Testing: Upgrades can be tested on real data in production, then rolled back if issues arise.
Gradual Migration: Large datasets can be migrated incrementally rather than all-at-once.
Reversibility: You can theoretically downgrade by running inverse transformations.

Why Priority Instead of Dependency Graphs?¶

Dependency graphs are more flexible but harder to understand and debug. Priority ordering is:

Simple to reason about
Easy to visualize (topological sort)
Sufficient for most upgrade chains
Easier for users to configure

Real-World Example¶

Suppose you’re transitioning from per-unit to SI units:

upgrades = [
    # Version 1.5.0: Prepare for transition
    UpgradeStep(
        name="add_unit_field",
        func=add_unit_field_to_csv,
        target_version="1.5.0",
        min_version="1.0.0",
        priority=10
    ),
    # Version 1.8.0: Dual-mode reading (both per-unit and SI)
    UpgradeStep(
        name="enable_unit_detection",
        func=enable_unit_detection,
        target_version="1.8.0",
        min_version="1.5.0",
        priority=20
    ),
    # Version 2.0.0: Require SI units
    UpgradeStep(
        name="convert_to_si",
        func=convert_to_si_units,
        target_version="2.0.0",
        min_version="1.8.0",
        priority=30
    ),
]

A v1.0 user wanting to upgrade to v2.0 would run all three steps in order. A v1.5 user would run the last two. A v1.8 user would run only the last one.