10 min read
1 code sample

Implementing rule of origin checks in Python

The exact failure this page solves: a Python ETL batch has already classified and value-declared a bill of materials (BOM), and now must decide — deterministically, and with a defensible audit trail — whether each finished good qualifies for preferential origin under an agreement such as USMCA. A naive implementation computes regional value content (RVC) with float arithmetic, treats a missing tariff-shift flag as “pass,” and short-circuits de minimis before the primary rule is even evaluated. The result is a preferential duty claim that looks correct on the invoice but collapses under a CBP Focused Assessment. This page gives you a single runnable evaluation engine that treats origin determination as an explicit state machine, computes RVC with Decimal precision, enforces change-in-tariff-classification (CTC) and de minimis in the legally correct precedence, and emits an immutable per-decision audit record. It sits between classification and duty calculation in the Rule of Origin Logic Engines workflow, and its output is consumed directly by Duty Formula Calculation Frameworks.

Prerequisites

This solution assumes a specific upstream pipeline state and toolchain. Confirm each before applying it:

Python 3.10+, pandas>=2.0, and numpy>=1.24. The engine uses structural match-free logic but relies on 3.10 type-hint syntax and pandas 2.x nullable coercion behavior.
Classified, value-normalized input. Each BOM line has already passed classification and value declaration. Tariff codes must already be resolved against the HTS Schedule Database Design schema, with any unmapped or ambiguous codes diverted upstream through Fallback Routing for Unmapped Codes — the origin engine must never see a provisional or placeholder code.
Current rule set. The RVC threshold, de minimis percentage, and per-heading tariff-shift rules for the governing agreement must be synchronized from official annexes via the Tariff Update Ingestion Pipelines; a stale threshold silently changes eligibility.
Monetary values as strings or Decimal, never float. Transaction value (TV) and value of non-originating materials (VNM) must arrive as ISO 4217-denominated strings so the engine can construct exact Decimal values. A float upstream has already lost cents before you begin.
Isolated execution. The engine runs inside the read-only reference-data boundary described in Security Boundary & Data Isolation; it makes no network calls and masks supplier identifiers in its audit output.

Implementation

The engine below isolates regulatory logic from data ingestion. Origin determination is modeled as an explicit state machine: each BOM line enters INGESTED and terminates in exactly one of ORIGINATING, NOT_ORIGINATING, or INVALID_VALUE_DATA, with the transition recorded for audit. RVC uses the transaction-value method — RVC = ((TV − VNM) / TV) × 100 — computed in Decimal and quantized with ROUND_HALF_UP, the rounding convention CBP applies to percentage determinations. Precedence is enforced in the order the agreements require: a wholly-obtained material short-circuits before any calculation; missing value data quarantines before any pass; the CTC shift is a gate that must be satisfied before RVC can even redeem a non-originating line; and the RVC threshold is the final test.

import hashlib
import logging
from decimal import Decimal, ROUND_HALF_UP, InvalidOperation
from datetime import datetime, timezone
from enum import Enum
from typing import Any

import numpy as np
import pandas as pd

logger = logging.getLogger(__name__)


class OriginState(str, Enum):
    ORIGINATING = "ORIGINATING"
    NOT_ORIGINATING = "NOT_ORIGINATING"
    INVALID_VALUE_DATA = "INVALID_VALUE_DATA"


class RuleOfOriginEngine:
    def __init__(
        self,
        rvc_threshold: Decimal = Decimal("40.0"),
        de_minimis_threshold: Decimal = Decimal("10.0"),
        currency: str = "USD",  # ISO 4217
    ) -> None:
        self.rvc_threshold = rvc_threshold
        self.de_minimis = de_minimis_threshold
        self.currency = currency
        self.audit_log: list[dict[str, Any]] = []

    def _log_audit(self, sku: str, rule: str, state: OriginState, detail: str) -> None:
        # Supplier identifiers are masked; only the SKU and decision are retained.
        entry = {
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "sku": sku,
            "rule": rule,
            "state": state.value,
            "detail": detail,
            # Tamper-evident chain over the decision inputs, not the raw payload.
            "checksum": hashlib.sha256(
                f"{sku}|{rule}|{state.value}|{detail}".encode()
            ).hexdigest()[:16],
        }
        self.audit_log.append(entry)
        logger.info("ROO_AUDIT | sku=%s | rule=%s | state=%s", sku, rule, state.value)

    def _rvc(self, tv_raw: Any, vnm_raw: Any) -> Decimal | None:
        """Regional value content, transaction-value method, exact Decimal."""
        try:
            tv = Decimal(str(tv_raw))
            vnm = Decimal(str(vnm_raw))
        except (InvalidOperation, ValueError, TypeError):
            return None
        if tv <= 0 or vnm < 0 or vnm > tv:
            return None
        rvc = (tv - vnm) / tv * Decimal("100")
        # CBP rounds percentage determinations half-up to two places.
        return rvc.quantize(Decimal("0.01"), rounding=ROUND_HALF_UP)

    def evaluate(self, bom_df: pd.DataFrame) -> pd.DataFrame:
        """
        Evaluate each BOM line to a single terminal OriginState.

        Precedence (do not reorder — it encodes the legal rule hierarchy):
          1. Wholly obtained / originating material  -> ORIGINATING
          2. Missing or malformed TV/VNM             -> INVALID_VALUE_DATA (quarantine)
          3. Required CTC shift not satisfied        -> NOT_ORIGINATING
          4. RVC below threshold, outside de minimis -> NOT_ORIGINATING
          5. Otherwise                               -> ORIGINATING
        """
        required = {"sku", "transaction_value", "non_originating_value"}
        missing = required - set(bom_df.columns)
        if missing:
            raise ValueError(f"Missing required BOM columns: {sorted(missing)}")

        df = bom_df.copy()
        df["rvc_pct"] = pd.Series(np.nan, index=df.index, dtype="object")
        df["origin_state"] = OriginState.NOT_ORIGINATING.value
        df["failure_reason"] = ""

        for idx, row in df.iterrows():
            sku = str(row.get("sku", "UNKNOWN"))

            if bool(row.get("is_originating_material", False)):
                df.at[idx, "origin_state"] = OriginState.ORIGINATING.value
                self._log_audit(sku, "WHOLLY_OBTAINED", OriginState.ORIGINATING,
                                "Material flagged originating")
                continue

            rvc = self._rvc(row.get("transaction_value"),
                            row.get("non_originating_value"))
            if rvc is None:
                df.at[idx, "origin_state"] = OriginState.INVALID_VALUE_DATA.value
                df.at[idx, "failure_reason"] = "INVALID_VALUE_DATA"
                self._log_audit(sku, "RVC", OriginState.INVALID_VALUE_DATA,
                                "Missing or malformed TV/VNM")
                continue
            df.at[idx, "rvc_pct"] = rvc

            # Non-originating content within de minimis tolerance is disregarded.
            nom_pct = Decimal("100") - rvc
            within_de_minimis = nom_pct <= self.de_minimis

            if bool(row.get("ctc_required", True)) and not bool(
                row.get("ctc_shift_satisfied", False)
            ):
                if within_de_minimis:
                    df.at[idx, "origin_state"] = OriginState.ORIGINATING.value
                    self._log_audit(sku, "DE_MINIMIS", OriginState.ORIGINATING,
                                    f"NOM={nom_pct}% <= {self.de_minimis}%")
                    continue
                df.at[idx, "failure_reason"] = "CTC_SHIFT_NOT_SATISFIED"
                self._log_audit(sku, "CTC", OriginState.NOT_ORIGINATING,
                                "Required tariff shift not satisfied")
                continue

            if rvc >= self.rvc_threshold:
                df.at[idx, "origin_state"] = OriginState.ORIGINATING.value
                self._log_audit(sku, "RVC", OriginState.ORIGINATING,
                                f"RVC={rvc}% >= {self.rvc_threshold}%")
            else:
                df.at[idx, "failure_reason"] = f"RVC_INSUFFICIENT_{rvc}%"
                self._log_audit(sku, "RVC", OriginState.NOT_ORIGINATING,
                                f"RVC={rvc}% < {self.rvc_threshold}%")

        return df

The engine returns the input frame annotated with origin_state, rvc_pct, and failure_reason, plus a populated audit_log. Only rows in state ORIGINATING may carry a preferential-rate claim into the Duty Formula Calculation Frameworks; INVALID_VALUE_DATA rows are quarantined for broker review, and NOT_ORIGINATING rows fall through to the general (MFN) rate column.

Verification steps

Run these checks against a representative BOM before trusting the engine in production:

Threshold boundary parity. Inject synthetic lines whose RVC lands at exactly 40.00%, one cent above, and one cent below the threshold. The engine must return ORIGINATING at exactly the threshold and NOT_ORIGINATING at one cent below. Because RVC is computed in Decimal, this boundary is exact — a float implementation will occasionally flip it.
De minimis interaction. Feed a line that fails the CTC shift but whose non-originating content is 10.00% or less. Confirm it resolves to ORIGINATING via the DE_MINIMIS rule, and that raising non-originating content to 10.01% flips it to NOT_ORIGINATING. De minimis must never redeem a line whose value data is invalid.
CTC gate precedence. Provide a line with rvc_pct well above threshold but ctc_shift_satisfied=False and non-originating content above de minimis. It must be NOT_ORIGINATING with reason CTC_SHIFT_NOT_SATISFIED — a passing RVC must never override an unmet tariff shift.
Value-data quarantine. Pass rows with transaction_value of None, "", "0", and a vnm > tv. All must land in INVALID_VALUE_DATA, not NOT_ORIGINATING — a malformed value is an unknown, not a failure.
Audit-chain integrity. Export audit_log and confirm exactly one entry per input row, that each checksum recomputes from its fields, and that every SKU in the output frame has a matching log entry. A shortfall means an exception path swallowed a record.
Currency precision. Assert every rvc_pct is a Decimal quantized to two places and that no float appears anywhere in the value path. Round only at the RVC and final duty stages, per the customs authority’s rounding rule.

Edge cases & gotchas

float leaking through pandas. Reading a CSV without dtype mapping coerces transaction_value to float64, silently truncating cents before the engine sees it. Read monetary columns as str (dtype={"transaction_value": "string"}) and let _rvc build the Decimal. This is the single most common source of one-cent RVC drift.
VNM greater than TV. A supplier feed can report non-originating value exceeding transaction value (a units or currency-mixing error). _rvc rejects vnm > tv as INVALID_VALUE_DATA rather than emitting a negative RVC that would silently fail the threshold — a negative RVC looks like a legitimate non-qualifying result and hides the data bug.
De minimis is value-based here, not weight-based. Several agreements express de minimis by weight for specific chapters (notably textiles under USMCA). This engine models the value tolerance only; a chapter with a weight-based rule needs a separate branch keyed off the finished good’s HS chapter, or it will mis-qualify.
Missing ctc_required defaults to strict. The engine defaults ctc_required to True, so a BOM that omits the column is evaluated under the stricter rule. That is deliberately conservative, but confirm your rule loader sets ctc_required=False for lines governed by an RVC-only product-specific rule, or genuinely qualifying lines will be under-claimed.
iterrows at scale. Row-wise iteration is fine for a single BOM but degrades on multi-million-line batches. Group by finished good and evaluate per BOM, or move the vectorizable RVC computation ahead of the loop and reserve the loop for the state transitions. Never call .apply() with a Python function over the whole frame expecting a speedup — it is the same per-row cost.
Timezone-naive audit timestamps. datetime.now(timezone.utc) is intentional; a naive datetime.now() records host-local time and breaks point-in-time reconstruction when audit logs from multiple regions are merged.
Placeholder codes must never arrive. If a provisional 9999999999 or superseded code reaches the engine, it will be evaluated as if legitimate. Enforce the upstream guarantee that unmapped codes are held in Fallback Routing for Unmapped Codes before origin evaluation runs.

Up: Rule of Origin Logic Engines

Implementing rule of origin checks in Python

# Prerequisites

# Implementation

# Verification steps

# Edge cases & gotchas

# Related

Prerequisites

Implementation

Verification steps

Edge cases & gotchas

Related