Solution 23 of 25PWS · Data Review/ValidationFO · Accounting

Legacy System Data Migration and Data Quality

RD's legacy mainframe systems (PLAS, DLS, CLSS, GLS, LoanServ, AMAS, ECMS) carry decades of data with inconsistent field definitions, partial documentation, and aging extraction interfaces, and any cohort, modification, or audit work depends on extracts that must be rationalized before they are usable.

Why It Matters

Bad data is the upstream cause of most cohort-model staleness, ULO certification gaps, reconciliation breaks, and audit findings. Investing in legacy data quality compounds across every other workstream.

HSG's Approach

1Stand up a data-quality program with profiling, definition-mapping, lineage documentation, and cleansing rules across each legacy system extract.
2Build a canonical data model with field definitions tied to USSGL, OMB DAIMS, and FCRA cohort conventions.
3Use Python / dbt / DuckDB / SQL Server to build an automated data-quality pipeline that surfaces and tracks data-quality exceptions over time.
4Produce a data-quality scorecard per legacy system with monthly trending.
5Document everything in a data dictionary suitable for the upcoming USDA Loan Modernization migration.

Expected Deliverables

Data-quality profiling report per legacy system
Canonical data model and data dictionary
Automated data-quality pipeline
Monthly data-quality scorecard
Cleansing-rule library with lineage documentation

Expected Outcome

Reduce data-quality exception count by 50% within 12 months and produce a migration-ready data dictionary in advance of the loan-modernization cutover.

PreviousAI / Automation Workflow Stack (OMB M-25-21 / M-25-22 Compliance)NextExternal Inquiry Response (Congress, GAO, OMB, OIG)