Turning Legacy Data into Business Value
11 September, 2025
Written by Igor Pyl
In asset-intensive industries, digital transformation is not simply a question of installing new systems – it’s about unlocking and structuring the enormous volume of existing technical and operational information that organizations already have. Much of this data, however, remains trapped in unstructured and semi-structured formats: scanned PDF documents, technical drawings, vendor documentation, inspection reports, construction dossiers, and historical maintenance records. These sources often contain high-value data essential for asset hierarchy design, maintenance planning, inventory optimization, and long-term lifecycle cost control – yet they are inaccessible to traditional ERP or CMMS systems without systematic processing and transformation. This is where KEEL SOLUTION delivers value through a hybrid methodology combining Optical Character Recognition (OCR), Machine Learning (ML), selective use of Large Language Models (LLMs), and deep domain expertise.

«AI is not a silver bullet. It is a powerful enabler that, when paired with standards and expertise, enhances productivity, improves accuracy, and drives efficiency.»
From PDFs to Intelligence: Unlocking Hidden Data with OCR, ML & LLM
The foundation of our approach begins with the digital capture of legacy documentation. This includes MRB (Manufacturer Record Books), TDS (Technical Data Sheets), VDB (Vendor Data Books), certificates of conformity, construction handover files, RBI (Risk-Based Inspection) outputs, inspection checklists, and maintenance logbooks. These files are often delivered as scanned PDFs with no metadata or indexing, making them invisible to transactional systems. Using OCR, we extract key technical elements such as:
- Asset identification (tags, serial numbers, location codes)
- Technical characteristics (dimensions, materials, voltage, pressure, power class, operating ranges)
- Manufacturer data (name, part/model numbers, datasheet versions)
- Equipment BoMs (bill of materials with part-level attributes)
- OEM-specified preventive maintenance procedures
- Historical consumption and repair records
This raw extraction is then passed through our cleansing and normalization engine, which performs multiple transformations. These include normalization of units (e.g., bar vs. MPa), resolution of duplicated or inconsistent values (e.g., ‘ABB’ vs. ‘Asea Brown Boveri’), semantic mapping to classification systems, detection of incomplete records, and intelligent suggestions for corrections based on data patterns. Unlike rule-only ETL tools, our approach incorporates a balanced mix of rules, ML models, selective LLM support for low-confidence cases, and human-in-the-loop feedback from engineers and ERP consultants to continuously improve accuracy.

Structuring, Classification, and Standards Alignment
Once the data is extracted and cleaned, the next critical step is structuring. We apply KEEL’s structured methodology to align data with industry-specific classification systems and technical standards, including:
- ISO 14224: for failure codes, equipment taxonomy, and maintenance data for the petroleum and natural gas industries
- ISO 81346 / RDS / RDS-PP: for structuring of technical objects and functional locations
- KKS (Kraftwerk-Kennzeichen-System): for power plant-specific object coding
- SFI (Ship Function Information): for the maritime and offshore sectors
- IEC/ISO 62424 and ISO 15926: for process industry data modelling
Our engineers map each asset or component to the appropriate hierarchical level, ensuring relationships between systems, sub-systems, equipment, and spare parts are preserved. We define functional locations, equipment records, and bill of materials in a format directly compatible with the target ERP/CMMS system — whether that’s SAP PM, IBM Maximo, IFS, or any modern EAM platform. Wherever applicable, we generate classification structures, populate class types, and assign characteristics for advanced filtering, data segmentation, and digital twin readiness.
We also work closely with reliability engineers and SMEs to validate structured outputs. This includes reviewing logic behind preventive maintenance plans, mapping RBI results to inspection strategies, and assigning MTBF/MTTR estimates where historical data is available. Where preventive maintenance content is derived from OEM documentation, we structure it into task lists with frequencies, durations, required materials/tools, and personnel qualifications — all of which can be uploaded directly into SAP task lists or maintenance plans.
Verified and Repeatable Process Flow
KEEL’s approach is built around a repeatable and scalable process that has been tested across multiple industries. The typical phases include:
- Document Collection – Gather handover, operational, and maintenance documents; index and classify them
- OCR Extraction – Automatically extract structured content (tags, specs, parts, maintenance tasks)
- Cleansing & Normalization – Apply rules and ML; use LLM selectively for messy edge cases or suggestions
- Structuring & Classification – Apply industry taxonomies, build asset hierarchies, validate functional logic
- ERP/CMMS Integration – Prepare upload templates, perform technical loads, verify in QA systems
- Continuous Improvement – Analyze user feedback, track data usage, enhance models for future uploads

This process supports both greenfield (CAPEX) projects where asset data is first being introduced into the system, and brownfield (existing plant) scenarios where legacy information needs to be cleaned, enriched, and re-integrated without disrupting operations.
Real-World Impact: 3TB Transformed into ERP-Ready Gold
In a recent engagement with a multinational industrial client, KEEL SOLUTION was tasked with processing over 3 terabytes of legacy construction and commissioning data, consisting of more than 500,000 documents. Using our methodology, we:
- Extracted and structured data for 120,000 equipment assets, including rotating, static, and instrument categories
- Identified and registered 170,000 unique spare parts, mapped to asset-specific BoMs
- Created 40,000 preventive maintenance tasks, aligned with OEM recommendations and localized work instructions
- Delivered a fully populated SAP asset register and maintenance model, integrated with document links to MRBs, TDSs, and certificates
- Enabled operational readiness from Day One of go-live — with full traceability and cross-referenced metadata
The results included a 30% improvement in master data accuracy, 40% reduction in implementation time and cost, and a significant reduction in manual effort during ERP rollout.
Beyond the Upload: The Strategic Value of Structured Data
The benefit of structured asset data doesn’t end at go-live. Once legacy information is digitized and validated, it becomes a foundation for strategic decision-making and continuous improvement. Organizations gain:
- Reliable inputs for predictive analytics and AI-driven decision support
- Faster and more accurate spare parts procurement
- Automated alerts for asset health and maintenance intervals
- Consistent technical documentation and easier audit readiness
- Full traceability across the asset lifecycle, from engineering to decommissioning
This aligns directly with the principles of ISO 55000 — emphasizing whole-of-life asset value, risk-based decision making, and data-driven governance.
Expertise-Driven Transformation by Keel Solution
Advanced tools like OCR, ML, and LLMs are enablers, not silver bullets.
What delivers long-term impact is how they are applied — with a deep understanding of engineering logic, asset management standards, and real-world maintenance processes. At KEEL SOLUTION, we don’t just extract data — we transform it into structure, strategy, and value.
Whether you are digitizing decades of engineering documents or preparing for a large-scale ERP rollout, our methodology ensures your data becomes an enabler — not a blocker — of operational excellence.
Let’s talk about your legacy data challenge and turn it into a competitive advantage. Contact us.
- Turning Legacy Data into Business Value
- SAP Classification: Structure & Enrich Master Data for Business Results
- Maximize Efficiency with our ISO 14224 services
- Unlocking the Power of Data Governance: Overcoming Challenges and Maximizing Potential
- Unleashing Efficiency: A Cutting-Edge Approach to Offshore Drilling Rig Maintenance

Oil and Gas Asset Data Management
Harness the technology advancements and incorporate the Industry 4.0 into your Oil&Gas business
More
Windpower Competency Center
Opening the door of Industry 4.0 possibilities
MoreWe are ready to help!
Request consultation, ask a question or share your feedback. Just get in touch!