Turning Legacy Data into Business Value

Written by Igor Pyl

In asset-intensive industries, digital transformation is not simply a question of installing new systems – it’s about unlocking and structuring the enormous volume of existing technical and operational information that organizations already have. Much of this data, however, remains trapped in unstructured and semi-structured formats: scanned PDF documents, technical drawings, vendor documentation, inspection reports, construction dossiers, and historical maintenance records. These sources often contain high-value data essential for asset hierarchy design, maintenance planning, inventory optimization, and long-term lifecycle cost control – yet they are inaccessible to traditional ERP or CMMS systems without systematic processing and transformation.

This is where KEEL SOLUTION delivers value through a hybrid methodology combining Optical Character Recognition (OCR), Machine Learning (ML), selective use of Large Language Models (LLMs), and deep domain expertise.

AI is not a silver bullet. It is a powerful enabler that, when paired with standards and expertise, enhances productivity, improves accuracy, and drives efficiency.

From PDFs to Intelligence: Unlocking Hidden Data with OCR, ML & LLM

The foundation of our approach begins with the digital capture of legacy documentation. This includes MRB (Manufacturer Record Books), TDS (Technical Data Sheets), VDB (Vendor Data Books), certificates of conformity, construction handover files, RBI (Risk-Based Inspection) outputs, inspection checklists, and maintenance logbooks. These files are often delivered as scanned PDFs with no metadata or indexing, making them invisible to transactional systems. Using OCR, we extract key technical elements such as:

  • Asset identification (tags, serial numbers, location codes)
  • Technical characteristics (dimensions, materials, voltage, pressure, power class, operating ranges)
  • Manufacturer data (name, part/model numbers, datasheet versions)
  • Equipment BoMs (bill of materials with part-level attributes)
  • OEM-specified preventive maintenance procedures
  • Historical consumption and repair records

This raw extraction is then passed through our cleansing and normalization engine, which performs multiple transformations. These include normalization of units (e.g., bar vs. MPa), resolution of duplicated or inconsistent values (e.g., ‘ABB’ vs. ‘Asea Brown Boveri’), semantic mapping to classification systems, detection of incomplete records, and intelligent suggestions for corrections based on data patterns. Unlike rule-only ETL tools, our approach incorporates a balanced mix of rules, ML models, selective LLM support for low-confidence cases, and human-in-the-loop feedback from engineers and ERP consultants to continuously improve accuracy.

Structuring, Classification, and Standards Alignment

Once the data is extracted and cleaned, the next critical step is structuring. We apply KEEL’s structured methodology to align data with industry-specific classification systems and technical standards, including:

  • ISO 14224: for failure codes, equipment taxonomy, and maintenance data for the petroleum and natural gas industries
  • ISO 81346 / RDS / RDS-PP: for the structuring of technical objects and functional locations
  • KKS (Kraftwerk-Kennzeichen-System): for power plant-specific object coding
  • SFI (Ship Function Information): for the maritime and offshore sectors
  • IEC/ISO 62424 and ISO 15926: for process industry data modelling

Our engineers map each asset or component to the appropriate hierarchical level, ensuring relationships between systems, sub-systems, equipment, and spare parts are preserved. We define functional locations, equipment records, and bill of materials in a format directly compatible with the target ERP/CMMS system, whether that’s SAP PM, IBM Maximo, IFS, or any modern EAM platform. Wherever applicable, we generate classification structures, populate class types, and assign characteristics for advanced filtering, data segmentation, and digital twin readiness.

We also work closely with reliability engineers and SMEs to validate structured outputs. This includes reviewing logic behind preventive maintenance plans, mapping RBI results to inspection strategies, and assigning MTBF/MTTR estimates where historical data is available. Where preventive maintenance content is derived from OEM documentation, we structure it into task lists with frequencies, durations, required materials/tools, and personnel qualifications, all of which can be uploaded directly into the maintenance system.

Verified and Repeatable Process Flow

KEEL’s approach is built around a repeatable and scalable process that has been tested across multiple industries. The typical phases include:

  1. Document Collection – Gather handover, operational, and maintenance documents; index and classify them
  2. OCR Extraction – Automatically extract structured content (tags, specs, parts, maintenance tasks)
  3. Cleansing & Normalization – Apply rules and ML; use LLM selectively for messy edge cases or suggestions
  4. Structuring & Classification – Apply industry taxonomies, build asset hierarchies, and validate functional logic
  5. ERP/CMMS Integration – Prepare upload templates, perform technical loads, and verify in QA systems
  6. Continuous Improvement – Analyze user feedback, track data usage, and enhance models for future uploads

This process supports both greenfield (CAPEX) projects, where asset data is first being introduced into the system, and brownfield (existing plant) scenarios, where legacy information needs to be cleaned, enriched, and re-integrated without disrupting operations.

Real-World Impact: 3TB Transformed into ERP-Ready Gold

In a recent engagement with a multinational industrial client, KEEL SOLUTION was tasked with processing over 3 terabytes of legacy construction and commissioning data, consisting of more than 500,000 documents. Using our methodology, we:

  • Extracted and structured data for 120,000 equipment assets, including rotating, static, and instrument categories
  • Identified and registered 170,000 unique spare parts, mapped to asset-specific BoMs
  • Created 40,000 preventive maintenance tasks, aligned with OEM recommendations, and localized work instructions
  • Delivered a fully populated SAP asset register and maintenance model, integrated with document links to MRBs, TDSs, and certificates
  • Enabled operational readiness from Day One of go-live with full traceability and cross-referenced metadata

The results included a 30% improvement in master data accuracy, 40% reduction in implementation time and cost, and a significant reduction in manual effort during ERP rollout.

Beyond the Upload: The Strategic Value of Structured Data

The benefit of structured asset data doesn’t end at go-live. Once legacy information is digitized and validated, it becomes a foundation for strategic decision-making and continuous improvement. Organizations gain:

  • Reliable inputs for predictive analytics and AI-driven decision support
  • Faster and more accurate spare parts procurement
  • Automated alerts for asset health and maintenance intervals
  • Consistent technical documentation and easier audit readiness
  • Full traceability across the asset lifecycle, from engineering to decommissioning

This aligns directly with the principles of ISO 55000, emphasizing whole-of-life asset value, risk-based decision making, and data-driven governance.

Expertise-Driven Digital Transformation by Keel Solution

Advanced tools like OCR, ML, and LLMs are enablers, not silver bullets.
What delivers long-term impact is how they are applied – with a deep understanding of engineering logic, asset management standards, and real-world maintenance processes. At KEEL SOLUTION, we don’t just extract data – we transform it into structure, strategy, and value.

Whether you are digitizing decades of engineering documents or preparing for a large-scale ERP rollout, our methodology ensures your data becomes an enabler, not a blocker of operational excellence.

Let’s talk about your legacy data challenge and turn it into a competitive advantage. Contact us.

Oil and Gas Asset Data Management

Harness the technology advancements and incorporate the Industry 4.0 into your Oil&Gas business

More
Windpower Competency Center

Opening the door of Industry 4.0 possibilities

More

We are ready to help!

Request consultation, ask a question or share your feedback. Just get in touch!

The Data Keel of Applied AI
Written by Christian D. Grahn Just as a ship’s keel ensures balance and direction, standards...
Condition-Based Maintenance (CBM) in ERP
Written by Vitalii Yeliashevskyi Shifting from preventive maintenance to condition-based maintenance (CBM) offers many benefits...
Turning Legacy Data into Business Value
Written by Igor Pyl In asset-intensive industries, digital transformation is not simply a question of...
SAP Classification: Structure & Enrich Master Data for...
In today’s digital landscape, structured and reliable technical information isn’t just helpful, it’s a strategic...