157x Filetype PDF File size 1.17 MB Source: www.indiumsoftware.com
SUCCESS STORY Nested Tables & Machine Drawing Text Extraction For An Oil & Gas Company DOMAIN TECHNOLOGIES Oil & Gas Industry The solution was built leveraging Python and several of its libraries. KEY HIGHLIGHTS OCR: Tesseract, Tesserocr, OCRmyPDF, PyTesseract 4x faster automated text Preprocessing and Post Processing Tools: extraction using teX.ai. xPDF, Poppler, OpenCV, Pandas, Json The need for human intervention was reduced by over 80%. Table Detection and Extraction: The quality of their process had Camelot, OpenCV, LSD (line segment detection), increased by over 75%. csv, TensorFlow, FCN (Fully Convolutional Networks), CNN (Convolutional Neural Networks) Application Deployment: Flask, Docker Nested Tables & Machine Drawingtext Extraction For An Oil & Gas Company Well Schematics CUSTOMER BACKGROUND Identify and extract the nested tables as The Client is one of the pioneers in the oil and gas separate entities. These documents had a business, with a focus on innovation to find ways to combination of nested tables with complex help their customers to fuel progress in agriculture, drilling equipment’s drawing. industry, medicine, science, space, technology, and APPROACH & IMPLEMENTATION transportation. The combination of engineering disciplines, computer science, geophysics, and metallurgy help create a winning formula for all teX.ai was leveraged to process text for all the 3 use stakeholders in such projects. cases BUSINESS REQUIREMENTS Quality File Validation The Analysis table which contained the Given the document intensive nature of business, chemical composition details was identified in the client generally had to deal with numerous PDF the document and extracted using OCR. documents dealing with complex drilling machine The time taken to extract is just a few seconds parts diagrams and data in nested tables and and accuracy more than 85%. various other formats. Their requirement was to Public Files (Surveys) extract data and save in a format that could facilitate First isolated the survey tables using the further analysis downstream. keyword search leveraging OCR. CHALLENGES Survey details are then extracted using techniques such as Tabula or Camelot. Client had hundreds of PDF documents and each Well Schematics of these PDF documents had pages ranging from All the nested tables were extracted as 2 to 100 pages. In some cases, the required data separate tables and saved in CSV format. was not present in all of the pages of the PDF The nested tables are extracted in 2 stages documents. leveraging FCN model at stage 1 and OpenCV There were 5 different formats of documents in the next stage to detect rows in the table. consisting of engineering drawings, nested tables, Deployment un-demarcated tables, etc. This requires model Once the AI models were built and the required creation for each of the document format. accuracy and performance tuning complete, OBJECTIVE Indium deployed teX.ai with an admin interface built using Flask and containerization using Dockers. To leverage teX.ai for the automated text extraction process with an accuracy target of over 80% and requiring less than 50% of the current time taken. BUSINESS IMPACT SOLUTION OVERVIEW 4x faster text extraction from the source docments, by leveraging teX.ai in the automated Quality File Validation process flow. Extraction of chemical composition file and The need for human intervention was reduced by Converting it to a key-value pair. over 80%. These chemical composition type PDF are 10 The quality of their process had increased by over pages long. 75%. Survey Files Automatic identification of Survey(s) tables from multi-page documents followed by extraction. © 2022 All Rights Reserved 2 About Indium Indium is a Digital Engineering Services leader and Full Spectrum Integrator that helps customers embrace and navigate the Cloud-native world with Certainty. With deep expertise across Applications, Data & Analytics, AI, DevOps, Security and Digital Assurance we “Make technology work” and accelerate business value, while adding scale and velocity to customer’s digital journey on AWS. Make Technology Work USA INDIA UK ^/E'WKZ ƵƉĞƌƟŶŽͮWƌŝŶĐĞƚŽŶ ŚĞŶŶĂŝͮĞŶŐĂůƵƌƵͮDƵŵďĂŝ >ŽŶĚŽŶ ^ŝŶŐĂƉŽƌĞ dŽůůͲĨƌĞĞ͗нϭͲϴϴϴͲϮϬϳͲϱϵϲϵ dŽůůͲĨƌĞĞ͗ϭϴϬϬͲϭϮϯͲϭϭϵϭ WŚ͗нϰϰϭϰϮϬϯϬϬϬϭϰ WŚ͗нϲϱϲϴϭϮϳϴϴϴ https://www.indiumsoftware.com ǁǁǁ͘ŝŶĚŝƵŵƐŽŌǁĂƌĞ͘ĐŽŵ &Žƌ^ĂůĞƐ/ŶƋƵŝƌŝĞƐ &Žƌ'ĞŶĞƌĂů/ŶƋƵŝƌŝĞƐ https://www.facebook.com/indiumsoftware/ https://twitter.com/indiumsoft?lang=en https://www.linkedin.com/company/indiumsoftware/?originalSubdomain=in https://www.facebook.com/indiumsoftware/ https://twitter.com/indiumsoft?lang=en https://www.facebook.com/indiumsoftware/ https://www.linkedin.com/company/indiumsoftware/?originalSubdomain=in mailto:info@indiumsoftware.com https://twitter.com/indiumsoft?lang=en ƐĂůĞƐΛŝŶĚŝƵŵƐŽŌǁĂƌĞ͘ĐŽŵ ŝŶĨŽΛŝŶĚŝƵŵƐŽŌǁĂƌĞ͘ĐŽŵ mailto:sales@indiumsoftware.com
no reviews yet
Please Login to review.