jagomart
digital resources
picture1_Python Pdf Text Extraction 180870 | Nested Tables Machine Drawing Text Extraction For An Oil Gas Company


 157x       Filetype PDF       File size 1.17 MB       Source: www.indiumsoftware.com


File: Python Pdf Text Extraction 180870 | Nested Tables Machine Drawing Text Extraction For An Oil Gas Company
success story nested tables machine drawing text extraction for an oil gas company domain technologies oil gas industry the solution was built leveraging python and several of its libraries key ...

icon picture PDF Filetype PDF | Posted on 30 Jan 2023 | 2 years ago
Partial capture of text on file.
          SUCCESS STORY
                               Nested Tables & Machine Drawing 
                               Text Extraction For An
                               Oil & Gas Company
              DOMAIN                                            TECHNOLOGIES
             Oil & Gas Industry                               The solution was built leveraging Python and 
                                                              several of its libraries.
              KEY HIGHLIGHTS                                  OCR:
                                                              Tesseract, Tesserocr, OCRmyPDF, PyTesseract
                  4x faster automated text                    Preprocessing and Post Processing Tools:
                  extraction using teX.ai.                    xPDF, Poppler, OpenCV, Pandas, Json
                  The need for human intervention 
                  was reduced by over 80%.                    Table Detection and Extraction:
                  The quality of their process had            Camelot, OpenCV, LSD (line segment detection),
                  increased by over 75%.                      csv, TensorFlow, FCN (Fully Convolutional 
                                                              Networks), CNN (Convolutional Neural 
                                                              Networks)
                                                              Application Deployment:
                                                              Flask, Docker
              Nested Tables & Machine Drawingtext Extraction
              For An Oil & Gas Company
                                                                        Well Schematics
              CUSTOMER BACKGROUND                                         Identify and extract the nested tables as 
              The Client is one of the pioneers in the oil and gas        separate entities. These documents had a 
              business, with a focus on innovation to find ways to        combination of nested tables with complex 
              help their customers to fuel progress in agriculture,       drilling equipment’s drawing.
              industry, medicine, science, space, technology, and     APPROACH & IMPLEMENTATION
              transportation. The combination of engineering 
              disciplines, computer science, geophysics, and 
              metallurgy help create a winning formula for all  teX.ai was leveraged to process text for all the 3 use 
              stakeholders in such projects.                          cases
              BUSINESS REQUIREMENTS                                   Quality File Validation
                                                                          The Analysis table which contained the 
              Given the document intensive nature of business,            chemical composition details was identified in 
              the client generally had to deal with numerous PDF          the document and extracted using OCR.
              documents dealing with complex drilling machine             The time taken to extract is just a few seconds 
              parts diagrams and data in nested tables and                and accuracy more than 85%.
              various other formats. Their requirement was to  Public Files (Surveys)
              extract data and save in a format that could facilitate     First isolated the survey tables using the 
              further analysis downstream.                                keyword search leveraging OCR.
              CHALLENGES                                                  Survey details are then extracted using 
                                                                          techniques such as Tabula or Camelot.
                Client had hundreds of PDF documents and each         Well Schematics
                of these PDF documents had pages ranging from             All the nested tables were extracted as 
                2 to 100 pages. In some cases, the required data          separate tables and saved in CSV format.
                was not present in all of the pages of the PDF            The nested tables are extracted in 2 stages 
                documents.                                                leveraging FCN model at stage 1 and OpenCV 
                There were 5 different formats of documents               in the next stage to detect rows in the table.
                consisting of engineering drawings, nested tables,    Deployment
                un-demarcated tables, etc. This requires model            Once the AI models were built and the required 
                creation for each of the document format.                 accuracy and performance tuning complete, 
              OBJECTIVE                                                   Indium deployed teX.ai with an admin interface 
                                                                          built using Flask and containerization using 
                                                                          Dockers.
              To leverage teX.ai for the automated text extraction 
              process with an accuracy target of over 80% and 
              requiring less than 50% of the current time taken.      BUSINESS IMPACT
              SOLUTION OVERVIEW                                         4x faster text extraction from the source 
                                                                        docments, by leveraging teX.ai in the automated 
                Quality File Validation                                 process flow.
                   Extraction of chemical composition file and          The need for human intervention was reduced by 
                   Converting it to a key-value pair.                   over 80%.
                   These chemical composition type PDF are 10           The quality of their process had increased by over 
                   pages long.                                          75%.
                Survey Files
                   Automatic identification of Survey(s) tables 
                   from multi-page documents followed by 
                   extraction.
              © 2022 All Rights Reserved                                                                                 2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           About Indium
                                                                                                                                       Indium is a Digital Engineering Services leader and Full Spectrum Integrator that helps 
                                                                                                                                       customers embrace and navigate the Cloud-native world with Certainty. With deep expertise 
                                                                                                                                       across Applications, Data & Analytics, AI, DevOps, Security and Digital Assurance we “Make 
                                                                                                                                       technology work” and accelerate business value, while adding scale and velocity to 
                                                                                                                                       customer’s digital journey on AWS.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Make Technology Work
                                                                                                                                                                                                                                    USA                                                                                                                                                                                                                                                                                                                               INDIA                                                                                                                                                                                                                                                                                                                              UK                                                                                                                                                                                          ^/E'WKZ
                                                                                                                                                   ƵƉĞƌƟŶŽͮWƌŝŶĐĞƚŽŶ                                                                                                                                                                                                                                                                          ŚĞŶŶĂŝͮĞŶŐĂůƵƌƵͮDƵŵďĂŝ                                                                                                                                                                                                                                                                                                                                                                                                       >ŽŶĚŽŶ                                                                                                                                                                                                                           ^ŝŶŐĂƉŽƌĞ
                                                                                                                                              dŽůůͲĨƌĞĞ͗нϭͲϴϴϴͲϮϬϳͲϱϵϲϵ                                                                                                                                                                                                                                                                                                                         dŽůůͲĨƌĞĞ͗ϭϴϬϬͲϭϮϯͲϭϭϵϭ                                                                                                                                                                                                                                                                                                           WŚ͗нϰϰϭϰϮϬϯϬϬϬϭϰ                                                                                                                                                                                                                                  WŚ͗нϲϱϲϴϭϮϳϴϴϴ
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 https://www.indiumsoftware.com
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    ǁǁǁ͘ŝŶĚŝƵŵƐŽŌǁĂƌĞ͘ĐŽŵ
                                                                                                                                                                                                                                            &Žƌ^ĂůĞƐ/ŶƋƵŝƌŝĞƐ                                                                                                                                                                                                                                                                                                                                                                                                                         &Žƌ'ĞŶĞƌĂů/ŶƋƵŝƌŝĞƐ                                                                                                                                                                                                                                                                                                                                     https://www.facebook.com/indiumsoftware/           https://twitter.com/indiumsoft?lang=en       https://www.linkedin.com/company/indiumsoftware/?originalSubdomain=in
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  https://www.facebook.com/indiumsoftware/            https://twitter.com/indiumsoft?lang=en
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  https://www.facebook.com/indiumsoftware/                                                      https://www.linkedin.com/company/indiumsoftware/?originalSubdomain=in
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       mailto:info@indiumsoftware.com                                                                                                                                                                                                                                                                                                                                                                                 https://twitter.com/indiumsoft?lang=en
                                                                                                                                                                                                                                            ƐĂůĞƐΛŝŶĚŝƵŵƐŽŌǁĂƌĞ͘ĐŽŵ                                                                                                                                                                                                                                                                                                                                                                                                                     ŝŶĨŽΛŝŶĚŝƵŵƐŽŌǁĂƌĞ͘ĐŽŵ
                                                                                                                                                                                                                                           mailto:sales@indiumsoftware.com
The words contained in this file might help you see if this file matches what you are looking for:

...Success story nested tables machine drawing text extraction for an oil gas company domain technologies industry the solution was built leveraging python and several of its libraries key highlights ocr tesseract tesserocr ocrmypdf pytesseract x faster automated preprocessing post processing tools using tex ai xpdf poppler opencv pandas json need human intervention reduced by over table detection quality their process had camelot lsd line segment increased csv tensorflow fcn fully convolutional networks cnn neural application deployment flask docker drawingtext well schematics customer background identify extract as client is one pioneers in separate entities these documents a business with focus on innovation to find ways combination complex help customers fuel progress agriculture drilling equipment s medicine science space technology approach implementation transportation engineering disciplines computer geophysics metallurgy create winning formula all leveraged use stakeholders such ...

no reviews yet
Please Login to review.