Pre-requisite: To read pdf file, first we need to install pdf2json library.
Step to install pdf2json
Prerequisite : System should have nodejs Installed.
- Open COMMAND prompt and go to directory .
- Write npm i pdf2json and press enter.
Once library is installed you will see below screen
Below is the code which shows how to read PDF file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
function readPDFFile(pdfFilePath) { var PDFParser = require("pdf2json"); //store pdf2json library into variable which contains all methods of reading PDF file. var fs=require("fs"); //create file system object let FileParser = new PDFParser(this,1); // Create object of PDFParser defined in first line. //if any error occured while reading data FileParser.on("pdfParser_dataError", errText => console.error(errData.parserError)); //execute only when receive any error while reading PDF file. //If data read successfully FileParser.on("pdfParser_dataReady", pdfText => { //to print data on console console.log(FileParser.getRawTextContent().toString()); //To print data in json file fs.writeFile("D:/DataFiles/pdfData.json", JSON.stringify(pdfText)); }); FileParser.loadPDF(pdfFilePath); } //To execute above function var pdfFilePath="D:/DataFiles/pdfRead.pdf"; //you can use any pdf here by providing PDF file path. readPDFFile(pdfFilePath); |
Great example! Thanks.
How would I go about retrieving individual items from the PDF file? For example I have tables with a summary. How can the summary be distinguished from the remainder of the document?
The summary is structured as show here:
Profile Total 24.99% 25.01%
Total Gross Profit Dollars $20,786,455.46 $20,811,422.60
Gross Profit Dollar Diff vs Base $24,977.14
can you please provide solution for reading pdf file in protractor with javascript