node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
HTML
1694
198
MIT License
textract is a Node.js module for extracting text from a wide variety of file types, including HTML, PDF, Word documents, images, and more. It's designed for developers who need to programmatically extract readable text from diverse file formats in their applications.