textract

Install command:
brew install textract

Extract text from various different types of files

https://textract.readthedocs.io/

License: MIT

Formula JSON API: /api/formula/textract.json

Formula code: textract.rb on GitHub

Bottle (binary package) installation support provided for:

Intel ventura
monterey
big sur
64-bit linux
Apple Silicon ventura
monterey
big sur

Current versions:

stable 1.6.5

Depends on:

antiword 0.37 Utility to read Word (.doc) files
flac 1.4.2 Free lossless audio codec
pillow 9.5.0 Friendly PIL fork (Python Imaging Library)
poppler 23.05.0 PDF rendering library (based on the xpdf-3.0 code base)
python@3.11 3.11.3 Interpreted, interactive, object-oriented programming language
six 1.16.0 Python 2 and 3 compatibility utilities
swig 4.1.1 Generate scripting interfaces to C/C++ code
tesseract 5.3.1 OCR (Optical Character Recognition) engine
unrtf 0.21.10 RTF to other formats converter

Analytics:

Installs (30 days)
textract 0
Installs on Request (30 days)
textract 0
Build Errors (30 days)
textract 0

Analytics (macOS):

Installs (90 days)
textract 1
Installs on Request (90 days)
textract 1
Installs (365 days)
textract 126
Installs on Request (365 days)
textract 126

Analytics (Linux):

Installs (90 days)
textract 0
Installs on Request (90 days)
textract 0
Installs (365 days)
textract 8
Installs on Request (365 days)
textract 8
Fork me on GitHub