textract (deprecated)

Install command:
brew install textract

Extract text from various different types of files

https://textract.readthedocs.io/

License: MIT

Formula JSON API: /api/formula/textract.json

Formula code: textract.rb on GitHub

Bottle (binary package) installation support provided for:

macOS on
Apple Silicon
sequoia
sonoma
ventura
monterey
macOS on
Intel
sonoma
ventura
monterey
Linux ARM64
x86_64

Current versions:

stable 1.6.5

Depends on:

antiword 0.37 Utility to read Word (.doc) files
flac 1.5.0 Free lossless audio codec
pillow 11.2.1 Friendly PIL fork (Python Imaging Library)
poppler 25.04.0 PDF rendering library (based on the xpdf-3.0 code base)
python@3.12 3.12.10 Interpreted, interactive, object-oriented programming language
swig 4.3.1 Generate scripting interfaces to C/C++ code
tesseract 5.5.0 OCR (Optical Character Recognition) engine
unrtf 0.21.10 RTF to other formats converter

Analytics:

Installs (30 days)
textract 8
Installs on Request (30 days)
textract 8
Build Errors (30 days)
textract 0
Installs (90 days)
textract 24
Installs on Request (90 days)
textract 24
Installs (365 days)
textract 252
Installs on Request (365 days)
textract 252
Fork me on GitHub