Extract text from various different types of files
https://textract.readthedocs.io/
License: MIT
Formula JSON API: /api/formula/textract.json
Formula code: textract.rb
on GitHub
Bottle (binary package) installation support provided for:
Apple Silicon | sequoia | ✅ |
---|---|---|
sonoma | ✅ | |
ventura | ✅ | |
monterey | ✅ | |
Intel | sonoma | ✅ |
ventura | ✅ | |
monterey | ✅ | |
64-bit linux | ✅ |
Current versions:
stable | ✅ | 1.6.5 |
Depends on:
antiword | 0.37 | Utility to read Word (.doc) files |
flac | 1.4.3 | Free lossless audio codec |
pillow | 11.0.0 | Friendly PIL fork (Python Imaging Library) |
poppler | 24.12.0 | PDF rendering library (based on the xpdf-3.0 code base) |
python@3.12 | 3.12.8 | Interpreted, interactive, object-oriented programming language |
swig | 4.3.0 | Generate scripting interfaces to C/C++ code |
tesseract | 5.5.0 | OCR (Optical Character Recognition) engine |
unrtf | 0.21.10 | RTF to other formats converter |
Analytics:
Installs (30 days) | |
---|---|
textract |
11 |
Installs on Request (30 days) | |
textract |
11 |
Build Errors (30 days) | |
textract |
0 |
Installs (90 days) | |
textract |
28 |
Installs on Request (90 days) | |
textract |
28 |
Installs (365 days) | |
textract |
837 |
Installs on Request (365 days) | |
textract |
837 |