Extract text from various different types of files
https://textract.readthedocs.io/
License: MIT
Formula JSON API: /api/formula/textract.json
Formula code: textract.rb
on GitHub
Bottle (binary package) installation support provided for:
Apple Silicon | sequoia | ✅ |
---|---|---|
sonoma | ✅ | |
ventura | ✅ | |
monterey | ✅ | |
Intel | sonoma | ✅ |
ventura | ✅ | |
monterey | ✅ | |
64-bit linux | ✅ |
Current versions:
stable | ✅ | 1.6.5 |
Depends on:
antiword | 0.37 | Utility to read Word (.doc) files |
flac | 1.4.3 | Free lossless audio codec |
pillow | 10.4.0 | Friendly PIL fork (Python Imaging Library) |
poppler | 24.04.0 | PDF rendering library (based on the xpdf-3.0 code base) |
python@3.12 | 3.12.7 | Interpreted, interactive, object-oriented programming language |
swig | 4.2.1 | Generate scripting interfaces to C/C++ code |
tesseract | 5.4.1 | OCR (Optical Character Recognition) engine |
unrtf | 0.21.10 | RTF to other formats converter |
Analytics:
Installs (30 days) | |
---|---|
textract |
10 |
Installs on Request (30 days) | |
textract |
10 |
Build Errors (30 days) | |
textract |
0 |
Installs (90 days) | |
textract |
28 |
Installs on Request (90 days) | |
textract |
28 |
Installs (365 days) | |
textract |
1,142 |
textract --HEAD |
1 |
Installs on Request (365 days) | |
textract |
1,142 |
textract --HEAD |
1 |