conda
on Windowsunstructured
on Windows with conda
, but the process involves a few extra steps. This section will help you get up and running.
pycocotools
dependency.
conda env create -f environment.yml
using the environment.yml
file in the unstructured
repo to create a virtual environment. The environment will be named unstructured
.
conda activate unstructured
to activate the virtualenvironment.
pip install unstructured
to install the unstructured
library.
unstructured
for local inferencedetectron2
for PDF layout parsing. detectron2
does not officially support Windows, but it is possible to get it to install on Windows. The installation instructions are based on the instructions LayoutParser provides here.
pip install pycocotools-windows
to install a Windows compatible version of pycocotools
. Alternatively, you can run pip3 install "git+https://github.com/philferriere/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI"
as outlined in this GitHub issue.
git clone https://github.com/ivanpp/detectron2.git
, then cd detectron2
, then pip install -e .
to install a Windows compatible version of the detectron2
library.
iopath
using the instructions outlined in this GitHub issue. First, run git clone https://github.com/facebookresearch/iopath --single-branch --branch v0.1.8
. Then on line 753 in iopath/iopath/common/file_io.py
change filename = path.split("/")[-1]
to filename = parsed_url.path.split("/")[-1]
. After that, navigate to the iopath
directory and run pip install -e .
.
pip install unstructured[local-inference]
. This will install the unstructured_inference
dependency.
unstructured
repo:
unstructured
. You can use the following steps to install paddleocr
in your unstructured
conda
environment.
conda install -c esri paddleocr
detectron2
cloned and installed locally, change the name of detectron2/tools
to detectron2/detectron2_tools
. Otherwise, you will hit the module name conflict error described in this issue.
KMP_DUPLICATE_LIB_OK
to "TRUE"
. This prevents the libiomp5md.dll
linking issue described in this issue on GitHub.
.jpg
image that contains text.
LOG_LEVEL
environment variable. By default, the log level is set to WARNING
. For debugging, consider setting the log level to INFO
or DEBUG
.