Installation with conda on Windows
You can install and run unstructured on Windows with conda, but the process involves a few extra steps. This section will help you get up and running.
- Install Anaconda on your Windows machine.
-
Install Microsoft C++ Build Tools using the instructions in this Stackoverflow post. C++ build tools are required for the
pycocotoolsdependency. -
Run
conda env create -f environment.ymlusing theenvironment.ymlfile in theunstructuredrepo to create a virtual environment. The environment will be namedunstructured. -
Run
conda activate unstructuredto activate the virtualenvironment. -
Run
pip install unstructuredto install theunstructuredlibrary.
Setting up unstructured for local inference
If you need to run model inferences locally, there are a few additional steps you need to take. The main challenge is installing detectron2 for PDF layout parsing. detectron2 does not officially support Windows, but it is possible to get it to install on Windows. The installation instructions are based on the instructions LayoutParser provides here.
-
Run
pip install pycocotools-windowsto install a Windows compatible version ofpycocotools. Alternatively, you can runpip3 install "git+https://github.com/philferriere/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI"as outlined in this GitHub issue. -
Run
git clone https://github.com/ivanpp/detectron2.git, thencd detectron2, thenpip install -e .to install a Windows compatible version of thedetectron2library. -
Install the a Windows compatible version of
iopathusing the instructions outlined in this GitHub issue. First, rungit clone https://github.com/facebookresearch/iopath --single-branch --branch v0.1.8. Then on line 753 iniopath/iopath/common/file_io.pychangefilename = path.split("/")[-1]tofilename = parsed_url.path.split("/")[-1]. After that, navigate to theiopathdirectory and runpip install -e .. -
Run
pip install unstructured[local-inference]. This will install theunstructured_inferencedependency.
unstructured repo:
Installing PaddleOCR
PaddleOCR is another package that is helpful to use in conjunction withunstructured. You can use the following steps to install paddleocr in your unstructured conda environment.
-
Run
conda install -c esri paddleocr -
If you have the Windows version of
detectron2cloned and installed locally, change the name ofdetectron2/toolstodetectron2/detectron2_tools. Otherwise, you will hit the module name conflict error described in this issue. -
Set the environment variable
KMP_DUPLICATE_LIB_OKto"TRUE". This prevents thelibiomp5md.dlllinking issue described in this issue on GitHub.
.jpg image that contains text.
Logging
You can set the logging level for the package with theLOG_LEVEL environment variable. By default, the log level is set to WARNING. For debugging, consider setting the log level to INFO or DEBUG.
Note on Older Versions
For versions earlier than unstructured<0.9.0, the following installation pattern was recommended:While “local-inference” remains supported in newer versions for backward compatibility, it might be deprecated in future releases. It’s advisable to transition to the “all-docs” extra for comprehensive support.

