API Parameters

Parameters

The only required parameter is files - the file you wish to process.

Python & direct call	JavaScript	Description
`files` (shared.Files)	`files` (File, Blob, shared.Files)	The file to process.
`coordinates` (bool)	`coordinates` (boolean)	If true, return bounding box coordinates for each element extracted via OCR. Default: false
`encoding` (str)	`encoding` (string)	The encoding method used to decode the text input. Default: `utf-8`
`extract_image_block_types` (List[str])	`extractImageBlockTypes` (string[])	The types of elements to extract, for use in extracting image blocks as base64 encoded data stored in metadata fields
`gz_uncompressed_content_type` (str)	`gzUncompressedContentType` (string)	If file is gzipped, use this content type after unzipping. Example: `application/pdf`
`hi_res_model_name` (str)	`hiResModelName` (string)	The name of the inference model used when strategy is `hi_res`. Example: `chipper`
`include_page_breaks` (bool)	`includePageBreaks` (boolean)	If True, the output will include page breaks if the filetype that supports it. Default: `false`
`languages` (List[str])	`languages` (string[])	The languages present in the document, for use in partitioning and/or OCR. See the Tesseract documentation for a full list of languages.
`output_format` (str)	`outputFormat` (string)	The format of the response. Supported formats are `application/json` and `text/csv`. Default: `application/json`.
`pdf_infer_table_structure` (bool)	`pdfInferTableStructure` (boolean)	Deprecated! If True and strategy=hi_res, any Table Elements extracted from a PDF will include an additional metadata field, ‘text_as_html’, where the value (string) is a just a transformation of the data into an HTML table.
`skip_infer_table_types` (List[str])	`skipInferTableTypes` (string[])	The document types that you want to skip table extraction with. Default: [‘pdf’, ‘jpg’, ‘png’, ‘heic’]
`split_pdf_page` (bool)	`splitPdfPage` (boolean)	Should the pdf file be split at client. Ignored on backend.
`strategy` (str)	`strategy` (string)	The strategy to use for partitioning PDF/image. Options are `fast`, `hi_res`, `auto`. Default: `auto`
`unique_element_ids` (bool)	`uniqueElementIds` (boolean)	When True, assign UUIDs to element IDs, which guarantees their uniqueness (useful when using them as primary keys in database). Otherwise a SHA-256 of element text is used. Default: False
`xml_keep_tags` (bool)	`xmlKeepTags` (boolean)	If True, will retain the XML tags in the output. Otherwise it will simply extract the text from within the tags. Only applies to XML documents.
`chunking_strategy` (str)	`chunkingStrategy` (string)	Use one of the supported strategies to chunk the returned elements after partitioning. When `chunking_strategy` is not specified, no chunking is performed and any other chunking parameters provided are ignored. Supported strategies: `"basic"`, `"by_title"`

The following parameters only apply when a chunking_strategy is specified. Otherwise, they are ignored.

Python & direct call	JavaScript	Description
`combine_under_n_chars` (int)	`combineUnderNChars` (number)	Applies only when chunking strategy is set to `"by_title"`. Use this parameter to combines small chunks until the combined chunk reaches a length of n chars. This can mitigate the appearance of small chunks created by short paragraphs, not intended as section headings, being identified as `Title` elements in certain documents. Default: the same value as `max_characters`
`include_orig_elements` (bool)	`includeOrigElements` (boolean)	When True (the default), the elements used to form a chunk appear in `.metadata.orig_elements` for that chunk.
`max_characters` (int)	`maxCharacters` (number)	Cut off new sections after reaching a length of n chars (hard max). Default: 500
`multipage_sections` (bool)	`multipageSections` (boolean)	Applies only when `chunking_strategy` is set to `by_title`. Determines if a chunk can include elements from more than one page. Default: true
`new_after_n_chars` (int)	`newAfterNChars` (number)	Applies only when `chunking_strategy` is specified. Cut off new sections after reaching a length of n chars (soft max). Default: 1500
`overlap` (int)	`overlap` (number)	A prefix of this many trailing characters from the prior text-split chunk is applied to second and later chunks formed from oversized elements by text-splitting. Default: None
`overlap_all` (bool)	`overlapAll` (boolean)	When True, overlap is also applied to ‘normal’ chunks formed by combining whole elements. Use with caution as this can introduce noise into otherwise clean semantic units. Default: None

Need help getting started? Check out the Examples page for some inspiration.

Unstructured API Services

Getting Started With API Services

Using Unstructured API

Concepts

Endpoints

Parameters

Unstructured API Services

Getting Started With API Services

Using Unstructured API

Concepts

Endpoints

​Parameters

Parameters