Configs
-
chunk_elements (Default: False): (Deprecated) Boolean flag whether to run chunking as part of the ingest process. This option is deprecated in favor of thechunking_strategyoption. This option being set True has the same effect aschunking_strategy=by_title. -
chunking_strategy: One of “basic” or “by_title”. When omitted, no chunking is performed. The “basic” strategy maximally fills each chunk with whole elements, up the specified size limits (max_charactersandnew_after_n_charsdescribed below). A single element that exceeds this length is divided into two or more chunks using text-splitting. ATableelement is never combined with any other element and appears as a chunk of its own or as a sequence ofTableChunkelements splitting is required. The “by_title” behaviors are the same except that section and optionally page boundaries are respected such that two consecutive elements from different sections appear in separate chunks. -
combine_text_under_n_chars (Default: max_characters): Combines small elements (for example a series ofTitleelements) until a section reaches a length of n characters. Only operative for the"by_title"chunking strategy. Defaults to max_characters which combines chunks whenever space allows. Specifying 0 for this argument suppresses combining of small chunks. -
include_orig_elements (default: True): Adds the document elements consolidated to form each chunk to thechunk.metadata.orig_elements: list[Element]metadata field. Setting this to false allows for somewhat smaller payloads when you don’t need that metadata. -
max_characters (Default: 500): Combine elements into chunks no larger than n characters (hard max). No chunk with text longer than this value will appear in the output stream. -
multipage_sections (Default: True): When False, in addition to section boundaries, page boundaries are also respected. Only operative for the “by_title” strategy. -
new_after_n_chars (Default: max_characters (off)): Cuts off new chunks once they reach a length of n characters (soft max). Defaults to max_characters when not specified, which effectively disables any soft window. Specifying 0 for this argument causes each element to appear in a chunk by itself (although an element with text longer than max_characters will be still be divided into two or more chunks using text-splitting).

