Skip to content

Commit 65c13bf

Browse files
committed
Merge remote-tracking branch 'origin/fix/none-model-name-breaks-api' into yuming/table_ocr_factor
2 parents 03da255 + 28cb79c commit 65c13bf

File tree

4 files changed

+19
-4
lines changed

4 files changed

+19
-4
lines changed

CHANGELOG.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## 0.10.25-dev8
1+
## 0.10.25-dev9
22

33
### Enhancements
44

@@ -20,6 +20,7 @@ ocr agent tesseract/paddle in environment variable `OCR_AGENT` for OCRing the en
2020
* **Map source cli command configs when destination set** Due to how the source connector is dynamically called when the destination connector is set via the CLI, the configs were being set incorrectoy, causing the source connector to break. The configs were fixed and updated to take into account Fsspec-specific connectors.
2121
* **Fix metrics folder not discoverable** Fixes issue where unstructured/metrics folder is not discoverable on PyPI by adding
2222
an `__init__.py` file under the folder.
23+
* **Fix a bug when `parition_pdf` get `model_name=None`** In API usage the `model_name` value is `None` and the `cast` function in `partition_pdf` would return `None` and lead to attribution error. Now we use `str` function to explicit convert the content to string so it is garanteed to have `starts_with` and other string functions as attributes
2324

2425
## 0.10.24
2526

test_unstructured/partition/pdf_image/test_pdf.py

+13
Original file line numberDiff line numberDiff line change
@@ -1035,3 +1035,16 @@ def test_chipper_not_losing_parents(chipper_results, chipper_children):
10351035
[el for el in chipper_results if el.id == child.metadata.parent_id]
10361036
for child in chipper_children
10371037
)
1038+
1039+
1040+
def test_partition_model_name_default_to_None():
1041+
filename = "example-docs/DA-1p.pdf"
1042+
try:
1043+
pdf.partition_pdf(
1044+
filename=filename,
1045+
strategy="hi_res",
1046+
ocr_languages="eng",
1047+
model_name=None,
1048+
)
1049+
except AttributeError:
1050+
pytest.fail("partition_pdf() raised AttributeError unexpectedly!")

unstructured/__version__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.10.25-dev8" # pragma: no cover
1+
__version__ = "0.10.25-dev9" # pragma: no cover

unstructured/file_utils/filetype.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
import os
88
import re
99
import zipfile
10-
from typing import IO, Any, Callable, Dict, List, Optional, cast
10+
from typing import IO, Any, Callable, Dict, List, Optional
1111

1212
from typing_extensions import ParamSpec
1313

@@ -562,7 +562,8 @@ def wrapper(*args: _P.args, **kwargs: _P.kwargs) -> List[Element]:
562562
metadata_kwargs = {
563563
kwarg: params.get(kwarg) for kwarg in ("filename", "url", "text_as_html")
564564
}
565-
if not cast(str, kwargs.get("model_name", "")).startswith("chipper"):
565+
# NOTE (yao): do not use cast here as cast(None) still is None
566+
if not str(kwargs.get("model_name", "")).startswith("chipper"):
566567
# NOTE(alan): Skip hierarchy if using chipper, as it should take care of that
567568
elements = set_element_hierarchy(elements)
568569

0 commit comments

Comments
 (0)