API Reference
- class itemloaders.ItemLoader(item: Any = None, selector: Selector | None = None, parent: ItemLoader | None = None, **context: Any)[source]
Return a new Item Loader for populating the given item. If no item is given, one is instantiated automatically using the class in
default_item_class
.When instantiated with a :param
selector
parameter theItemLoader
class provides convenient mechanisms for extracting data from web pages using parsel selectors.- Parameters:
item (
dict
object) – The item instance to populate using subsequent calls toadd_xpath()
,add_css()
,add_jmes()
oradd_value()
.selector (
Selector
object) – The selector to extract data from, when using theadd_xpath()
(resp.add_css()
,add_jmes()
) orreplace_xpath()
(resp.replace_css()
,replace_jmes()
) method.
The item, selector and the remaining keyword arguments are assigned to the Loader context (accessible through the
context
attribute).- item
The item object being parsed by this Item Loader. This is mostly used as a property so when attempting to override this value, you may want to check out
default_item_class
first.
- context
The currently active Context of this Item Loader. Refer to <loaders-context> for more information about the Loader Context.
- default_item_class
An Item class (or factory), used to instantiate items when not given in the
__init__
method.Warning
Currently, this factory/class needs to be callable/instantiated without any arguments. If you are using
dataclasses
, please consider the following alternative:from dataclasses import dataclass, field from typing import Optional @dataclass class Product: name: Optional[str] = field(default=None) price: Optional[float] = field(default=None)
- default_input_processor
The default input processor to use for those fields which don’t specify one.
- default_output_processor
The default output processor to use for those fields which don’t specify one.
- selector
The
Selector
object to extract data from. It’s the selector given in the__init__
method. This attribute is meant to be read-only.
- add_css(field_name: str | None, css: str | Iterable[str], *processors: Callable[..., Any], re: str | Pattern[str] | None = None, **kw: Any) Self [source]
Similar to
ItemLoader.add_value()
but receives a CSS selector instead of a value, which is used to extract a list of unicode strings from the selector associated with thisItemLoader
.See
get_css()
forkwargs
.- Parameters:
css (str) – the CSS selector to extract data from
- Returns:
The current ItemLoader instance for method chaining.
- Return type:
Examples:
# HTML snippet: <p class="product-name">Color TV</p> loader.add_css('name', 'p.product-name') # HTML snippet: <p id="price">the price is $1200</p> loader.add_css('price', 'p#price', re='the price is (.*)')
- add_jmes(field_name: str | None, jmes: str, *processors: Callable[..., Any], re: str | Pattern[str] | None = None, **kw: Any) Self [source]
Similar to
ItemLoader.add_value()
but receives a JMESPath selector instead of a value, which is used to extract a list of unicode strings from the selector associated with thisItemLoader
.See
get_jmes()
forkwargs
.- Parameters:
jmes (str) – the JMESPath selector to extract data from
- Returns:
The current ItemLoader instance for method chaining.
- Return type:
Examples:
# HTML snippet: {"name": "Color TV"} loader.add_jmes('name') # HTML snippet: {"price": the price is $1200"} loader.add_jmes('price', TakeFirst(), re='the price is (.*)')
- add_value(field_name: str | None, value: Any, *processors: Callable[..., Any], re: str | Pattern[str] | None = None, **kw: Any) Self [source]
Process and then add the given
value
for the given field.The value is first passed through
get_value()
by giving theprocessors
andkwargs
, and then passed through the field input processor and its result appended to the data collected for that field. If the field already contains collected data, the new data is added.The given
field_name
can beNone
, in which case values for multiple fields may be added. And the processed value should be a dict with field_name mapped to values.- Returns:
The current ItemLoader instance for method chaining.
- Return type:
Examples:
loader.add_value('name', 'Color TV') loader.add_value('colours', ['white', 'blue']) loader.add_value('length', '100') loader.add_value('name', 'name: foo', TakeFirst(), re='name: (.+)') loader.add_value(None, {'name': 'foo', 'sex': 'male'})
- add_xpath(field_name: str | None, xpath: str | Iterable[str], *processors: Callable[..., Any], re: str | Pattern[str] | None = None, **kw: Any) Self [source]
Similar to
ItemLoader.add_value()
but receives an XPath instead of a value, which is used to extract a list of strings from the selector associated with thisItemLoader
.See
get_xpath()
forkwargs
.- Parameters:
xpath (str) – the XPath to extract data from
- Returns:
The current ItemLoader instance for method chaining.
- Return type:
Examples:
# HTML snippet: <p class="product-name">Color TV</p> loader.add_xpath('name', '//p[@class="product-name"]') # HTML snippet: <p id="price">the price is $1200</p> loader.add_xpath('price', '//p[@id="price"]', re='the price is (.*)')
- get_collected_values(field_name: str) List[Any] [source]
Return the collected values for the given field.
- get_css(css: str | Iterable[str], *processors: Callable[[...], Any], re: str | Pattern[str] | None = None, **kw: Any) Any [source]
Similar to
ItemLoader.get_value()
but receives a CSS selector instead of a value, which is used to extract a list of unicode strings from the selector associated with thisItemLoader
.- Parameters:
Examples:
# HTML snippet: <p class="product-name">Color TV</p> loader.get_css('p.product-name') # HTML snippet: <p id="price">the price is $1200</p> loader.get_css('p#price', TakeFirst(), re='the price is (.*)')
- get_jmes(jmes: str | Iterable[str], *processors: Callable[[...], Any], re: str | Pattern[str] | None = None, **kw: Any) Any [source]
Similar to
ItemLoader.get_value()
but receives a JMESPath selector instead of a value, which is used to extract a list of unicode strings from the selector associated with thisItemLoader
.- Parameters:
Examples:
# HTML snippet: {"name": "Color TV"} loader.get_jmes('name') # HTML snippet: {"price": the price is $1200"} loader.get_jmes('price', TakeFirst(), re='the price is (.*)')
- get_output_value(field_name: str) Any [source]
Return the collected values parsed using the output processor, for the given field. This method doesn’t populate or modify the item at all.
- get_value(value: Any, *processors: Callable[[...], Any], re: str | Pattern[str] | None = None, **kw: Any) Any [source]
Process the given
value
by the givenprocessors
and keyword arguments.Available keyword arguments:
- Parameters:
re (str or Pattern[str]) – a regular expression to use for extracting data from the given value using
extract_regex()
method, applied before processors
Examples:
>>> from itemloaders import ItemLoader >>> from itemloaders.processors import TakeFirst >>> loader = ItemLoader() >>> loader.get_value('name: foo', TakeFirst(), str.upper, re='name: (.+)') 'FOO'
- get_xpath(xpath: str | Iterable[str], *processors: Callable[[...], Any], re: str | Pattern[str] | None = None, **kw: Any) Any [source]
Similar to
ItemLoader.get_value()
but receives an XPath instead of a value, which is used to extract a list of unicode strings from the selector associated with thisItemLoader
.- Parameters:
Examples:
# HTML snippet: <p class="product-name">Color TV</p> loader.get_xpath('//p[@class="product-name"]') # HTML snippet: <p id="price">the price is $1200</p> loader.get_xpath('//p[@id="price"]', TakeFirst(), re='the price is (.*)')
- load_item() Any [source]
Populate the item with the data collected so far, and return it. The data collected is first passed through the output processors to get the final value to assign to each item field.
- nested_css(css: str, **context: Any) Self [source]
Create a nested loader with a css selector. The supplied selector is applied relative to selector associated with this
ItemLoader
. The nested loader shares the item with the parentItemLoader
so calls toadd_xpath()
,add_value()
,replace_value()
, etc. will behave as expected.
- nested_xpath(xpath: str, **context: Any) Self [source]
Create a nested loader with an xpath selector. The supplied selector is applied relative to selector associated with this
ItemLoader
. The nested loader shares the item with the parentItemLoader
so calls toadd_xpath()
,add_value()
,replace_value()
, etc. will behave as expected.
- replace_css(field_name: str | None, css: str | Iterable[str], *processors: Callable[..., Any], re: str | Pattern[str] | None = None, **kw: Any) Self [source]
Similar to
add_css()
but replaces collected data instead of adding it.- Returns:
The current ItemLoader instance for method chaining.
- Return type:
- replace_jmes(field_name: str | None, jmes: str | Iterable[str], *processors: Callable[..., Any], re: str | Pattern[str] | None = None, **kw: Any) Self [source]
Similar to
add_jmes()
but replaces collected data instead of adding it.- Returns:
The current ItemLoader instance for method chaining.
- Return type:
- replace_value(field_name: str | None, value: Any, *processors: Callable[..., Any], re: str | Pattern[str] | None = None, **kw: Any) Self [source]
Similar to
add_value()
but replaces the collected data with the new value instead of adding it.- Returns:
The current ItemLoader instance for method chaining.
- Return type:
- replace_xpath(field_name: str | None, xpath: str | Iterable[str], *processors: Callable[..., Any], re: str | Pattern[str] | None = None, **kw: Any) Self [source]
Similar to
add_xpath()
but replaces collected data instead of adding it.- Returns:
The current ItemLoader instance for method chaining.
- Return type: