itemloaders¶
itemloaders
provide a convenient mechanism for populating data records.
Its design provides a flexible, efficient and easy mechanism
for extending and overriding different field parsing rules, either by raw data,
or by source format (HTML, XML, etc) without becoming a nightmare to maintain.
To install itemloaders
, run:
pip install itemloaders
Note
Under the hood, itemloaders
uses
itemadapter as a common interface.
This means you can use any of the types supported by itemadapter
here.
Warning
dataclasses
and attrs
support is still experimental.
Please, refer to default_item_class
in the
API Reference for more information.
Getting Started with itemloaders
¶
To use an Item Loader, you must first instantiate it. You can either
instantiate it with a dict-like object (item) or without one, in
which case an item is automatically instantiated in the Item Loader __init__
method
using the item class specified in the ItemLoader.default_item_class
attribute.
Then, you start collecting values into the Item Loader, typically using CSS or XPath Selectors. You can add more than one value to the same item field; the Item Loader will know how to “join” those values later using a proper processing function.
Note
Collected data is stored internally as lists,
allowing to add several values to the same field.
If an item
argument is passed when creating a loader,
each of the item’s values will be stored as-is if it’s already
an iterable, or wrapped with a list if it’s a single value.
Here is a typical Item Loader usage:
from itemloaders import ItemLoader
from parsel import Selector
html_data = '''
<!DOCTYPE html>
<html>
<head>
<title>Some random product page</title>
</head>
<body>
<div class="product_name">Some random product page</div>
<p id="price">$ 100.12</p>
</body>
</html>
'''
l = ItemLoader(selector=Selector(html_data))
l.add_xpath('name', '//div[@class="product_name"]/text()')
l.add_xpath('name', '//div[@class="product_title"]/text()')
l.add_css('price', '#price::text')
l.add_value('last_updated', 'today') # you can also use literal values
item = l.load_item()
item
# {'name': ['Some random product page'], 'price': ['$ 100.12'], 'last_updated': ['today']}
By quickly looking at that code, we can see the name
field is being
extracted from two different XPath locations in the page:
//div[@class="product_name"]
//div[@class="product_title"]
In other words, data is being collected by extracting it from two XPath
locations, using the add_xpath()
method. This is the
data that will be assigned to the name
field later.
Afterwards, similar calls are used for price
field using a CSS selector with
the add_css()
method, and finally the last_update
field is
populated directly with a literal value
(today
) using a different method: add_value()
.
Finally, when all data is collected, the ItemLoader.load_item()
method is
called which actually returns the item populated with the data
previously extracted and collected with the add_xpath()
,
add_css()
, and add_value()
calls.
Contents¶
- Declaring Item Loaders
- Input and Output processors
- Item Loader Context
- Nested Loaders
- Reusing and extending Item Loaders
- Available built-in processors
- API Reference
ItemLoader
ItemLoader.item
ItemLoader.context
ItemLoader.default_item_class
ItemLoader.default_input_processor
ItemLoader.default_output_processor
ItemLoader.selector
ItemLoader.add_css()
ItemLoader.add_jmes()
ItemLoader.add_value()
ItemLoader.add_xpath()
ItemLoader.get_collected_values()
ItemLoader.get_css()
ItemLoader.get_jmes()
ItemLoader.get_output_value()
ItemLoader.get_value()
ItemLoader.get_xpath()
ItemLoader.load_item()
ItemLoader.nested_css()
ItemLoader.nested_xpath()
ItemLoader.replace_css()
ItemLoader.replace_jmes()
ItemLoader.replace_value()
ItemLoader.replace_xpath()
- Release notes
- itemloaders 1.3.2 (2024-09-30)
- itemloaders 1.3.1 (2024-06-03)
- itemloaders 1.3.0 (2024-05-30)
- itemloaders 1.2.0 (2024-04-18)
- itemloaders 1.1.0 (2023-04-21)
- itemloaders 1.0.6 (2022-08-29)
- itemloaders 1.0.5 (2022-08-25)
- itemloaders 1.0.4 (2020-11-12)
- itemloaders 1.0.3 (2020-09-09)
- itemloaders 1.0.2 (2020-08-05)
- itemloaders 1.0.1 (2020-07-02)
- itemloaders 1.0.0 (2020-05-18)