Input and Output processors¶
An Item Loader contains one input processor and one output processor for each
(item) field. The input processor processes the extracted data as soon as it’s
received (through the
add_value() methods) and the result of the input processor is
collected and kept inside the ItemLoader. After collecting all data, the
ItemLoader.load_item() method is called to populate and get the populated
item object. That’s when the output processor is
called with the data previously collected (and processed using the input
processor). The result of the output processor is the final value that gets
assigned to the item.
Let’s see an example to illustrate how the input and output processors are called for a particular field (the same applies for any other field):
l = ItemLoader(selector=some_selector) l.add_xpath('name', xpath1) # (1) l.add_xpath('name', xpath2) # (2) l.add_css('name', css) # (3) l.add_value('name', 'test') # (4) return l.load_item() # (5)
So what happens is:
xpath1is extracted, and passed through the input processor of the
namefield. The result of the input processor is collected and kept in the Item Loader (but not yet assigned to the item).
xpath2is extracted, and passed through the same input processor used in (1). The result of the input processor is appended to the data collected in (1) (if any).
This case is similar to the previous ones, except that the data is extracted from the
cssCSS selector, and passed through the same input processor used in (1) and (2). The result of the input processor is appended to the data collected in (1) and (2) (if any).
This case is also similar to the previous ones, except that the value to be collected is assigned directly, instead of being extracted from a XPath expression or a CSS selector. However, the value is still passed through the input processors. In this case, since the value is not iterable it is converted to an iterable of a single element before passing it to the input processor, because input processor always receive iterables.
The data collected in steps (1), (2), (3) and (4) is passed through the output processor of the
namefield. The result of the output processor is the value assigned to the
namefield in the item.
It’s worth noticing that processors are just callable objects, which are called with the data to be parsed, and return a parsed value. So you can use any function as input or output processor. The only requirement is that they must accept one (and only one) positional argument, which will be an iterable.
Both input and output processors must receive an iterable as their first argument. The output of those functions can be anything. The result of input processors will be appended to an internal list (in the Loader) containing the collected values (for that field). The result of the output processors is the value that will be finally assigned to the item.
The other thing you need to keep in mind is that the values returned by input processors are collected internally (in lists) and then passed to output processors to populate the fields.
Last, but not least,
itemloaders comes with some commonly used processors built-in for convenience.