Comparing Records - normalize.diff

Comparing objects can be done using the normalize.diff.diff() function or normalize.diff.diff_iter(), or by calling the instance methods: normalize.record.Record.diff() or normalize.record.Record.diff_iter()

The iterative versions return DiffInfo records, and the functional version returns a Diff. These objects are instances of Record and RecordList.

All of the diff functions and methods take a single ‘other’ object, as well as keyword arguments to customize the diff operation; these are passed to the DiffOptions constructor and the result is passed recursively to itself to compare deeply. The exception to this is the keyword argument options=, which specifies a pre-constructed, perhaps derived DiffOptions instance.

There are some examples of this in Comparing object structures, and more in the normalize test suite.

Class reference

normalize.diff.diff(base, other, **kwargs)[source]

Eager version of diff_iter(), which takes all the same options and returns a Diff instance.

normalize.diff.diff_iter(base, other, options=None, **kwargs)[source]

Compare a Record with another object (usually a record of the same type), and yield differences as DiffInfo instances.

args:
base=Record
The ‘base’ object to compare against. The enumeration in DiffTypes is relative to this object.
other=Record|<object>
The ‘other’ object to compare against. If duck_type is not true, then it must be of the same type as the base.
**kwargs
Specify comparison options: duck_type, ignore_ws, etc. See normalize.diff.DiffOptions.__init__() for the complete list.
options=DiffOptions instance
Pass in a pre-constructed DiffOptions instance. This may not be specified along with **kwargs.
class normalize.diff.Diff(values=None, **kwargs)[source]

Bases: normalize.coll.ListCollection

Container for a list of differences.

base_type_name

Type name of the source object

other_type_name

Type name of the compared object; normally the same, unless the duck_type option was specified.

itemtype

alias of DiffInfo

class normalize.diff.DiffInfo(init_dict=None, **kwargs)[source]

Container for storing diff information that can be used to reconstruct the values diffed.

diff_type

Enumeration describing the type of difference; a DiffType value.

base

A FieldSelector object referring to the location within the base object that the changed field was found. If the diff_type is DiffTypes.ADDED, then this will be the location of the record the field was added in, not the (non-existant) field itself.

other

A FieldSelector object referring to the location within the ‘other’ object that the changed field was found. If the diff_type is DiffTypes.REMOVED, then this will be location of the record the field was removed from, not the (non-existant) field itself.

class normalize.diff.DiffTypes[source]

A richenum.OrderedRichEnum type to denote the type of an individual difference.

NO_CHANGE = <EnumValue #1: none ('UNCHANGED')>
ADDED = <EnumValue #2: added ('ADDED')>
REMOVED = <EnumValue #3: removed ('REMOVED')>
MODIFIED = <EnumValue #4: modified ('MODIFIED')>
class normalize.diff.DiffOptions(ignore_ws=True, ignore_case=False, unicode_normal=True, unchanged=False, ignore_empty_slots=False, ignore_empty_items=False, duck_type=False, extraneous=False, compare_filter=None, fuzzy_match=True)[source]

Optional data structure to pass diff options down. Some functions are delegated to this object, allowing for further customization of operation, forming the DiffOptions sub-class API.

__init__(ignore_ws=True, ignore_case=False, unicode_normal=True, unchanged=False, ignore_empty_slots=False, ignore_empty_items=False, duck_type=False, extraneous=False, compare_filter=None, fuzzy_match=True)[source]

Create a new DiffOptions instance.

args:

ignore_ws=BOOL
Ignore whitespace in strings (beginning, end and middle). True by default.
ignore_case=BOOL
Ignore case differences in strings. False by default.
unicode_normal=BOOL
Ignore unicode normal form differences in strings by normalizing to NFC before comparison. True by default.
unchanged=BOOL
Yields DiffInfo objects for every comparison, not just those which found a difference. Defaults to False. Useful for testing.
ignore_empty_slots=BOOL
If true, slots containing typical ‘empty’ values (by default, just '' and None) are treated as if they were not set. False by default.
ignore_empty_items=BOOL
If true, items are considered to be absent from collections if they have all None, not set, or '' in their primary key fields (all compared fields in the absence of a primary key definition). False by default.
duck_type=BOOL

Normally, types must match or the result will always be normalize.diff.DiffTypes.MODIFIED and the comparison will not descend further.

However, setting this option bypasses this check, and just checks that the ‘other’ object has all of the properties defined on the ‘base’ type. This can be used to check progress when porting from other object systems to normalize.

fuzzy_match=BOOL
Enable approximate matching of items in collections, so that finer granularity of changes are available.
compare_filter=MULTIFIELDSELECTOR|LIST_OF_LISTS
Restrict comparison to the fields described by the passed MultiFieldSelector (or list of FieldSelector lists/objects)
items_equal(a, b)[source]

Sub-class hook which performs value comparison. Only called for comparisons which are not Records.

normalize_whitespace(value)[source]

Normalizes whitespace; called if ignore_ws is true.

normalize_unf(value)[source]

Normalizes Unicode Normal Form (to NFC); called if unicode_normal is true.

normalize_case(value)[source]

Normalizes Case (to upper case); called if ignore_case is true.

value_is_empty(value)[source]

This method decides whether the value is ‘empty’, and hence the same as not specified. Called if ignore_empty_slots is true. Checking the value for emptiness happens after all other normalization.

normalize_text(value)[source]

This hook is called by DiffOptions.normalize_val() if the value (after slot/item normalization) is a string, and is responsible for calling the various normalize_foo methods which act on text.

normalize_val(value=(not set))[source]

Hook which is called on every value before comparison, and should return the scrubbed value or self._nothing to indicate that the value is not set.

normalize_slot(value=(not set), prop=None)[source]

Hook which is called on every record slot; this is a way to perform context-aware clean-ups.

args:

value=nothing|anything
The value in the slot. nothing can be detected in sub-class methods as self._nothing.
prop=PROPERTY
The slot’s normalize.property.Property instance. If this instance has a compare_as method, then that method is called to perform a clean-up before the value is passed to normalize_val
normalize_item(value=(not set), coll=None, index=None)[source]

Hook which is called on every collection item; this is a way to perform context-aware clean-ups.

args:

value=nothing|anything
The value in the collection slot. nothing can be detected in sub-class methods as self._nothing.
coll=COLLECTION
The parent normalize.coll.Collection instance. If this instance has a compare_item_as method, then that method is called to perform a clean-up before the value is passed to normalize_val
index=HASHABLE
The key of the item in the collection.
record_id(record, type_=None, selector=None)[source]

Retrieve an object identifier from the given record; if it is an alien class, and the type is provided, then use duck typing to get the corresponding fields of the alien class.

Comparison functions

These functions call each other recursively, depending on the value encountered during walking the base object. They are documented here to give insight into the workings of more user-facing APIs such as diff, but should be considered to be implementation details.

normalize.diff.compare_record_iter(a, b, fs_a=None, fs_b=None, options=None)[source]

This generator function compares a record, slot by slot, and yields differences found as DiffInfo objects.

args:

a=Record
The base object
b=Record|object
The ‘other’ object, which must be the same type as a, unless options.duck_type is set.
fs_a=*FieldSelector*
The current diff context, prefixed to any returned base field in yielded DiffInfo objects. Defaults to an empty FieldSelector.
fs_b=*FieldSelector*
The other object context. This will differ from fs_a in the case of collections, where a value has moved slots. Defaults to an empty FieldSelector.
options=*DiffOptions*
A constructed DiffOptions object; a default one is created if not passed in.
normalize.diff.compare_collection_iter(propval_a, propval_b, fs_a=None, fs_b=None, options=None)[source]

Generator function to compare two collections, and yield differences. This function does not currently report moved items in collections, and uses the DiffOptions.record_id() method to decide if objects are to be considered the same, and differences within returned.

Arguments are the same as compare_record_iter().

Note that diff_iter and compare_record_iter will call both this function and compare_record_iter on RecordList types. However, as most RecordList types have no extra properties, no differences are yielded by the compare_record_iter method.

normalize.diff.compare_list_iter(propval_a, propval_b, fs_a=None, fs_b=None, options=None)[source]

Generator for comparing ‘simple’ lists when they are encountered. This does not currently recurse further. Arguments are as per other compare_X functions.

normalize.diff.compare_dict_iter(propval_a, propval_b, fs_a=None, fs_b=None, options=None)[source]

Generator for comparing ‘simple’ dicts when they are encountered. This does not currently recurse further. Arguments are as per other compare_X functions.

normalize.diff.collection_generator(collection)[source]

This function returns a generator which iterates over the collection, similar to Collection.itertuples(). Collections are viewed by this module, regardless of type, as a mapping from an index to the value. For sets, the “index” is the value itself (ie, (V, V)). For dicts, it’s a string, and for lists, it’s an int.

In general, this function defers to itertuples and/or iteritems methods defined on the instances; however, when duck typing, this function typically provides the generator.

Table Of Contents

Previous topic

Referring to components of Records - normalize.selector

Next topic

normalize.visitor reference

This Page