How to use Python’s “diff” functionality

Python natively provides the ability to compare two documents, and to produce a set of patch instructions to get from one to the other. This is often useful to 1) provide quick insight into the differences between the two (if any), or 2) provide a list of changes to derive a later version of a document, which is typically much lighter than sending the whole, updated document if the original document is already available on the receiving end.

This is an example of the few lines of code required to generate a list of those instructions, and how to apply them to derive the updated document.

Our documents, for the purpose of this example:

original = """
line1
line2
line3
line4
line5
"""

updated = """
line3
line4
line5
line6
line7
"""

If all you want is to get a list of adds and removes, use ndiff:

from difflib import ndiff

def get_updates(original, updated):
    """Return a 2-tuple of (adds, removes) describing the changes to get from
    ORIGINAL to UPDATED.
    """

    diff = ndiff(original.split("\n"), updated.split("\n"))

    adds = set()
    deletes = set()
    for row in diff:
        diff_type = row[0]
        if diff_type == ' ':
            continue

        entry = row[2:]

        if diff_type == '+':
            adds.add(entry)
        elif diff_type == '-':
            deletes.add(entry)

    return (list(adds), list(deletes))

Run:

updates = get_updates(original, updated)

updates contains a 2-tuple of adds and removes, respectively:

(['line7', 'line6'], ['line2', 'line1'])

If, on the other hand, you do actually need a full set of patch instructions, use SequenceMatcher:

from difflib import SequenceMatcher

def get_transforms(original, updated):
    """Get a list of patch instructions to get from ORIGINAL to UPDATED."""

    s = SequenceMatcher(None, original, updated)

    tag_mapping = { 'delete': '-',
                    'insert': '+',
                    'replace': '>' }

    transforms = []
    for tag, i1, i2, j1, j2 in s.get_opcodes():
        if tag == 'delete':
            transform = ('-', (i1, i2))
        elif tag == 'insert':
            transform = ('+', (i1, i2), updated[j1:j2])
        elif tag == 'replace':
            transform = ('>', (i1, i2), updated[j1:j2])
        else:
            transform = ('=', (i1, i2), (j1, j2))

        transforms.append(transform)

    return transforms

def apply_transforms(original, transforms):
    """Execute the transform instructions returned from get_transforms() to
    derive UPDATED from ORIGINAL.
    """

    updated = []
    for transform in transforms:
        if transform[0] == '-':
            pass
        elif transform[0] == '+':
            updated.append(transform[2])
        elif transform[0] == '>':
            updated.append(transform[2])
        else: # Equals.
            updated.append(original[transform[1][0]:transform[1][1]])

    return ''.join(updated)

Run:

transforms = get_transforms(original, updated)

transforms contains:

[('-', (0, 12)), ('=', (12, 31), (0, 19)), ('+', (31, 31), 'line6\nline7\n')]

To derive updated from original:

updated_derived = apply_transforms(original, transforms)
print(updated == updated_derived)

Which displays:

True

Writing Your Own Timezone Implementation for Python

Python has the concept of “naive” and “aware” times. The former refers to a timezone-capable date/time object that hasn’t been assigned a timezone, and the latter refers to one that has.

However, Python only provides an interface for “tzinfo” implementations: classes that define a particular timezone. It does not provide the implementations themselves. So, you either have to do your own implementations, or use something like the widely used “pytz” or “pytzpure” (a pure-Python version).

This is a quick example of how to write your own, courtesy of Google:

from datetime import tzinfo, timedelta, datetime


class _TzBase(tzinfo):
    def utcoffset(self, dt):
        return timedelta(hours=self.get_offset()) + self.dst(dt)

    def _FirstSunday(self, dt):
        """First Sunday on or after dt."""
        return dt + timedelta(days=(6 - dt.weekday()))

    def dst(self, dt):
        # 2 am on the second Sunday in March
        dst_start = self._FirstSunday(datetime(dt.year, 3, 8, 2))
        # 1 am on the first Sunday in November
        dst_end = self._FirstSunday(datetime(dt.year, 11, 1, 1))

        if dst_start <= dt.replace(tzinfo=None) < dst_end:
            return timedelta(hours=1)
        else:
            return timedelta(hours=0)

    def tzname(self, dt):
        if self.dst(dt) == timedelta(hours=0):
            return self.get_tz_name()
        else:
            return self.get_tz_with_dst_name()

    def get_offset(self):
        """Returns the offset in hours (-5)."""
        
        raise NotImplementedError()

    def get_tz_name(self):
        """Returns the standard acronym (EST)."""
        
        raise NotImplementedError()
    
    def get_tz_with_dst_name(self):
        """Returns the DST version of the acronym ('EDT')."""        
        
        raise NotImplementedError()


class TzGmt(_TzBase):
    """Implementation of the EST timezone."""

    def get_offset(self):
        return 0

    def get_tz_name(self):
        return 'GMT'
    
    def get_tz_with_dst_name(self):
        return 'GMT'


class TzEst(_TzBase):
    """Implementation of the EST timezone."""

    def get_offset(self):
        return -5

    def get_tz_name(self):
        return 'EST'
    
    def get_tz_with_dst_name(self):
        return 'EDT'

Use it, like so:

from datetime import datetime

now_est = datetime.now().replace(tzinfo=TzEst())
now_gmt = now_est.astimezone(TzGmt())

This produces a datetime object with an EST timezone, and then uses it to produce a GMT time.

Using “dialog” for Nice, Easy, C-Based Console Dialogs

dialog is a great command-line-based dialog tool that let’s you construct twenty-three types of dialog screens, that resemble the best of any available dialog utilities.

It’s as simple as running the following from the command-line:

dialog --yesno "Yes or no, please." 6 30

Very few of the users of dialog probably know that it can be statically linked to provide the same functionality in a C application. It doesn’t help that there is almost no documentation on the subject.

This is an example of how to create a “yesno” dialog:

#include <curses.h>
#include <dialog.h>

int main()
{
    int rc;
    init_dialog(stdin, stderr);
    rc = dialog_yesno("title", "message", 0, 0);
    end_dialog();

    return rc;
}

I explicitly pre-include curses.h so dialog.h won’t go looking in the wrong place. It might be different in your situation.

To build:

gcc -o example example.c -L dialogpath -I dialogpath -ldialog -lncurses -lm

Just configure and build your dialog sources, and then use that path in the make line, above.

This program will return an integer representing which button was pressed (true/0, false/1), or whether the dialog was cancelled with ESC (255).