Using HMACs instead of Plain Hashes for Security

We all know that everyone should be storing passwords as cryptographic hashes (SHA256, for example), these days. The Internet has now been around too long for people to not know this. In addition, engineers that write communications code will often employ “message authentication codes” (“MACs”). These are hashes calculated over data that is passed back and forth from the remote system, and can be used to verify that various data hasn’t been modified (preventing “man in the middle” attacks).

However, it is much lesser known that hashes used in the manner that they often are widely vulnerable to attack. For example (in Python):

from hashlib import sha256

salt = b"3qk4yfbgql"
message = b"some data"
salted_data = salt + message
digest_ = sha256(salted_data).hexdigest()

This is no good. Our hash functions are referred to as “iterative hash functions”. In other words, they calculate the hash by iterating, one chunk at a time, through the cleartext. They suffer from what Bruce Schneier refers to as the “length extension bug”. In other words, it’s trivial for a third-party to append data to the message, even though it’s salted, and calculate another, completely valid, MAC. If you move the salt to the end of the message, there are different problems.

Enter the HMAC (“keyed hashed message authentication code”). An HMAC tool will take the salt, data, and a hash function, and do this: hash(salt + hash(salt + message))

import hmac
from hashlib import sha256

salt = b"3qk4yfbgql"
message = b"some data"
hmac_ = hmac.new(salt, message, sha256)
digest_ = hmac_.hexdigest()

Problem solved (for now).

Scheduling Cron Jobs From Python

Inevitably, you’ll have to write a post-install routine in Python that has to schedule a Cron job. For this, python-crontab is a great solution.

Create the job using the following. This will not commit the changes, yet.

from crontab import CronTab    
cron = CronTab()
job = cron.new(command='/usr/bin/tool')

There are at least three ways to configure the schedule:

Using method calls:

job.minute.during(5,50).every(5)
job.hour.every(4)

Using an existing crontab row:

job.parse("* * * * * /usr/bin/tool")

Obviously in the last case, you can omit the command when we created the job.

Using an alias:

job.special = '@daily'

You can commit your changes by then calling:

job.write()

This is a very limited introduction that elaborates on how to use aliases and parsing to schedule, since the documentation doesn’t go into it, but omits other features such as:

  • indicating the user to schedule for (the default is “root”)
  • finding existing jobs by command or comment
  • enumerating jobs
  • enumerating the schedule that a particular job will follow
  • enumerating the logs for a particular job
  • removing jobs
  • editing a crontab file directly, or building one in-memory
  • enabling/disabling jobs

The official documentation (follow the link, above) describes all of the features on this list.

A Console-Based Form

I just uploaded a Python tool called “text-prompts” that takes a dictionary, presents a list of prompts to the user, and returns a dictionary.

For more detail, go here: text_prompts.txt

Example:

    from text_prompts import text_prompts
    text_prompts({ 'prompt1': ('Prompt 1', True, None), 
                   'prompt2': ('Prompt 2', False, 'default')})

Output:

    Prompt 1 (req): first response
    Prompt 2 [CTRL+D for "default"]: second response

Result:

    {'prompt1': 'first response', 'prompt2': 'second response'}

How to use Python’s “diff” functionality

Python natively provides the ability to compare two documents, and to produce a set of patch instructions to get from one to the other. This is often useful to 1) provide quick insight into the differences between the two (if any), or 2) provide a list of changes to derive a later version of a document, which is typically much lighter than sending the whole, updated document if the original document is already available on the receiving end.

This is an example of the few lines of code required to generate a list of those instructions, and how to apply them to derive the updated document.

Our documents, for the purpose of this example:

original = """
line1
line2
line3
line4
line5
"""

updated = """
line3
line4
line5
line6
line7
"""

If all you want is to get a list of adds and removes, use ndiff:

from difflib import ndiff

def get_updates(original, updated):
    """Return a 2-tuple of (adds, removes) describing the changes to get from
    ORIGINAL to UPDATED.
    """

    diff = ndiff(original.split("\n"), updated.split("\n"))

    adds = set()
    deletes = set()
    for row in diff:
        diff_type = row[0]
        if diff_type == ' ':
            continue

        entry = row[2:]

        if diff_type == '+':
            adds.add(entry)
        elif diff_type == '-':
            deletes.add(entry)

    return (list(adds), list(deletes))

Run:

updates = get_updates(original, updated)

updates contains a 2-tuple of adds and removes, respectively:

(['line7', 'line6'], ['line2', 'line1'])

If, on the other hand, you do actually need a full set of patch instructions, use SequenceMatcher:

from difflib import SequenceMatcher

def get_transforms(original, updated):
    """Get a list of patch instructions to get from ORIGINAL to UPDATED."""

    s = SequenceMatcher(None, original, updated)

    tag_mapping = { 'delete': '-',
                    'insert': '+',
                    'replace': '>' }

    transforms = []
    for tag, i1, i2, j1, j2 in s.get_opcodes():
        if tag == 'delete':
            transform = ('-', (i1, i2))
        elif tag == 'insert':
            transform = ('+', (i1, i2), updated[j1:j2])
        elif tag == 'replace':
            transform = ('>', (i1, i2), updated[j1:j2])
        else:
            transform = ('=', (i1, i2), (j1, j2))

        transforms.append(transform)

    return transforms

def apply_transforms(original, transforms):
    """Execute the transform instructions returned from get_transforms() to
    derive UPDATED from ORIGINAL.
    """

    updated = []
    for transform in transforms:
        if transform[0] == '-':
            pass
        elif transform[0] == '+':
            updated.append(transform[2])
        elif transform[0] == '>':
            updated.append(transform[2])
        else: # Equals.
            updated.append(original[transform[1][0]:transform[1][1]])

    return ''.join(updated)

Run:

transforms = get_transforms(original, updated)

transforms contains:

[('-', (0, 12)), ('=', (12, 31), (0, 19)), ('+', (31, 31), 'line6\nline7\n')]

To derive updated from original:

updated_derived = apply_transforms(original, transforms)
print(updated == updated_derived)

Which displays:

True

Writing Your Own Timezone Implementation for Python

Python has the concept of “naive” and “aware” times. The former refers to a timezone-capable date/time object that hasn’t been assigned a timezone, and the latter refers to one that has.

However, Python only provides an interface for “tzinfo” implementations: classes that define a particular timezone. It does not provide the implementations themselves. So, you either have to do your own implementations, or use something like the widely used “pytz” or “pytzpure” (a pure-Python version).

This is a quick example of how to write your own, courtesy of Google:

from datetime import tzinfo, timedelta, datetime


class _TzBase(tzinfo):
    def utcoffset(self, dt):
        return timedelta(hours=self.get_offset()) + self.dst(dt)

    def _FirstSunday(self, dt):
        """First Sunday on or after dt."""
        return dt + timedelta(days=(6 - dt.weekday()))

    def dst(self, dt):
        # 2 am on the second Sunday in March
        dst_start = self._FirstSunday(datetime(dt.year, 3, 8, 2))
        # 1 am on the first Sunday in November
        dst_end = self._FirstSunday(datetime(dt.year, 11, 1, 1))

        if dst_start <= dt.replace(tzinfo=None) < dst_end:
            return timedelta(hours=1)
        else:
            return timedelta(hours=0)

    def tzname(self, dt):
        if self.dst(dt) == timedelta(hours=0):
            return self.get_tz_name()
        else:
            return self.get_tz_with_dst_name()

    def get_offset(self):
        """Returns the offset in hours (-5)."""
        
        raise NotImplementedError()

    def get_tz_name(self):
        """Returns the standard acronym (EST)."""
        
        raise NotImplementedError()
    
    def get_tz_with_dst_name(self):
        """Returns the DST version of the acronym ('EDT')."""        
        
        raise NotImplementedError()


class TzGmt(_TzBase):
    """Implementation of the EST timezone."""

    def get_offset(self):
        return 0

    def get_tz_name(self):
        return 'GMT'
    
    def get_tz_with_dst_name(self):
        return 'GMT'


class TzEst(_TzBase):
    """Implementation of the EST timezone."""

    def get_offset(self):
        return -5

    def get_tz_name(self):
        return 'EST'
    
    def get_tz_with_dst_name(self):
        return 'EDT'

Use it, like so:

from datetime import datetime

now_est = datetime.now().replace(tzinfo=TzEst())
now_gmt = now_est.astimezone(TzGmt())

This produces a datetime object with an EST timezone, and then uses it to produce a GMT time.

AppEngine Development Environment Module Restrictions

AppEngine has some very tight but obvious restrictions on what types of Python modules can be invoked from application code. The general rule of thumb is that modules that need filesystem access or C code can’t be used. So, which modules are allowed or disallowed? Which modules are partially implemented, or defined and completely empty (yes, there are/were some)?

Unfortunately, the only official list of such modules is very dated.

There was a point, in the not-too-distant past, that the reigning perception of AppEngine’s module support was that the development environment does no such restriction, leaving a dangerous and scary gap between what will definitely run on your system and what you can be sure will run in production.

It turns out that there is some protection in the development environment.. Maybe even complete protection.

The google/appengine/tools/devappserver2/python/sandbox.py module appears to be wholly responsible for the loading of modules. At the top, there’s a sys.meta_path assignment. This is what appears as of version 1.8.4:

  sys.meta_path = [
      StubModuleImportHook(),
      ModuleOverrideImportHook(_MODULE_OVERRIDE_POLICIES),
      BuiltinImportHook(),
      CModuleImportHook(enabled_library_regexes),
      path_override_hook,
      PyCryptoRandomImportHook,
      PathRestrictingImportHook(enabled_library_regexes)
      ]

This defines a series of module “finders” responsible for resolving imported modules. This is where restrictions are imposed. The following are descriptions/insights about each one.

StubModuleImportHook: Replaces complete modules with different ones.
ModuleOverrideImportHook: Adjust partially white-listed modules (symbols may be added, removed, or updated).
BuiltinImportHook: Imposes a white-list on builtin modules. This raises an ImportError on everything else.
CModuleImportHook: Imposes a white-list on C modules.
path_override_hook: Has an instance of PathOverrideImportHook. It looks like this module looks for modules in special paths (the kind scattered in the.
PyCryptoRandomImportHook: Fixes the loading of Crypto.Random.OSRNG.new .
PathRestrictingImportHook: Makes sure any remaining imports come out of an accessible path.

If you have a question of what specific modules are involved, look in the sandbox.py module mentioned above. The first four finders are relatively concrete. Most of their modules are expressed in lists.

A Pure-Python Implementation of “pytz”

There is a problem with the standard “pytz” package: It’s awesome, but can’t be used on systems that don’t allow direct file access. I created “pytzpure” to account for this. It allows you to build-out data files as Python modules. As long as these modules are put into the path, the “pytzpure” module will provide the same exports as the original “pytz” package.

For export:

PYTHONPATH=. python pytzpure/tools/tz_export.py /tmp/tzppdata

Output:

Verifying export path exists: /tmp/tzppdata
Verifying __init__.py .
Writing zone tree.
(578) timezones written.
Writing country timezones.
Writing country names.

To use:

from datetime import datetime
from pytzpure import timezone
utc = timezone('UTC')
detroit = timezone('America/Detroit')
datetime.utcnow().replace(tzinfo=utc).astimezone(detroit).\
strftime('%H:%M:%S %z')
'16:34:37 -0400'

Dumping Raw Python from Dictionary

I wrote a simple tool to generate a Python string-representation of the given data. Note that this renders data very similar to JSON, with the exception of the handling of NULLs.

Example usage:

get_as_python({ 'data1': { 'data22': { 'data33': 44 }},
                'data2': ['aa','bb','cc'],
                'data3': ('dd','ee','ff',None) })

Output (notice that a dict does not carry order, as expected):

data1 = {"data22":{"data33":44}}
data3 = ["dd","ee","ff",None]
data2 = ["aa","bb","cc"]

https://raw.github.com/dsoprea/RandomUtility/master/get_as_python.py

PySecure is now Python 3 Compatible

Changes to PySecure for Python 3 compatibility have now been checked in and pushed to PyPI.

A large amount of the labor went into refactoring nearly every occurrence of strings for string/bytes correctness. I also did an internal refactor of all of the tests (which largely just invoke a bunch of the functionalities and rely on the right exceptions to fail out when they should).

Unfortunately, I discovered that libssh’s reverse port-forwarding appears to be broken in 0.6.0 (which is incompatible with 0.5.5, for its authentication calls). This has been registered as bug #126 in their tracker.

Progress of GDriveFS (Google Drive FUSE Adapter)

The GDriveFS project has picked-up a lot of momentum in the last couple of months. I original wrote it because there didn’t exist any other FUSE-traditional implementations of a Google Drive client. Due to the massive amount of complexity involved in keeping track of an account’s filesystem organization and integrating a useful amount of GD’s feature set, only a handful of projects were created, and they were mostly very limited.

Needless to say, primary development lagged on for a while, but one year and two-hundred commits later, it has a following.

Thanks to all of those who have been involved. I have been getting regular community contributions/inquiries/bug-fixes. All are welcomed to get their feet wet in one way, or another.