Adding Custom Data to X.509 SSL Certificates

Signed SSL certificates have a feature known as “extensions”. In order for them to be there, they must be in the CSR. Therefore, CSR’s support them too. Although X.509 certificates are not meant for a lot of data and were never meant to act as databases (rather, an identity with associated information), they act as a great solution when you need to store secured information alongside your application at a client site. Though the data is viewable, you have the following guarantees:

  • The data (including the extensions) can not be interfered with, or it’ll fault its signatures.
  • The certificate will expire at a set time (and can be renewed if need be).
  • A certificate-revocation list (CRL) can be implemented (using a CRL distribution point, or “CDP”) so that you can invalidate a certificate remotely.

As long as you don’t care about keep the data secret, this makes extensions an ideal solution to a problem like on-site client licenses, where your software needs to regularly check whether the client still has permission to operate. You can also use a CRL to disable them if they stop paying their bill.

These extensions accommodate data that goes beyond the distinguished-name (DN) fields (locality, country, organization, common-name, etc..), chain of trust, key fingerprints, the signatures that guarantee the trustworthiness of the certificate (using the signature of the CA), and the integrity of the certificate (the signature of the certificate contents). Extensions seem relatively easy to add to certificates, whether you’re creating CSRs from code or from command-line. They’re just manageably-sized strings (though it technically seems like there is no official length limit) of human-readable text.

If you own the CA, then you might also create your own extensions. In this case, you’ll refer to your extension with a unique dotted identifier called an “OID” (we’ll go into this in the ASN.1 explanation below). Libraries might have trouble if you just refer to your own extension without properly registering it with your library prior. For example, OpenSSL has the ability to register and use custom extensions, but the M2Crypto SSL library doesn’t expose the registration call, and, therefore, can’t use custom extensions.

Unsupported extensions might be skipped or omitted from the signed certificate by a CA that doesn’t recognize/support them, so beware that you’ll need to stick to the popular extensions if you can’t use your own CA. Extensions that are mandatory for you requirements can be marked as “critical”, so that signing won’t precede if any of your extensions aren’t recognized.

The extension that we’re interested in, here, is “subjectAltName”, and it is recognized/supported by all CAs. This extension can describe the “alternative subjects” (using DNS-type entries) that you might need to specify if your X.509 needs to be used with more than one common-name (more than one hostname). It can also describe email-addresses and other kinds of identity information. However, it can also store custom text.

This is an example of two “subjectAltName” extensions (multiple instances of the same extensions can be present in a certificate):

DNS:server1.yourdomain.tld, DNS:server2.yourdomain.tld
otherName:1.3.6.1.4.1.99;UTF8:This is arbitrary data.

However, due to details soon to follow, it’s very difficult to pull the extension text back out, again. In order to go further, we have to take a quick diversion into certificate structure. This isn’t necessarily required, but it is information that is obscure-enough to find that you won’t have any coping skills if you encounter issues, otherwise.

Certificate Encoding

All of the standard, human-readable, SSL documents, such as the private-key, public-key, CSR, and X.509, are encoded in a format called PEM. This is base64-encoded data with anchors (e.g. “—–BEGIN DSA PRIVATE KEY—–“) on the top and bottom.

In order to have any use, a PEM-encoded document must be converted to a DER-encoded document. This just means that it’s stripped of the anchors and newlines, and then base64-decoded. DER is a tighter subset of “BER” encoding.

ASN.1 Encoding

The DER-encoding describes an ASN.1 data structure node. ASN.1 combines a tree of data with a tree of grammar specifications, and reduces down to hierarchical sets of DER-encoded data. All nodes (called “tags”) are represented by dot-separated identifiers called OIDs (mentioned above). Usually these are officially-assigned OIDs, but you might have some custom ones if you don’t have to pass your certificates to higher authority that might have a problem with them.

In order to decode the structure, you must walk it, applying the correct specs as required. There is nothing self-descriptive within the data. This makes it fast, but useless until you have enough pre-existing knowledge to descend to the information you require.

The specification for the common grammars (like RFC 2459 for X.509) in ASN.1 is so massive that you should expect to avoid getting involved in the mechanics at all costs, and to learn how to survive with the limited number of libraries already available. In all likelihood, a need for anything outside the realm of popular usage will require a non-trivial degree of debugging.

ASN.1 has been around… for a while (about thirty years, as of this year). It’s obtuse, impossible, and not understood in great deal by very few individuals. However, it’s going to be here for a while.

Extension Decoding

The reason that extensions are tough to decode is because the encoding depends on the text that you put in the extension. Specifically, the “otherName” and “UT8” parts. OpenSSL can’t present these values when it dumps the certificate, because it just doesn’t have enough information to decode them. M2Crypto, since it uses OpenSSL, has the same problem.

Now that we’ve introduced a little of the conceptual ASN.1 structure, let’s go back to the previous subjectAltName “otherName” example:

otherName:1.3.6.1.4.1.99;UTF8:This is arbitrary data.

The following is the breakdown:

  1. “otherName”: A classification of the subjectAltName extension that indicates custom-data. This has an OID of its own in the RFC 2459 grammar.
  2. 1.3.6.1.4.1.99: The OID of your company. The first eight parts comprise the common-prefix, combined with a “private enterprise number” (PEN). You can register for your own.
  3. Custom data, prefixed with a type. The “UTF8” prefix determines the encoding of the data, but is not itself included.

I used the following calls to M2Crypto to add these extensions to the X.509:

ext = X509.new_extension(
        'subjectAltName',
        'otherName:1.3.6.1.4.1.99;UTF8:This is arbitrary data.'
    )

ext.set_critical(1)
cert.add_ext(ext)

Aside from the extension information itself, I also indicate that it’s to be considered “critical”. Signing will fail if the CA doesn’t recognize the extension, and not simply omit it. When this gets encoded, it’ll be encoded as three separate “components”:

  1. The OID for the “otherName” type.
  2. The “critical” flag.
  3. A DER-encoded sequence of the PEN and the UTF8-encoded string.

It turns out that it’s quicker to use a library that specializes in ASN.1, rather than trying to get the information from OpenSSL. After all, it’s out-of-scope as it’s colocated with cryptographical data while not being cryptographical itself.

I used pyasn1.

Decoding Our Extension

To decode the string from the previous extension:

  1. Enumerate the extensions.
  2. Decode the third component (mentioned above) using the RFC 2459 “subjectAltName” grammar.
  3. Descend to the first component of the “SubjectAltName” node: the “GeneralName” node.
  4. Descend to the first component of the “General Name” node: the “AnotherName” nerve.
  5. Match the OID against the OID we’re looking for.
  6. Decode the string using the RFC 2459 UTF8 specification.

This is a dump of the structure using pyasn1:

SubjectAltName().
   setComponentByPosition(
       0, 
       GeneralName().
           setComponentByPosition(
               0, 
               AnotherName().
                   setComponentByPosition(
                       0, 
                       ObjectIdentifier(1.3.6.1.5.5.7.1.99)
                   ).
                   setComponentByPosition(
                       1, 
                       Any(hexValue='0309006465616462656566')
                   )
           )
   )

The process might seem easy, but this took some work (and collaboration) to get right, with the primary difficulty coming from obscurity meeting unfamiliarity. However, the process should be somewhat set in stone, every time.

This is the corresponding code. “cert” is an M2Crypto X.509 certificate:

cert, rest = decode(cert.as_der(), asn1Spec=rfc2459.Certificate())

extensions = cert['tbsCertificate']['extensions']
for extension in extensions:
    extension_oid = extension.getComponentByPosition(0)
    print("0 [%s]" % (repr(extension_oid)))

    critical_flag = extension.getComponentByPosition(1)
    print("1 [%s]" % (repr(critical_flag)))

    sal_raw = extension.getComponentByPosition(2)
    print("2 [%s]" % (repr(sal_raw)))

    (sal, r) = decode(sal_raw, rfc2459.SubjectAltName())
    
    gn = sal.getComponentByPosition(0)
    an = gn.getComponentByPosition(0)

    oid = an.getComponentByPosition(0)
    string = an.getComponentByPosition(1)

    print("[%s]" % (oid))

    # Decode the text.

    s, r = decode(string, rfc2459.UTF8String())

    print("Decoded: [%s]" % (s))
    print('')

Wrap Up

I wanted to provide an end-to-end tutorial in adding and retrieving “otherName”-type “subjectAltName” extensions because none currently exists. It’s a good solution for keeping data safe on someone else’s assets (as long as you don’t overburden the certificate with extensions, as it’ll decrease the efficiency to verify).

Don’t forget to implement the CRL/CDP, or you won’t have the possibility of faulting the certificate (and its extensions) without having to wait for them to expire.

Writing and Reading 7-Zip Archives From Python

I don’t often need to read or write archives from code. When I do, and I don’t want to call a tool via shell-commands, I’ll use zip-files. Obviously there are better formats out there, but when it comes to library compatibility, tar and zip are the easiest possible formats to manipulate. If you’re desperate, you can even write a quick tar archiver with relative simplicity (the headers are mostly ASCII).

Obviously, the emphasis here has been on availability. My preferred format is 7-Zip (which uses LZMA compression). Though you don’t often see 7-Zip archives for download, I’ve been using this format for eight-years and haven’t looked back. The compression is good and the tool is every bit as easy as zip.

Unfortunately, there’s limited support for 7-Zip in Python. To the best of my knowledge, only the libarchive Python package can read and write 7-Zip archives. The libarchive Python package is developed and supported separately from the C library that it implements.

Though the library is structured to support any format that the libarchive library can (all major formats, and probably all of the minor ones), the Python project is outrightly labeled as a work-in-progress. 7-Zip is the only format explicitly supported for both reading and writing. Fortunately, it also supports libarchive‘s autodetection functionality. So, you can read/expand any archive, as long as you can afford the extra couple of milliseconds that the detection will cost you.

The focus of this project is to provide elegant archiving routines. Most of the API functions are implemented as generators.

Example

To enumerate the entries in an archive:

import libarchive

with libarchive.reader('test.7z') as reader:
    for e in reader:
        # (The entry evaluates to a filename.)
        print("> %s" % (e))

To extract the entries from an archive to the current directory (like a normal, Unix-based extraction):

import libarchive

for state in libarchive.pour('test.7z'):
    if state.pathname == 'dont/write/me':
        state.set_selected(False)
        continue

    # (The state evaluates to a filename.)
    print("Writing: %s" % (state))

To build an archive from a collection of files (omit the target for stdout):

import libarchive

for entry in libarchive.create(
                '7z', 
                ['/aa/bb', '/cc/dd'], 
                'create.7z'):
    print("Adding: %s" % (entry))

Installing Xcode Command Line Tools for Mavericks (Problems)

I had a perfectly running development environment under Mavericks 10.9.1 . I’m not, by nature, someone who would prefer to use a Mac, but sometimes we have to take what we’re given, and, at least, it’s Unix-based.

I was surviving without having to install Xcode, until recently when I had to investigate Apple’s illegally-modified “pngcrush” utility. I required Xcode in order to get the iPhone optimizations. Otherwise, I just got the standard version of the open-source utility. So, I installed it.

Yesterday, I had to install/build the Python “cryptography” module, which requires a C build-environment. Now, I had some cc/gcc discrepancies, and one unsupported command-line argument. Obviously, it’s Xcode. So, I uninstalled it. I innocently also installed the 10.9.1->10.9.2 Mavericks update at the same time.

Catastrophe. Now, the same stuff is broken, and I get warnings every single time I invoke Brew:

Warning: No developer tools installed.
You should install the Command Line Tools.
Run `xcode-select --install` to install them.

When I run xcode-select, I get a dialog that says the command-line tools are required, one button for installing them, and another for the full Xcode install. When I click to install the tools, I got the EULA and then a progress-bar that said “Finding Software”, only to give me a message:

Can't install the software because it is not currently available from the Software Update server.

I had to physically go and download the dmg package: http://developer.apple.com/downloads

However, when I dragged the pkg file into Applications and ran it, I inevitably ran into the following message, every time:

The installation failed.

The Installer can't locate the data it needs to install the software. Check your install media or Internet connection and try again, or contact the software manufacturer for assistance.

It turns out that it expected to be run directly from the dmg container. It looks like everything is working now (with only the command-line tools, and not requiring the whole Xcode install).

Though I’m still investigating the build errors I now having, the emphasis of this post is how to remedy the Xcode/tools errors that I was seeing.

Tool to Quickly Create Upstart Jobs

Upstart is a monumental improvement over the classical SysV mechanism for Unix/Linux process/daemon management. Still, it’s a somewhat manual process to create jobs. I’ve previously written about the Upstart library that provides the ability to start and stop jobs (using D-Bus), as well as build jobs.

However, the Upstart library also provides two command-line tools:

  • upstart-create: Create Upstart jobs using reasonable defaults.
  • upstart-reload: Send a signal to Upstart to reload jobs.

Of particular note is the first tool. It’ll take a couple of options, and write a new job file (in /etc/init). The example from the project website (which displays to the screen rather than write a job file):

$ upstart-create test-job /bin/sh -j -d "some description" -a "some author "
description "some description"
author "some author "
exec /bin/sh
start on runlevel [2345]
stop on runlevel [016]
respawn 

Reading Keypresses Under Python

An elegant solution for reading a individual keypresses under Python.

import termios, sys, os

def read_keys():
    fd = sys.stdin.fileno()
    old = termios.tcgetattr(fd)
    new = termios.tcgetattr(fd)
    new[3] = new[3] & ~termios.ICANON & ~termios.ECHO
    new[6][termios.VMIN] = 1
    new[6][termios.VTIME] = 0
    termios.tcsetattr(fd, termios.TCSANOW, new)
    try:
        while 1:
            yield os.read(fd, 1)
    finally:
        termios.tcsetattr(fd, termios.TCSAFLUSH, old)

Example:

>>> for key in read_keys():
...   print("KEY: %s" % (key))
... 
KEY: g
KEY: i
KEY: f
KEY: d
KEY: s
KEY: w
KEY: e

Inspired by this.

Simplified Protocol Buffers for Socket Communication

Protocol Buffers (“protobuf”) is a Google technology that lets you define messages declaratively, and then build library code for a myriad of different programming-languages. The way that messages are serialized is efficient and effortless, and protobuf allows for simple string assignment (without predefining a length), arrays and optional values, and sub-messages.

The only tough part comes during implementation. As protobuf is only concerned with serialization/unserialization, it’s up to you to deal with the logistics of sending the message, and this means that, for socket communication, you often have to:

  1. Copy and paste the code to prepend a length.
  2. Copy/paste/adapt existing code that embeds a type-identifier on outgoing requests, and reads the type-identifier on incoming requests in order to automatically handle/route messages (if this is something that you want, which I often do).

This quickly becomes redundant and mundane, and it’s why we’re about to introduce protobufp (“Protocol Buffers Processor”).

We can’t improve on the explanation on the project-page. Therefore, we’ll just provide the example.

We’re going to build some messages, push into a StringIO-based byte-stream (later to be whatever type of stream you wish), read them into the protobufp “processor” object, and retrieve one fully-unserialized message at a time until depleted:

from test_msg_pb2 import TestMsg

from protobufp.processor import Processor

def get_random_message():
    rand = lambda: randint(11111111, 99999999)

    t = TestMsg()
    t.left = rand()
    t.center = "abc"
    t.right = rand()

    return t

messages = [get_random_message() for i in xrange(5)]

Create an instance of the processor, and give it a list of valid message-types (the order of this list should never change, though you can append new types to the end):

msg_types = [TestMsg]
p = Processor(msg_types)

Use the processor to serialize each message and push them into the byte-stream:

s = StringIO()

for msg in messages:
    s.write(p.serializer.serialize(msg))

Feed the data from the byte stream into the processor (normally, this might be chunked-data from a socket):

p.push(s.getvalue())

Pop one decoded message at a time:

j = 0
while 1:
    in_msg = p.read_message()
    if in_msg is None:
        break

    assert messages[j].left == in_msg.left
    assert messages[j].center == in_msg.center
    assert messages[j].right == in_msg.right

    j += 1

Now there’s one less annoying task to distract you from your critical path.

Creating and Controlling OS Services from Python

One important deployment task of server software is to not only deploy the software and then start it, but to enable it to be automatically started and monitored by the OS at future reboots. The most modern solution for this type of management is Upstart. You access Upstart every time you call “sudo service apache2 restart”, and whatnot. Upstart is sponsored by Ubuntu (more specifically, Canonical).

Upstart configs are located in /etc/init (we’re slowly, slowly approaching the point where we might one day be able to get rid of the System-V init scripts, in /etc/init.d). To create a job, you drop a “xyz.conf” file into /etc/init, and Upstart should automatically become aware of it via inotify. To query Upstart (including starting and stopping jobs), you emit a D-Bus message.

So, what about elegantly automating the creation of a job for the service from your Python deployment code? There is exactly one solution for doing so, and it’s a Swiss Army Knife for such a task.

We’re going to use the Python upstart library to build a job and then write it (in fact, we’re just going to share one of their examples, for your convenience). The library also allows for listing the jobs on the system, getting statuses, and starting/stopping jobs, among other things, but we’ll leave it to you to experiment with this, when you’re ready.

Build a job that starts and stops on the normal run-levels, respawns when it terminates, and runs a single command (a non-forking process, otherwise we’d have to add the ‘expect’ stanza as well):

from upstart.job import JobBuilder

jb = JobBuilder()

# Build the job to start/stop with default runlevels to call a command.
jb.description('My test job.').\
   author('Dustin Oprea <dustin@nowhere.com>').\
   start_on_runlevel().\
   stop_on_runlevel().\
   run('/usr/bin/my_daemon')

with open('/etc/init/my_daemon.conf', 'w') as f:
    f.write(str(jb))

Remember to run this as root. The job output looks like this:

description "My test job."
author "Dustin Oprea <dustin@nowhere.com>"
start on runlevel [2345]
stop on runlevel [016]
respawn 
exec /usr/bin/my_daemon