Configuring Python applications with the config module

Contents

1   Introduction

This document describes config, a module for configuring Python programs which aims to offer more power and flexibility than the existing ConfigParser module. Python programs which are designed as a hierarchy of components can use config to configure their various components in a uniform way. This module is expected to be used with Python versions >= 2.2.

A complete API is available, and a test suite is included with the distribution - see the Download link above for further details.

2   Simple Usage

The simplest scenario is, of course, "Hello, world". Let's look at a very simple configuration file simple.cfg where a message to be printed is configured:

# The message to print (this is a comment)
message: 'Hello, world!'

and the program which uses it:

from config import Config

# You can pass any file-like object; if it has a name attribute,
# that name is used when file format error messages are printed
f = file('simple.cfg')
cfg = Config(f)
print cfg.message

which results in the expected:

Hello, world!

A configuration file is, at the top level, a list of key-value pairs. Each value, as we'll see later, can be a sequence or a mapping, and these can be nested without any practical limit.

In addition to attribute access (cfg.message in the example above), you can also access a value in the configuration using the getByPath method of a configuration: cfg.getByPath('message') would be equivalent. The parameter passed to getByPath is the path of the required value. The getByPath method is useful for when the path is variable. It could even be read from a configuration :-) There is also a get method which acts like the dictionary method of the same name - you can pass a default value which is returned if the value is not found in the configuration. The get method works with dictionary keys or attribute names, rather than paths. Hence, you may call cfg.getByPath('a.b') which is equivalent to cfg.a.b, or you can call cfg.a.get('b', 1234) which will return cfg.a.b if it is defined, and 1234 otherwise.

3   Evaluating values

So far, so obvious. Now, suppose that we need to print not to stdout, but to stdout or stderr depending on the configuration. Then, the modified configuration file might look like this:

# The message to print (this is a comment)
message: 'Hello, world!'
# The stream to print to (comments are of course optional)
stream: `sys.stderr`

Notice the use of backticks to indicate a special value. The corresponding program would be:

from config import Config

# You can pass any file-like object; if it has a name attribute,
# that name is used when error messages are printed
f = file('simple.cfg')
cfg = Config(f)
# The cfg attributes correspond to the keys in the
# configuration file
print >> cfg.stream, cfg.message

with the same result as before:

Hello, world!

Notice that the "sys.stderr" in backticks was apparently correctly evaluated. This is not a special case, but a generalized mechanism; you can provide any dotted-identifier expression in backticks and it will be evaluated against a list of namespaces you specify. The reason for the dotted-identifier mechanism is to provide some security - the system does not perform an unrestricted eval(). By default, the system supports sys and os modules, which gives easy access to environment variables (for example).

If you change the configuration file to:

# The message to print (this is a comment)
message: 'Hello, world!'
# The stream to print to (comments are of course optional)
stream: `sys.stderr`
value: `handlers.DEFAULT_TCP_LOGGING_PORT`

and the program to:

f = file('simple.cfg')
cfg = Config(f)
print >> cfg.stream, cfg.message, cfg.value

then running it as is would give rise to an error:

config.ConfigResolutionError: unable to evaluate `handlers.DEFAULT_TCP_LOGGING_PORT` in the configuration's namespaces

because an appropriate namespace is not in the list. To rectify this, we modify the program to:

f = file('simple.cfg')
cfg = Config(f)
# Add lines to import a namespace and add it to the list of namespaces used
import logging, logging.handlers
cfg.add_namespace(logging)
print >> cfg.stream, cfg.message, cfg.value

with a more satisfactory result:

Hello, world! 9020

4   Dealing with repeating values and mappings

The config module allows you to specify repeating values using syntax which is very similar to Python's list syntax. You can also use syntax which is almost identical to Python's dict syntax to specify mappings in the configuration. If the application is required to print a sequence of messages to corresponding streams, you could use a configuration file like this:

messages:
[
  { stream : `sys.stderr`, message: 'Welcome' },
  { stream : `sys.stdout`, message: 'Welkom' },
  { stream : `sys.stderr`, message: 'Bienvenue' },
]

and the program would look like this:

from config import Config

f = file('simple.cfg')
cfg = Config(f)
for m in cfg.messages:
    print >> m.stream, m.message

Running the above would give what you would expect intuitively:

Welcome
Welkom
Bienvenue

The preamble to the above example mentioned that the list and dict syntax of the config module is very similar or almost identical to Python's. The main differences are:

The module is fairly liberal about whitespace and is not indentation-sensitive:

messages:
[
  {
    stream : `sys.stderr`
    message: Welcome
    name: 'Harry'
  }
  {
    stream : `sys.stdout`
    message: Welkom
    name: 'Ruud'
  }
  {
    stream  : `sys.stderr`
    message : Bienvenue
    name    : Yves
  }
]

However, there is one area where whitespace can be significant; see below.

5   Handling cross-references

Sometimes there is a need to cross-reference one part of the configuration from another. Suppose in the above configuration, the third message (the one in French) needs to use the same stream as the English message, whatever stream that might be. This can be expressed as follows:

messages:
[
  {
    stream : `sys.stderr`
    message: 'Welcome'
    name: 'Harry'
  }
  {
    stream : `sys.stdout`
    message: 'Welkom'
    name: 'Ruud'
  }
  {
    stream : $messages[0].stream
    message: 'Bienvenue'
    name: Yves
  }
]

The $ syntax is used because the intent is similar to substitution: $messages[0].stream is replaced with the value to which it refers.

The above configuration works with the program:

from config import Config

f = file('simple.cfg')
cfg = Config(f)
for m in cfg.messages:
    s = '%s, %s' % (m.message, m.name)
    try:
        print >> m.stream, s
    except IOError, e:
        print e

to give:

Welcome, Harry
Welkom, Ruud
Bienvenue, Yves

However, if you change the file to:

messages:
[
  {
    stream : `sys.stdin`
    message: 'Welcome'
    name: 'Harry'
  }
  {
    stream : `sys.stdout`
    message: 'Welkom'
    name: 'Ruud'
  }
  {
    stream : $messages[0].stream
    message: 'Bienvenue'
    name: Yves
  }
]

and run the program again (note the change to sys.stdin, which is bound to cause an error if we try to write to it), you get two errors:

(0, 'Error')
Welkom, Ruud
(0, 'Error')

This is because the stream for the third message is effectively the same as that for the first message.

Note that in the above expression $messages[0].stream, whitespace is significant before the [. This is so that we can distinguish between [ $a[1] ] (a sequence whose single element is the second element of the sequence referenced as a) and [ $a [1] ] (a two-element sequence whose first element is the value referenced by a and whose second element is the sequence with the single element which is integer 1.

6   Using expressions

Although calculations are not normally the preserve of configuration modules, there are times when it is useful to express configuration values in terms of others. For example, an overall time period may be specified and other configuration values are fractions thereof. It may also be desirable to perform other simple calculations declaratively, e.g. concatenation of numerous file names to a base directory to get a final pathname. To support this, the config module allows expressions involving +, -, *, / and % to be used in a configuration. The + operator can be used for string concatenation. For example, the file:

total_period : 100
header_time: 0.3 * $total_period
steady_time: 0.5 * $total_period
trailer_time: 0.2 * $total_period
base_prefix: '/my/app/'
log_file: $base_prefix + 'test.log'

used with the program:

from config import Config

f = file('simple.cfg')
cfg = Config(f)

print "Header time: %d" % cfg.header_time
print "Steady time: %d" % cfg.steady_time
print "Trailer time: %d" % cfg.trailer_time
print "Log file name: %s" % cfg.log_file

leads to the result:

Header time: 30
Steady time: 50
Trailer time: 20
Log file name: /my/app/test.log

7   Including configurations within others

You can include a configuration within another configuration at any point where you would specify a value. The included configuration is treated as if it were a dictionary at the inclusion point. Hence, given the configuration file:

# application configuration
app:
{
  name : MyApplication
  base: '/path/to/app/logs/'
  # support team email address
  support_team: myappsupport
  mail_domain: '@my-company.com'
}
# logging for the app
logging: @"logging.cfg"
test: $logging.handler.email.from

The logging key in this configuration includes another file called logging.cfg, which looks like this:

# root logger configuration
root:
{
  level     : DEBUG
  handlers  : [$handlers.console, $handlers.file, $handlers.email]
}
# logging handlers
handlers:
{
  console:  [
              # the class to instantiate
              StreamHandler,
              # how to configure the instance
              {
                # the logger level
                level : WARNING
                # the stream to use
                stream  : `sys.stderr` }
            ]
  file:     [ FileHandler, { filename: $app.base + $app.name + '.log', mode : 'a' } ]
  socket:   [ `handlers.SocketHandler`, {
                  host: localhost,
                  # use this port for now
                  port: `handlers.DEFAULT_TCP_LOGGING_PORT`} ]
  nt_eventlog: [`handlers.NTEventLogHandler`, { appname: $app.name, logtype : Application } ]
  email:    [ `handlers.SMTPHandler`,
              { level: CRITICAL,
                host: localhost,
                port: 25,
                from: $app.name + $app.mail_domain,
                to: [$app.support_team + $app.mail_domain, 'QA' + $app.mail_domain, 'product_manager' + $app.mail_domain],
                subject: 'Take cover' } ]
}
# the loggers which are configured
loggers:
{
  "input"     : { handlers: [$handlers.socket] }
  "input.xls" : { handlers: [$handlers.nt_eventlog] }
}

Given the above, the program:

from config import Config

cfg = Config(file('app.cfg'))
file = open('test.txt', 'w')
cfg.save(file)
file.close()
file = open('testlog.txt', 'w')
cfg.logging.save(file)
file.close()
file = open('root.txt', 'w')
cfg.logging.root.save(file)
file.close()
import logging, logging.handlers
cfg.add_namespace(logging)
print cfg.logging.loggers['input.xls'].handlers[0][0]
print cfg.logging.handlers.console[1].stream
print cfg['logging']['handlers']['console'][1]['stream']
print cfg.logging.handlers.email[1]['from']
x = cfg.logging.handlers.email[1].to
print x
for a in x:
    print a
print x[0:2]
print cfg.logging.handlers.file[1].filename

Prints the following:

logging.handlers.NTEventLogHandler
<open file '<stderr>', mode 'w' at 0x0088E0A0>
<open file '<stderr>', mode 'w' at 0x0088E0A0>
MyApplication@my-company.com
['myappsupport@my-company.com', 'QA@my-company.com', 'product_manager@my-company.com']
myappsupport@my-company.com
QA@my-company.com
product_manager@my-company.com
['myappsupport@my-company.com', 'QA@my-company.com']
/path/to/app/logs/MyApplication.log

You will see from the code of the above program that there are a number of ways of accessing portions of the configuration, and you will also see that parts of the configuration have been written out. Here is test.txt, which was used when writing out the whole configuration:

# application configuration
app :
{
  name : 'MyApplication'
  base : '/path/to/app/logs/'
  # support team email address
  support_team : 'myappsupport'
  mail_domain : '@my-company.com'
}
# logging for the app
logging :
{
  # root logger configuration
  root :
  {
    level : 'DEBUG'
    handlers :
    [
      $handlers.console
      $handlers.file
      $handlers.email
    ]
  }
  # logging handlers
  handlers :
  {
    console :
    [
      # the class to instantiate
      StreamHandler
      # how to configure the instance
      {
        # the logger level
        level : 'WARNING'
        # the stream to use
        stream : `sys.stderr`
      }
    ]
    file :
    [
      FileHandler
      {
        filename : $app.base + $app.name + '.log'
        mode : 'a'
      }
    ]
    socket :
    [
      `handlers.SocketHandler`
      {
        host : 'localhost'
        # use this port for now
        port : `handlers.DEFAULT_TCP_LOGGING_PORT`
      }
    ]
    nt_eventlog :
    [
      `handlers.NTEventLogHandler`
      {
        appname : $app.name
        logtype : 'Application'
      }
    ]
    email :
    [
      `handlers.SMTPHandler`
      {
        level : 'CRITICAL'
        host : 'localhost'
        port : 25
        from : $app.name + $app.mail_domain
        to :
        [
          $app.support_team + $app.mail_domain
          'QA' + $app.mail_domain
          'product_manager' + $app.mail_domain
        ]
        subject : 'Take cover'
      }
    ]
  }
  # the loggers which are configured
  loggers :
  {
    input :
    {
      handlers :
      [
        $handlers.socket
      ]
    }
    'input.xls' :
    {
      handlers :
      [
        $handlers.nt_eventlog
      ]
    }
  }
}
test : $logging.handler.email.from

You will see that the entire configuration (including the included file) has been written out, and that ordering and comments have been preserved. If we examine testlog.txt, into which the logging part of the configuration was written, we see:

# root logger configuration
root :
{
  level : 'DEBUG'
  handlers :
  [
    $handlers.console
    $handlers.file
    $handlers.email
  ]
}
# logging handlers
handlers :
{
  console :
  [
    # the class to instantiate
    StreamHandler
    # how to configure the instance
    {
      # the logger level
      level : 'WARNING'
      # the stream to use
      stream : `sys.stderr`
    }
  ]
  file :
  [
    FileHandler
    {
      filename : $app.base + $app.name + '.log'
      mode : 'a'
    }
  ]
  socket :
  [
    `handlers.SocketHandler`
    {
      host : 'localhost'
      # use this port for now
      port : `handlers.DEFAULT_TCP_LOGGING_PORT`
    }
  ]
  nt_eventlog :
  [
    `handlers.NTEventLogHandler`
    {
      appname : $app.name
      logtype : 'Application'
    }
  ]
  email :
  [
    `handlers.SMTPHandler`
    {
      level : 'CRITICAL'
      host : 'localhost'
      port : 25
      from : $app.name + $app.mail_domain
      to :
      [
        $app.support_team + $app.mail_domain
        'QA' + $app.mail_domain
        'product_manager' + $app.mail_domain
      ]
      subject : 'Take cover'
    }
  ]
}
# the loggers which are configured
loggers :
{
  input :
  {
    handlers :
    [
      $handlers.socket
    ]
  }
  'input.xls' :
  {
    handlers :
    [
      $handlers.nt_eventlog
    ]
  }
}

which is just the logging configuration. If we examine root.txt, we see the portion relating to the root logger:

level : 'DEBUG'
handlers :
[
  $handlers.console
  $handlers.file
  $handlers.email
]

8   Changing a configuration

There's not much point in being able to save a configuration programatically if you can't make changes to it programatically. This can be done using standard attribute syntax. For example, given the file:

messages :
[
  {
    stream : `sys.stdin`
    message : 'Welcome'
    name : 'Harry'
  }
  {
    stream : `sys.stdout`
    message : 'Welkom'
    name : 'Ruud'
  }
  {
    stream : $messages[0].stream
    message : 'Bienvenue'
    name : 'Yves'
  }
]

the following program could be used to modify the configuration and save the changes:

from config import Config

f = file('simple.cfg')
cfg = Config(f)

cfg.written = 1234
cfg.messages[2].surname = 'Montand'

f = file('test.txt', 'w')
cfg.save(f)
print 'written'  in cfg
print 'writen' in cfg
print 'surname' in cfg.messages[2]
print 'xyzzy' in cfg.messages[2]

With the printed output:

True
False
True
False

and the output file:

messages :
[
  {
    stream : `sys.stdin`
    message : 'Welcome'
    name : 'Harry'
  }
  {
    stream : `sys.stdout`
    message : 'Welkom'
    name : 'Ruud'
  }
  {
    stream : $messages[0].stream
    message : 'Bienvenue'
    name : 'Yves'
    surname : 'Montand'
  }
]
written : 1234

9   Cascading configurations

There may be times when you want to cascade configurations - e.g. at the suite, program and user level. When a value is required, you could check the user configuration first, then the configuration at program level, and finally at program suite level. To do this, you can use the handy ConfigList class, as in the following example:

from config import Config, ConfigList

cfglist = ConfigList()
cfglist.append(Config(file('/path/to/user.cfg')))
cfglist.append(Config(file('/path/to/program.cfg')))
cfglist.append(Config(file('/path/to/suite.cfg')))

To access a configuration value (e.g. verbosity), you can say:

cfglist.getByPath('verbosity')

and the value from the first configuration which defines verbosity will be returned.

This technique can also be used where you want to override configuration values with command-line values. See the section below entitled "Integrating with command-line options" for how the config module can be used with the standard library's optparse module.

10   Merging configurations

There are two ways in which configurations can be merged:

To see how to use ConfigMerger, suppose you have two files, merge1.cfg and merge2.cfg, shown below:

value1: True
value3: [1, 2, 3]
value5: [7]
value6: { 'a' : 1, 'c' : 3 }

and:

value2: False
value4: [4, 5, 6]
value5: ['abc']
value6: { 'b' : 2, 'd' : 4 }

The following program:

from config import Config, ConfigMerger

f = file('merge1.cfg')
cfg1 = Config(f)
f = file('merge2.cfg')
cfg2 = Config(f)
merger = ConfigMerger()
merger.merge(cfg1, cfg2)
f = file('test.txt', 'w')
cfg1.save(f)

results in the following file being saved:

value1 : True
value3 :
[
  1
  2
  3
]
value5 :
[
  7
  abc
]
value6 :
{
  a : 1
  c : 3
  b : 2
  d : 4
}
value2 : False
value4 :
[
  4
  5
  6
]

As you can see, the keys have been merged, and the sequence elements have been appended.

Starting with V0.3.6, ConfigMerger takes in its constructor an optional resolver argument (a default resolver is provided which allows the behaviour to be the same as in earlier versions). The resolver can be any callable which is called with three arguments and returns a string. The arguments are map1, map2 and key, where map1 is the target mapping for the merge, map2 is the merge operand and key is the clashing key. If a clash occurs (key is in both map1 and map2), the resolver is called to try to resolve the conflict. It can return one of several values:

Care should be taken to return a value compatible with the objects being merged. For example, it doesn't make sense to return "merge" when dealing with two sequences, or "append" when dealing with two mappings.

11   Integrating with command-line options

It's fairly easy to integrate command line options with configurations read from files. We use the standard library's excellent optparse module to parse the command line for options, and make those options available to the application through the config API. Here's an example configuration file (cmdline.cfg):

cmdline_values:
{
  verbose : `cmdline.verbose`
  file: `cmdline.filename`
}
other_config_items:
{
  whatever : 'you want'
}

The program which demonstrates optparse integration is below:

from optparse import OptionParser
from config import Config

parser = OptionParser()
parser.add_option("-f", "--file",
                action="store", type="string", dest="filename",
                help="write report to FILE", metavar="FILE")
parser.add_option("-q", "--quiet",
                action="store_false", dest="verbose", default=1,
                help="don't print status messages to stdout")

(options, args) = parser.parse_args()

cfg = Config(file('cmdline.cfg'))
cfg.addNamespace(options, 'cmdline')
print "The verbose option value is %r" % cfg.cmdline_values.verbose
print "The file name is %r" %  cfg.cmdline_values.file

Once we've parsed the command-line options using optparse and loaded the configuration, we add the parsed-options object as a namespace with name cmdline. When we then fetch cfg.cmdline_values.verbose, for example, this causes evaluation of cmdline.verbose against the configuration's namespaces, and fetches the appropriate value from the parsed-option object. The program, when run with arguments -q -f test, will print:

The verbose option value is False
The file name is 'test'

12   Uniform component configuration

You can use the config module to initialize a component hierarchy in a uniform manner. Typically, in a component,you initialize various attributes, some of which are other components. Suppose you have a component of class NetworkHandler which contains a particular subcomponent which is either of class HTTPHandler or of class FTPHandler. You could have a hierarchical configuration as follows:

netHandler:
{
  host: 'alpha'
  port: 8080
  protocol:
  {
    class: `HTTPHandler`
    config:
    {
      secure: True
      version: '1.1'
      keepAlive: True
    }
  }
}

You could define the initialization of these classes as:

class HTTPHandler:
  def __init__(self, config):
    self.secure = config.get('secure', False)
    self.version = config.get('version', '1.0')
    self.keepAlive = config.get('keepAlive', False)

class NetworkHandler:
  def __init__(self, config):
    self.host = config.get('host', 'localhost')
    self.port = config.get('port', 80)
    protocolClass = config.protocol.get('class')
    if protocolClass is None:
      raise ValueError('NetworkHandler: protocol class not specified')
    protocolConfig = config.protocol.get('config', {})
    protocolHandler = protocolClass(protocolConfig)

and then a NetworkHandler could be initialized as follows:

from config import Config

def makeNetworkHandler():
  cfg = Config('network.cfg')
  return NetworkHandler(cfg.netHandler)

In this scheme, each class has a constructor which takes a single argument - a configuration mapping. Subcomponents can be passed the appropriate mapping without the constructing class needing to know its schema. In the above example, NetworkHandler neither knows nor cares about the exact contents of the mapping with path netHandler.protocol.config. The creator of the configuration file needs only ensure that the mapping makes sense to the class being constructed - HTTPHandler in this case. If it was desired to use FTP instead, the netHandler.protocol mapping would perhaps look like this:

protocol:
{
  class: `FTPHandler`
  config:
  {
    maxSize: 1048576
  }
}

which could be used with a class initialized like this:

class FTPHandler:
  def __init__(self, config):
    self.maxSize = config.get('maxSize', 32768)

You can see a more complete example of this in the files logconfig.cfg and logconfig.py, which configure logging using a scheme very like that described above.

13   Unicode support

Unicode support for reading files is provided through the ConfigInputStream class. This is used automatically by defaultStreamOpener. ConfigInputStream automatically recognizes BOMs (byte order marks) for UTF-8, UTF-16LE and UTF-16BE. If a BOM is present, it is used to determine how the stream is to be decoded. If there is no BOM recognized, the stream is treated as a non-Unicode stream, assumed to be in the correct encoding, and read without decoding. Example of use:

from config import ConfigInputStream

for filename in ['ANSI.txt', 'Unicode8.txt', 'UnicodeLE.txt', 'UnicodeBE.txt']:
    pathname = '/temp/' + filename
    stream = file(pathname, 'rb')
    print "- raw contents of %s:" % pathname
    print repr(stream.read(6))
    print repr(stream.readline())
    stream.close()
    stream = ConfigInputStream(file(pathname, 'rb'))
    print "- decoded contents of %s, encoding = %s:" % (pathname, stream.encoding)
    print repr(stream.read(6))
    print repr(stream.readline())
    stream.close()

which produces:

- raw contents of /temp/ANSI.txt:
'Test\r\n'
'Line\r\n'
- decoded contents of /temp/ANSI.txt, encoding = None:
'Test\r\n'
'Line\r\n'
- raw contents of /temp/Unicode8.txt:
'\xef\xbb\xbfTes'
't\r\n'
- decoded contents of /temp/Unicode8.txt, encoding = utf-8:
u'Test\r\n'
u'Line\r\n'
- raw contents of /temp/UnicodeLE.txt:
'\xff\xfeT\x00e\x00'
's\x00t\x00\r\x00\n'
- decoded contents of /temp/UnicodeLE.txt, encoding = utf-16le:
u'Test\r\n'
u'Line\r\n'
- raw contents of /temp/UnicodeBE.txt:
'\xfe\xff\x00T\x00e'
'\x00s\x00t\x00\r\x00\n'
- decoded contents of /temp/UnicodeBE.txt, encoding = utf-16be:
u'Test\r\n'
u'Line\r\n'

Unicode support for writing files is provided through the ConfigOutputStream class. Here is an example on how to use it:

from config import Config, ConfigOutputStream

cfg = Config('app.cfg')
file = ConfigOutputStream(open('root.txt', 'wb'), 'utf-16be')
cfg.save(file)

If the encoding is one of utf-8, utf-16le or utf16-be, the appropriate BOM is written to the output. Note that the underlying stream should be opened in binary mode; newlines are automatically written as '\r\n' (Windows), '\r' (Mac) or '\n' (other).

14   Download

Here is the current version, 0.3.7, in tarball, zip and Windows formats.

15   Further work

The config module is in a very early state, though it is already quite usable. The syntax is broadly fixed, though adjustments may be made to e.g. backticks, $ notation for references, and the @ notation for file inclusion, depending on feedback. The use of builtin functions - e.g. include("x"), get(an.attribute.chain), evaluate(sys.stderr) - has been considered, and not yet completely ruled out. Unicode support could be improved. No doubt there are bugs in the implementation, awaiting the completion of a more comprehensive test suite. Some minor changes in the API can be expected. All feedback will be gratefully received; please send it to vinay_sajip at red-dove.com or post it on the Python Wiki on the HierConfig page: http://www.python.org/moin/HierConfig.