Repex – Regex Management Simplified

>This post was originally published on DevOps.com.
Our Versioning Complexity
Cloudify is a Python shop.
Our REST service is Python.
Our Workflow Engine is Python.
Our Plugins are Python.
We have different version formats and different dependencies across different types of files which need to be changed when a version is updated.
This is complex, to say the least. We didn’t want to manage this manually having more than 60 repos to take care of when handling version updates.
Cloudify – open source tools & plugins, by the community, for the community. Try it. Go
Why not sed?
- Well.. sed is a bitch, isn’t it? It’s unmaintainable and unmanageable. While this is solely my opinion, handling hundreds of files, each change in a different context without any safety net, makes sed something you should walk away from.
- sed isn’t configurable.
Why not Jinja?
Using Jinja templates, while safe, was out the windows pretty much instantly as we needed the repos to be usable out of the box or we’d be forcing users to change templates so that they can use our product.
The Solution
We turned to Regex. Yes, there is the regex problem. You’re probably familiar with the famous quote:
–Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.
While this can be true, it also depends on how you use regex.
I believe we’ve done this well.
Repex
We developed repex.
repex allows you to specify the structure of changes you need to perform in YAML format:
variables:
You define a list of paths you’d like to iterate over, provide it with what you’d like to search for, what you’d like to replace, what you’d like to replace with, and some more parameters and off you go.
For example, the first object in the above configuration will:
- Look for a file in the path repex/tests/resources/mock_VERSION (regex)
- In the file, look for `”version”: “3.1.0-m2″` (regex)
- Attempt to replace 3.1.0-m2 with 3.1.0-m3 only if the string was found during the matching stage and if the strings `date` , `commit` and `version` were found in the file.
- Write to the output file repex/tests/resources/mock_VERSION.test
The second object will:
- Look for `setup.py`(regex) files under `cloudify-.*` (regex) and not under `my/excluded/path`.
- Look for
`version=('|")(d+)(.d+){1,2}(dev|(w+d+)?)('|")`
(regex) in the files found.
- Replace
`('|")(d+)(.d+){1,2}(dev|(w+d+)?)('|")`
(regex) with a variable which is not harcoded and therefore must be provided using the API.
Layers of protection
repex attempts to provide layers of protection and comfort:
Safe!
- You can first `match` whatever you want to match so that you’re sure you’re only replacing something in the exact context you wish to address. If `validate_before` is true, replacement will only occur if one or more matches were found.
- You can verify that only files containing very specific strings will be addressed by listing them under `must_include`.
- You can exclude files or directories you don’t want to search in by listing them under `excluded`.
Don’t take this the wrong way though – regex is regex. You make a mistake, you pay for it. All repex grants you is the ability to have everything organized in one place, with some layers of protection to keep you from making some trivial mistakes.
Comfortable
You can declare every single directory or file you’d like to address in one manageable YAML.
Of course, we needed to create some kind of naming convention so that we don’t have to configure each and every file separately.
Accepts Variables, both hardcoded and via an API.
One of repex’s strongest features is that it supports using variables for string replacement. The above example contains a ``
variable.
The API allows you to provide a dict of variables which will be placed in placeholders. You can either hardcode a variable in the `variables` section or use the API to send a dict of variables to be used. If a variable is hardcoded in the YAML and you use the API to send a variable with the same name, it will overwrite the hardcoded one. In addition, validation will occur upon variable replacement to make sure it was replaced.
Functional API
The API provides three basic functions. I’d rather not rewrite repex’s documentation.. so just refer to this.
Basically, the three functions allow you to decide which granularity you’d like to handle the changes. It will allow you to perform actions in different stages of the replacement process.
For example, we’re don’t really use the highest level of the API which just iterates over all objects in the YAML. We wrote a Cloudify specific wrapper which can be used as a reference implementation. It enables validation after every replacement. So, for instance, after every blueprint specific replacement, we run a validation that the blueprint is valid vs. our DSL-parser.
Repex is available on PyPI. You can install it by running:
pip install repex
We might, at some point, provide a CLI for repex. Also a logging feature might be implemented to allow for JSON formatted log messages to be generated so that they can be transported for analysis.
After using repex throughout our entire build process for the last five months or so, it has proven to be extremely useful. We can now follow our entire version update process and easily identify problems by looking at one YAML file and one execution log file. We’re planning on using repex to replace additional types of data (not only versions).
We would appreciate feedback and, even better, pull requests.
Hope you find this useful.