One Site Subversion
So, we have a more or less working blog application using our friends Merb and DataMapper. That’s great and if you were looking for a Merb/DataMapper tutorial, hopefully the first entry helped. Still, the central goal is to store the articles in a source-control repository. So, let’s get going on that.
For now, I’m going to ignore the multi-site requirement, for two reasons: I want to first focus on just interacting with Subversion, without extra distractions; and I happen to know that I want to write the library that will be covered in this section for other uses.
Before diving into the code, I want to examine three “big” design questions.
-
How much abstraction? I could create a library that abstracts so that it presents a unified API for multiple SCMs. But I won’t. Again, it complicates things. Also, the “Subversion” stuff will not be accessed from many points within the app, so I feel fairly safe with the possibility of future “API changes” if I decide to abstract it later.
-
What SCM? This is an easy call for me. I’m accustomed to Subversion, including having some experience using its SWIG bindings. I’ve played a little with git, but at some point I have to cut off the “learning new things on this one app”.
-
How to interact with Subversion. Here’s some possibilities:
-
Command line/backticks: I’m not entirely opposed to this, but since there are better options, no reason to look here.
-
RSCM: This may no longer be true, but from my memory of RSCM, it more or less uses the command line functions. It offers the benefit of being abstracted, but like the command line, it means working with a working copy. Sure, I could have one checked out in a tmp directory, but I don’t care for that idea.
-
post-commit-hooks: This could be pretty useful, and I could see extending Wistle to accept, say, XML or YAML sent by such a hook (it probably would be fairly simple). One downside is that it requires permissions to modify the hook. I don’t actually anticipate this would be a common problem. The second downside is then I don’t get to play :-( Oh, and the third is getting pre-existing data.
-
CSCM: Theoretically, along the lines of RSCM; so far, it’s only for Subversion, but it uses the SWIG bindings. Downsides are that it’s not under active development (with occassional exceptions on my personal copy, I suppose), and that it’s geared towards a different purpose.
-
Using Subversion SWIG bindings directly: Yay! This allows a bit more control and focus than using one of the libraries, and since we don’t need a lot, I don’t think this is reinventing the wheel. Or, maybe, I’m just using an earlier wheel, down one level of abstraction. The big downside to this is that installing the SWIG binding can be a massive pain unless you have a distro that has a nice package; it may well be impossible on Windows…
-
I’ll throw in one more, which is using the svn or Subversion DeltaV–or whatever the correct name is–protocols directly. Neither protocol is particularly frightening, but that path would still be a lot of extra work for probably minimal gain. It also has the downside that you have be running either svnserve or an http server.
Questions answered, I’ll add a few more design thoughts before diving into the code.
One approach, one which I’ve already tried, is to skip the relational database altogether. This is certainly possible, and with caching of the generated pages would be fast enough for my purposes. However, custom text searches are a problem, requiring loading all the current data, then performing the search in Ruby.
Since this is not scalable, my solution is for the application to actually retrieve data from a relational database which mirrors the current state of repository. Therefore, the main functionality I need to add is to update the database after any commit to the repository. Initially, I had the update procedure run at every request but for most requests, this only checked the current revision number. Other options would be a cron job, an “update” button on the site, etc. My current solution is an action, “/articles/sync_all”, and a post-commit hook that wget’s that page.
To derive this updating functionality, I want to include a Module in the appropriate models. I’ll call it Wistle::Svn because I can’t think of a more useful name. I’ll save the file as “lib/wistle/svn.rb”.
The first thing (yes, finally) I want this module to is add some properties to any model which includes it. So, let’s start with:
module Wistle
module Svn
class << self
def included(klass) # Set a few 'magic' properties
klass.property :path, String
klass.property :svn_created_at, DateTime
klass.property :svn_updated_at, DateTime
klass.property :svn_created_rev, String
klass.property :svn_updated_rev, String
klass.property :svn_created_by, String
klass.property :svn_updated_by, String
end
end
end
end
Path will store the relative path in the repository. It will also serve as a permalink later on (Note, path, and the *_by’s were added in later commits than the others. I just went and missed them).
The others are your basic created/updated timestamps except they will be kept in sync with the Subversion repo. This allows for having an #updated_at in the database without interfering with the auto timestamp functionality, etc. Also, we’ll keep track of the revisions. #svn_created_rev is for information only; #svn_updated_rev will be important to the sync method. So, every model that includes Wistle::Svn gets these properties, stored in the relational database. Of course, I’m now assuming that this will only be included in a class that include DataMapper::Resource.
Next up, I want to be able to specify, in the model, which is the “body” property; that is, what property in the relational db should store the contents of the file in Subversion. So, I need to accept an option to the property class method. But before I get there, this introduces a problem. How should I store this configuration data?
If you check out ActiveRecord::Base, for example, you’ll see a lot of lines like this:
cattr_accessor :table_name_prefix, :instance_writer => false
@@table_name_prefix = ""
I’m no expert on Rails internals, but I’ve spent a decent amount of time going through ActiveRecord in particular and this seems to be the preferred Rails’ method for doing class-wide configuration. cattr_accessor is a Rails addition to Ruby (Merb has it as well). Having spent time in ActiveRecord, my first inclination was to use this. And as a methodology, it works pretty well when your inheriting your functionality. Class variables in an included module doesn’t work (at least not in any way I understand).
Instead, I decided to just use a configuration class. It’s simpler and cleaner, in my opinion, and doesn’t have the inclusion problem mentioned above (I’ll get to how that works in a bit). So, let’s start defining that class:
module Wistle
class Config
attr_accessor :body_property
def initialize
# Set defaults
@body_property = 'body'
end
end
end
All it does, for now, is define an instance variable, @body_property (the name of the property in the database that stores the contents of the file) and use :attr_accessor to create the getter and setter methods.
But our model needs access to the Config data. Again, I could try to make it a class variable, but there’s still the problem with class variables in modules. Fortunately, in Ruby, everything is an Object. So, a class can have instance variables.
module Wistle::Svn
module ClassMethods
def config
@config ||= Config.new
end
end
end
Easy enough? I also need to extend the model class with the methods in ClassMethods when the module is included. This is a popular Rails trick. To the Wistle::Svn.included method, add the line klass.extend(ClassMethods). Now, if Article includes Wistle::Svn, we can access the config via #config (in the class), and self.class.config (from instances). And, I can always add custom methods for configuration options that are more likely to be accessed. Now, then, I can update DataMapper’s property class method to accept an option saying that a particular property stores the file’s contents.
module Wistle::Svn
module ClassMethods
def property(name, type, options = {})
if options.delete(:body_property)
config.body_property = name.to_s
end
super(name, type, options)
end
end
end
Using this would be something like:
class Article
property :contents, :body_property => true
end
I’ll look at what Wistle::Svn does with this information when I discuss syncing the databases. Hopefully, I will get to that point eventually.
As an aside, since I don’t anticipate any instance methods in the Wistle::Svn module, I could drop the ClassMethods module and use extend instead of include in my model. But I’ve chosen the include for consistency with DataMapper.
The wistle_models table
Before I can get to syncing, the database will need to know the version of its “working copy”, as it were. Except, I suppose, for the first update. I reckon I need another table in the database that keeps track of the current revision for each Wistle::Svn model. So, ‘lib/wistle/model.rb’:
module Wistle
class Model # Table is named wistle_models.
include DataMapper::Resource
property :id, Integer, :serial => true
property :name, String
property :revision, Integer
end
end
And this file needs to be required in ‘lib/wistle.rb’. Just for fun, let’s run rake dm:db:autoupgrade. Alas, no luck, the new model doesn’t migrate. There’s a good reason why, none of the Wistle module is required when running Merb (As an aside, it just seems more reasonable to me to include Wistle::Model in the Wistle lib instead of directly in the models directory). Add another depencency in init.db, but there’s a gotcha here. This dependency should not be declared until after use_orm :datamapper, because it depends on DataMapper being loaded.
use_orm :datamapper
dependency 'lib/wistle.rb'
Awesome. I guess. You can run that migration now and it should work. And now let’s get our Subversion-y models talking to this model.
module Wistle::Svn
module ClassMethods
def svn_repository
return @svn_repository if @svn_repository
@svn_repository = Wistle::Model.first(:name => self.name)
@svn_repository ||= Wistle::Model.create(:name => self.name, :revision => 0)
@svn_repository.config = config
@svn_repository
end
end
end
Again, I use the Class instance variable trick. I only want to set up @svn_repository when I have to, so if it’s already available, I just return it. Next, I try to get a row in wistle_models that is set up for the current. If no luck there, I create such a row. Finally, I give this Model instance direct access to the Subversion-ized Models @config. Which means one more update to Wistle::Model: attr_accessor :config.
Before hitting the update code, I want to flesh out the Wistle::Config class. The other three configuration elements I want are
- uri
-
The uri of the folder in the Subversion repository where the model's
contents are stored (file:///path/to/repo/path/to/folder,
svn://example.com/path/to/folder, etc.)
- username
-
The Subversion username to use, if needed.
- password
-
The Subversion password to use, if needed.
- property_prefix
-
This addresses a question I didn't ask above. How to deal with properties
other than the contents. I could, for example, start each file with a bit
of yaml or xml or what have. I'm going to store the other properties using
Subversion's property mechanism. However, I want to minimize the chance of
name conflicts, so I provide a setting for a prefix. As a default, I'll use
"ws:" (for Wistle::Svn, I guess).
- extension
-
The extension of files that will be included in the update. This is
certainly not necessary, but it works for me.
class Wistle::Config
OPTS = [:uri, :username, :password,
:body_property, :property_prefix, :extension]
attr_accessor *OPTS
def initialize
# Set defaults
@body_property = 'body'
@property_prefix = 'ws:'
@extension = 'txt'
end
end
The OPTS constant is because I’ll re-use this list momentarily. I also want to be able to set some of these settings in database.yml, if it’s available. At the end of the initialize method, I add:
if Object.const_defined?("Merb")
f = "#{Merb.root}/config/database.yml"
env = Merb.env.to_sym || :development
end
if f
config = YAML.load(IO.read(f))[env]
OPTS.each do |field|
config_field = config["svn_#{field}"] || config["svn_#{field}".to_sym]
if config_field
instance_variable_set("@#{field}", config_field)
end
end
end
Now, in database.yml, I can add :svn\_username: my\_login. That is, I can prefix any of the fields defined above with ‘svn_’. I’m not sure that sentence made sense.
Revision 42
Updating
Hey, it’s time for the central code, sync the database from the repository. If you’re particularly interesting in using Subversion’s SWIG bindings, one of the more interesting parts of this project might be the Wistle::Fixture library, which I use to generate Subversion repository “test fixtures”, but which I won’t cover here. Incidentally, if you are so inclined, the test cases included in Subversion’s repository. The actual code isn’t commented, but it’s “fairly” readable.
I’m putting the syncing code in its own class, because, well, that’s what my brain says I should do. The only initialization argument it requires is a the appropriate row in Wistle::Model. It only provides one other public method, #run, which runs the updating, going through the following steps
- Connect to the repository. See #connect, #context, and #callbacks private methods. Most of what’s going on here is dealing with different authentication options. Honestly, I don’t have a solid understanding of this bit.
- Check if we have updated to the last revision already. If so, quit.
- Run the repository’s #log method. This gets information about each commit, starting with the most recent; I’ve specified to get revisions only through the last update (stored in Wistle::Model#revision). Store this information in the variable changesets.
- Reverse changesets and run #do_changeset on each element.
SvnSync#do_changeset actually updates the database. For each change in the changeset:
- It determines whether the change was one I’m interested in, and if so, what kind of change. There are three types of interest: moves, modifications/adds, and deletes.
- Moves are the most problematic, mostly because Subversion doesn’t really have a “move” concept. Instead were looking for a node that was copied for another node in the same changeset that the latter node was deleted. In this case, as opposed to “just a copy”, I don’t want to create a new entry in the database, but rather modify the path of the existing entry. Why? To not invalidate foreign keys, i.e. to keep comments listed with the article after it’s renamed.
- Next, do any deletes. It’s possible we won’t find the node to delete, either because it was actually a move, or because it refers to a file we don’t keep track of. In that case, just continue on with the next delete.
- Modify/Add/Replace: In all these cases, what I want is to update the content of the appropriate row, creating a new row if needed. The private method #get is responsible for finding the appropriate row, based on the path. This updates contents and other properties, both those specified by the revision and the actual node properties.
- When all changes have been processed, update the Wistle::Model row with the new current revision.
If you aren’t familiar with the SWIG bindings, the code will probably be a bit confusing, but hopefully the outline above will help clarify what’s going on. More to the point, I hope it illustrates that ORM’s are not the only available storage mechanisms for web apps.
So, the code (yikes):
module Wistle
class SvnSync
def initialize(model_row)
@model_row = model_row
@model = Object.const_get(@model_row.name)
@config = @model_row.config
end
# There is the possibility for uneccessary updates, as a database row may be
# modified several times (if modified in multiple revisions) in a single
# call. This is inefficient, but--for now--not enough to justify more
# complex code.
def run
connect unless @repos
return false if @repos.latest_revnum <= @model_row.revision
changesets = [] # TODO Maybe revision + 1
@repos.log(@path_from_root, @repos.latest_revnum, @model_row.revision, 0, true, false
) do |changes, rev, author, date, msg|
changesets << [changes, rev, author, date]
end
changesets.sort{ |a, b| a[1] <=> b[1] }.each do |c| # Sort by revision
do_changset(*c)
end
return true
end
private
# Get the relative path from config.uri
def short_path(path)
path = path[@path_from_root.length..-1]
path = path[1..-1] if path[0] == ?/
path.sub!(/\.#{@config.extension}\Z/, '') if @config.extension
path
end
# Get an object of the @model, by path.
def get(path)
@model.first(:path => short_path(path))
end
# Create a new object of the @model
def new_record
@model.new
end
# Process a single changset.
# This doesn't account for possible move/replace conflicts (A node is moved,
# then the old node is replaced by a new one). I assume those are rare
# enough that I won't code around them, for now.
def do_changset(changes, rev, author, date)
modified, deleted, copied = [], [], []
changes.each_pair do |path, change|
next if short_path(path).blank?
case change.action
when "M", "A", "R" # Modified, Added or Replaced
modified << path if @repos.stat(path, rev).file?
when "D"
deleted << path
end
copied << [path, change.copyfrom_path] if change.copyfrom_path
end
# Perform moves
copied.each do |copy|
del = deleted.find { |d| d == copy[1] }
if del
# Change the path. No need to perform other updates, as this is an
# "A" or "R" and thus is in the +modified+ Array.
record = get(del)
record.update_attributes(:path => short_path(copy[0])) if record
end
end
# Perform deletes
deleted.each do |path|
record = get(path)
record.destroy if record # May have been moved or refer to a directory
end
# Perform modifies and adds
modified.each do |path|
next if @config.extension && path !~ /\.#{@config.extension}\Z/
record = get(path) || new_record
svn_file = @repos.file(path, rev)
# update body
record.__send__("#{@config.body_property}=", svn_file[0])
# update node props -- just find any props with property_prefix
svn_file[1].each do |name, val|
if name =~ /\A#{@config.property_prefix}(.*)/
record.__send__("#{$1}=", val)
end
end
# update revision props
record.path = short_path(path)
record.svn_updated_at = date
record.svn_updated_rev = rev
record.svn_updated_by = author
if record.new_record?
record.svn_created_at = date
record.svn_created_rev = rev
record.svn_created_by = author
end
record.save
end
# Update model_row.revision
@model_row.update_attributes(:revision => rev)
end
def connect
@ctx = context
# This will raise some error if connection fails for whatever reason.
# I don't currently see a reason to handle connection errors here, as I
# assume the best handling would be to raise another error.
@repos = ::Svn::Ra::Session.open(@config.uri, {}, callbacks)
@path_from_root = @config.uri[(@repos.repos_root.length)..-1]
return true
end
def context
# Client::Context, which paticularly holds an auth_baton.
ctx = ::Svn::Client::Context.new
if @config.username && @config.password
# TODO: What if another provider type is needed? Is this plausible?
ctx.add_simple_prompt_provider(0) do |cred, realm, username, may_save|
cred.username = @config.username
cred.password = @config.password
end
elsif URI.parse(@config.uri).scheme == "file"
ctx.add_username_prompt_provider(0) do |cred, realm, username, may_save|
cred.username = @config.username || "ANON"
end
else
ctx.auth_baton = ::Svn::Core::AuthBaton.new()
end
ctx
end
# callbacks for Svn::Ra::Session.open. This includes the client +context+.
def callbacks
::Svn::Ra::Callbacks.new(@ctx.auth_baton)
end
end
end
Time to hook the pieces together.
An update to Wistle::Svn, to add the .sync class method to including models:
module Wistle::Svn
module ClassMethods
def sync
Wistle::SvnSync.new(svn_repository).run
end
end
end
In Article, after including DataMapper::Resource, include Wistle::Svn.
Run rake dm:db:automigrate to add in Wistle::Svn’s properties to Article.
And, now, to make the sync’s happen. I’m going to go with one sync for every Request, for now. This may prove to be terribly inefficient (the connect code to the Subversion repository is not cheap), but if so, I’ll change it later.
So, a nice before filter in Application should do the trick.
class Application < Merb::Controller
before :sync_articles
protected
def sync_articles
Article.sync
end
end
Finally, I’m going to remove all methods and associated views from Articles that can update an Article, i.e. new, create, edit, update and destroy.
And, well…that’s it. Well, you do need to set up in appropriate Wistle::Config in Article (or in database.yml).
Revision 48