Fri, 2008 Aug 22

Wistle Part 2: Subversion Storage -- Single Site

Posted in Wistle at 09:00 by jmorgan

One Site Subversion

So, we have a more or less working blog application using our friends Merb and DataMapper. That's great and if you were looking for a Merb/DataMapper tutorial, hopefully the first entry helped. Still, the central goal is to store the articles in a source-control repository. So, let's get going on that.

For now, I'm going to ignore the multi-site requirement, for two reasons: I want to first focus on just interacting with Subversion, without extra distractions; and I happen to know that I want to write the library that will be covered in this section for other uses.

Before diving into the code, I want to examine three "big" design questions.

  1. How much abstraction? I could create a library that abstracts so that it presents a unified API for multiple SCMs. But I won't. Again, it complicates things. Also, the "Subversion" stuff will not be accessed from many points within the app, so I feel fairly safe with the possibility of future "API changes" if I decide to abstract it later.

  2. What SCM? This is an easy call for me. I'm accustomed to Subversion, including having some experience using its SWIG bindings. I've played a little with git, but at some point I have to cut off the "learning new things on this one app".

  3. How to interact with Subversion. Here's some possibilities:

  • Command line/backticks: I'm not entirely opposed to this, but since there are better options, no reason to look here.

  • RSCM: This may no longer be true, but from my memory of RSCM, it more or less uses the command line functions. It offers the benefit of being abstracted, but like the command line, it means working with a working copy. Sure, I could have one checked out in a tmp directory, but I don't care for that idea.

  • post-commit-hooks: This could be pretty useful, and I could see extending Wistle to accept, say, XML or YAML sent by such a hook (it probably would be fairly simple). One downside is that it requires permissions to modify the hook. I don't actually anticipate this would be a common problem. The second downside is then I don't get to play :-( Oh, and the third is getting pre-existing data.

  • CSCM: Theoretically, along the lines of RSCM; so far, it's only for Subversion, but it uses the SWIG bindings. Downsides are that it's not under active development (with occassional exceptions on my personal copy, I suppose), and that it's geared towards a different purpose.

  • Using Subversion SWIG bindings directly: Yay! This allows a bit more control and focus than using one of the libraries, and since we don't need a lot, I don't think this is reinventing the wheel. Or, maybe, I'm just using an earlier wheel, down one level of abstraction. The big downside to this is that installing the SWIG binding can be a massive pain unless you have a distro that has a nice package; it may well be impossible on Windows...

  • I'll throw in one more, which is using the svn or Subversion DeltaV--or whatever the correct name is--protocols directly. Neither protocol is particularly frightening, but that path would still be a lot of extra work for probably minimal gain. It also has the downside that you have be running either svnserve or an http server.

Questions answered, I'll add a few more design thoughts before diving into the code.

One approach, one which I've already tried, is to skip the relational database altogether. This is certainly possible, and with caching of the generated pages would be fast enough for my purposes. However, custom text searches are a problem, requiring loading all the current data, then performing the search in Ruby.

Since this is not scalable, my solution is for the application to actually retrieve data from a relational database which mirrors the current state of repository. Therefore, the main functionality I need to add is to update the database after any commit to the repository. Initially, I had the update procedure run at every request but for most requests, this only checked the current revision number. Other options would be a cron job, an "update" button on the site, etc. My current solution is an action, "/articles/sync_all", and a post-commit hook that wget's that page.

To derive this updating functionality, I want to include a Module in the appropriate models. I'll call it Wistle::Svn because I can't think of a more useful name. I'll save the file as "lib/wistle/svn.rb".

The first thing (yes, finally) I want this module to is add some properties to any model which includes it. So, let's start with:

module Wistle
            module Svn
              class << self
                def included(klass) # Set a few 'magic' properties
                  klass.property :path, String
                  klass.property :svn_created_at, DateTime
                  klass.property :svn_updated_at, DateTime
                  klass.property :svn_created_rev, String
                  klass.property :svn_updated_rev, String
                  klass.property :svn_created_by, String
                  klass.property :svn_updated_by, String
                end
              end
            end
          end
          

Path will store the relative path in the repository. It will also serve as a permalink later on (Note, path, and the *_by's were added in later commits than the others. I just went and missed them).

The others are your basic created/updated timestamps except they will be kept in sync with the Subversion repo. This allows for having an #updated_at in the database without interfering with the auto timestamp functionality, etc. Also, we'll keep track of the revisions. #svn_created_rev is for information only; #svn_updated_rev will be important to the sync method. So, every model that includes Wistle::Svn gets these properties, stored in the relational database. Of course, I'm now assuming that this will only be included in a class that include DataMapper::Resource.

Next up, I want to be able to specify, in the model, which is the "body" property; that is, what property in the relational db should store the contents of the file in Subversion. So, I need to accept an option to the property class method. But before I get there, this introduces a problem. How should I store this configuration data?

If you check out ActiveRecord::Base, for example, you'll see a lot of lines like this:

cattr_accessor :table_name_prefix, :instance_writer => false
          @@table_name_prefix = ""
          

I'm no expert on Rails internals, but I've spent a decent amount of time going through ActiveRecord in particular and this seems to be the preferred Rails' method for doing class-wide configuration. cattr_accessor is a Rails addition to Ruby (Merb has it as well). Having spent time in ActiveRecord, my first inclination was to use this. And as a methodology, it works pretty well when your inheriting your functionality. Class variables in an included module doesn't work (at least not in any way I understand).

Instead, I decided to just use a configuration class. It's simpler and cleaner, in my opinion, and doesn't have the inclusion problem mentioned above (I'll get to how that works in a bit). So, let's start defining that class:

module Wistle
            class Config
              attr_accessor :body_property
          
              def initialize
                # Set defaults
                @body_property = 'body'
              end
            end
          end
          

All it does, for now, is define an instance variable, @body_property (the name of the property in the database that stores the contents of the file) and use :attr_accessor to create the getter and setter methods.

But our model needs access to the Config data. Again, I could try to make it a class variable, but there's still the problem with class variables in modules. Fortunately, in Ruby, everything is an Object. So, a class can have instance variables.

module Wistle::Svn
            module ClassMethods
              def config
                @config ||= Config.new
              end
            end
          end
          

Easy enough? I also need to extend the model class with the methods in ClassMethods when the module is included. This is a popular Rails trick. To the Wistle::Svn.included method, add the line klass.extend(ClassMethods). Now, if Article includes Wistle::Svn, we can access the config via #config (in the class), and self.class.config (from instances). And, I can always add custom methods for configuration options that are more likely to be accessed. Now, then, I can update DataMapper's property class method to accept an option saying that a particular property stores the file's contents.

module Wistle::Svn
            module ClassMethods
              def property(name, type, options = {})
                if options.delete(:body_property)
                  config.body_property = name.to_s
                end
          
                super(name, type, options)
              end
            end
          end
          

Using this would be something like:

class Article
            property :contents, :body_property => true
          end
          

I'll look at what Wistle::Svn does with this information when I discuss syncing the databases. Hopefully, I will get to that point eventually.

As an aside, since I don't anticipate any instance methods in the Wistle::Svn module, I could drop the ClassMethods module and use extend instead of include in my model. But I've chosen the include for consistency with DataMapper.


The wistle_models table

Before I can get to syncing, the database will need to know the version of its "working copy", as it were. Except, I suppose, for the first update. I reckon I need another table in the database that keeps track of the current revision for each Wistle::Svn model. So, 'lib/wistle/model.rb':

module Wistle
            class Model # Table is named wistle_models.
              include DataMapper::Resource
          
              property :id, Integer, :serial => true
              property :name, String
              property :revision, Integer
            end
          end
          

And this file needs to be required in 'lib/wistle.rb'. Just for fun, let's run rake dm:db:autoupgrade. Alas, no luck, the new model doesn't migrate. There's a good reason why, none of the Wistle module is required when running Merb (As an aside, it just seems more reasonable to me to include Wistle::Model in the Wistle lib instead of directly in the models directory). Add another depencency in init.db, but there's a gotcha here. This dependency should not be declared until after use_orm :datamapper, because it depends on DataMapper being loaded.

use_orm :datamapper
          dependency 'lib/wistle.rb'
          

Awesome. I guess. You can run that migration now and it should work. And now let's get our Subversion-y models talking to this model.

module Wistle::Svn
            module ClassMethods
                def svn_repository
                return @svn_repository if @svn_repository
          
                @svn_repository = Wistle::Model.first(:name => self.name)
                @svn_repository ||= Wistle::Model.create(:name => self.name, :revision => 0)
                @svn_repository.config = config
                @svn_repository
              end
            end
          end
          

Again, I use the Class instance variable trick. I only want to set up @svn_repository when I have to, so if it's already available, I just return it. Next, I try to get a row in wistle_models that is set up for the current. If no luck there, I create such a row. Finally, I give this Model instance direct access to the Subversion-ized Models @config. Which means one more update to Wistle::Model: attr_accessor :config.

Before hitting the update code, I want to flesh out the Wistle::Config class. The other three configuration elements I want are

uri
The uri of the folder in the Subversion repository where the model's contents are stored (file:///path/to/repo/path/to/folder, svn://example.com/path/to/folder, etc.)
username
The Subversion username to use, if needed.
password
The Subversion password to use, if needed.
property_prefix
This addresses a question I didn't ask above. How to deal with properties other than the contents. I could, for example, start each file with a bit of yaml or xml or what have. I'm going to store the other properties using Subversion's property mechanism. However, I want to minimize the chance of name conflicts, so I provide a setting for a prefix. As a default, I'll use "ws:" (for Wistle::Svn, I guess).
extension
The extension of files that will be included in the update. This is certainly not necessary, but it works for me.
class Wistle::Config
            OPTS = [:uri, :username, :password,
                    :body_property, :property_prefix, :extension]
          
            attr_accessor *OPTS
          
            def initialize
              # Set defaults
              @body_property = 'body'
              @property_prefix = 'ws:'
              @extension = 'txt'
            end
          end
          

The OPTS constant is because I'll re-use this list momentarily. I also want to be able to set some of these settings in database.yml, if it's available. At the end of the initialize method, I add:

if Object.const_defined?("Merb")
            f = "#{Merb.root}/config/database.yml"
            env = Merb.env.to_sym || :development
          end
          
          if f
            config = YAML.load(IO.read(f))[env]
            OPTS.each do |field|
              config_field = config["svn_#{field}"] || config["svn_#{field}".to_sym]
              if config_field
                instance_variable_set("@#{field}", config_field)
              end
            end
          end
          

Now, in database.yml, I can add :svn_username: my_login. That is, I can prefix any of the fields defined above with 'svn_'. I'm not sure that sentence made sense.

Revision 42


Updating

Hey, it's time for the central code, sync the database from the repository. If you're particularly interesting in using Subversion's SWIG bindings, one of the more interesting parts of this project might be the Wistle::Fixture library, which I use to generate Subversion repository "test fixtures", but which I won't cover here. Incidentally, if you are so inclined, the test cases included in Subversion's repository. The actual code isn't commented, but it's "fairly" readable.

I'm putting the syncing code in its own class, because, well, that's what my brain says I should do. The only initialization argument it requires is a the appropriate row in Wistle::Model. It only provides one other public method, #run, which runs the updating, going through the following steps

  1. Connect to the repository. See #connect, #context, and #callbacks private methods. Most of what's going on here is dealing with different authentication options. Honestly, I don't have a solid understanding of this bit.
  2. Check if we have updated to the last revision already. If so, quit.
  3. Run the repository's #log method. This gets information about each commit, starting with the most recent; I've specified to get revisions only through the last update (stored in Wistle::Model#revision). Store this information in the variable changesets.
  4. Reverse changesets and run #do_changeset on each element.

SvnSync#do_changeset actually updates the database. For each change in the changeset:

  1. It determines whether the change was one I'm interested in, and if so, what kind of change. There are three types of interest: moves, modifications/adds, and deletes.
  2. Moves are the most problematic, mostly because Subversion doesn't really have a "move" concept. Instead were looking for a node that was copied for another node in the same changeset that the latter node was deleted. In this case, as opposed to "just a copy", I don't want to create a new entry in the database, but rather modify the path of the existing entry. Why? To not invalidate foreign keys, i.e. to keep comments listed with the article after it's renamed.
  3. Next, do any deletes. It's possible we won't find the node to delete, either because it was actually a move, or because it refers to a file we don't keep track of. In that case, just continue on with the next delete.
  4. Modify/Add/Replace: In all these cases, what I want is to update the content of the appropriate row, creating a new row if needed. The private method #get is responsible for finding the appropriate row, based on the path. This updates contents and other properties, both those specified by the revision and the actual node properties.
  5. When all changes have been processed, update the Wistle::Model row with the new current revision.

If you aren't familiar with the SWIG bindings, the code will probably be a bit confusing, but hopefully the outline above will help clarify what's going on. More to the point, I hope it illustrates that ORM's are not the only available storage mechanisms for web apps.

So, the code (yikes):

module Wistle
            class SvnSync
              def initialize(model_row)
                @model_row = model_row
                @model = Object.const_get(@model_row.name)
                @config = @model_row.config
              end
          
              # There is the possibility for uneccessary updates, as a database row may be
              # modified several times (if modified in multiple revisions) in a single
              # call. This is inefficient, but--for now--not enough to justify more
              # complex code.
              def run
                connect unless @repos
                return false if @repos.latest_revnum <= @model_row.revision
          
                changesets = [] # TODO Maybe revision + 1
                @repos.log(@path_from_root, @repos.latest_revnum, @model_row.revision, 0, true, false
                    ) do |changes, rev, author, date, msg|
                  changesets << [changes, rev, author, date]
                end
          
                changesets.sort{ |a, b| a[1] <=> b[1] }.each do |c| # Sort by revision
                  do_changset(*c)
                end
                return true
              end
          
              private
          
              # Get the relative path from config.uri
              def short_path(path)
                path = path[@path_from_root.length..-1]
                path = path[1..-1] if path[0] == ?/
                path.sub!(/\.#{@config.extension}\Z/, '') if @config.extension
                path
              end
          
              # Get an object of the @model, by path.
              def get(path)
                @model.first(:path => short_path(path))
              end
          
              # Create a new object of the @model
              def new_record
                @model.new
              end
          
              # Process a single changset.
              # This doesn't account for possible move/replace conflicts (A node is moved,
              # then the old node is replaced by a new one). I assume those are rare
              # enough that I won't code around them, for now.
              def do_changset(changes, rev, author, date)
                modified, deleted, copied = [], [], []
          
                changes.each_pair do |path, change|
                  next if short_path(path).blank?
          
                  case change.action
                  when "M", "A", "R" # Modified, Added or Replaced
                    modified << path if @repos.stat(path, rev).file?
                  when "D"
                    deleted << path
                  end
                  copied << [path, change.copyfrom_path] if change.copyfrom_path        
                end
          
                # Perform moves
                copied.each do |copy|
                  del = deleted.find { |d| d == copy[1] }
                  if del
                    # Change the path. No need to perform other updates, as this is an
                    # "A" or "R" and thus is in the +modified+ Array.
                    record = get(del)
                    record.update_attributes(:path => short_path(copy[0])) if record
                  end
                end
          
                # Perform deletes
                deleted.each do |path|
                  record = get(path)
                  record.destroy if record # May have been moved or refer to a directory
                end
          
                # Perform modifies and adds
                modified.each do |path|
                  next if @config.extension && path !~ /\.#{@config.extension}\Z/
          
                  record = get(path) || new_record
                  svn_file = @repos.file(path, rev)
          
                  # update body
                  record.__send__("#{@config.body_property}=", svn_file[0])
          
                  # update node props -- just find any props with property_prefix
                  svn_file[1].each do |name, val|
                    if name =~ /\A#{@config.property_prefix}(.*)/
                      record.__send__("#{$1}=", val)
                    end
                  end
          
                  # update revision props
                  record.path = short_path(path)
                  record.svn_updated_at = date
                  record.svn_updated_rev = rev
                  record.svn_updated_by = author
                  if record.new_record?
                    record.svn_created_at = date
                    record.svn_created_rev = rev
                    record.svn_created_by = author
                  end
                  record.save
                end
          
                # Update model_row.revision
                @model_row.update_attributes(:revision => rev)
              end
          
              def connect
                @ctx = context
          
                # This will raise some error if connection fails for whatever reason.
                # I don't currently see a reason to handle connection errors here, as I
                # assume the best handling would be to raise another error.
                @repos = ::Svn::Ra::Session.open(@config.uri, {}, callbacks)
                @path_from_root = @config.uri[(@repos.repos_root.length)..-1]
                return true
              end
          
              def context
                # Client::Context, which paticularly holds an auth_baton.
                ctx = ::Svn::Client::Context.new
                if @config.username && @config.password
                  # TODO: What if another provider type is needed? Is this plausible?
                  ctx.add_simple_prompt_provider(0) do |cred, realm, username, may_save|
                    cred.username = @config.username
                    cred.password = @config.password
                  end
                elsif URI.parse(@config.uri).scheme == "file" 
                  ctx.add_username_prompt_provider(0) do |cred, realm, username, may_save|
                    cred.username = @config.username || "ANON"
                  end
                else
                  ctx.auth_baton = ::Svn::Core::AuthBaton.new()
                end
                ctx
              end
          
              # callbacks for Svn::Ra::Session.open. This includes the client +context+.
              def callbacks
                ::Svn::Ra::Callbacks.new(@ctx.auth_baton)
              end
            end
          end
          

Time to hook the pieces together.

An update to Wistle::Svn, to add the .sync class method to including models:

module Wistle::Svn
            module ClassMethods
              def sync
                Wistle::SvnSync.new(svn_repository).run
              end
            end
          end
          

In Article, after including DataMapper::Resource, include Wistle::Svn.

Run rake dm:db:automigrate to add in Wistle::Svn's properties to Article.

And, now, to make the sync's happen. I'm going to go with one sync for every Request, for now. This may prove to be terribly inefficient (the connect code to the Subversion repository is not cheap), but if so, I'll change it later.

So, a nice before filter in Application should do the trick.

class Application < Merb::Controller
            before :sync_articles
          
            protected
          
            def sync_articles
              Article.sync
            end
          end
          

Finally, I'm going to remove all methods and associated views from Articles that can update an Article, i.e. new, create, edit, update and destroy.

And, well...that's it. Well, you do need to set up in appropriate Wistle::Config in Article (or in database.yml).

Revision 48


0 Comments

Add a comment