Half-Penny For Your Thoughts

rounded down to the nearest cent



Categories


Recent Articles




Wistle

Wistle: Plans for 2009

Update 2009-11-29: Well, I’ve completed some of these plans (see below) and have Wistle back to “good enough” for me, so I’ve decided to forego the rest, at least for the time being. I have looked at migrating to hyde, a fork of jekyll (Jekyll is too “opinionated” for my taste), but have concluded that’s more work than I want for the moment. On the other hand, starting down that path produced an interesting library: svn-transform.


So, I’ve been pretty happy with Wistle, the merb app that runs this blog and fromgenesis.org. But I’ve been pondering the future of it. The design is somewhat monolothic, what with several libraries and such just stuck in the lib directory. I’m now thinking about focusing more on the app itself, rather than just the ability to store the articles in Subversion, but the current setup is a bit too interconnected. I would also like to easily re-use several of the bits in the lib directory in other apps.

The driving force here is actually the comments. The views I have so far are not real useful, and there’s no anti-spam measures whatsoever. And as I’ve pondered what I want to do there–and now with some time away from the original work–I’m thinking much more along “this is part of the app (the UI)” versus “this is part of the back end”.

So, refactoring is at hand. I want to strike a useful balance between “trash it and start over” and “make the changes that need to happen” (Although, since its mostly a for-fun project, if I end up trashing a lot, no biggee). I’m considering the best order to approach this; one in which I plan on doing the comments updates last, but at which there are a number of points at which I could stop and jump over to that. So, here goes.

Move to git

Done 6/14/09 - http://github.com/jm81/wistle

Wistle is currently SCM’ed in Subversion. I’d like to move it to git, and host on github. No particular reason, other than for the chance for some more learning. I’ve only done a little bit with git, and since it seems to be ‘the thing’ among the merb/rails folks these days. The little I’ve used it, I’ve liked it…

Libraries to gems

This is where the change really begins. I want to move libraries out to gems, because there’s no reason for these things to be tied to the app. I foresee five gems here:

  1. Filters

    Done 7/9/09 - http://github.com/jm81/dm-filters. Unlike with jm81-paginate and jm81-svn-fixture, I did no cleanup on this library, but basically just copy/pasted the code and made it a gem.

    This is the library that allows for filtering text through, e.g. markdown, textile, smartypants, etc. libraries. It will also contain a couple built-in, which I could move out later. One of my favorite features of this lib is the ease of choosing between a variety of implementations for a single filter. For example, for markdown, I could use rdiscount or bluecloth, depending on which is available.

  2. Pagination

    (Mostly) Done 6/23/09 - http://github.com/jm81/paginate - See TODO file for future plans for this gem)

    I know, there’s bazillions of pagination approaches out there. This gem would just be for adding a method to DataMapper classes and collections: paginate, which is just like all, but it receives a :page option (and receives or assumes a :limit option). The result has two extra methods: #pages is the number of pages given the current settings, and #current_page is the number of the current page (1-indexed).

    I should probably check what else is out there now, although I’ve never completely liked other implementations I’ve seen, mostly because I want to access the page number and total pages through a method on the returned collection. If there’s something better out there, I probably shouldn’t bother with the updates that are needed, such as adding the #paginate method to DataMapper::Collection.

    What I do have is somewhat based on dm-is-paginated.

  3. Pagination-slice

    Update 6/25/09 - I decided this was unneeded. The jm81-paginate gem (above) now has a helper method (Paginate::Helpers::Merb#page_links) that does what I had planned for this slice to do.

    This involves figuring out merb slices, but that’s worth it. I have a helper (might be part of the first library) and a view partial that I use quite a bit, with slight variations. It’s based on someone else’s code, but I’m not sure I remember what.

    Anyway, I think I could create a useful slice, to make this code more easily reusable. I’ll see.

  4. Subversion Fixtures

    Done 7/5/09 - http://github.com/jm81/svn-fixture

    Currently in lib/wistle/fixture.rb, this allows a fairly simple way to set up a Subversion repository (which I use in testing the Subversion-to-Datamapper stuff.

  5. Subversion-to-Datamapper

    Done 7/10/09 - http://github.com/jm81/dm-svn. As with dm-filters, this gem works, but I haven’t put much effort into it.

    The current lib/wistle directory, the stuff that extends a DataMapper class to allow syncing from a Subversion repository.

(A possible sixth gem has to do with “attachments”, along the lines of attachment_fu. It’s not something currently required in Wistle, but I have one more or less together that works more reasonably to me than other plugins currently available)

New Wistle lib approach

Aside from the fact that the above needs to happen in order to simplify new updates, what really interests me is this part. I want to break up the process in the Subversion-to-Datamapper sync. In short, I want to place an intermediary, which is the Atom Publishing Protocol. That way, I can use Subversion to store and auto-publish to any blog app that accepts Atom publishing. The blog app, then, can also accept from any client that publishes via Atom. Both of these options are appealing to me, while still allowing the Subversion to blog app as it currently is without any change from the perspective of the article writer.

So, there would be two libraries/programs then, such as it is:

  • Subversion to Atom
  • Atom parser to Ruby model

Ideally, this could go in the opposite direction, so that, for example, I could publish via Atom from Word and it would commit to the Subversion and update the blog app. But I’ve not really investigated Atom to know if this would all be easy enough to be worth it.

Another issue this will create is adding some sort of user privileges to authenticate when users try to post articles, etc.

There will also probably need to be another library for updating the views and public files from Subversion, and (possibly again elsewhere) the methods for allowing the views and assets to be found in the right place in the file system based on which site is currently active.

Update app to new libs

This would actually be an ongoing process, updating the blog app to move from the existing integrated libraries to the new gems, including updating for any modifications to those libraries (which will happen in at least some cases). But at some point, I need to make a concerted effort to ensure this all has happened. Which leads to…

Clear separation of app and libs

By this point, the blog app itself should be a distinct entity. It would accept publishing adds, edits, etc. via Atom Publishing, and use the other libraries as needed. But it would not be tied to Subversion as the storage mechanism, and the gem could be used in other applications.

Update versions of datamapper and merb

Another thing that will need to happen, and I’m not sure when, is to update to the latest and greatest datamapper and merb versions, and the latest svn-client lib, which does have some changes. But those will need to happen at some point, at this if not before.

Fix comments

Hey, now we get to really fix the comments, because there’s not all that other stuff in the way. Obviously, this could happen sooner. It involves two major pieces:

  1. Nicer looking default views.

  2. Anti-spam. I’m currently looking at reCAPTCHA. Another possibility I’ve considered is that the first time a visitor posts a comment using a given email, they would receive a validation email. Since I have no intention of actually showing commenters’ emails, this should be workable.

    Update 7/6/09 - Minimal work to add recaptcha support. Commit

General app cleanup

Because, there will be stuff to clean up, right?

As a final comment, if anyone else is interested in working on this, please let me know: jmorgan at morgancreative dot net. I don’t know that this project would hold any interest for anyone else, but it would be silly of me not to ask, eh?


Wistle

Wistle Part 5: Multi-Site Views

Multi Site views and public files

Parts 1 2 3 4

Finally, I want to be able to create the views, and do so using haml, erb, etc, and store them in Subversion, and have different views (and that means different stylesheets, etc) for each site. That involves three major actions:

  1. Decide how to organize the per-site files.
  2. Figure out how to get those files updates.
  3. Tell Merb where to find the files.

My organization goes like this

/app
  /sites
    /SITENAME
      /views
      (possibly /helpers here in the future).
/public
  /sites
    /SITENAME

The file updates is more tricky. An easy option would be to use svn:externals. This could be a hassle, though, if the Wistle app is hosting a lot of sites.

Instead, I’m going to update SiteSync to also update the views and public files. This will be done by deleting the current directory (when there has been an update) and exporting the most recent files. First thing, a few more properties in Site:

class Site
  property :views_uri, Text
  property :views_revision, Integer, :default => 0
  property :public_uri, Text
  property :public_revision, Integer, :default => 0
  
  # A URI based off of contents_uri to use as the base for building URI's
  # for public and views
  def base_uri
    ary = contents_uri.split("/")
    ary.pop if ary[-1].blank?
    ary.pop
    ary.join("/") + "/"
  end
  
  def views_uri
    @views_uri || (base_uri + "app/views")
  end
  
  def public_uri
    @public_uri || (base_uri + "public")
  end
end

The additional methods give me some default URIs based on the contents_uri. This is based on my preferred organization.

SiteSync is where the big updates happen. Basically, I add some methods to check if there are updates to the views or public files. If so, the current are deleted and an export is done. This means that the files could be inaccessible for a few seconds (depending on connection speed and repository size). I’m also not sure if/when reboots would be required in a production environment.

class SiteSync
  def run
    super
    export_views
    export_public
  end
  
  def export_views
    export("views", File.join(Merb::root, "app", "sites", @model_row.name, "views"))
  end
  
  def export_public
    export("public", File.join(Merb::root, "public", "sites", @model_row.name))    
  end
  
  def export(name, export_path)
    export_path = File.expand_path(export_path)
    uri = @model_row.__send__("#{name}_uri")
    rev = @model_row.__send__("#{name}_revision")
    connect(uri)
    return false if @repos.latest_revnum <= rev
    updated_rev = @repos.stat(uri[(@repos.repos_root.length)..-1], @repos.latest_revnum).created_rev
    return false if updated_rev <= rev
    
    FileUtils.mkdir_p(export_path)
    FileUtils.rm_rf(export_path)
    @ctx.export(uri, export_path)
    @model_row.update_attributes("#{name}_revision" => @repos.latest_revnum)
    true
  end
end

Method #export is the workhorse here, and the bulk is checking if we really need to do any work and that the path is ready for the export. The actual @ctx.export line is anticlimatic.

This does require some updates in Wistle::SvnSync because we may be accessing multiple repositories within on instance. In short, #connect and #context both need to accept a uri option rather than relying on @config.uri. Probably some refactoring is in order (move all connection work to another class, for example).

Telling Merb where to find the files

This turns out to be surprisingly easy, so long as the “correct” helper methods are used. On that note, I’ll look first at the public files. This requires two override methods in GlobalHelpers.

module Merb
  module GlobalHelpers
    def image_tag(img, opts ={})
      opts[:path] ||= "/sites/#{@site.name}/images/"
      super(img, opts)
    end
    
    def asset_path(asset_type, filename, local_path = false)
      path = super(asset_type, filename, local_path)
      "/sites/#{@site.name}#{path}"
    end
  end
end
image_tag generates a :path option to the site-specific image directory, unless
path has been set manually. It then calls super to let the original method do the real work.

asset_path is similar but, well, backwards. This is called by js_include_tag and css_include_tag to generate the appropriate path. I call super to let the parent method again do the real work. Then I prepend its result with the site-specific public path.

Another option is if I were using Lighttpd or Apache or something similar to serve public files, I could use the web server’s url rewriting capabilities.

The approach I take for the views is fun. In Application, I add this little jewel:

class Application < Merb::Controller  
  before :update_template_roots
  after :revert_template_roots
  
  def update_template_roots
    self.class._template_roots = [
      ["#{Merb.root}/app/views", :_template_location],
      ["#{Merb.root}/app/sites/#{@site.name}/views", :_template_location]
    ]
  end
  
  def revert_template_roots
    self.class._template_roots = [
      ["#{Merb.root}/app/views", :_template_location]
    ]
  end
end

Is that an ugly hack or what? Surely there’s a better way than back and forth modifying a class variable. Please? Well, there probably is, but I don’t know Merb’s internals well enough.

The key is the class method (I believe representing a class variable), _template_roots . If I understand it all correctly, this is used by render to determine possible base paths and what method to use with that path. So, with each request to render, I tack on the current site’s view path as a possible root, call super, then revert to the default. Why this back and forth? Because one request could be directly followed by a request for a different Site.

I half expect to be beaten in my sleep for that one. But it works.

Revision 79

Conclusion

Of course, this is just the starting point, but it’s met my goals, and I hope it’s illustrated both some basics of Merb and DataMapper as well as how these can be used to interact with data that is not stored in an relational database. After all, great frameworks and libraries can really free us to focus on the important bits, but they can also make it difficult to see all the possibilities.

After that cheesy statement, here’s a few pieces I’d like to expand Wistle with in the future:

- Tags (now supported) - Search - Date links (i.e. /2008 gets all articles from 2008) - RSS/Atom - A sync action (for use by, e.g. subversion hooks; now supported) - Pagination - Per-site Helpers (maybe; I’ve debated whether there’s any likely value in this) - Better support for STI (I’ve played with this a bit)

Finally, it’s worth mentioning that my intent with these articles is illustrative and/or tutorial, rather than to start a “project”. That is, I hope this helps people who are writing their own blog or similar application. However, should you decide to use Wistle, that’s great, and I’d be happy to receive bug reports, feature requests, etc. Whether I will do anything with them probably depends on the day.


Wistle

Wistle Part 4: Multi-Site Subversion Models

Multi Site models

Ah, point #4, multiple sites hosted on one Wistle instance. I’m not going to create additional “library” functionality to support this. After all, this is getting pretty application-specific. But, I am going to take advantage of the existing Wistle library.

The key point here is going to be a Site model, that: a) Articles belong to; and b) takes over storing per-site configuration. In essence, it will replace Wistle::Config. The key is that “site-wide” configuration will subsitute for model-wide configuration. So, let’s start with the Site model.

class Site
  include DataMapper::Resource
  
  has n, :articles
  
  property :id, Integer, :serial => true
  property :name, String, :unique => true, :nullable => false
  property :domain_regex, String
  
  # Subversion
  property :contents_uri, Text
  property :contents_revision, Integer, :default => 0
  property :username, String
  property :password, String
  property :property_prefix, String, :default => "ws:"
  property :extension, String, :default => "txt"
  
  # Content Filters
  property :article_filter, String
  property :comment_filter, String
  
  # Timestamps
  property :created_at, DateTime
  property :updated_at, DateTime
end

Here’s the properties (and has n, :articles). Note that the properties under the “Subversion” heading are pretty close to the instance variables of Svn::Config. Also, notice contents_uri and contents_revision. These match with Config’s uri and revision. Why the prefix? Because I want to able to use a different uri (possibly in another repo) for views and public files. But that is for the next section. I could set up username, etc this same way; I won’t for now, because I have no use for doing so. If I were, however, I would probably create yet another model, called “Config” or something that belongs to a Site, with a role property. Like I said, it’s not needed for now, so I won’t bother.

The contents_* fields could create a problem though, because Wistle::SvnSync expects different names. A simple solution is some (not-quite) aliasing:

class Site
  def uri
    @contents_uri
  end
  
  def revision
    @contents_revision
  end
  
  def revision=(rev)
    attribute_set(:contents_revision, rev)
  end
  
  def body_property
    :body
  end
end

The revision= is also used by Wistle::SvnSync, and body_property is another configuration option that SvnSync expects. With body_property, there’s only one option, at least so long as I only use the one model (Article). So, body_property always returns :body. I’ll show how all this hooks into SvnSync in a moment. Before that, though, a bit about the :domain_regex property.

Wistle is not designed to be user-friendly in the traditional sense, except when the user is defined as me. For example, adding Sites, deleting Comments, etc. must, at this point, be done through a console. That’s great by me, but for someone without programming experience, Wistle would probably not be a good choice. Another example is the domain regex property. It’s used by Site.by_domain (below) to find a site based on a domain. Except, as it’s name implies, domain_regex is a regular expression. Great for me, might be less attractive to others.

class Site
  class << self
    # Find a Site by domain regex, prefer longest match.
    def by_domain(val)
      possible = \[]
      
      # Find matching Sites
      Site.all.each do |s|
        r = Regexp.new(s.domain_regex.to_s, true)
        m = r.match(val)
        if m
          possible << [s, m\[0].length] 
        end
      end
      
			# Sort for longest match.
      possible.sort!{ |a, b| b\[1] <=> a\[1] }
      possible\[0] ? possible\[0]\[0] : nil
    end
  end
end

I no longer need to include Wistle::Svn in the Article model, but I do need to add in the properties that Wistle::Svn took care of.

class Article
    # Subversion-specific properties
    property :path, String
    property :svn_created_at, DateTime
    property :svn_updated_at, DateTime
    property :svn_created_rev, String
    property :svn_updated_rev, String
    property :svn_created_by, String
    property :svn_updated_by, String
end

I also update how Filters works to deal with the *_filter properties. To utilize these properties, in Article and Comment, I change the :filter option of the body property to set :default => :site . This tells the Filters::Resource module to use the Site model to determine default filters. In Comment, I also add a method #site, because Filters may try to call this method.

class Comment
  def site
    @article.site
  end
end

Now, you may have noticed a few weird methods that didn’t do much in SvnSync, partically get and new_record. Here’s where they come in. To use SvnSync with the new Site model (instead of the Wistle::Model Model), a few things have to change. First, Site doesn’t have a config method, pointing to a Wistle::Config object. It does, however, respond to the the same methods as a Config object. Second, when creating or getting the content, we need to scope by Site. What to do? Inherit Wistle::SvnSync and override a few key methods.

class SiteSync < Wistle::SvnSync
  def initialize(model_row)
    @model_row = model_row
    @model = Article
    @config = model_row
  end
  
  # Get an Article by site and path.
  def get(path)
    Article.first(:site_id => @model_row.id, :path => short_path(path))
  end
  
  def new_record
    @model.new(:site_id => @model_row.id)
  end
end

Awesome-sauce.

Now, just hook in Site to SiteSync and all the ugly work is done!

class Site
  def sync
    SiteSync.new(self).run
  end
  
  class << self
    def sync_all
      Site.all.each do |site|
        site.sync if site.contents_uri
      end
    end
  end
end

The controllers need a few updates to filter by Site (and the application view needs one for the list of recent articles, but I’m ignoring views). Application needs updates first:

class Application < Merb::Controller
  before :sync_articles
  before :choose_site
    
  protected
  
  def sync_articles
    Site.sync_all
  end
  
  def choose_site
    @site = Site.by_domain(request.host)
  end
end

I change the sync_articles method to use Site.sync_all. Then, I add a choose_site before filter to assign @site, using Site.by_domain (request.host is the full host name including any port number).

One other bit I want to do that might as well fall in this section is folders as categories. My approach here is definately I reflection of my personal organizations styles; in addition, the code is probably not a good solution.

Anyway, I want each top-level folder under the articles directory to represent a category; I want to be able to add additional subfolders without them creating additional categories. I also prefer to use only one category per article, with additional “categorization” through tags (which I will not be implementing in this already way too long article).

To do so, I need to add a category property, which I’ll update with a before

save hook

class Article property :category, String before :save, :update_category

def update_category if attribute_dirty?(:category) || @category.nil? attribute_set(:category, @path.split(’/’)[0]) if @path end end end

I then add two new methods to Site, one to get a list of categories, the second to find published articles by category.

class Site
  def categories 
    repository.adapter.query('SELECT category FROM articles WHERE site_id = ? group by category order by category', self.id)
  end
  
  def published_by_category(category = nil, options = {})
    conditions = "datetime(published_at) <= datetime('now') "
    if category
      conditions << "and path like '#{category}/%' "
    end
    Article.all(options.merge(
          :conditions => [conditions + "and site_id = ?", self.id],
          :order => [:published_at.desc]))
  end
end

Now is also a nice time for some routing updates, both to take advantage of categories, and for “permalink” paths for the Articles. I’m taking advantage of Merb’s support for regular expressions in routes:

Merb::Router.prepare do |r|  
  r.resources :articles do | article |
    article.resources :comments
  end
  
  r.match('/').to(:controller => 'articles', :action =>'index')
  
  r.match(%r[/categories/(.*)]).to(
     :controller => 'articles', :action => 'index', :category => '[1]')
  
  r.match(%r[/(.*)]).to(
     :controller => 'articles', :action => 'show', :path => '[1]')
end

The articles resource remains to support comments, although it is probably not needed.

The last match is the “permalink” one, so that there’s not “articles” or other prefixes in permalinks; doing this obviously depends on the particular application.

And the Articles controller gets a couple of updates to take advantage of these routes:

class Articles < Application
  # provides :xml, :yaml, :js
  
  def index
    @articles = @site.published_by_category(params[:category])
    display @articles
  end
  
  def show
    if params[:path]
      @article = Article.first(:path => params[:path], :site_id => @site.id)
    else
      @article = Article.first(:id => params[:id], :site_id => @site.id)
    end

    raise NotFound unless @article
    display @article
  end
end

Revision 73

And, next, the views…


Wistle

Wistle Part 3: Filters

Body Filters

My next big step is to filter the content, so that Article#html, for example, is the body property filtered through Markdown. So, I created a Filters module, in lib/filters.rb . I won’t show the code here, but I am going to discuss my approach about. Of course, plenty of other packages, such as Mephisto, have already addressed this issue and done so well. But, a big part of this project is for my own personal enjoyment. And I want to write random code, eh?

The crux of my approach is that all the filtering libraries I’m accustomed can be used as such: FilterClass.new(content).to_html. So, the Filters module attempts to initialize an object of the specified class and call #to_html. If needed, the module tries to require the appropriate file or gem.

A constant Hash is defined, with each pair in the format: NameSpecifiedInModel => [[require_name, ClassName], [backup_require_name, BackupClassName]]

For example:

{
  'Smartypants' => [['rubypants', 'RubyPants']],
  'Markdown' => [['rdiscount', 'RDiscount'], ['bluecloth', 'BlueCloth']]
}

In the model, this is set up by include Filter::Resource (probably, not the most useful name). Then, properties can be set to format with an option :format. The syntax when defining a property is:

property :prop_name, :filter => {:to => :filtered_prop, :with => :filter_column, :default => "DefaultFilter"}

(:with and :default are optional, though at least one should be specified.)

If the properties in :to and :with have not yet been defined, they will be defined automatically. Hence, if you want to specify any options with this, they should be defined before the filtered property.

This is similar to Wistle::Svn in that it extends the property methods and stores information in a class instance variable. It also adds a method process_filters, called by a before :save hook, that updates the to property.

So, in Article and Comment, we will now have:

property :html, Text, :lazy => false
property :body, Text,
         :filter => {:to => :html, :with => :filters, :default => %w{Markdown Smartypants}}

I also update views to use #html instead of #body.

There’s room for design debate here. One of the things I like about DataMapper is that the programmer explicitly declares properties. But, here, Filters is doing a lot behind the scenes, including possibly declaring some properties. Still, the design “feels right” to me.

Revision 54


Wistle

Wistle Part 2: Subversion Storage -- Single Site

One Site Subversion

So, we have a more or less working blog application using our friends Merb and DataMapper. That’s great and if you were looking for a Merb/DataMapper tutorial, hopefully the first entry helped. Still, the central goal is to store the articles in a source-control repository. So, let’s get going on that.

For now, I’m going to ignore the multi-site requirement, for two reasons: I want to first focus on just interacting with Subversion, without extra distractions; and I happen to know that I want to write the library that will be covered in this section for other uses.

Before diving into the code, I want to examine three “big” design questions.

  1. How much abstraction? I could create a library that abstracts so that it presents a unified API for multiple SCMs. But I won’t. Again, it complicates things. Also, the “Subversion” stuff will not be accessed from many points within the app, so I feel fairly safe with the possibility of future “API changes” if I decide to abstract it later.

  2. What SCM? This is an easy call for me. I’m accustomed to Subversion, including having some experience using its SWIG bindings. I’ve played a little with git, but at some point I have to cut off the “learning new things on this one app”.

  3. How to interact with Subversion. Here’s some possibilities:

  • Command line/backticks: I’m not entirely opposed to this, but since there are better options, no reason to look here.

  • RSCM: This may no longer be true, but from my memory of RSCM, it more or less uses the command line functions. It offers the benefit of being abstracted, but like the command line, it means working with a working copy. Sure, I could have one checked out in a tmp directory, but I don’t care for that idea.

  • post-commit-hooks: This could be pretty useful, and I could see extending Wistle to accept, say, XML or YAML sent by such a hook (it probably would be fairly simple). One downside is that it requires permissions to modify the hook. I don’t actually anticipate this would be a common problem. The second downside is then I don’t get to play :-( Oh, and the third is getting pre-existing data.

  • CSCM: Theoretically, along the lines of RSCM; so far, it’s only for Subversion, but it uses the SWIG bindings. Downsides are that it’s not under active development (with occassional exceptions on my personal copy, I suppose), and that it’s geared towards a different purpose.

  • Using Subversion SWIG bindings directly: Yay! This allows a bit more control and focus than using one of the libraries, and since we don’t need a lot, I don’t think this is reinventing the wheel. Or, maybe, I’m just using an earlier wheel, down one level of abstraction. The big downside to this is that installing the SWIG binding can be a massive pain unless you have a distro that has a nice package; it may well be impossible on Windows…

  • I’ll throw in one more, which is using the svn or Subversion DeltaV–or whatever the correct name is–protocols directly. Neither protocol is particularly frightening, but that path would still be a lot of extra work for probably minimal gain. It also has the downside that you have be running either svnserve or an http server.

Questions answered, I’ll add a few more design thoughts before diving into the code.

One approach, one which I’ve already tried, is to skip the relational database altogether. This is certainly possible, and with caching of the generated pages would be fast enough for my purposes. However, custom text searches are a problem, requiring loading all the current data, then performing the search in Ruby.

Since this is not scalable, my solution is for the application to actually retrieve data from a relational database which mirrors the current state of repository. Therefore, the main functionality I need to add is to update the database after any commit to the repository. Initially, I had the update procedure run at every request but for most requests, this only checked the current revision number. Other options would be a cron job, an “update” button on the site, etc. My current solution is an action, “/articles/sync_all”, and a post-commit hook that wget’s that page.

To derive this updating functionality, I want to include a Module in the appropriate models. I’ll call it Wistle::Svn because I can’t think of a more useful name. I’ll save the file as “lib/wistle/svn.rb”.

The first thing (yes, finally) I want this module to is add some properties to any model which includes it. So, let’s start with:

module Wistle
  module Svn
    class << self
      def included(klass) # Set a few 'magic' properties
        klass.property :path, String
        klass.property :svn_created_at, DateTime
        klass.property :svn_updated_at, DateTime
        klass.property :svn_created_rev, String
        klass.property :svn_updated_rev, String
        klass.property :svn_created_by, String
        klass.property :svn_updated_by, String
      end
    end
  end
end

Path will store the relative path in the repository. It will also serve as a permalink later on (Note, path, and the *_by’s were added in later commits than the others. I just went and missed them).

The others are your basic created/updated timestamps except they will be kept in sync with the Subversion repo. This allows for having an #updated_at in the database without interfering with the auto timestamp functionality, etc. Also, we’ll keep track of the revisions. #svn_created_rev is for information only; #svn_updated_rev will be important to the sync method. So, every model that includes Wistle::Svn gets these properties, stored in the relational database. Of course, I’m now assuming that this will only be included in a class that include DataMapper::Resource.

Next up, I want to be able to specify, in the model, which is the “body” property; that is, what property in the relational db should store the contents of the file in Subversion. So, I need to accept an option to the property class method. But before I get there, this introduces a problem. How should I store this configuration data?

If you check out ActiveRecord::Base, for example, you’ll see a lot of lines like this:

cattr_accessor :table_name_prefix, :instance_writer => false
@@table_name_prefix = ""

I’m no expert on Rails internals, but I’ve spent a decent amount of time going through ActiveRecord in particular and this seems to be the preferred Rails’ method for doing class-wide configuration. cattr_accessor is a Rails addition to Ruby (Merb has it as well). Having spent time in ActiveRecord, my first inclination was to use this. And as a methodology, it works pretty well when your inheriting your functionality. Class variables in an included module doesn’t work (at least not in any way I understand).

Instead, I decided to just use a configuration class. It’s simpler and cleaner, in my opinion, and doesn’t have the inclusion problem mentioned above (I’ll get to how that works in a bit). So, let’s start defining that class:

module Wistle
  class Config
    attr_accessor :body_property
    
    def initialize
      # Set defaults
      @body_property = 'body'
    end
  end
end

All it does, for now, is define an instance variable, @body_property (the name of the property in the database that stores the contents of the file) and use :attr_accessor to create the getter and setter methods.

But our model needs access to the Config data. Again, I could try to make it a class variable, but there’s still the problem with class variables in modules. Fortunately, in Ruby, everything is an Object. So, a class can have instance variables.

module Wistle::Svn
  module ClassMethods
    def config
      @config ||= Config.new
    end
  end
end

Easy enough? I also need to extend the model class with the methods in ClassMethods when the module is included. This is a popular Rails trick. To the Wistle::Svn.included method, add the line klass.extend(ClassMethods). Now, if Article includes Wistle::Svn, we can access the config via #config (in the class), and self.class.config (from instances). And, I can always add custom methods for configuration options that are more likely to be accessed. Now, then, I can update DataMapper’s property class method to accept an option saying that a particular property stores the file’s contents.

module Wistle::Svn
  module ClassMethods
    def property(name, type, options = {})
      if options.delete(:body_property)
        config.body_property = name.to_s
      end
      
      super(name, type, options)
    end
  end
end

Using this would be something like:

class Article
  property :contents, :body_property => true
end

I’ll look at what Wistle::Svn does with this information when I discuss syncing the databases. Hopefully, I will get to that point eventually.

As an aside, since I don’t anticipate any instance methods in the Wistle::Svn module, I could drop the ClassMethods module and use extend instead of include in my model. But I’ve chosen the include for consistency with DataMapper.


The wistle_models table

Before I can get to syncing, the database will need to know the version of its “working copy”, as it were. Except, I suppose, for the first update. I reckon I need another table in the database that keeps track of the current revision for each Wistle::Svn model. So, ‘lib/wistle/model.rb’:

module Wistle
  class Model # Table is named wistle_models.
    include DataMapper::Resource
    
    property :id, Integer, :serial => true
    property :name, String
    property :revision, Integer
  end
end

And this file needs to be required in ‘lib/wistle.rb’. Just for fun, let’s run rake dm:db:autoupgrade. Alas, no luck, the new model doesn’t migrate. There’s a good reason why, none of the Wistle module is required when running Merb (As an aside, it just seems more reasonable to me to include Wistle::Model in the Wistle lib instead of directly in the models directory). Add another depencency in init.db, but there’s a gotcha here. This dependency should not be declared until after use_orm :datamapper, because it depends on DataMapper being loaded.

use_orm :datamapper
dependency 'lib/wistle.rb'

Awesome. I guess. You can run that migration now and it should work. And now let’s get our Subversion-y models talking to this model.

module Wistle::Svn
  module ClassMethods
      def svn_repository
      return @svn_repository if @svn_repository
      
      @svn_repository = Wistle::Model.first(:name => self.name)
      @svn_repository ||= Wistle::Model.create(:name => self.name, :revision => 0)
      @svn_repository.config = config
      @svn_repository
    end
  end
end

Again, I use the Class instance variable trick. I only want to set up @svn_repository when I have to, so if it’s already available, I just return it. Next, I try to get a row in wistle_models that is set up for the current. If no luck there, I create such a row. Finally, I give this Model instance direct access to the Subversion-ized Models @config. Which means one more update to Wistle::Model: attr_accessor :config.

Before hitting the update code, I want to flesh out the Wistle::Config class. The other three configuration elements I want are

uri
The uri of the folder in the Subversion repository where the model's contents are stored (file:///path/to/repo/path/to/folder, svn://example.com/path/to/folder, etc.)
username
The Subversion username to use, if needed.
password
The Subversion password to use, if needed.
property_prefix
This addresses a question I didn't ask above. How to deal with properties other than the contents. I could, for example, start each file with a bit of yaml or xml or what have. I'm going to store the other properties using Subversion's property mechanism. However, I want to minimize the chance of name conflicts, so I provide a setting for a prefix. As a default, I'll use "ws:" (for Wistle::Svn, I guess).
extension
The extension of files that will be included in the update. This is certainly not necessary, but it works for me.
class Wistle::Config
  OPTS = [:uri, :username, :password,
          :body_property, :property_prefix, :extension]
  
  attr_accessor *OPTS
  
  def initialize
    # Set defaults
    @body_property = 'body'
    @property_prefix = 'ws:'
    @extension = 'txt'
  end
end

The OPTS constant is because I’ll re-use this list momentarily. I also want to be able to set some of these settings in database.yml, if it’s available. At the end of the initialize method, I add:

if Object.const_defined?("Merb")
  f = "#{Merb.root}/config/database.yml"
  env = Merb.env.to_sym || :development
end

if f
  config = YAML.load(IO.read(f))[env]
  OPTS.each do |field|
    config_field = config["svn_#{field}"] || config["svn_#{field}".to_sym]
    if config_field
      instance_variable_set("@#{field}", config_field)
    end
  end
end

Now, in database.yml, I can add :svn\_username: my\_login. That is, I can prefix any of the fields defined above with ‘svn_’. I’m not sure that sentence made sense.

Revision 42


Updating

Hey, it’s time for the central code, sync the database from the repository. If you’re particularly interesting in using Subversion’s SWIG bindings, one of the more interesting parts of this project might be the Wistle::Fixture library, which I use to generate Subversion repository “test fixtures”, but which I won’t cover here. Incidentally, if you are so inclined, the test cases included in Subversion’s repository. The actual code isn’t commented, but it’s “fairly” readable.

I’m putting the syncing code in its own class, because, well, that’s what my brain says I should do. The only initialization argument it requires is a the appropriate row in Wistle::Model. It only provides one other public method, #run, which runs the updating, going through the following steps

  1. Connect to the repository. See #connect, #context, and #callbacks private methods. Most of what’s going on here is dealing with different authentication options. Honestly, I don’t have a solid understanding of this bit.
  2. Check if we have updated to the last revision already. If so, quit.
  3. Run the repository’s #log method. This gets information about each commit, starting with the most recent; I’ve specified to get revisions only through the last update (stored in Wistle::Model#revision). Store this information in the variable changesets.
  4. Reverse changesets and run #do_changeset on each element.

SvnSync#do_changeset actually updates the database. For each change in the changeset:

  1. It determines whether the change was one I’m interested in, and if so, what kind of change. There are three types of interest: moves, modifications/adds, and deletes.
  2. Moves are the most problematic, mostly because Subversion doesn’t really have a “move” concept. Instead were looking for a node that was copied for another node in the same changeset that the latter node was deleted. In this case, as opposed to “just a copy”, I don’t want to create a new entry in the database, but rather modify the path of the existing entry. Why? To not invalidate foreign keys, i.e. to keep comments listed with the article after it’s renamed.
  3. Next, do any deletes. It’s possible we won’t find the node to delete, either because it was actually a move, or because it refers to a file we don’t keep track of. In that case, just continue on with the next delete.
  4. Modify/Add/Replace: In all these cases, what I want is to update the content of the appropriate row, creating a new row if needed. The private method #get is responsible for finding the appropriate row, based on the path. This updates contents and other properties, both those specified by the revision and the actual node properties.
  5. When all changes have been processed, update the Wistle::Model row with the new current revision.

If you aren’t familiar with the SWIG bindings, the code will probably be a bit confusing, but hopefully the outline above will help clarify what’s going on. More to the point, I hope it illustrates that ORM’s are not the only available storage mechanisms for web apps.

So, the code (yikes):

module Wistle
  class SvnSync
    def initialize(model_row)
      @model_row = model_row
      @model = Object.const_get(@model_row.name)
      @config = @model_row.config
    end
    
    # There is the possibility for uneccessary updates, as a database row may be
    # modified several times (if modified in multiple revisions) in a single
    # call. This is inefficient, but--for now--not enough to justify more
    # complex code.
    def run
      connect unless @repos
      return false if @repos.latest_revnum <= @model_row.revision
      
      changesets = [] # TODO Maybe revision + 1
      @repos.log(@path_from_root, @repos.latest_revnum, @model_row.revision, 0, true, false
          ) do |changes, rev, author, date, msg|
        changesets << [changes, rev, author, date]
      end
      
      changesets.sort{ |a, b| a[1] <=> b[1] }.each do |c| # Sort by revision
        do_changset(*c)
      end
      return true
    end
    
    private
    
    # Get the relative path from config.uri
    def short_path(path)
      path = path[@path_from_root.length..-1]
      path = path[1..-1] if path[0] == ?/
      path.sub!(/\.#{@config.extension}\Z/, '') if @config.extension
      path
    end
    
    # Get an object of the @model, by path.
    def get(path)
      @model.first(:path => short_path(path))
    end
    
    # Create a new object of the @model
    def new_record
      @model.new
    end
    
    # Process a single changset.
    # This doesn't account for possible move/replace conflicts (A node is moved,
    # then the old node is replaced by a new one). I assume those are rare
    # enough that I won't code around them, for now.
    def do_changset(changes, rev, author, date)
      modified, deleted, copied = [], [], []
      
      changes.each_pair do |path, change|
        next if short_path(path).blank?
        
        case change.action
        when "M", "A", "R" # Modified, Added or Replaced
          modified << path if @repos.stat(path, rev).file?
        when "D"
          deleted << path
        end
        copied << [path, change.copyfrom_path] if change.copyfrom_path        
      end
          
      # Perform moves
      copied.each do |copy|
        del = deleted.find { |d| d == copy[1] }
        if del
          # Change the path. No need to perform other updates, as this is an
          # "A" or "R" and thus is in the +modified+ Array.
          record = get(del)
          record.update_attributes(:path => short_path(copy[0])) if record
        end
      end
      
      # Perform deletes
      deleted.each do |path|
        record = get(path)
        record.destroy if record # May have been moved or refer to a directory
      end
      
      # Perform modifies and adds
      modified.each do |path|
        next if @config.extension && path !~ /\.#{@config.extension}\Z/
        
        record = get(path) || new_record
        svn_file = @repos.file(path, rev)
        
        # update body
        record.__send__("#{@config.body_property}=", svn_file[0])
    
        # update node props -- just find any props with property_prefix
        svn_file[1].each do |name, val|
          if name =~ /\A#{@config.property_prefix}(.*)/
            record.__send__("#{$1}=", val)
          end
        end
        
        # update revision props
        record.path = short_path(path)
        record.svn_updated_at = date
        record.svn_updated_rev = rev
        record.svn_updated_by = author
        if record.new_record?
          record.svn_created_at = date
          record.svn_created_rev = rev
          record.svn_created_by = author
        end
        record.save
      end
      
      # Update model_row.revision
      @model_row.update_attributes(:revision => rev)
    end
    
    def connect
      @ctx = context
     
      # This will raise some error if connection fails for whatever reason.
      # I don't currently see a reason to handle connection errors here, as I
      # assume the best handling would be to raise another error.
      @repos = ::Svn::Ra::Session.open(@config.uri, {}, callbacks)
      @path_from_root = @config.uri[(@repos.repos_root.length)..-1]
      return true
    end
    
    def context
      # Client::Context, which paticularly holds an auth_baton.
      ctx = ::Svn::Client::Context.new
      if @config.username && @config.password
        # TODO: What if another provider type is needed? Is this plausible?
        ctx.add_simple_prompt_provider(0) do |cred, realm, username, may_save|
          cred.username = @config.username
          cred.password = @config.password
        end
      elsif URI.parse(@config.uri).scheme == "file" 
        ctx.add_username_prompt_provider(0) do |cred, realm, username, may_save|
          cred.username = @config.username || "ANON"
        end
      else
        ctx.auth_baton = ::Svn::Core::AuthBaton.new()
      end
      ctx
    end
  
    # callbacks for Svn::Ra::Session.open. This includes the client +context+.
    def callbacks
      ::Svn::Ra::Callbacks.new(@ctx.auth_baton)
    end
  end
end

Time to hook the pieces together.

An update to Wistle::Svn, to add the .sync class method to including models:

module Wistle::Svn
  module ClassMethods
    def sync
      Wistle::SvnSync.new(svn_repository).run
    end
  end
end

In Article, after including DataMapper::Resource, include Wistle::Svn.

Run rake dm:db:automigrate to add in Wistle::Svn’s properties to Article.

And, now, to make the sync’s happen. I’m going to go with one sync for every Request, for now. This may prove to be terribly inefficient (the connect code to the Subversion repository is not cheap), but if so, I’ll change it later.

So, a nice before filter in Application should do the trick.

class Application < Merb::Controller
  before :sync_articles
  
  protected

  def sync_articles
    Article.sync
  end
end

Finally, I’m going to remove all methods and associated views from Articles that can update an Article, i.e. new, create, edit, update and destroy.

And, well…that’s it. Well, you do need to set up in appropriate Wistle::Config in Article (or in database.yml).

Revision 48