Fri, 2008 Sep 12

Wistle Part 5: Multi-Site Views

Posted in Wistle at 09:00 by jmorgan

Multi Site views and public files

Parts 1 2 3 4

Finally, I want to be able to create the views, and do so using haml, erb, etc, and store them in Subversion, and have different views (and that means different stylesheets, etc) for each site. That involves three major actions:

  1. Decide how to organize the per-site files.
  2. Figure out how to get those files updates.
  3. Tell Merb where to find the files.

My organization goes like this

/app
            /sites
              /SITENAME
                /views
                (possibly /helpers here in the future).
          /public
            /sites
              /SITENAME
          

The file updates is more tricky. An easy option would be to use svn:externals. This could be a hassle, though, if the Wistle app is hosting a lot of sites.

Instead, I'm going to update SiteSync to also update the views and public files. This will be done by deleting the current directory (when there has been an update) and exporting the most recent files. First thing, a few more properties in Site:

class Site
            property :views_uri, Text
            property :views_revision, Integer, :default => 0
            property :public_uri, Text
            property :public_revision, Integer, :default => 0
          
            # A URI based off of contents_uri to use as the base for building URI's
            # for public and views
            def base_uri
              ary = contents_uri.split("/")
              ary.pop if ary[-1].blank?
              ary.pop
              ary.join("/") + "/"
            end
          
            def views_uri
              @views_uri || (base_uri + "app/views")
            end
          
            def public_uri
              @public_uri || (base_uri + "public")
            end
          end
          

The additional methods give me some default URIs based on the contents_uri. This is based on my preferred organization.

SiteSync is where the big updates happen. Basically, I add some methods to check if there are updates to the views or public files. If so, the current are deleted and an export is done. This means that the files could be inaccessible for a few seconds (depending on connection speed and repository size). I'm also not sure if/when reboots would be required in a production environment.

class SiteSync
            def run
              super
              export_views
              export_public
            end
          
            def export_views
              export("views", File.join(Merb::root, "app", "sites", @model_row.name, "views"))
            end
          
            def export_public
              export("public", File.join(Merb::root, "public", "sites", @model_row.name))    
            end
          
            def export(name, export_path)
              export_path = File.expand_path(export_path)
              uri = @model_row.__send__("#{name}_uri")
              rev = @model_row.__send__("#{name}_revision")
              connect(uri)
              return false if @repos.latest_revnum <= rev
              updated_rev = @repos.stat(uri[(@repos.repos_root.length)..-1], @repos.latest_revnum).created_rev
              return false if updated_rev <= rev
          
              FileUtils.mkdir_p(export_path)
              FileUtils.rm_rf(export_path)
              @ctx.export(uri, export_path)
              @model_row.update_attributes("#{name}_revision" => @repos.latest_revnum)
              true
            end
          end
          

Method #export is the workhorse here, and the bulk is checking if we really need to do any work and that the path is ready for the export. The actual @ctx.export line is anticlimatic.

This does require some updates in Wistle::SvnSync because we may be accessing multiple repositories within on instance. In short, #connect and #context both need to accept a uri option rather than relying on @config.uri. Probably some refactoring is in order (move all connection work to another class, for example).

Telling Merb where to find the files

This turns out to be surprisingly easy, so long as the "correct" helper methods are used. On that note, I'll look first at the public files. This requires two override methods in GlobalHelpers.

module Merb
            module GlobalHelpers
              def image_tag(img, opts ={})
                opts[:path] ||= "/sites/#{@site.name}/images/"
                super(img, opts)
              end
          
              def asset_path(asset_type, filename, local_path = false)
                path = super(asset_type, filename, local_path)
                "/sites/#{@site.name}#{path}"
              end
            end
          end
          

image_tag generates a :path option to the site-specific image directory, unless :path has been set manually. It then calls super to let the original method do the real work.

asset_path is similar but, well, backwards. This is called by js_include_tag and css_include_tag to generate the appropriate path. I call super to let the parent method again do the real work. Then I prepend its result with the site-specific public path.

Another option is if I were using Lighttpd or Apache or something similar to serve public files, I could use the web server's url rewriting capabilities.

The approach I take for the views is fun. In Application, I add this little jewel:

class Application < Merb::Controller
            before :update_template_roots
            after :revert_template_roots
          
            def update_template_roots
              self.class._template_roots = [
                ["#{Merb.root}/app/views", :_template_location],
                ["#{Merb.root}/app/sites/#{@site.name}/views", :_template_location]
              ]
            end
          
            def revert_template_roots
              self.class._template_roots = [
                ["#{Merb.root}/app/views", :_template_location]
              ]
            end
          end
          

Is that an ugly hack or what? Surely there's a better way than back and forth modifying a class variable. Please? Well, there probably is, but I don't know Merb's internals well enough.

The key is the class method (I believe representing a class variable), _template_roots . If I understand it all correctly, this is used by render to determine possible base paths and what method to use with that path. So, with each request to render, I tack on the current site's view path as a possible root, call super, then revert to the default. Why this back and forth? Because one request could be directly followed by a request for a different Site.

I half expect to be beaten in my sleep for that one. But it works.

Revision 79

Conclusion

Of course, this is just the starting point, but it's met my goals, and I hope it's illustrated both some basics of Merb and DataMapper as well as how these can be used to interact with data that is not stored in an relational database. After all, great frameworks and libraries can really free us to focus on the important bits, but they can also make it difficult to see all the possibilities.

After that cheesy statement, here's a few pieces I'd like to expand Wistle with in the future:

  • Tags (now supported)
  • Search
  • Date links (i.e. /2008 gets all articles from 2008)
  • RSS/Atom
  • A sync action (for use by, e.g. subversion hooks; now supported)
  • Pagination
  • Per-site Helpers (maybe; I've debated whether there's any likely value in this)
  • Better support for STI (I've played with this a bit)

Finally, it's worth mentioning that my intent with these articles is illustrative and/or tutorial, rather than to start a "project". That is, I hope this helps people who are writing their own blog or similar application. However, should you decide to use Wistle, that's great, and I'd be happy to receive bug reports, feature requests, etc. Whether I will do anything with them probably depends on the day.

Fri, 2008 Sep 05

Wistle Part 4: Multi-Site Subversion Models

Posted in Wistle at 09:00 by jmorgan

Multi Site models

Ah, point #4, multiple sites hosted on one Wistle instance. I'm not going to create additional "library" functionality to support this. After all, this is getting pretty application-specific. But, I am going to take advantage of the existing Wistle library.

The key point here is going to be a Site model, that: a) Articles belong to; and b) takes over storing per-site configuration. In essence, it will replace Wistle::Config. The key is that "site-wide" configuration will subsitute for model-wide configuration. So, let's start with the Site model.

class Site
            include DataMapper::Resource
          
            has n, :articles
          
            property :id, Integer, :serial => true
            property :name, String, :unique => true, :nullable => false
            property :domain_regex, String
          
            # Subversion
            property :contents_uri, Text
            property :contents_revision, Integer, :default => 0
            property :username, String
            property :password, String
            property :property_prefix, String, :default => "ws:"
            property :extension, String, :default => "txt"
          
            # Content Filters
            property :article_filter, String
            property :comment_filter, String
          
            # Timestamps
            property :created_at, DateTime
            property :updated_at, DateTime
          end
          

Here's the properties (and has n, :articles). Note that the properties under the "Subversion" heading are pretty close to the instance variables of Svn::Config. Also, notice contents_uri and contents_revision. These match with Config's uri and revision. Why the prefix? Because I want to able to use a different uri (possibly in another repo) for views and public files. But that is for the next section. I could set up username, etc this same way; I won't for now, because I have no use for doing so. If I were, however, I would probably create yet another model, called "Config" or something that belongs to a Site, with a role property. Like I said, it's not needed for now, so I won't bother.

The contents_* fields could create a problem though, because Wistle::SvnSync expects different names. A simple solution is some (not-quite) aliasing:

class Site
            def uri
              @contents_uri
            end
          
            def revision
              @contents_revision
            end
          
            def revision=(rev)
              attribute_set(:contents_revision, rev)
            end
          
            def body_property
              :body
            end
          end
          

The revision= is also used by Wistle::SvnSync, and body_property is another configuration option that SvnSync expects. With body_property, there's only one option, at least so long as I only use the one model (Article). So, body_property always returns :body. I'll show how all this hooks into SvnSync in a moment. Before that, though, a bit about the :domain_regex property.

Wistle is not designed to be user-friendly in the traditional sense, except when the user is defined as me. For example, adding Sites, deleting Comments, etc. must, at this point, be done through a console. That's great by me, but for someone without programming experience, Wistle would probably not be a good choice. Another example is the domain regex property. It's used by Site.by_domain (below) to find a site based on a domain. Except, as it's name implies, domain_regex is a regular expression. Great for me, might be less attractive to others.

class Site
            class << self
              # Find a Site by domain regex, prefer longest match.
              def by_domain(val)
                possible = []
          
                # Find matching Sites
                Site.all.each do |s|
                  r = Regexp.new(s.domain_regex.to_s, true)
                  m = r.match(val)
                  if m
                    possible << [s, m[0].length] 
                  end
                end
          
                      # Sort for longest match.
                possible.sort!{ |a, b| b[1] <=> a[1] }
                possible[0] ? possible[0][0] : nil
              end
            end
          end
          

I no longer need to include Wistle::Svn in the Article model, but I do need to add in the properties that Wistle::Svn took care of.

class Article
              # Subversion-specific properties
              property :path, String
              property :svn_created_at, DateTime
              property :svn_updated_at, DateTime
              property :svn_created_rev, String
              property :svn_updated_rev, String
              property :svn_created_by, String
              property :svn_updated_by, String
          end
          

I also update how Filters works to deal with the *_filter properties. To utilize these properties, in Article and Comment, I change the :filter option of the body property to set :default => :site . This tells the Filters::Resource module to use the Site model to determine default filters. In Comment, I also add a method #site, because Filters may try to call this method.

class Comment
            def site
              @article.site
            end
          end
          

Now, you may have noticed a few weird methods that didn't do much in SvnSync, partically get and new_record. Here's where they come in. To use SvnSync with the new Site model (instead of the Wistle::Model Model), a few things have to change. First, Site doesn't have a config method, pointing to a Wistle::Config object. It does, however, respond to the the same methods as a Config object. Second, when creating or getting the content, we need to scope by Site. What to do? Inherit Wistle::SvnSync and override a few key methods.

class SiteSync < Wistle::SvnSync
            def initialize(model_row)
              @model_row = model_row
              @model = Article
              @config = model_row
            end
          
            # Get an Article by site and path.
            def get(path)
              Article.first(:site_id => @model_row.id, :path => short_path(path))
            end
          
            def new_record
              @model.new(:site_id => @model_row.id)
            end
          end
          

Awesome-sauce.

Now, just hook in Site to SiteSync and all the ugly work is done!

class Site
            def sync
              SiteSync.new(self).run
            end
          
            class << self
              def sync_all
                Site.all.each do |site|
                  site.sync if site.contents_uri
                end
              end
            end
          end
          

The controllers need a few updates to filter by Site (and the application view needs one for the list of recent articles, but I'm ignoring views). Application needs updates first:

class Application < Merb::Controller
            before :sync_articles
            before :choose_site
          
            protected
          
            def sync_articles
              Site.sync_all
            end
          
            def choose_site
              @site = Site.by_domain(request.host)
            end
          end
          

I change the syncarticles method to use Site.sync_all. Then, I add a choosesite before filter to assign @site, using Site.by_domain (request.host is the full host name including any port number).

One other bit I want to do that might as well fall in this section is folders as categories. My approach here is definately I reflection of my personal organizations styles; in addition, the code is probably not a good solution.

Anyway, I want each top-level folder under the articles directory to represent a category; I want to be able to add additional subfolders without them creating additional categories. I also prefer to use only one category per article, with additional "categorization" through tags (which I will not be implementing in this already way too long article).

To do so, I need to add a category property, which I'll update with a before :save hook

class Article
            property :category, String
            before :save, :update_category
          
            def update_category
              if attribute_dirty?(:category) || @category.nil?
                attribute_set(:category, @path.split('/')[0]) if @path
              end
            end
          end
          

I then add two new methods to Site, one to get a list of categories, the second to find published articles by category.

class Site
            def categories 
              repository.adapter.query('SELECT category FROM articles WHERE site_id = ? group by category order by category', self.id)
            end
          
            def published_by_category(category = nil, options = {})
              conditions = "datetime(published_at) <= datetime('now') "
              if category
                conditions << "and path like '#{category}/%' "
              end
              Article.all(options.merge(
                    :conditions => [conditions + "and site_id = ?", self.id],
                    :order => [:published_at.desc]))
            end
          end
          

Now is also a nice time for some routing updates, both to take advantage of categories, and for "permalink" paths for the Articles. I'm taking advantage of Merb's support for regular expressions in routes:

Merb::Router.prepare do |r|
            r.resources :articles do | article |
              article.resources :comments
            end
          
            r.match('/').to(:controller => 'articles', :action =>'index')
          
            r.match(%r[/categories/(.*)]).to(
               :controller => 'articles', :action => 'index', :category => '[1]')
          
            r.match(%r[/(.*)]).to(
               :controller => 'articles', :action => 'show', :path => '[1]')
          end
          

The articles resource remains to support comments, although it is probably not needed.

The last match is the "permalink" one, so that there's not "articles" or other prefixes in permalinks; doing this obviously depends on the particular application.

And the Articles controller gets a couple of updates to take advantage of these routes:

class Articles < Application
            # provides :xml, :yaml, :js
          
            def index
              @articles = @site.published_by_category(params[:category])
              display @articles
            end
          
            def show
              if params[:path]
                @article = Article.first(:path => params[:path], :site_id => @site.id)
              else
                @article = Article.first(:id => params[:id], :site_id => @site.id)
              end
          
              raise NotFound unless @article
              display @article
            end
          end
          

Revision 73

And, next, the views...

Fri, 2008 Aug 29

Wistle Part 3: Filters

Posted in Wistle at 09:00 by jmorgan

Body Filters

My next big step is to filter the content, so that Article#html, for example, is the body property filtered through Markdown. So, I created a Filters module, in lib/filters.rb . I won't show the code here, but I am going to discuss my approach about. Of course, plenty of other packages, such as Mephisto, have already addressed this issue and done so well. But, a big part of this project is for my own personal enjoyment. And I want to write random code, eh?

The crux of my approach is that all the filtering libraries I'm accustomed can be used as such: FilterClass.new(content).tohtml. So, the Filters module attempts to initialize an object of the specified class and call #tohtml. If needed, the module tries to require the appropriate file or gem.

A constant Hash is defined, with each pair in the format: NameSpecifiedInModel => [[require_name, ClassName], [backup_require_name, BackupClassName]]

For example:

{
            'Smartypants' => [['rubypants', 'RubyPants']],
            'Markdown' => [['rdiscount', 'RDiscount'], ['bluecloth', 'BlueCloth']]
          }
          

In the model, this is set up by include Filter::Resource (probably, not the most useful name). Then, properties can be set to format with an option :format. The syntax when defining a property is:

property :prop_name, :filter => {:to => :filtered_prop, :with => :filter_column, :default => "DefaultFilter"}
          

(:with and :default are optional, though at least one should be specified.)

If the properties in :to and :with have not yet been defined, they will be defined automatically. Hence, if you want to specify any options with this, they should be defined before the filtered property.

This is similar to Wistle::Svn in that it extends the property methods and stores information in a class instance variable. It also adds a method process_filters, called by a before :save hook, that updates the to property.

So, in Article and Comment, we will now have:

property :html, Text, :lazy => false
          property :body, Text,
                   :filter => {:to => :html, :with => :filters, :default => %w{Markdown Smartypants}}
          

I also update views to use #html instead of #body.

There's room for design debate here. One of the things I like about DataMapper is that the programmer explicitly declares properties. But, here, Filters is doing a lot behind the scenes, including possibly declaring some properties. Still, the design "feels right" to me.

Revision 54

Fri, 2008 Aug 22

Wistle Part 2: Subversion Storage -- Single Site

Posted in Wistle at 09:00 by jmorgan

One Site Subversion

So, we have a more or less working blog application using our friends Merb and DataMapper. That's great and if you were looking for a Merb/DataMapper tutorial, hopefully the first entry helped. Still, the central goal is to store the articles in a source-control repository. So, let's get going on that.

For now, I'm going to ignore the multi-site requirement, for two reasons: I want to first focus on just interacting with Subversion, without extra distractions; and I happen to know that I want to write the library that will be covered in this section for other uses.

Before diving into the code, I want to examine three "big" design questions.

  1. How much abstraction? I could create a library that abstracts so that it presents a unified API for multiple SCMs. But I won't. Again, it complicates things. Also, the "Subversion" stuff will not be accessed from many points within the app, so I feel fairly safe with the possibility of future "API changes" if I decide to abstract it later.

  2. What SCM? This is an easy call for me. I'm accustomed to Subversion, including having some experience using its SWIG bindings. I've played a little with git, but at some point I have to cut off the "learning new things on this one app".

  3. How to interact with Subversion. Here's some possibilities:

  • Command line/backticks: I'm not entirely opposed to this, but since there are better options, no reason to look here.

  • RSCM: This may no longer be true, but from my memory of RSCM, it more or less uses the command line functions. It offers the benefit of being abstracted, but like the command line, it means working with a working copy. Sure, I could have one checked out in a tmp directory, but I don't care for that idea.

  • post-commit-hooks: This could be pretty useful, and I could see extending Wistle to accept, say, XML or YAML sent by such a hook (it probably would be fairly simple). One downside is that it requires permissions to modify the hook. I don't actually anticipate this would be a common problem. The second downside is then I don't get to play :-( Oh, and the third is getting pre-existing data.

  • CSCM: Theoretically, along the lines of RSCM; so far, it's only for Subversion, but it uses the SWIG bindings. Downsides are that it's not under active development (with occassional exceptions on my personal copy, I suppose), and that it's geared towards a different purpose.

  • Using Subversion SWIG bindings directly: Yay! This allows a bit more control and focus than using one of the libraries, and since we don't need a lot, I don't think this is reinventing the wheel. Or, maybe, I'm just using an earlier wheel, down one level of abstraction. The big downside to this is that installing the SWIG binding can be a massive pain unless you have a distro that has a nice package; it may well be impossible on Windows...

  • I'll throw in one more, which is using the svn or Subversion DeltaV--or whatever the correct name is--protocols directly. Neither protocol is particularly frightening, but that path would still be a lot of extra work for probably minimal gain. It also has the downside that you have be running either svnserve or an http server.

Questions answered, I'll add a few more design thoughts before diving into the code.

One approach, one which I've already tried, is to skip the relational database altogether. This is certainly possible, and with caching of the generated pages would be fast enough for my purposes. However, custom text searches are a problem, requiring loading all the current data, then performing the search in Ruby.

Since this is not scalable, my solution is for the application to actually retrieve data from a relational database which mirrors the current state of repository. Therefore, the main functionality I need to add is to update the database after any commit to the repository. Initially, I had the update procedure run at every request but for most requests, this only checked the current revision number. Other options would be a cron job, an "update" button on the site, etc. My current solution is an action, "/articles/sync_all", and a post-commit hook that wget's that page.

To derive this updating functionality, I want to include a Module in the appropriate models. I'll call it Wistle::Svn because I can't think of a more useful name. I'll save the file as "lib/wistle/svn.rb".

The first thing (yes, finally) I want this module to is add some properties to any model which includes it. So, let's start with:

module Wistle
            module Svn
              class << self
                def included(klass) # Set a few 'magic' properties
                  klass.property :path, String
                  klass.property :svn_created_at, DateTime
                  klass.property :svn_updated_at, DateTime
                  klass.property :svn_created_rev, String
                  klass.property :svn_updated_rev, String
                  klass.property :svn_created_by, String
                  klass.property :svn_updated_by, String
                end
              end
            end
          end
          

Path will store the relative path in the repository. It will also serve as a permalink later on (Note, path, and the *_by's were added in later commits than the others. I just went and missed them).

The others are your basic created/updated timestamps except they will be kept in sync with the Subversion repo. This allows for having an #updated_at in the database without interfering with the auto timestamp functionality, etc. Also, we'll keep track of the revisions. #svn_created_rev is for information only; #svn_updated_rev will be important to the sync method. So, every model that includes Wistle::Svn gets these properties, stored in the relational database. Of course, I'm now assuming that this will only be included in a class that include DataMapper::Resource.

Next up, I want to be able to specify, in the model, which is the "body" property; that is, what property in the relational db should store the contents of the file in Subversion. So, I need to accept an option to the property class method. But before I get there, this introduces a problem. How should I store this configuration data?

If you check out ActiveRecord::Base, for example, you'll see a lot of lines like this:

cattr_accessor :table_name_prefix, :instance_writer => false
          @@table_name_prefix = ""
          

I'm no expert on Rails internals, but I've spent a decent amount of time going through ActiveRecord in particular and this seems to be the preferred Rails' method for doing class-wide configuration. cattr_accessor is a Rails addition to Ruby (Merb has it as well). Having spent time in ActiveRecord, my first inclination was to use this. And as a methodology, it works pretty well when your inheriting your functionality. Class variables in an included module doesn't work (at least not in any way I understand).

Instead, I decided to just use a configuration class. It's simpler and cleaner, in my opinion, and doesn't have the inclusion problem mentioned above (I'll get to how that works in a bit). So, let's start defining that class:

module Wistle
            class Config
              attr_accessor :body_property
          
              def initialize
                # Set defaults
                @body_property = 'body'
              end
            end
          end
          

All it does, for now, is define an instance variable, @body_property (the name of the property in the database that stores the contents of the file) and use :attr_accessor to create the getter and setter methods.

But our model needs access to the Config data. Again, I could try to make it a class variable, but there's still the problem with class variables in modules. Fortunately, in Ruby, everything is an Object. So, a class can have instance variables.

module Wistle::Svn
            module ClassMethods
              def config
                @config ||= Config.new
              end
            end
          end
          

Easy enough? I also need to extend the model class with the methods in ClassMethods when the module is included. This is a popular Rails trick. To the Wistle::Svn.included method, add the line klass.extend(ClassMethods). Now, if Article includes Wistle::Svn, we can access the config via #config (in the class), and self.class.config (from instances). And, I can always add custom methods for configuration options that are more likely to be accessed. Now, then, I can update DataMapper's property class method to accept an option saying that a particular property stores the file's contents.

module Wistle::Svn
            module ClassMethods
              def property(name, type, options = {})
                if options.delete(:body_property)
                  config.body_property = name.to_s
                end
          
                super(name, type, options)
              end
            end
          end
          

Using this would be something like:

class Article
            property :contents, :body_property => true
          end
          

I'll look at what Wistle::Svn does with this information when I discuss syncing the databases. Hopefully, I will get to that point eventually.

As an aside, since I don't anticipate any instance methods in the Wistle::Svn module, I could drop the ClassMethods module and use extend instead of include in my model. But I've chosen the include for consistency with DataMapper.


The wistle_models table

Before I can get to syncing, the database will need to know the version of its "working copy", as it were. Except, I suppose, for the first update. I reckon I need another table in the database that keeps track of the current revision for each Wistle::Svn model. So, 'lib/wistle/model.rb':

module Wistle
            class Model # Table is named wistle_models.
              include DataMapper::Resource
          
              property :id, Integer, :serial => true
              property :name, String
              property :revision, Integer
            end
          end
          

And this file needs to be required in 'lib/wistle.rb'. Just for fun, let's run rake dm:db:autoupgrade. Alas, no luck, the new model doesn't migrate. There's a good reason why, none of the Wistle module is required when running Merb (As an aside, it just seems more reasonable to me to include Wistle::Model in the Wistle lib instead of directly in the models directory). Add another depencency in init.db, but there's a gotcha here. This dependency should not be declared until after use_orm :datamapper, because it depends on DataMapper being loaded.

use_orm :datamapper
          dependency 'lib/wistle.rb'
          

Awesome. I guess. You can run that migration now and it should work. And now let's get our Subversion-y models talking to this model.

module Wistle::Svn
            module ClassMethods
                def svn_repository
                return @svn_repository if @svn_repository
          
                @svn_repository = Wistle::Model.first(:name => self.name)
                @svn_repository ||= Wistle::Model.create(:name => self.name, :revision => 0)
                @svn_repository.config = config
                @svn_repository
              end
            end
          end
          

Again, I use the Class instance variable trick. I only want to set up @svn_repository when I have to, so if it's already available, I just return it. Next, I try to get a row in wistle_models that is set up for the current. If no luck there, I create such a row. Finally, I give this Model instance direct access to the Subversion-ized Models @config. Which means one more update to Wistle::Model: attr_accessor :config.

Before hitting the update code, I want to flesh out the Wistle::Config class. The other three configuration elements I want are

uri
The uri of the folder in the Subversion repository where the model's contents are stored (file:///path/to/repo/path/to/folder, svn://example.com/path/to/folder, etc.)
username
The Subversion username to use, if needed.
password
The Subversion password to use, if needed.
property_prefix
This addresses a question I didn't ask above. How to deal with properties other than the contents. I could, for example, start each file with a bit of yaml or xml or what have. I'm going to store the other properties using Subversion's property mechanism. However, I want to minimize the chance of name conflicts, so I provide a setting for a prefix. As a default, I'll use "ws:" (for Wistle::Svn, I guess).
extension
The extension of files that will be included in the update. This is certainly not necessary, but it works for me.
class Wistle::Config
            OPTS = [:uri, :username, :password,
                    :body_property, :property_prefix, :extension]
          
            attr_accessor *OPTS
          
            def initialize
              # Set defaults
              @body_property = 'body'
              @property_prefix = 'ws:'
              @extension = 'txt'
            end
          end
          

The OPTS constant is because I'll re-use this list momentarily. I also want to be able to set some of these settings in database.yml, if it's available. At the end of the initialize method, I add:

if Object.const_defined?("Merb")
            f = "#{Merb.root}/config/database.yml"
            env = Merb.env.to_sym || :development
          end
          
          if f
            config = YAML.load(IO.read(f))[env]
            OPTS.each do |field|
              config_field = config["svn_#{field}"] || config["svn_#{field}".to_sym]
              if config_field
                instance_variable_set("@#{field}", config_field)
              end
            end
          end
          

Now, in database.yml, I can add :svn_username: my_login. That is, I can prefix any of the fields defined above with 'svn_'. I'm not sure that sentence made sense.

Revision 42


Updating

Hey, it's time for the central code, sync the database from the repository. If you're particularly interesting in using Subversion's SWIG bindings, one of the more interesting parts of this project might be the Wistle::Fixture library, which I use to generate Subversion repository "test fixtures", but which I won't cover here. Incidentally, if you are so inclined, the test cases included in Subversion's repository. The actual code isn't commented, but it's "fairly" readable.

I'm putting the syncing code in its own class, because, well, that's what my brain says I should do. The only initialization argument it requires is a the appropriate row in Wistle::Model. It only provides one other public method, #run, which runs the updating, going through the following steps

  1. Connect to the repository. See #connect, #context, and #callbacks private methods. Most of what's going on here is dealing with different authentication options. Honestly, I don't have a solid understanding of this bit.
  2. Check if we have updated to the last revision already. If so, quit.
  3. Run the repository's #log method. This gets information about each commit, starting with the most recent; I've specified to get revisions only through the last update (stored in Wistle::Model#revision). Store this information in the variable changesets.
  4. Reverse changesets and run #do_changeset on each element.

SvnSync#do_changeset actually updates the database. For each change in the changeset:

  1. It determines whether the change was one I'm interested in, and if so, what kind of change. There are three types of interest: moves, modifications/adds, and deletes.
  2. Moves are the most problematic, mostly because Subversion doesn't really have a "move" concept. Instead were looking for a node that was copied for another node in the same changeset that the latter node was deleted. In this case, as opposed to "just a copy", I don't want to create a new entry in the database, but rather modify the path of the existing entry. Why? To not invalidate foreign keys, i.e. to keep comments listed with the article after it's renamed.
  3. Next, do any deletes. It's possible we won't find the node to delete, either because it was actually a move, or because it refers to a file we don't keep track of. In that case, just continue on with the next delete.
  4. Modify/Add/Replace: In all these cases, what I want is to update the content of the appropriate row, creating a new row if needed. The private method #get is responsible for finding the appropriate row, based on the path. This updates contents and other properties, both those specified by the revision and the actual node properties.
  5. When all changes have been processed, update the Wistle::Model row with the new current revision.

If you aren't familiar with the SWIG bindings, the code will probably be a bit confusing, but hopefully the outline above will help clarify what's going on. More to the point, I hope it illustrates that ORM's are not the only available storage mechanisms for web apps.

So, the code (yikes):

module Wistle
            class SvnSync
              def initialize(model_row)
                @model_row = model_row
                @model = Object.const_get(@model_row.name)
                @config = @model_row.config
              end
          
              # There is the possibility for uneccessary updates, as a database row may be
              # modified several times (if modified in multiple revisions) in a single
              # call. This is inefficient, but--for now--not enough to justify more
              # complex code.
              def run
                connect unless @repos
                return false if @repos.latest_revnum <= @model_row.revision
          
                changesets = [] # TODO Maybe revision + 1
                @repos.log(@path_from_root, @repos.latest_revnum, @model_row.revision, 0, true, false
                    ) do |changes, rev, author, date, msg|
                  changesets << [changes, rev, author, date]
                end
          
                changesets.sort{ |a, b| a[1] <=> b[1] }.each do |c| # Sort by revision
                  do_changset(*c)
                end
                return true
              end
          
              private
          
              # Get the relative path from config.uri
              def short_path(path)
                path = path[@path_from_root.length..-1]
                path = path[1..-1] if path[0] == ?/
                path.sub!(/\.#{@config.extension}\Z/, '') if @config.extension
                path
              end
          
              # Get an object of the @model, by path.
              def get(path)
                @model.first(:path => short_path(path))
              end
          
              # Create a new object of the @model
              def new_record
                @model.new
              end
          
              # Process a single changset.
              # This doesn't account for possible move/replace conflicts (A node is moved,
              # then the old node is replaced by a new one). I assume those are rare
              # enough that I won't code around them, for now.
              def do_changset(changes, rev, author, date)
                modified, deleted, copied = [], [], []
          
                changes.each_pair do |path, change|
                  next if short_path(path).blank?
          
                  case change.action
                  when "M", "A", "R" # Modified, Added or Replaced
                    modified << path if @repos.stat(path, rev).file?
                  when "D"
                    deleted << path
                  end
                  copied << [path, change.copyfrom_path] if change.copyfrom_path        
                end
          
                # Perform moves
                copied.each do |copy|
                  del = deleted.find { |d| d == copy[1] }
                  if del
                    # Change the path. No need to perform other updates, as this is an
                    # "A" or "R" and thus is in the +modified+ Array.
                    record = get(del)
                    record.update_attributes(:path => short_path(copy[0])) if record
                  end
                end
          
                # Perform deletes
                deleted.each do |path|
                  record = get(path)
                  record.destroy if record # May have been moved or refer to a directory
                end
          
                # Perform modifies and adds
                modified.each do |path|
                  next if @config.extension && path !~ /\.#{@config.extension}\Z/
          
                  record = get(path) || new_record
                  svn_file = @repos.file(path, rev)
          
                  # update body
                  record.__send__("#{@config.body_property}=", svn_file[0])
          
                  # update node props -- just find any props with property_prefix
                  svn_file[1].each do |name, val|
                    if name =~ /\A#{@config.property_prefix}(.*)/
                      record.__send__("#{$1}=", val)
                    end
                  end
          
                  # update revision props
                  record.path = short_path(path)
                  record.svn_updated_at = date
                  record.svn_updated_rev = rev
                  record.svn_updated_by = author
                  if record.new_record?
                    record.svn_created_at = date
                    record.svn_created_rev = rev
                    record.svn_created_by = author
                  end
                  record.save
                end
          
                # Update model_row.revision
                @model_row.update_attributes(:revision => rev)
              end
          
              def connect
                @ctx = context
          
                # This will raise some error if connection fails for whatever reason.
                # I don't currently see a reason to handle connection errors here, as I
                # assume the best handling would be to raise another error.
                @repos = ::Svn::Ra::Session.open(@config.uri, {}, callbacks)
                @path_from_root = @config.uri[(@repos.repos_root.length)..-1]
                return true
              end
          
              def context
                # Client::Context, which paticularly holds an auth_baton.
                ctx = ::Svn::Client::Context.new
                if @config.username && @config.password
                  # TODO: What if another provider type is needed? Is this plausible?
                  ctx.add_simple_prompt_provider(0) do |cred, realm, username, may_save|
                    cred.username = @config.username
                    cred.password = @config.password
                  end
                elsif URI.parse(@config.uri).scheme == "file" 
                  ctx.add_username_prompt_provider(0) do |cred, realm, username, may_save|
                    cred.username = @config.username || "ANON"
                  end
                else
                  ctx.auth_baton = ::Svn::Core::AuthBaton.new()
                end
                ctx
              end
          
              # callbacks for Svn::Ra::Session.open. This includes the client +context+.
              def callbacks
                ::Svn::Ra::Callbacks.new(@ctx.auth_baton)
              end
            end
          end
          

Time to hook the pieces together.

An update to Wistle::Svn, to add the .sync class method to including models:

module Wistle::Svn
            module ClassMethods
              def sync
                Wistle::SvnSync.new(svn_repository).run
              end
            end
          end
          

In Article, after including DataMapper::Resource, include Wistle::Svn.

Run rake dm:db:automigrate to add in Wistle::Svn's properties to Article.

And, now, to make the sync's happen. I'm going to go with one sync for every Request, for now. This may prove to be terribly inefficient (the connect code to the Subversion repository is not cheap), but if so, I'll change it later.

So, a nice before filter in Application should do the trick.

class Application < Merb::Controller
            before :sync_articles
          
            protected
          
            def sync_articles
              Article.sync
            end
          end
          

Finally, I'm going to remove all methods and associated views from Articles that can update an Article, i.e. new, create, edit, update and destroy.

And, well...that's it. Well, you do need to set up in appropriate Wistle::Config in Article (or in database.yml).

Revision 48

Fri, 2008 Aug 15

Wistle Part 1: A Simple Merb Application

Posted in Wistle at 09:00 by jmorgan

So, I decided to create a blogging application. After all, Typo was pretty nice and I've quite happily used (and abused) Mephisto for some time. And of course, there's a thousand other options out there. But, over the last few years, I've developed a wish list, so:

  1. Store the actual articles in a source code repository, ideally Subversion (or maybe git, but I'm much more comfortable with Subversion).
  2. Store views in the same repo as the articles (or, at least, separate from the app itself. I don't have a particularly good reason for this, but point #6 will take care of that).
  3. Views in anything other than liquid. I mean, I just can't stand it. I understand it's purpose and it's great and all, but I wanna' write my view code in Ruby..or PHP..or VBA..or Lisp..or that programming language that's all in whitespace..or... Phhbbt!
  4. By default, set up for multiple sites hosted by a single app.
  5. Easy to add content filters, like Markdown and Textile, but also including my own (This is actually pretty easy in Mephisto, but again, point #6).
  6. I wanted a relatively simple but challenging project to do on my own, so I mostly made up 1-5 to justify it. Hey-o!

I picked Merb for the framework and DataMapper for the ORM, mostly because I've been experimenting with these lately. In addition, they feel more flexible than Rails for doing stuff like points one and two, and because I can't stand the font on the DataMapper site. Hello? WTF is that? "Font-that-looks-like-I- designed-it-in-MS-Paint? While fending off a horde of rabid chihauhas?" Seriously.

Oh, it's a "humanist sans-serif typeface". Good to know.

  • -

Anywho, I thought I'd walk you [insert your name here] through the process, partly because it has some fun, not-normally-tutorial-stuff aspects, partly as yet another intro to the rapidly changing worlds of Merb and DataMapper. Mostly because I just felt like it.

I'm also going to try to do this in discrete stages, adding elements of the above requirements as I go. I think that will work okay. Be agiley and s---.

Repository note

I have set up a project on Google code at http://code.google.com/p/wistle/, for you to actually view the code. I'll also reference particular revisions. However, big note, there's some errors in the code, and some things changed because Merb and DataMapper are changing, and because I'm learning. So, things might not work for you, and what's in these articles may be different from what's in the repository. Hopefully, this won't be too big a problem, because--hopefully--these blog entries will explain enough to start you on the right path to figuring out what's wrong.

Here goes.

Generate the app

NB: If you're trying on Windows...good luck. By the way, if you use a Windows machine, for whatever reason, colinux can be your friend.

  1. Get the gems. See the respective sites (Merb, DataMapper).
  2. merb-gen app wistle (No, I don't know why we're calling this app 'wistle'. It's what I happened to type.

Hopefully, that worked. If not, do some google searches; depending on the day, everything may work or fail terribly.

We need to pause for some configuration. The file we want is config/init.rb. Not a lot of options, but they're well commented. All I'm going to do is uncomment two lines. use_orm :datamapper and use_test :datamapper. This just tells Merb that I'm using DataMapper and RSpec. I assume it uses this knowledge to load appropriate libraries or something. I don't really know.

The other bit-o-configuration we need is database.yml. I like to stick with sqlite for development unless I'm intending to use features specific to a given database.

:development: &defaults
            :adapter: sqlite3
            :database: db/dev.db
          
          :test:
            <<: *defaults
            :database: db/test.db
          
          :production:
            <<: *defaults
            :database: db/pro.db
          

And, before you start scaffolding, create a db directory if there's not one already.

Revision 5

Scaffolding

Yeah, I know, scaffolding sucks, but it's a quick way to get some working code, because this ain't the interesting bit. First, though, what models/resources do I need?

  1. Article, for the actual articles. This one will become interesting later.
  2. Comment, for people to leave comments.
  3. Site, to specify each different site. But I'm going to leave out the multi-site requirement for now.
  4. User? Nope, I'm not going to bother. Since I know that my article editing will ultimately happen on some text editor and be committed to a Subversion repository, I don't have a need for User accounts. I could add it later, say, if I wanted commenters to have accounts or something. Oh, as an aside, after having completed several Rails apps, the only thing interesting about user accounts are the passwords. For some reason, this gets overcomplicated.

So, just Article and Comment. And nothing too fancy. Note that the underscore in date_time matters. Otherwise, you're liable to get a constant missing error. Another gotcha is that an "id" field is not generated automatically.

merb-gen resource Article
            id:integer
            title:string
            body:text
            published_at:date_time
            comments_allowed_at:date_time
            created_at:date_time
            updated_at:date_time
          
          merb-gen resource Comment
            id:integer
            author:string
            email:string
            body:text
            article_id:integer
            parent_id:integer
            created_at:date_time
            updated_at:date_time
          

We have several more things to do before we can really get the app running. The first is routing. I understand that Merb's router is quite powerful. But, I'm not intending to venture there for now.

I want the actual code of router.rb to look like this for now (just using REST routing for the two models just created). I'll update this a bit as time goes on.

Merb::Router.prepare do |r|
            r.resources :articles
            r.resources :comments
          
            r.default_routes
          end
          

Next, specify that id is the primary key for both tables. So, in each model, change the line property :id, Integer to property :id, Integer, :serial => true, thus telling DataMapper that id is an auto-numbering primary key.

Then, migrate the database. Yay, no migration files! This is probably a personal preference, but I really like specifying the tables fields within the model.

rake dm:db:automigrate
          

The next was a surprise to me. Apparently, link_to is now in the "merb-assets" gem and must be required explicitly (Thanks to this article for the solution. Likewise, "error_messages_for" is in "merb_helpers" (You may need to gem install merb_helpers). So, add to init.rb dependencies "merb_helpers", "merb-assets".

To start the app, the command is, well, merb. Add a "-p ####" to specify a port other than 3000.

So, play around, check out the scaffolded code, yadda, yadda.

Revision 11

Clean Up the Scaffolding

The next step is to get the app working like I want it, without messing with the storage in Subversion stuff. One thing to note is that I'm not going to address "look and feel" in this article. (Except sort of at the tail-end). I generally like to start with the models, although I don't really have an "approach". Oh, and I don't plan on going over specs/tests in this article, although I'll be writing some (probably less than some people would prefer).

Anyway, first stop is making the properties in the models work just like I want them.

Validations - I won't validate anything for the Article model because editing will ultimately be done in Subversion, and, well, I generally don't care to validate data that I personally will be inputting. But the Comments will see some changes.

  1. First, we need 'dm-validations'. There's several places you could require it (directly, in the model for example), but I'll add it as a dependency in init.rb. For some reason, I had a version problem, so I specified it explicitly: dependency 'dm-validations', '= 0.9.1'. (Later, I removed the version).
  2. Then, add some options to some of the properties. Add :nullable => false to #body, #author and #article_id; also, add :length => 100 to #author (Because I feel like it); and :format => :email_address for #email. By default, DataMapper validates based on this info. So, a :nullable => false results in a validates_present. Of course, you can use explicit validations if desired or needed.
  3. I'm not sure how to disable the format validation (for email) when no address has been supplied. So, I'll customize the setter.

    def email=(val)

     if val.blank?
                 attribute_set(:email, nil)
               else
                 attribute_set(:email, val)
               end
              
    end

Lazy Loading - DM lazy loads Text fields by default. I don't anticipate retrieving Articles or Comments without using their #body fields, so, add :lazy => false option to the #body properties.

Relationships - Comments belong to a) an Article and b) possibly a parent Comment. Associations look a bit different if you're accustomed to ActiveRecord, but nothing too weird. Here's the updates. Some of these associations have some extra options, such as ordering and scope. Note particularly Article#direct_comments.

class Article
            has n, :comments
          
            has n, :direct_comments,
                :class_name => 'Comment',
                :order => [:created_at.asc],
                :parent_id => nil
          end
          
          class Comment
            belongs_to :article
          
            belongs_to :parent,
                :class_name => 'Comment',
                :child_key => [:parent_id]
          
            has n, :replies,
                :class_name => 'Comment',
                :child_key => [:parent_id],
                :order => [:created_at.asc]
          end
          

I want to be able to call @article.comments.count from my vies, so I need to add a dependency 'dm-aggregates' in init.rb

Auto Times - I like the auto-updating created_at and updated_at in AR. To get this in DataMapper, we just need to require "dm_timestamps". dependency 'dm-timestamps' in init.rb is one way to do this.

Timestamp Booleans - One of my favorite little tricks are timestamp columns that can operate as booleans. I have two in Article #published_at and comments_allowed_at. I'll want the following methods: #published? and #published=(Boolean) (and similar for #comments_allowed_at. Since I might add similar columns later, I'll do some meta-programming here.

class Article
            %w{published comments_allowed}.each do | col |
              define_method("#{col}=") do |value|
                value = false if (value == '0' || value == 0) # for checkboxes
          
                # update only if the boolean value changed.
                if (!value == __send__("#{col}?"))
                   attribute_set("#{col}_at", value ? Time.now : nil)
                end
              end
          
                  define_method("#{col}?") do
                __send__("#{col}_at") ? true : false
              end
            end
          end
          

#attribute_set is preferred to @attribute_name=(value) for "tracking dirtiness".

Auto Migrate again - rake dm:db:automigrate. This will take care of updating the database with those :nullable => false kind of property options. I think this is destructive. rake dm:db:autoupgrade, according to rake -T, is nondestructive. But I don't have any useful data yet anyway.

Finally, in this "clean up the scaffolding" section, I want to look at the VC side of MVC. There's a few things needed to match up the controllers with the associations specified in the model. I'll also work on the views, although I won't document any of that here. Merb, by the way, supports ERB and HAML. I assume it supports other templating engines; looking at the merb-haml gem, anyway, this doesn't look difficult. I'm going to use HAML for now, because, hey, why not add on something else new. But, the controller/routing changes. (Oh, and I'll ignore the edit and new views for articles; they will after all disappear shortly).

HAML - Add a 'merb-haml' dependency in init.rb

Router - Basically, I just want to use REST routes (for now), with comment routes nested in article routes. Also, add the default route.

Merb::Router.prepare do |r|
            r.resources :articles do | article |
              article.resources :comments
            end
          
            r.match('/').to(:controller => 'articles', :action =>'index')
          end
          

Contents controller - I want to update the Contents controller to scope requests by the article. The key here is a before filter. In this, I'll also assign a parent Comment, if appropriate.

before :assign_article_and_parent
          
          protected
          
          def assign_article_and_parent
            @article = Article.get(params[:article_id])
            raise NotFound unless @article.nil?
            @parent = Comment.get(params[:parent_id]) unless params[:parent_id].blank?
          end
          

There's also some updates such as:

@comment = @article.comments.first(:id => params[:id])
          

and

@comment = Comment.new
          @article.comments << @comment
          @parent.replies << @comment if @parent
          

URLs also need to reflect the nested routing of comments. For example, the redirect in #create becomes:

redirect url(:article_comment,
              :article_id => @article.id,
              :comment_id => @comment.id)
          

I also remove the edit, update and destroy actions. The only mechanism I will provide for these for now is the console. This is just to avoid needing an administrative area (even then, though, I'd probably just provide the destroy option).

Articles controller - Finally, I want to limit Articles to those already published. Again, a before filter would work, but I'm just going to create an Article.published method, referenced in index. I could restrict the show action also to only those published, but I'll leave it for previewing, at least for now.

class Article
            class << self
              def published(options = {})
                 Article.all(options.merge(
                     :conditions => ["datetime(published_at) <= datetime('now')"],
                     :order => [:published_at.desc]))
              end
            end
          end
          

Revision 33