Move to Mongo

by Andrew Zeneski  •  published July 30
 

As an experiment, I decided to switch the document store from Redis to MongoDB.

First I wrote a migration script to pull the current blog post data out of Redis, restructured it since MongoDB supports lists and embedded documents inside a main document. Because of this restricting a lot of code was removed and is now a lot cleaner. Also, instead of having to do several queries to the document store to retrieve posts, it is now handled completely by a single query.

entry = self.mongo.posts.find_one({'slug': slug})

I created several indices to the post document for the normal lookups sorts:

db.posts.ensureIndex({'front_page' : 1})
db.posts.ensureIndex({'created' : -1})
db.posts.ensureIndex({'slug' : 1})

The front_page flag is a boolean to indicate if a post is promoted to the front page of the blog or not. The created field is a time stamp and is used to sort in descending order. The slug field is the "search engine friendly" string that is used in the URLs.

A simple lookup for the home page and right navigation now looks like this:

posts = self.mongo.posts.find({'front_page': True}).sort("created", -1)[0:15]

Simply it finds all posts that has the boolean field True, sorts them by the created date descending and limits the result to the first 15 entries.

Switching to Mongo was really just an exercise, but it does seem like a more natural way to store blog/article content. No longer do I need to populate a bunch of lists and sets with Redis keys to categories and tag posts. MongoDB has a very powerful query language that supports looking at contents of an embedded list, values on an embedded object, fast sorts and indexed lookups.

Finally, I added the [MarkItUp] (http://markitup.jaysalvat.com "MarkItUp")+ jQuery editor plugin to the compose screen. A clean editor with Markdown support and a very nice preview pane.

It's almost as if I am now running Blog v2.0.

// ttfn

Comments
 

Tornado Async Twitter Feed

by Andrew Zeneski  •  published February 19, 2011
 

Tornado's AsyncHTTPClient is a very simple way to make RESTful API calls. Adding the Twitter feed to my blog was just a couple of lines of code:

class HomeHandler(BaseHandler):
    @tornado.web.asynchronous      
    def get(self):
       http = tornado.httpclient.AsyncHTTPClient() 
       http.fetch("http://api.twitter.com/1/statuses/user_timeline.json?" \
                  "screen_name=username&include_rts=t",
                  callback=self.on_response)

    def on_response(self, response):
        latest_entries = self.redis.lrange('latest', 0, 15)
        tweets = tornado.escape.json_decode(response.body)
        ...

Could it be any easier??

Comments
 

Easy Syntax Highlighting

by Andrew Zeneski  •  published February 13, 2011
 

My blog was looking pretty good, but while thinking about finishing touches I felt it would be nice to add syntax highlighting to the code blocks.

The fact that I am using markdown2 for text formatting I assumed this task would not be trivial. I was wrong...

I added 3 lines to my template:

    <link rel="stylesheet" href="http://yandex.st/highlightjs/5.16/styles/github.min.css"/>
    <script src="http://yandex.st/highlightjs/5.16/highlight.min.js" type="text/javascript"></script>
    <script>hljs.initHighlightingOnLoad();</script>

Then all of a sudden, I had syntax highlighting. The highlight.js library handles everything and even works with content generated using markdown like text formatters.

Comments
 

Project Py-Tor-Red-Blog

by Andrew Zeneski  •  published February 12, 2011
 

I found Tornado while browsing through the list of open source software used by Facebook. I came across a few projects I wasn't familiar with; Tornado was one of them.

Browsing through the documentation, I found the framework to be very intuitive:

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("Hello, world")

With a few more lines of code to setup the application, this creates a very simple hello word example. Pretty nifty. Looks a lot like a GAE webapp (or web.py) eh?

Not that long ago, I also came across a key-value NoSQL store, Redis, that caught my attention. Having been focused on OFBiz implementations that depend on transaction SQL databases, I was interested in investigating the NoSQL trend more.

Redis looked very interesting, so as an exercise to learn more about both Tornado and Redis I ported the PHP sample application Retwis to Python using Tornado and the redis-py library.

(My port is available on GitHub.)

I decided I needed a new blog and with the Retwis port under by belt I realized that a blogging application was a perfect candidate for this technology stack. Storing blog data in a SQL database never really made a whole lot of sense, but a document store or key-value store did seem like a good fit.

I reused a number of patterns and code blocks I found on Bret Taylor's Blog, spent 1.5 days and coded up this tool. My first impression is: after spending the last several years writing Java exclusively, Python is very refreshing and MUCH faster to develop, and with modern CPUs, Tornado + Nginx scaling issues are a thing of the past. I am very happy the with results.

The major meat of the app is in the posting class and entry module, but first I needed a way to secure the ability to post. So I created an author decorator, which is borrowed from the 'authenticated' decorator in Tornado:

def author(method):
    """Decorate methods with this to require that the user be an author."""
    @functools.wraps(method)
    def wrapper(self, *args, **kwargs):
        if not self.current_user:
            if self.request.method in ("GET", "HEAD"):
                url = self.get_login_url()
                if "?" not in url:
                    if urlparse.urlsplit(url).scheme:
                        # if login url is absolute, make next absolute too
                        next_url = self.request.full_url()
                    else:
                        next_url = self.request.uri
                    url += "?" + urllib.urlencode(dict(next=next_url))
                self.redirect(url)
                return
            raise tornado.web.HTTPError(403)
        elif not self.is_author:
            raise tornado.web.HTTPError(403)
        return method(self, *args, **kwargs)
    return wrapper

The is_author property simply checks the user against a list of approved authors stored in Redis.

Next, I decided to store the blog data as a HASH and re-used a block of code I found (that I think was written by Benjamin Golub) to "slugify" the title and tags. I store the post, a link from the slug to the post, then add it to a list of latest posts, tag sets and to the proper monthly archive:

class ComposeHandler(BaseHandler):
    def _slugify(self, slug_str):
        slug = unicodedata.normalize("NFKD", slug_str).encode(
                "ascii", "ignore")
        slug = re.sub(r"[^\w]+", " ", slug)
        slug = "-".join(slug.lower().strip().split())
        return slug

    @author
    def get(self):
        id = self.get_argument('id', None)
        entry = None
        if id:
            entry = self.redis.hgetall('post:' + id)
        self.render('compose.html', entry=entry)

    @author    
    def post(self):
        # current user
        user = self.get_current_user()

        # id: if this is an update
        id = self.get_argument('id', None)

        # tags
        tags = set([self._slugify(unicode(tag)) for tag in
            self.get_argument("tags", "").split(",")])

        # dict for the new post
        post = dict()
        if id: post = self.redis.hgetall('post:' + id)                

        # title and content
        post['title'] = self.get_argument('title')
        post['content'] = string.replace(
            self.get_argument("markdown"), "\n", "")

        # slug for urls 
        slug = self._slugify(post['title'])
        if not slug: slug = str(post['id'])
        post['slug'] = slug

        # set tags on the post
        post['tags'] = ",".join(tags)

        # begin a redis Pipeline
        pipe = self.redis.pipeline()

        # only on creates               
        if not id:
            post['created'] = time.time()
            post['author'] = user['user_id']

            # get the next post ID
            post['id'] = self.redis.incr("global:nextPostId")        

        # updated time
        post['updated'] = time.time()

        # store the post
        pipe.hmset("post:" + str(post['id']), post)

        # store the slug for quick lookup
        pipe.set('slug:' + slug, post['id'])

        # associate post with the tags
        for tag in tags:
            if tag: pipe.sadd('tag:' + tag, post['id'])

        # place in the archive
        archive_date = datetime.date.today().strftime("%Y%m")
        pipe.rpush("archive:" + archive_date, post['id'])
        if not self.redis.sismember("archives", int(archive_date)):
            pipe.sadd("archives", int(archive_date))

        # home page posts
        promote = self.get_argument('promote', 'N')
        if not id and promote == 'Y':
            pipe.lpush('latest', post['id']);
            pipe.ltrim('latest', 0, 50);

        # execute the pipeline
        pipe.execute()

        # tell redis to [background] save right away
        self.redis.bgsave()

        # redirect to the post page
        self.redirect("/entry/%s" % post['slug'])

The EntryHandler is what is used to display the post. First it pulls a dict directly out of Redis, converts the timestamps, parses the content using markdown2 then renders the template:

class EntryHandler(BaseHandler):
    def get(self, slug):
        post_id = self.redis.get('slug:' + slug)        
        if not post_id: raise tornado.web.HTTPError(404)

        entry = self.redis.hgetall("post:" + post_id)
        author = self.redis.hgetall("user:" + str(entry['author']))
        entry['created_time'] = datetime.datetime.utcfromtimestamp(float(entry['created']))
        entry['updated_time'] = datetime.datetime.utcfromtimestamp(float(entry['updated']))
        html = markdown.markdown(entry['content'])
        self.render("entry.html", is_author=self.is_author, html=html,
                    entry=entry, author=author)

A few more features, like an Atom feed, a JSON and plain text API, integrations with Twitter, Facebook and Disqus took no time at all to add. As usual, the most time consuming part was working through the look and feel.

Time used: 1/2 day coding 1 day design.

Comments