<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <atom:link href="http://omniti.com/shares/seeds" rel="self" type="application/rss+xml" />
        <title>OmniTI ~ Seeds</title>
        <link>http://omniti.com/seeds</link>
        <language>en-us</language>
        <description>Seeds</description>
        <item>
            <title>Writing Readable Code</title>
            <link>http://omniti.com/seeds/writing-readable-code</link>
            <guid>http://omniti.com/seeds/writing-readable-code</guid>
            <description><![CDATA[
  Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. Code for readability.
   - Martin Golding


Many
 coders have run across this statement and dismissed it as a bit of 
folklore, but it...]]></description>
            <content:encoded><![CDATA[<blockquote>
  <p class="first">Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. Code for <span class="end-quote">readability.</span></p>
  <p class="attribution"> - Martin Golding</p>
</blockquote>

<p>Many
 coders have run across this statement and dismissed it as a bit of 
folklore, but it rings true. You will almost certainly never be the only
 person to look at code you&#8217;ve written. Even if that time is many months
 or years down the road, you owe it to future developers to write code 
that is easy to read, even if the logic behind it is complex.</p>

<p>Inevitably
 you will run into people who say "who cares, my code works, that&#8217;s the 
bottom line, right?"  Well, yes and no. Of course your code should 
work -- even the most beautiful code is rendered ineffective if it 
contains a crippling bug. Unless you&#8217;re coding in a vacuum though, you 
or someone else will have to look at it again. If you can&#8217;t understand 
what&#8217;s going on because the code is written poorly you are costing 
yourself (and likely your clients) time. You also increase the chances 
of introducing bugs and/or syntax errors which further slow you down. 
Not good.</p>

<p>So how do we get the job done and still have readable code?</p>

<p><h5>Use a Sensible Indentation Scheme</h5></p>
<p>This
 one might seem obvious, but you should choose an indentation scheme 
that makes sense and stick to it. Typical schemes include hard-tabs, 
two- or four-space soft-tabs, or whacking the space bar a couple of 
times. Whichever method you choose though, be consistent. Being able to vertically scan a section of code and see where indentation levels match levels of code is extremely useful.</p>

<p>Maintaining
 consistency is easy if your editor of choice supports a macro or 
keybind so that indenting is reduced to a single keystroke. Hard-tabs 
are already done for you. If you go the soft-tabs route, set your editor
 to use your chosen number of spaces whenever you press Tab for maximum 
convenience. Indent each code block or logical function so that things 
visually line up.</p>

<p><h5>Block Separators</h5></p>
<p>Where
 you place block separators is entirely up to you, but again be 
consistent. For example, both of the following are acceptable:</p>

<div style="padding-bottom:1em;"><pre style="font-size:11px; font-family: Courier">
while ($condition) {
    # code
}
</pre></div>

<p>&#8230; or &#8230;</p>

<div style="padding-bottom:1em;"><pre style="font-size:11px; font-family: Courier">
while ($condition)
{
   # code
}
</pre></div>

<p>If your language of choice allows it, you should also do this for function calls with significant numbers of arguments, such as:</p>

<div style="padding-bottom:1em;"><pre style="font-size:11px; font-family: Courier">
my $result = super_awesome_function(
    color => 'red',
    awesome => 1,
    frogs => [ 'out', 'gonk' ]
);
</pre></div>

<p>You
 might have noticed a tendency to avoid long lines here. This is 
intentional. One stat often trotted out is that "stock" terminal windows
 are 80 characters wide, and therefore you should not go over this for a
 single line of code. That&#8217;s a good starting point, but instead try to 
think in sensible chunks. If you have a long string being passed in as a
 function argument, then don&#8217;t sweat having that wrap the terminal 
window. Just don&#8217;t put all your arguments on one giant line. That&#8217;s hard
 to read, and difficult to deal with.</p>

<p><h5>Crack the Whip Now and Then</h5></p>
<p>So
 you&#8217;re doing your best to write consistently readable code, with a 
sensible indentation scheme and well laid out lines. Your project grows 
and a couple junior developers come on, and let&#8217;s say they&#8217;re not quite 
as consistent as you are. Consider using a style enforcement tool on 
your code to keep things in line.</p>

<p><a href="http://perltidy.sourceforge.net/"><span>Perltidy</span></a> is a good example. <a href="http://jsbeautifier.org/"><span>JSBeautifier</span></a>
 is handy to drop blocks of JavaScript into. Other languages have their 
own, and if you use a GUI to code, consider using one that auto-indents 
for you so you don&#8217;t have to worry about it. You can get creative here 
too -- source control programs like SVN support commit hooks which could 
auto-prettify your code before check-in, or you could set up a cron job.
 The idea here is that once you embark down the road of maintaining a 
readable codebase, try your best to stick with it. Use tools to your 
advantage.</p>

<p><h5>Use Descriptive Variable Names</h5></p>
<p>Large
 blocks of code with variable names like, $x, $y, $thing, $o, will 
certainly work, but give very little insight into what&#8217;s actually going 
on. You should use variable names that describe the type of value being 
stored.</p>

<p>So instead of these:</p>
<ul>
  <li>$res</li>
  <li>$c</li>
  <li>$rv</li>
  <li>$o</li>
</ul>

<p>Use these:</p>
<ul>
  <li>$reservation</li>
  <li>$row_count</li>
  <li>$return_value</li>
  <li>$customer_object</li>
</ul>

<p>This will make your code significantly more readable and provide in-line, semantic hints when tracing logic.</p>

<p>Similarly,
 if your language supports implicit function arguments and loop 
variables, avoid these. For example, in Perl, the following doesn&#8217;t tell
 you much.</p>

<div style="padding-bottom:1em;"><pre style="font-size:11px; font-family: Courier">
foreach (@stuff) {
    $_->do();
}
</pre></div>

<p>But this does:</p>

<div style="padding-bottom:1em;"><pre style="font-size:11px; font-family: Courier">
foreach my $record (@customer_records) {
    $record->process();
}
</pre></div>

<p>It
 should always be easy to tell what&#8217;s going on, and what is being acted 
upon, implicitly. Non-obvious variable names that are used consistently 
across your application are fine as long as this is well-documented. For
 example you might want to use $c for a database connection handle to 
save typing, or $u for a user object. Shortcuts are fine as long as 
everyone speaks the same language.</p>

<p><h5>Use Natural Language for Subroutine Names</h5></p>
<p>Similarly,
 your subroutine names should describe the actions being taken as 
descriptively as possible. This helps to not only segment program 
functionality, but to find relevant functionality when enhancements need
 to be made. Consider the following code:</p>

<div style="padding-bottom:1em;"><pre style="font-size:11px; font-family: Courier">
foreach my $record (@customer_records) {
    $record->verify_eligibility();
    $record->process_enrollment();
    $record->check_for_and_process_errors();
    $record->finalize_and_archive();
}
</pre></div>

<p>It
 is very easy to follow exactly what&#8217;s going on there. No functionality 
is hidden behind obscure subroutine names, and each action being taken 
on a customer record is clearly defined. If business rules change, for 
example, requiring additional customer data before enrollment, it&#8217;s 
clear where something like 
$record->populate_secondary_customer_data() would have to go.</p>

<p>Making
 defined breaks like this also helps with proper code design -- you 
wouldn&#8217;t put archiving code in $record->process_enrollment() for 
example, but you might be inclined to do so if it were simply named 
"process."</p>

<p><h5>Write Code Comments for Non-Obvious Things</h5></p>
<p>Very
 often, as we&#8217;re in the thick of things, we write complex bits of code 
that seem perfectly fine, but turn out to be incomprehensible blocks of 
logic later. You should get into the habit of writing code comments for 
each logical chunk of code unless it&#8217;s painfully obvious what&#8217;s 
happening. Adding comments to your code costs nothing, but improves the 
readability of your code considerably. Don&#8217;t go overboard though. Too 
many comments will clutter your code, and can actually have a more 
serious drawback -- teaching your colleagues to ignore your comments. 
It&#8217;s sort of like the alarm that keeps going off and just gets 
acknowledged without being looked at. When developers are routinely 
forced to wade through lots of unhelpful comments, they&#8217;ll miss the good
 ones. Quality over quantity is the rule of thumb here.</p>

<p>You
 should definitely comment in situations that seem obvious but have 
their roots in non-obvious business logic. Yes, we can see that you&#8217;re 
adding a dollar to the transaction charge on Fridays. But why? Document 
that business logic; maybe the client has a specific need to do so that 
won&#8217;t be so obvious later on.</p>

<p><h5>Know When To Write Many Smaller Statements Instead of One, Huge One</h5></p>
<p>This
 one is a particular problem in languages like Perl that make it easy to
 chain many things together using implicit parameters into one 
monolithic statement. Consider the following code.</p>

<div style="padding-bottom:1em;"><pre style="font-size:11px; font-family: Courier">
my @result = grep { $_->{priority} > 2}
map { { priority => get_priority($_->id()), customer => $_ } }
grep { $_->last_name() =~ /^H/ } @customers;
</pre></div>

<p>Basically,
 we&#8217;re looking for all customers with a last name beginning with H, who 
are at priority level 3 or higher and jamming this into some temporary struct. This works great -- but the following 
is much easier to read:</p>

<div style="padding-bottom:1em;"><pre style="font-size:11px; font-family: Courier">
my @customers = subset_customers('last_name', 'H', @customers);
my @result = find_customers_at_priority(3, @customers);
</pre></div>

<p>Now
 you can hide the complex logic inside your utility functions, and the 
main program flow is much easier to follow. This is marginally slower to
 execute given we&#8217;re storing results and making subroutine calls instead
 of passing through transient loop variables, but the gain in 
readability outstrips the loss.</p>

<p><h5>Why You Should Do All of This</h5></p>
<p>Writing
 readable code makes you a better team player. The next person who picks
 up your code will be able to tell what&#8217;s going on, how your design 
flows, and will know the reasons behind any tricky bits. You&#8217;ll reduce 
ramp-up time and the inevitable "what&#8217;s going on here" questions. 
Consider that if the next person on your project can&#8217;t figure out what&#8217;s
 going on, they&#8217;re probably going to call you and make you help. It&#8217;s in
 your best interests then to make sure that doesn&#8217;t happen.</p>

<p>And remember: the violent psychopath who comes along next just might be you.</p>]]></content:encoded>
            <pubDate>Wed, 16 May 2012 15:46:48 GMT</pubDate>
        </item>
        <item>
            <title>The Asymptote of DevOps Utopia</title>
            <link>http://omniti.com/seeds/the-asymptote-of-devops-utopia</link>
            <guid>http://omniti.com/seeds/the-asymptote-of-devops-utopia</guid>
            <description><![CDATA[Many technologists that I know are math nerds. I know I am. Just in case you don&#8217;t recall though, here&#8217;s an asymptote.



What does this graph have to do with DevOps? Well, I like to think about the pursuit of perfection of something as a p...]]></description>
            <content:encoded><![CDATA[<p>Many technologists that I know are math nerds. I know I am. Just in case you don&#8217;t recall though, here&#8217;s an asymptote.

<br /><br /><img border="0" src="http://images.omniti.net/omniti.com/i/b/parabola_full_with_dots.png" />

<p>What does this graph have to do with DevOps? Well, I like to think about the pursuit of perfection of something as a parabola and an asymptote. It&#8217;s impossible for the two to meet, just like it is impossible for you to reach perfection, aka, "Devops Utopia." While we live in a finite world (like our little parabola) and can never reach perfection (again, like our little parabola), there is a positive side to all this. Lets look at quadrant one of our graph.

<br /><br /><img border="0" src="http://images.omniti.net/omniti.com/i/b/parabola_quad_1_with_dots_and_labels.png" />

<p>The blue point: that&#8217;s your organization right now. The green point: that&#8217;s a good goal to reach. Surely, you do some "devops-y" things now, but what can we do to improve? Luckily, we can make significant progress when we move along that X axis in a positive direction.

<p><h5>So, What Is This DevOps Thing Anyway?</h5><br />

<p>In order to improve upon something, one must understand it. So, "What is DevOps?" is a great first question. I found some definitions by some really smart people.  These ideas are intertwined throughout this article, so here they are in all their independent glory.

<blockquote>
<p class="first">A Cultural and <span class="end-quote">Professional Movement.</span></p>
<p class="attribution">- Adam Jacob, Opscode</p>
</blockquote>
<blockquote><p class="first">Anything that makes interactions between development and <span class="end-quote">operations, better.</span></p>
<p class="attribution">- Thomas Limoncelli, Google</p></blockquote>

<p>I&#8217;ve noticed that a couple of interesting things are missing from these definitions. There is no mention of it being a new department or job title. Being a movement, many people rightfully identify with it, however, I don&#8217;t believe it&#8217;s "you." I view it as the next evolution along the path of "WebOps." I see it as "doing things in a DevOps fashion." WebOps 2.0, if you will.

<p>Well then, how do we learn how to do DevOps? "By their deeds you will know them." DevOps is something you do, personally and professionally. It is this pursuit that I consider here.

<p><h5>The Cultural Movement of DevOps</h5><br />

<p>There is clearly a culture behind DevOps. Some even call it the "Cult of DevOps." Whatever you call it, the cultural portion involves the interactions between you and your peers. Just like some forces in a galaxy far, far away, there are Light and Dark sides; the Inclusives and the Exclusives.

<p><b>The Inclusives.</b> These are the people that you find you want to be around. You all build cool stuff together and help each other out. These are the awesome people who have a passion for what they do; they love to show you and tell you all about it, given a chance. Think about the people you can&#8217;t wait to tell when you have a cool, new project. Chances are, these people are Inclusives.

<p><b>The Exclusives.</b> These people are on the complete opposite side of the spectrum. No one likes to be around them. If you dare share your passion with them, don&#8217;t expect much. They attack you if your choice of tool isn&#8217;t the same as theirs. They fight you if you dare have an opinion of your own, and they are the grumpy "Us vs. Them" bastards. They might be chronic assholes.

<p>It&#8217;s worth mentioning that my first draft of "The Asymptote of DevOps Utopia" pointed to the "No Asshole Rule" as what ultimately distinguished the Inclusives from the Exclusives. I&#8217;ve come to the realization (after much discussion) that a "No Asshole Rule" is simply not enough. It&#8217;s like Google&#8217;s "Don&#8217;t Be Evil" mantra. Everyone has their own definition of what makes an asshole. Therefore, I propose the "Be Fucking Nice" rule instead. Common courtesy, respect and leniency for others, especially those different than you, should be something we should all strive toward. The DevOps meetups I&#8217;ve attended always feel like extended family reunions. Extend that same respect you show your meetup members to your coworkers and I&#8217;m betting it will go a long way, indeed.

<p><h5>The Professional Movement of DevOps</h5><br />

<p><b style="font-size:110%";>Organizational Structure</b>

<p>We&#8217;ve all heard about the departmental silos that exist in many organizations. You&#8217;ve got your sysadmins on one team, the developers on another, security and network in another and designers in yet another. Requests have to filter up the chain, through the director that runs the silos, to a VP or other director, who then forwards the requests down to the appropriate team, until it reaches someone who actually does the work. This is woefully inefficient and it&#8217;s where we get the "throw it over the wall" cliche. What can we do? Enter, the "Happy Fun Pile."

<p>No, the "Happy Fun Pile" isn&#8217;t a giant, adult-sized, ball pit (though I hear some companies have those). The Happy Fun Pile is where you get everyone working together. It&#8217;s a really simple concept, though many companies seem to have trouble embracing it. One misconception is that, without silos, you wouldn&#8217;t need directors. Directors are still needed, but their job is no longer to funnel communications and requests between teams. Their job is to make sure the team has whatever is necessary to get things done and work together well.

<p>So, how does the ambitious director facilitate the Happy Fun Pile? Here are three things to start with (though I&#8217;m sure you can find <b>many</b> more):

<ul>
<li>Optimize for serendipitous interactions</li>
<li>Embrace asynchronous communication</li>
<li>Fix your broken meetings (hint: most of them are broken)</li>
</ul>

<p><b>Optimizing for serendipitous interactions</b> is like fluid physics for ideas. Setup physical work spaces in such a way that people can easily interact together: if two people are discussing their ideas, then others can join in. Of course, the culture must support this, but it can help people refine their ideas. Exactly how you do this with your space is up to you. Maybe clusters of desks throughout the room. Maybe on big table with everyone gathered around it. However you do it, mix everyone together and let ideas collide like atoms in a fluid.

<p><b>Asynchronous communications</b> allow collaboration to occur when people "feel like it." Think back to when you did your best work, when you happened upon your "eureka" moments; they probably didn&#8217;t happen at the same time as everyone else&#8217;s moments. Perhaps you were pondering your ideas over a cup of coffee in the early morning. Maybe it was in a late night hacking session. Chances are, you&#8217;ve had more inspired moments away from your desk or outside of meetings than during them. How can your organization leverage these bursts of inspired thinking? Asynchronous communication. Email and chat programs that feature a browsable, searchable history of can act as a collaborative "Commonplace Book". This can allow asynchronous work schedules to succeed as well, though that is a topic that warrants it&#8217;s own, dedicated article (or series of articles).

<p><b>Fix your broken meetings.</b> Chances are, most of your meetings are broken. Personally, I think meetings suck. Most of the technical people I know work on what&#8217;s called the "maker&#8217;s schedule." The type of work they do typically involves throwing the entirety of their cognitive abilities at a hard problem over a significant period of time. This clashes with the "manager&#8217;s schedule," upon which most managers operate. The "manager&#8217;s schedule" is best visualized with a day planner. Convenient slots of time every half hour or so, where you can write in what you will be doing. Due to the work of many managers, their day may be sharded into many little chunks each day. Penciling in an afternoon meeting is no problem.

<p>Unfortunately, many businesses attempt to force their makers operate to on a manager&#8217;s schedule too. Little meetings scheduled in the middle of the day are a bane to the maker&#8217;s productivity. The positive side of this is that it&#8217;s easy to change. Here are a few tests you can run before scheduling a meeting, to see if it&#8217;s impact is truly justified:

<ul>
    <b style="font-size:95%";>Calculate and add the following:</b>
    <li>Is the meeting value > human-hours?</li>
        <ul style="padding: 0px;">
            <li>(Y) meeting_value/(#_attending * rate * length_of_meeting); (N) -1</li>
        </ul>
    <li>Is the meeting at the beginning or end of the day? (Y) +1; (N) -1</li>
    <li>Will there be real food provided? (Y) +1; (N) -1</li>
    <li>Is attendance completely optional? (Y) +1; (N) -1</li>
    <li>Is this the most efficient way to convey the information? (Y) +1; (N) -1</li>
        <ul style="padding: 0px;">
            <li>Perhaps, weekly recaps emailed from each team member instead.</li>
            <li>Leverage asynchronous chat logs.</li>
        </ul>
    <li>Is this primarily a social event? (Y) +1; (N) -1</li>
</ul>
<p>Add together your values for each portion. If the value is positive, schedule the meeting. If it&#8217;s negative, don&#8217;t schedule it. Just be honest with your answers.

<p><b style="font-size:110%";>Operations ALL The Things!</b>

<p>Theo Schlossnagle mentions "*OPs" in his "<a href="http://youtu.be/LAP1zaXUvAE" target="_blank"><span>Web Operations as a Career</span></a>" talk. In it, he mentions how you must integrate the operational mindset into every part of your business. Operations is very broad and covers many things, so I&#8217;ll be focusing on "technical operations," like we see in the typical sysadmin career.

<p>Technical operations is responsible for two primary things: system availability and efficiency. System availability is the easier half of the equation and involves the person&#8217;s troubleshooting skills. "Is it down? Get it back up!" Efficiency though, is much more difficult. When I say efficiency, I&#8217;m not talking about the efficiency of your servers. Rather, I&#8217;m talking about the efficiency of everyone else in your organization. How can you achieve efficiency? Three main things: set standards, enable everyone, and be the fire marshal.

<p><b>Set standards</b> and ensure they are followed. Of course, it is your responsibility to make sure that the standards are highly efficient. An example of an efficient standard might be, "All new server instances must have ldap credentials and ssh keys setup for all sysadmins and the dev teams that need access to that machine, within 5 minutes of creation." Design a process to accomplish this task (such as through automation, in this example), audit and test your standard process in different cases, and verify it works. Then ensure the standard is upheld.

<p><b>Enabling everyone</b> helps maintain productivity in an obvious way. If people are blocked from doing what they need to be doing, then they can&#8217;t get it done. It&#8217;s not rocket science. Let&#8217;s say a developer comes to you and says "I need a bunch of hardware in the datacenter by the end of the week. I&#8217;ll need it setup to go to production by the following monday." There are (at least) two ways to handle this. First, you could say, "Sorry mate! I simply can&#8217;t get that done. I can order the hardware and have it express shipped to the datacenter, but there&#8217;s simply no way for me to get that done in time." Here, the developer came to you with a problem because he needed it solved and you&#8217;re the guy who comes to mind. Now, you are turning him down. Perhaps, instead, the conversation could go like this, "Well, I can get the hardware here on express shipping. Tomorrow, while you and your dev team read up on the standards I wrote, I&#8217;ll go rent a bus. Then you, your team and I will go to the datacenter and get this done!" Chances are, your developer will have a change of heart and realize that it&#8217;s not that important. On the other hand, if he says OK, well. . .you get to drive a bus! Also, this lets you push responsibility to the edges.

<p>Being the <b>Fire Marshal</b> is cool. You get to put stickers on stuff and tell people, "please don&#8217;t do that, lest you conflagrate." You get to be like that, but with (hopefully) no burning stuff. Many people consider operations as a sort of "firefighting". Really, though, good operations isn&#8217;t about individual heroics that save the day, and if you rely on that to keep your systems up, then I&#8217;m sorry that your ops team hates their job. Instead, you want to have your ops team be fire marshals. You see, fire marshals set standards. They are also responsible for drilling for disaster. They test procedures to ensure that they work. You can do this too. Break production infrastructure and test your disaster recovery systems. Validate your systems against a "tenth floor test." By doing these things, you prepare people for the worst. Remember, being the fire marshal is cool, and makes your life and organization that much better.

<p><b style="font-size:110%";>Site Reliability Engineering</b>

<p>"Site Reliability Engineering" (SRE) is something you hear more and more these days. It often feels like the elusive "DevOps," in that, when I talk to people about it, no one seems to share the same definition. Personally, I view it as the sophisticated name for WebOps. Lets break it down.

<p>With site reliability engineering, we have two goals: <b>high velocity</b> and <b>extreme reliability</b>. High velocity means rapid growth in a positive direction and extreme reliability means with record breaking uptime. These embody the core of web operations too. Shall we go further? Site. Reliability. Engineering.

<p>Your <b>Site</b> is your. . .SaaS; web app; website; service. Whatever it is, this is the objective toward which you direct your efforts. <b>Reliability</b> means that your site is consistently available, operable and fast. Not just fast as in speed, but velocity. We must focus on speed in a positive direction, rather than speed for speed&#8217;s sake. <b>Engineering</b> seeks to build things that make life better.

<p><b style="font-size:110%";>Achieving Reliability</b>

<p>Three key tenets of reliability: Reliability Budgets, Operable Code, and Monitoring. Set standards for these and your site will show it.

<p><b>Reliability budgets</b> are based on your SLA, typically your quarterly SLA. Collect metrics that measure the availability of your site and report on its uptime. This should be reported automatically and visible to anyone in your organization. When it comes time for a new deployment, use a "canary system" to test, deploy and potentially roll back. Automated push and roll back cannot be stressed enough here. Use it to upgrade a single machine and test. If it&#8217;s still good, keep gradually rolling it out until you hit a threshold and upgrade everything. (The threshold is up to you.) If (when) things go bad, roll back. It&#8217;s called a canary system for a reason. It exists to tell you when things are going bad. After a failed deploy, the reliability budget is recalculated and you can try again. The only time you can&#8217;t keep pushing is if you have already reached the limit set in your SLA.

<p>The key with reliability budgets is that they&#8217;re entirely numbers driven. This helps remove personal bias from the process of deployment. Hopefully, such a system will push the team toward operable code. What is operable code though?

<p><b>Operable code</b> fails gracefully. It reports useful error codes. It has solid and useful documentation. Honestly, if you don&#8217;t document your code, who will? There are many different ways to implement documentation. Whatever you decide, just get with it and stick with it.

<p><b>Monitoring</b> has obvious benefits. You obviously want to know if your site is up and operational. Reliability budgets rely on monitoring. However, your monitoring needs to be more than, "Yup, site is there." You should monitor everything you can. The only thing I will caution here is that, while monitoring everything is good, be selective of what you alert and trend on. If you are watching a particular metric for trending, make sure you can tie it to direct business impact. An example that comes to mind for me is the "average load time." While this is important, do not be distracted by it. Let&#8217;s say you push new code out to your website. You&#8217;ve done your tuning and got everything set. You test it and find that your average load time went from 600ms to 625ms. What if you have a really busy site with millions of users? You probably don&#8217;t flinch at 25ms in difference. However, what if the thing that caused your spike wasn&#8217;t a slight increase in overall average, but a significant increase in outlying cases? What if, for 10% of your users, it&#8217;s actually taking more than a second to load now? Because you get such high traffic on your site, you don&#8217;t even realize it. This is where focusing on the wrong metric can blind you to real problems. Alerts are much more simple. "Can this metric be tied directly to significant financial impact?" If yes, go ahead and page for it at 3 a.m. If not, no one cares that much. No one. If they tell you they do. . .they lie. (You can have them paged instead!)

<p><b style="font-size:110%";>Engineering (aka, Building Cool Stuff)</b>

<p>One of the responsibilities of your operations team is to build and/or implement tools to make other people more efficient and make their lives better. Things like a canary system or automation tools fall under this category. One important point is that there should be a "self service portal" for developers so that they can access SRE knowledge bases. It should also let developers request new server instances, implement monitoring and dashboards, troubleshoot problems and prepare for launch readiness reviews. All of this should be possible without the assistance of a sysadmin. "Push Button/Receive Server" should be the design objective of this portal, literally. After doing so, developers should get an email with login information, a dashboard setup and monitoring automatically in place.

<p>This sort of system makes life much easier when testing code. Instead of the hacked together blob that is the average developer&#8217;s laptop, you get a known state system that is identical to what will be faced in production. You&#8217;ll hear no more, "Well, it worked on my system/laptop/workstation."

<p>The documentation available on the portal should include articles, videos and how-to sessions. SRE "open office hours" should be posted so that developers can ask questions of the SREs. Ultimately, the goal is to build up the skills of the developers, so that they can be self-supporting. This way, those issues that actually reach the SREs are due to a true, deeper problem.

<p>Overall, the portal acts as a workforce multiplier. A small SRE team can support many, many developers. You achieve this via automation. SREs touching each and every server is a system that does not scale. Don&#8217;t consider such a situation acceptable. The portal,  being such a powerful workforce multiplier, enables our next point.

<p><b style="font-size:110%";>Dedicated SRE Team Support</b>

<p>Sometimes, projects require dedicated SRE team support. Empowering the developers to run all their own stuff only goes so far. So, let&#8217;s lay down some requirements. For something to be eligible, it must be of high importance to the company, have a low operational burden, and pass a hand-off readiness review. Perhaps there are regulations like Sarbanes-Oxley. Of course, above all other requirements is SRE availability.

<p>The hand-off readiness review is paramount here. This checks to see if the project is operable. Volume of alerts are checked. A high volume of alerts would indicate that there is something broken in the underlying system and that needs to be worked out first. Next, monitoring, system architecture and release management. These all revolve around reliability and scalability. Outstanding bugs and a general review of "production hygiene" complete our set. These aim to ensure that there are no underlying issues that may have crippling effects. The review is done with the development team and a couple of SREs. The most successful teams, in order to pass the review with flying colors, work with the SREs during office hours and request consultations as needed, before the review. Such consultations should be accommodated to the best of the SRE team&#8217;s ability.

<p>Once an SRE team takes on dedicated support, the development team is kept up to date and regular communication still occurs. If the system starts to deteriorate, (say, crappy code is wrecking things) then the SRE team can&#8202;&#8212;&#8202;and should&#8202;&#8212;&#8202;hand back the operations of the site to the developers. This will allow the developers to fix the code issues and clear things up. Then another, quicker review is done and the SRE team resumes operational support.

<p><h5>Benefits for All!</h5><br />

<p>What are the benefits of making the movement toward "DevOps Utopia?"

<p><b style="font-size:110%";>For SREs</b><br />
Developers are committed to fixing issues
SREs are not expected nor required to support substandard services
SREs can say "yes" to change, yet have a way to encourage stability

<p><b style="font-size:110%";>For Developers</b><br />
Future designs will reflect the knowledge and experience gained from running their own infrastructure
Access to SRE knowledge, monitoring and tools allows them to do their jobs more efficiently
Developers know what to expect when it comes to deployment and working with SREs

<p><b style="font-size:110%";>For Both</b><br />
The adversarial relationship that can exist between developers and sysadmins is eliminated
It makes life better for everyone

<p>Now, I know there are many other things you can do to move toward "DevOps Utopia," but I hope this gives you a starting point. If it seems overwhelming, pick one point and start on it today. Within a few weeks, it will be second nature and you can work on the next point. All that matters is the positive movement forward. Remember the parabola? Even the small movements make significant differences.

<p>Once you feel you have reached the green point on the graph, congratulations! Now, step back and observe. What else can you do? Challenge yourself. Put your newly reformed organization on the red dot again. How do you get to the green dot? While "DevOps Utopia" is ultimately unobtainable, keep going and you will find you can get pretty damn close.


<br /><br />
=====

<p>Futher study and commentary over on my blog, <a href="http://www.liberumvir.com" target="_blank"><span>liberumvir.com</span></a>.

<br />
<p>Thanks and recognition go out to <a href="http://www.opscode.com/blog/" target="_blank"><span>Adam Jacob</span></a>, <a href="http://paulgraham.com/" target="_blank"><span>Paul Graham</span></a>, <a href="http://lethargy.org/~jesus/" target="_blank"><span>Theo Schlossnagle</span></a>, <a href="http://everythingsysadmin.com/" target="_blank"><span>Thomas Limoncelli</span></a>, <a href="http://tom.preston-werner.com/" target="_blank"><span>Tom Preston-Werner</span></a>, and <a href="http://zachholman.com" target="_blank"><span>Zach Holman</span></a> for all their great articles and talks which inspire me daily.]]></content:encoded>
            <pubDate>Wed, 28 Mar 2012 18:38:14 GMT</pubDate>
        </item>
        <item>
            <title>Your Code May Be Elegant</title>
            <link>http://omniti.com/seeds/your-code-may-be-elegant</link>
            <guid>http://omniti.com/seeds/your-code-may-be-elegant</guid>
            <description><![CDATA[
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.

                 - C.A.R. Hoare, The...]]></description>
            <content:encoded><![CDATA[<blockquote>
<p>There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.<br>

                 <span style="text-align: right;">- C.A.R. Hoare, The 1980 ACM Turing Award Lecture</span></p>
</blockquote>

<p>I often get criticized for my mantra toward the development approach.  <em>Your code may be elegant, by mine f***ing works</em>. In response, I hear statements ranging anywhere from "You don&#8217;t understand best practices" to "You hate testing!" In an effort to avoid repeating myself on a regular basis, I decided to write down my point of view on the topic. Adhere to it or not&#8202;&#8212;&#8202;your choice.</p>

<p>First of all, let me make a blanket statement here: the sentence "the project may be late &mdash; but the code [looks better/is easier to maintain/is cleaner]" (pick any that apply) is inherently flawed. If the project is late, it&#8217;s not done. Period. There is no way to justify being late by using code quality/elegance as an argument. If the client needs a Christmas promotion, and you deliver the best product in the history of promotions&#8202;&#8212;&#8202;on December 29th&#8202;&#8212;&#8202;it&#8217;s worthless.</p> 

<p>Now, let&#8217;s address the "best practices" argument, which implies that in order to produce more maintainable code one needs to take more time. I will keep using the phrase "best practices" in quotes because standards vary across the board (despite some common misconceptions), outside of very common Best Practices 101 that every programmer should have imprinted in their brain before the first "Hello World" is written. With that said, "best practices," however you define it, should be a natural coding standard for any decent developer. If you need to bake more time into the project time-line to make your code comply - you are, at best, new to the programming scene. To give a trivialized example, any seasoned programmer should instinctively know not to call the variables <code>$a</code>, <code>$b</code>, <code>$c</code>, etc., and should properly indent code lines. Granted, there are more advanced "best practice" standards that one might mention, but the point stands - "best practices" is not a good excuse for exceeding your development timeline. And taking it a step further, a veteran programmer should know when and, most importantly, <em>how</em> to cut corners, if needed, to meet the deadline. Which brings me to my next point: over-engineering.</p>

<p>Like any experienced engineer, I understand the desire to build the best, most flexible and robust system for every project. I do. But I also understand the common business constraints of every project: time and money. Most projects have a definite deadline and/or a specific budget that must be met and, often times, building something grand is just not feasible within either them. This is where the developer must make a conscious decision to limit creativity to meet the goals. There is no excuse for spending a week to setup a "proper" caching layer for database queries on a 20-row table, that is only used from the administrative panel by three publishers. Understand the use cases. As cool as it may be to build a flexible and expendable XHR framework to support variable simultaneous requests; you don&#8217;t need to invest in it if the only feature that will be using it is an update to a visitors counter on one page. Understand the scope. I cannot stress it enough: a good engineer is not the one who knows how to build the most advanced system, but the one who knows when <i>not to</i> build that system.</p>

<p>In the world of software development, time-to-market is the driving force of the business. In the world of Internet application development, it&#8217;s even more apparent because of the dynamics of the web. When time is of the essence, the simplest solution is not <i>often</i> the best solution; it is <i>always</i> the best solution. And this brings us to our final point of controversy: technical debt.</p>

<p>I often hear the argument that cutting corners anywhere in the development process will accrue irreconcilable technical debt in the long-term, and the cost of that debt should be heavily factored in when making any decision in the process. In reality, this argument supports my point of view. This is exactly the reason the ability to assess when and how to cut corners is crucial when working on deadline-driven projects. There are different types of "technical debt", and the quickest solution, given some thought behind it, will not necessary add to your technical debt. Similarly, over-engineering will not make you debt-free. The ability to make those decisions, often mid-project, is what separates veterans from rookies. </p> 

<p>Additionally, often times people who speak to the dangers of technical debts do not take the business implications into account. Technical debt should be weighted against the actual ROI, because in many cases it is more cost effective to launch early. This way, though you may be accruing technical debt, you are also accruing revenue immediately, and you can reconcile the debt over time. This may be preferable to delaying your time to market so that you end up (arguably) debt-free in tech, but losing market opportunities and/or an immediate revenue opportunity which may be far larger over time than the cost of technical debt itself.</p>

<p>As software developers, we often think our job is to develop software, but, really, that is just the means to an end, and the end is to empower business to reach their goals. Your code may be elegant, but if it doesn&#8217;t meet the objectives (be they time or business) it doesn&#8217;t f***ing work. </p>]]></content:encoded>
            <pubDate>Tue, 28 Feb 2012 19:05:21 GMT</pubDate>
        </item>
        <item>
            <title>Bending Forms to Your Will</title>
            <link>http://omniti.com/seeds/bending-forms-to-your-will</link>
            <guid>http://omniti.com/seeds/bending-forms-to-your-will</guid>
            <description><![CDATA[If you&#8217;re developing an application that requires a lot of forms, you&#8217;ll often find yourself repeating the same code. There are good reasons to avoid using a form package: loading a bunch of classes and running through layers of validation ...]]></description>
            <content:encoded><![CDATA[<p>If you&#8217;re developing an application that requires a lot of forms, you&#8217;ll often find yourself repeating the same code. There are good reasons to avoid using a form package: loading a bunch of classes and running through layers of validation will never be as fast as just working with markup and post values.&nbsp; But if your project doesn&#8217;t have to be incredibly fast, and you&#8217;re already taking the standard precautions against resource drain (smart caching, efficient class loading, minimal bootstrapping), a form package can save you coding time, or help you standardize forms across an application without relying on copy/paste.</p>

<p>Once you&#8217;ve decided to use a form package, you&#8217;ll have to figure out which one is best for the task at hand. As with so many other programming decisions, rolling your own is the way to ruthlessly optimize for the things you care about, but it requires the kind of time investment that might not be feasible. Choosing an existing package may be as simple as noticing one that&#8217;s part of a library you&#8217;re already using, or it might take some extensive research on what&#8217;s out there and what the strengths are for each.</p>

<p>For the purposes of this article, we&#8217;ll be focusing upon the two big tasks of form building: display and validation. The examples will use Zend_Form, which is large and complex enough to include most of the common ways of doing each. And we&#8217;ll talk about custom elements, which can be incredibly useful and require attention to both.</p>

<h3>Views</h3>

<p>One important question to ask yourself when choosing or building a form package is how it&#8217;s going to fit into the rest of your architecture, and into the rhythms of development for the project.</p>

<p>Most packages offer you the option of auto-building the form markup from the element objects you&#8217;ve added. This has a lot of appeal for back-end developers, for whom writing markup is the least interesting part of the endeavor, and you can get a lot of mileage out of them on plenty of projects.</p>

<p>An administrative back-end is ideal for this kind of use. It&#8217;s used only by people intimately familiar with the terminology you&#8217;ll be using for labels, so you&#8217;re unlikely to have do much UI engineering that would require fine-grain control over the layout. Admin areas tend to have a lot of forms that should all share a look and feel. Using auto-build views means you can describe the look in your base class&#8202;&#8212;&#8202;any tweaks to the default view can be applied to every form.</p>

<p>You can also make use of auto-build views for a project where you must do demos while you&#8217;re working on the underlying API: people (particularly people without programming knowledge, but sometimes even programmers do it, too) instinctively associate the design polish of a form with the state of completion of the code underneath it. The raw look of default forms can work in your favor to remind everyone that your API is still under development.</p>

<p>There are, however, good reasons not to let it persist into your production code.  If you&#8217;ve inherited a project that uses a form package in this way within an MVC framework, you&#8217;ll find view blocks that look something like this:</p>

<pre>
<code>&lt;div id="some-form"&gt
    &lt;?php echo $this-&gt;form-&gt;render(); ?&gt;
&lt;/div&gt</code>
</pre>

<p>This is not precisely helpful when you or your front-end developer have been asked to add an image next to the CVV field. Where is all the markup? How do you know where to add it? Instead, you need to find your way into the form object to have a look at some abstracted markup:</p>

<pre>
<code>$cvv = new Zend_Form_Element_Text('cvv');
$cvv-&gt;addValidator(new My_Custom_CVV_Validator());
$cvv-&gt;addDescription('A 3 or 4 digit number located on the back of the card.');
$cvv-&gt;addLabel('CVV/Security Code');
$cvv-&gt;addDecorator('HTMLTag', array('tag' =&gt; 'span', 'class&#8217; =&gt; 'short-entry'));
$this-&gt;addElement($cvv);
$this-&gt;addDisplayGroup(
   array('card_name','card_type','card_num','exp_month','exp_year','cvv'), 
   'credit_card', 
   array('displayGroupClass&#8217; =&gt; 'card')
   );</code>
</pre>

<p>This sort of complexity has a tendency to accumulate over time, too. Before long, you may find yourself with code that breaks a form into columns, wraps tags around elements or just shoehorns blocks of markup in wherever they&#8217;ll fit. At that point, I&#8217;d recommend sleeping in locked rooms, because your front-end developer owns several large kitchen knives and knows where you live.</p>

<p>How, then, do you balance your client&#8217;s need for you to bring up forms 
quickly with your front-end developer&#8217;s need to have a form&#8217;s markup all
 in one file?</p>

<p>Personally, I like to keep the overall layout in the view with the important parts plugged in. For example:</p>

<pre>
<code>&lt;form id="new_widget"
    method="&lt;?php echo $this-&gt;form-&gt;getMethod(); ?&gt;"
    action="&lt;?php echo $this-&gt;form-&gt;getAction(); ?&gt;"
    enctype="application/x-www-form-urlencoded"&lt;?php
    if ($this-&gt;form-&gt;isErrors()) { echo ' class="errors"'; } ?&gt;&gt;
&lt;fieldset class="column"&gt;
    &lt;legend&gt;Details&lt;/legend&gt;
    &lt;ul class="elements"&gt;
        &lt;li id="elem_title"&gt;
        &lt;label for="title"&gt;Title:&lt;/label&gt;
        &lt;p class="note"&gt;(will be used to build the url slug)&lt;/p&gt;
        &lt;?php echo $this-&gt;form-&gt;title-&gt;renderViewHelper(); ?&gt;
        &lt;?php echo $this-&gt;form-&gt;title-&gt;renderErrors(); ?&gt;
    &lt;/li&gt;</code>
</pre>

<p>. . .and so on. You can see how easy it would be to loop through the elements 
rather than grabbing them individually, if you were so inclined.</p>

<h3>Validation</h3>

<p>Let&#8217;s consider a simple scenario: you must validate a postal code based on country: U.S. against <code>/^\d{5}(-\d{4})?$/</code>, CA against <code>/^[ABCEGHJKLMNPRSTVXY]\d[A-Z] \d[A-Z]\d$/</code>, etc. There are four common places to handle validation:</p>

<ol>
<li>Using validators attached to the elements</li>
<li>Using a central method for the form</li>
<li>Outside of the form class entirely, in the controller code</li>
<li>Using javascript automatically built by the form package</li>
<li>Using javascript elsewhere (e.g., at the bottom of the view)</li>
</ol>

<p>Element-level validation is the way to go if your country is coming from somewhere outside the form-if, for example, you&#8217;re using country code subdomains. You can have your form class accept a country code on build, and set the appropriate regex for your element when you add it. (Most packages come with a regex validator, and it&#8217;s a simple one to write if you&#8217;re rolling your own.)</p>

<p>But what if you&#8217;re getting the country from a select box? You&#8217;ll need a set of regexes and a way to switch between them based upon the value of the select. For this, you&#8217;ll need custom validation. Some packages have a built-in "callback" option for element validation, while others use custom classes. Zend Form sends the whole form values array to its validation classes as <code>$context</code>, but this is relatively uncommon. Most packages require you to fall back to the central validation method instead.</p>

<p>Centralized validation is typically implemented as a method you can extend (the parent version runs all of the element-level validation) to include rules that apply to the form as a whole. You can use it to switch out which fields are required, too, if the package allows you to update that information on the fly. The main disadvantage to making this your primary method of customized validation is that it&#8217;s not very re-use friendly. The logic to test a postal code based upon country will have to be copied between forms, or made into a function or class that can be included. (This is another advantage of Zend Form: the validators extend Zend_Validate rather than anything in the Form package, so they can be used on non-form data, without requiring any of the Zend Form classes.)</p>

<p>For obvious reasons, breaking out of the form to do validation outside of it isn&#8217;t ideal. If it ends up there, it should be an indicator that your package is too hard to use.</p>

<p>Auto-generated javascript is often more trouble than it&#8217;s worth: you don&#8217;t usually have much control over how the error messages are displayed, and just like auto-build views, they make front-end developers sad. Worst of all, they don&#8217;t save you much time. If you&#8217;re only using the validators that are already part of the package, it&#8217;s relatively painless, but any custom validation has to be written twice-once for the backend and once for the front end. That gets old pretty fast.</p>

<p>You can get most of the benefit of automatic Javascript by using an AJAX call on submit, but of course it only saves you the difference between the cost of a page load and the cost of an AJAX call. You don&#8217;t get the benefit of not contacting the server at all, but it&#8217;s usually sufficient for most forms. Using jquery as an example</p>

<pre>
<code>var formvals = {}; // grab these with .val() as necessary
$('#myform').submit(function (e) {
    e.preventDefault();
    $.post({
        url: '/path/to/script',
        data: formvals,
        error: function (ret) { showGenericError(ret); },
        success: function (ret) {
            if (ret.success) {
                // go to the next page, show an message, etc
            } else if ('messages&#8217; in ret) {
                // show them on the form
            } else {
                showGenericError(ret);
            }
        }
    });
});</code>
</pre>

<p>You would just need to add something to the backend to pull up your form class, run the validation method on the post data, and return some json to indicate success and or failure, with error messages keyed by element name. Your front-end dev can do whatever he or she wants with them in the callback. Note that using a real submit button also means it will degrade gracefully; if the user has javascript turned off, the form will behave as usual.</p>

<p>Custom javascript is usually best written ad-hoc, per form, with reusable parts included in a central file. Combining the AJAX method above with custom non-ajax validation (like checking password strength as you type) means the front-end and back-end remain separate, which can be a significant benefit when the front-end and back-end developers must work independently.

<h3>Custom elements</h3>

<p>Most of the time, regular HTML form elements are fine (and if they aren&#8217;t, there&#8217;s usually a jquery plugin to fix that), but occasionally you&#8217;ll have to do something much more complicated and then include it on multiple forms-for example, an image selector that taps into a pool of user-uploaded images. Often the code to support it will be extensive and involve not only markup but also validation, display, and filtering logic. Rather than try to keep a library of functions to do each piece, you can build a custom element that knows about all those parts and slot itself into the regular form flow.</p>

<p>In Zend_Form, this usually means creating a very simple custom form class:</p>

<pre>
<code>class My_Element extends Zend_Form_Element {
    public $helper = 'myElement';
    public function init () {
        parent::init();
        $this-&gt;addValidator(new My_Validate_MyElement());
    }
}</code>
</pre>

<p>And a view helper (assuming you&#8217;ve already set the view helper path):</p>

<pre>
<code>class My_Helper_MyElement extends Zend_View_Helper_FormElement {
   public function myElement($name, $value='', $attribs=array(), $options=null) {
     $info = $this-&gt;_getInfo($name, $value, $attribs, $options);
     $content = '&lt;div id="my_elem_' . $info['name'] . '"&gt;';
     // show the value in an interesting way&#8230;
     // load a jquery plugin, whatever
     $content .= '&lt;/div&gt;';
     return $content;
   }
}</code>
</pre>

<p>And, if necessary, a validator:</p>

<pre>
<code>class My_Validate_MyElement extends Zend_Validate_Abstract {
    const MYBAD = 'myBad';
    protected $_messageTemplates = array(
        self::MYBAD =&gt; "%value% is wrong because&#8230;',
    );
    public function isValid($value, $context = array()) {
        if (/* some test */) {
            $this-&gt;_error(self::MYBAD);
            return false;
        }
        return true;
    }
}</code>
</pre>

<p>Then you can add this element wherever you need it. . .like so:</p>

<pre>
<code>$this-&gt;addElement(new My_Element());</code>
</pre>

<p>Much less painful all around.</p>

<h3>In conclusion</h3>

<p>As you go about bending forms to your will, bear in mind that the choices you make will affect your whole team, and anyone who works on the code after you. Separating layout from your class structure will ensure that your front-end developer doesn&#8217;t have to learn about the Decorator Pattern, and modular validation will make for a easy transition when you have to offer the form as a web service. These packages aren&#8217;t as annoying as they can sometimes seem, and they can provide speed now and flexibility later, provided you use them with care.</p>]]></content:encoded>
            <pubDate>Wed, 08 Feb 2012 18:55:57 GMT</pubDate>
        </item>
        <item>
            <title>Why NoSQL Does Not Mean NoDBA</title>
            <link>http://omniti.com/seeds/why-nosql-does-not-mean-nodba</link>
            <guid>http://omniti.com/seeds/why-nosql-does-not-mean-nodba</guid>
            <description><![CDATA[Whether you like it or not, NoSQL is changing the world. Granted, it&#8217;s not even clear what NoSQL means sometimes, but there is no doubt that, for better or worse, we are in a renaissance of non-relational database systems right now. For me person...]]></description>
            <content:encoded><![CDATA[<p>Whether you like it or not, NoSQL is changing the world. Granted, it&#8217;s not even clear what NoSQL means sometimes, but there is no doubt that, for better or worse, we are in a renaissance of non-relational database systems right now. For me personally, I tend to ignore the hype, study these systems with a critical eye, and then deploy them where traditional RDBMS software struggles. I do occasionally bump into people who babble on about how NoSQL will put DBAs out of business. When I hear this kind of comment, I just nod my head and smile: It&#8217;s hard to convince people that their beloved paradigm shift is just more of the same, and also very seldom worth the effort. However, I was recently talking to an Oracle DBA, and he made some comments about how he was concerned that new companies would have no use for a DBA because they were all switching to NoSQL. This surprised me a little, actually. I figured if there was one group of people who wouldn&#8217;t buy into the NoSQL hype, it would be the stalwart Oracle crowd. Et tu, Brute? If the hype has gotten to them, I guess it&#8217;s time for me to speak up. Whatever you think about NoSQL, the death of the DBA is a ludicrous idea. </p>

<p>One of the little known secrets of NoSQL systems is that they are used to hold data. Most NoSQL systems try to trumpet the ease of pushing data in and out of the system: "just push your JSON object to the server and your data is instantly stored, regardless of structure." It&#8217;s Oh, So Magical. The problem, of course, is that easily dumping data into a system doesn&#8217;t mean much if you can&#8217;t get it back out. This is where a data model comes into play. </p>

<p>In a traditional RDBMS environment there is usually someone who designs a data model ahead of time, breaking down information into a relational design, sometimes even drawing up an ERD diagram for people to reference, so they can find their information. This would be turned into DDL, committed to the database and then enforced rather strictly. Try to insert the wrong type of data, and you&#8217;ll get an error. Try to query for a column that doesn&#8217;t exist, and you&#8217;ll get an error. I&#8217;ve seen many a developer complain that the database is "too strict" because they couldn&#8217;t get their queries right, but the truth is you still need a <a href="http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/forum-application-data-model-conversion-td5210111.html"><span>data</span></a> <a href="http://markmail.org/message/p3h2r5dq5o2hqmod#query:+page:1+mid:5cc7yy5323b5mgkl+state:results"><span>model</span></a> even though you went to a NoSQL system. You won&#8217;t hear much about these problems from NoSQL advocates because the type of shops that use them are often smaller shops, where there are a small number of developers, and everyone knows what everyone else is doing, more or less. Or they are using NoSQL on new projects&#8202;&#8212;&#8202;where the scope of the system still fits in head-space pretty easily. However, as code bases grow, and projects stretch on for years, and developers come and go, not knowing what data is stored where means that you start to wish you had a data model. You recognize this the first time you hit a bug because you assumed that not getting a value back in the "items" key meant the user had no items in their cart, only to realize later that you should have been looking at the "item" key. If you&#8217;re lucky, this type of bug pops up on information retrieval rather than in information storage; but either way, cleaning up such a bug can be painful. You&#8217;ll also notice how, often, you have to pull back objects just to see what information is actually stored, because depending upon when the object was inserted, it may have a different opinion on what the data it should be holding looks like. Of course, you can go back and dig through the code that puts the data in, but this assumes you know where that code lives. Maybe you could ask the guy who originally wrote that feature, but suppose he left the company six months ago? Oh well, grep is your friend.  </p>

<p>So let&#8217;s pretend you&#8217;ve taken the time to discuss what data you must hold up front, and someone keeps a diagram of that posted on the wall. . . awesome. Sadly, you must still implement it in a physical system. In all the hype about "schema free" databases, the fact that in order to get good performance you still have to make adjustments for physical layout, or build things like indexes into your system to make queries against the server fast enough, is often overlooked. Yes, even on NoSQL you still need to know about <a href="http://mongly.com/Speedig-Up-Queries-Understanding-Query-Plans/"><span>explain tools and indexing</span></a>.</p>

<p>Think about what this means: Someone has to recognize that a certain piece of data is going to be requested a lot or notice a performance problem on the existing server. Once you figure out the right index to build, someone has to build it in production. This means locking, potentially, and certainly means an IO hit. Is your application developer going to be responsible for this? Does he know if your index build will require backfilling, like when you build a secondary index on existing data in Riak Don&#8217;t get me wrong, it&#8217;s not that they aren&#8217;t capable of doing this work, it&#8217;s just important to realize that this type of work must be done. </p>

<p>OK, you have a data model, and are managing the physical implementation; that&#8217;s good. But did you know that your NoSQL system still must interact with the disks on your server? It does. The better question is, do you know *how* it interacts with the disks? Actually, before we talk about the disks, do you know how it interacts with RAM? Some NoSQL systems absolutely fall over when they hit thresholds larger than RAM. For some, it&#8217;s total data set size, for others it might be the size of all index pointers in the system. Of course, maybe you have a system that doesn&#8217;t fall over, it just becomes slower, perhaps unacceptably slower. In either case, you need to be aware of these limitations. Have monitors in place for them, and then perform capacity planning accordingly. Now, let&#8217;s get back to disks. How crash-safe is your NoSQL server? Does it give you single node durability? Are writes automatically synced to disk with each put, or are they batched up and pushed out occasionally? Maybe it&#8217;s configurable; do you know how your systems are set up? If you were using Postgres, you could tune the durability guarantees for all of these cases. Your DBA knows this, and whoever is in charge of your NoSQL system needs to know this. Even if you think you are storing data you can afford to lose, chances are your business model must be aware of just how much exposure it has. Oh, but yours is a start-up, so you don&#8217;t have business concerns yet. . .? Still, the level of durability is going to have significant impact on your IO needs, and that, in turn, will impact your performance&#8202;&#8212;&#8202;and you can&#8217;t post your <a href="http://www.google.com/url?q=http%3A%2F%2Farstechnica.com%2Fbusiness%2Fnews%2F2011%2F09%2Fgoogle-devops-and-disaster-porn.ars&sa=D&sntz=1&usg=AFQjCNHwMY0pCnCygkWB0rqAtvk0gZjk4Q"><span>devops data-porn</span></a> if you can&#8217;t get decent systems performance.</p>

<p>Of course, disks are kind of unimportant these days, given that everyone runs multiple nodes, and you can have a distributed hash table running across multiple nodes with just a handful of Chef commands. That said, have you ever managed a complex distributed system? You know who probably has? Your DBA. By far the most common answer to the failover problem is to stick up a replicated database slave. It&#8217;s also common to see people putting up slaves for horizontal read scaling. DBAs understand the tradeoffs in consistency guarantees that come with these types of systems&#8202;&#8212;&#8202;not just at the node level, but from the applications point of view as well. You&#8217;ll need solid understanding of this on your dev team if you are going to build apps against a distributed data system. In addition, someone has to manage all of these servers and make sure they perform well. If your NoSQL system uses master-slave replication, someone with experience in this area might be handy. If you&#8217;ve ever built a Master-Master pair with individual Slave systems, you probably know what I am talking about. Oh, do you think running a clustered hash table system is easier? Just because you can add a new node to the ring doesn&#8217;t mean it&#8217;s free. You need both server level and cluster level monitoring in place. You need to make sure you can afford the IO and network strains as data is copied around, and you need to know under what circumstances locking will be involved. These things really do happen. </p>

<p>I remember when the MySQL documentation had a section devoted to explaining why foreign keys weren&#8217;t needed. Of course, once MySQL finally implemented foreign keys, it became a major headline for their release announcements. This is what happens as systems mature. Most NoSQL systems can cut down on overhead by eliminating (or more accurately, not implementing) many of the features people have come to expect from an RDBMS. Of course, which features are eliminated differs across systems. </p><p class="blockquote">Did you know you can write triggers for Riak in either Javascript or Erlang? Exactly which language you can use when differs depending upon the type of trigger.</p><p>To wrap your head around this, you need to have a good understanding of how triggers work, how asynchronous calls affect transaction semantics (or the lack thereof), and what types of work you might want to do on the server side. Some triggers are used to enforce data integrity or do data manipulation at the server level; these are the types I think work fine within a vertically scaled system. Others really are extensions of the application, and while they are sometimes frowned upon for adding overhead into a centralized resource like a typical RDBMS, in a decentralized system that scales out the arguments against them aren&#8217;t as clear cut. One thing I do know though: this is probably not something your SA wants to be involved with at all.</p>

<p>If all of this isn&#8217;t enough to make you think twice, let me mention one more thing. While you may not have a query language in your NoSQL system, that doesn&#8217;t mean you don&#8217;t query against it. Whether you are writing <a href="http://browsertoolkit.com/fault-tolerance.png"><span>distributed map-reduce queries</span></a>, trying to balance link-walking vs. secondary indexes, or trying to figure out whether the code you&#8217;ve written to pull back every key in the system is going to be a problem; there are going to be times when you will have to make these queries more efficient. This is probably going to be a more application-centric type of tuning than the traditional RDBMS, but watch as someone in your dev team becomes known as "the go to guy" for making your map/reduce query run more efficiently. And incidentally, you should also be aware that many of today&#8217;s NoSQL systems are trying to bolt on <a href="https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ExampleQueries"><span>SQL</span></a> and <a href="http://code.google.com/appengine/docs/python/datastore/gqlreference.html"><span>SQL-like</span></a> interfaces into their systems. Who fails to get excited when thinking about rewriting queries with subselects into joins clauses against a Hadoop cluster?</p>  

<p>If you think that managing all of this sounds like an impossible task, you&#8217;re welcome. This is the job that DBAs have been doing for years. . .and yes, it can be incredibly challenging. Of course, it doesn&#8217;t have to work this way. You can draw the lines of responsibility differently right now. Make the application developers manage the data model, design the schema, and tune the queries. Let your ops people be in charge of building new nodes, managing replication, and ensuring you have valid backups. Maybe 10 years ago, you had to have a DBA to work with Oracle, but nowadays just about anyone doing software engineering can put up a Postgres database and tune their way to usability with about three wiki links; you don&#8217;t need a DBA to design good operational habits. </p>

<p>Also, some of you might think of this as some type of doom and gloom piece against NoSQL; it is not meant that way at all. It&#8217;s not that switching to NoSQL is a bad idea necessarily; there are some things that RDBMS software can&#8217;t do as well as a more dedicated solution. But, if you think that switching to NoSQL will just let you hand-wave away all of the challenges of running a database, you are terribly misguided. If you&#8217;re a DBA and you are worried about a future with NoSQL, take heart; study your product less and focus on these key architectural design points more. Those skills are critical now, and they will remain so in the future, NoSQL or not.  </p>]]></content:encoded>
            <pubDate>Wed, 18 Jan 2012 21:15:00 GMT</pubDate>
        </item>
        <item>
            <title>Infrastructure Cost Reduction</title>
            <link>http://omniti.com/seeds/infrastructure-cost-reduction</link>
            <guid>http://omniti.com/seeds/infrastructure-cost-reduction</guid>
            <description><![CDATA[Everyone would like to spend less money on their server infrastructure, but it can be difficult to figure out where money can be saved, and whether reducing the amount that you spend on infrastructure will result in a revenue drop, caused by outages an...]]></description>
            <content:encoded><![CDATA[<p>Everyone would like to spend less money on their server infrastructure, but it can be difficult to figure out where money can be saved, and whether reducing the amount that you spend on infrastructure will result in a revenue drop, caused by outages and reduced service quality. By looking at your costs, revenue, system metrics and task/time management system together, it&#8217;s possible to avoid these pitfalls, and reduce the amount of money that you spend on your infrastructure while simultaneously improving the quality of the services that you provide to your customers.</p>

<p>One of the first steps to reducing your infrastructure cost is to make certain that you have a monitoring solution in place that can help you identify underutilized parts of your infrastructure, as well as what parts of your infrastructure are costing you money.  </p>

<p>Most shops already have metrics in their monitoring system that can be used to identify underutilized equipment, but if you do not: CPU usage, disk IO, network IO, and memory usage are all good places to start looking&#8202;&#8212;&#8202;and they are all easy to monitor. When you find underutilized equipment, a money saving solution is usually pretty obvious. If the system is older, and is a good candidate for virtualization, then it should probably be virtualized. If an expensive network link is not being heavily utilized, or it&#8217;s being over utilized, reconsider whether there are cheaper (and better) options available from another vendor. Reducing costs by virtualizing, moving off of old hardware and getting network connections that are suited to your business should be second nature at this point, but it can still be difficult to isolate exactly where you can do this. Simple systems monitoring can help.</p>

<p>Figuring out which parts of your infrastructure are costing you money can be more difficult. Most businesses are not monitoring the number of customers they lose as a result of unplanned downtime, or what the cost of support on old hardware is over time. By integrating business information (like number of customer sign ups over a period of time, amount of money refunded or number of support cases opened) into your technical monitoring, error detection or trending system, you can immediately see what the results of an outage or change are; and, you&#8217;ll know how much money should be spent to fix a problem, or to build a more fault-tolerant infrastructure.</p>

<p>There are some things that are difficult to automate in monitoring, but should still be reviewed on a regular basis. Support contracts, rack space/colocation bills, bandwidth overages (or underutilized contracts for bandwidth) and power bills all fall into this category. As equipment and environments age, fixed costs become taken for granted. When this happens, you&#8217;ll frequently forget that you are paying money for rack space that is no longer in use, bandwidth that is no longer necessary, and expensive support contracts on systems that could be virtualized onto an under-utilized new system. A newer system is often already in place, but the legacy system is left running for months, if not years, in case the new system fails. When you replace something and keep the old system as a backup, set up a reminder to revisit the decision to keep the old system after a month or two. If you don&#8217;t, the legacy equipment can end up staying in use for years before someone remembers it.</p>

<p>Finally, review your own time tracking system to see how you are spending your time. It&#8217;s easy to get into the rut of documenting a manual way of taking care of a task, and then doing it that way every time. If you can automate a process (or even parts of a process), or make the documentation simple enough for anyone else to follow, you can reduce the amount of time you spend on things and have more time for setting up new clients, investigating new software and helping your users. </p>

<p>One of the things to look for in your time tracking system would be who is spending time on tasks, and what those tasks are; if senior people are using their time to do the same task over and over again, it can be a sign that the task should be better documented, so that more people in your group can take care of it (and the more expensive time of senior administrators can be used for more difficult work).</p>

<p>To summarize:</p>
<ol>
<li>Monitor everything and look for under- or over-utilized resources.</li>
<li>Track your time; if you are spending lots of time on the same procedures; automate them. If you are responding to the same problems over and over again, find a way to permanently fix them.</li>
<li>Watch your invoices. It&#8217;s easy to pay support, bandwidth and power bills month after month, or year after year, without reviewing them to see if you can get a better deal elsewhere, or to see if non-critical infrastructure is costing you more money than you would like.</li>
</ol>

<p>These ideas are all simple, but by considering them during your day-to-day operations, as well as during periodic reviews, you may find yourself spending less money, and using less time, on keeping your infrastructure running well. Reducing the cost of your existing infrastructure gives you more time and money to spend on improvements and new projects, rather than merely on maintaining what is already in place.</p>]]></content:encoded>
            <pubDate>Thu, 01 Dec 2011 20:37:47 GMT</pubDate>
        </item>
        <item>
            <title>Sometimes &#34;Sexy&#34; Can Be the Right Choice</title>
            <link>http://omniti.com/seeds/sometimes-sexy-can-be-the-right-choice</link>
            <guid>http://omniti.com/seeds/sometimes-sexy-can-be-the-right-choice</guid>
            <description><![CDATA[
Hardly a month goes by these days without some exciting new technology hitting the blogosphere, filling the imagination of CTOs all over. At OmniTI, we are often approached by people asking about the "razor&#8217;s edge" technology of the week. Freque...]]></description>
            <content:encoded><![CDATA[
<p>Hardly a month goes by these days without some exciting new technology hitting the blogosphere, filling the imagination of CTOs all over. At OmniTI, we are often approached by people asking about the "razor&#8217;s edge" technology of the week. Frequently, they are convinced that this is the technology that they need for their business, and often will try to shoe-horn their requirements to fit the new toy. We typically have to convince people of things like their dusty old relational database actually handling their data needs just fine, even if it isn&#8217;t <a href="http://www.mongodb-is-web-scale.com/"><span>web scale</span></a>. Tried and true typically works better than shiny and new.</p>

<p>Sometimes however, a client&#8217;s requirements really do lend themselves nicely to the newer technologies and we are justified in playing with them during business hours instead of at home! We love our <a href="http://omniti.com/is/hiring"><span>jobs</span></a> at OmniTI.</p>

<p>The request we&#8217;ll review here was fairly simple. The client needed a highly scalable and fast web service to provide geo-location data, based upon the ip address of the requester. They also had to serve a small static file. The service would run for a few months and then would be discontinued. It didn&#8217;t require true high availability, but we had to be able to fix it quickly if something went wrong.</p>

<p>Using technologies we were already employing on the project, we wrote a simple <a href="http://labs.omniti.com/labs/mungo"><span>Mungo-based</span></a> perl script to look up the information in the <a href="http://www.maxmind.com/app/ip-location"><span>MaxMind</span></a> city level database and return the data inside of a JSON object. Once placed on the existing Apache httpd servers along with the static document, we had a working prototype for their third-party to develop against, while we looked at the more complex issues in the request.</p>

<p>In this case, there were two immediate concerns:</p>
<ul>
<li>The service had to be fast and handle a <strong>lot</strong> of requests.</li>
<li>This component should not endanger the availability of the rest of the web services.</li>
</ul>

<p>The web farm already deployed to handle the client&#8217;s business used the <a href="http://httpd.apache.org/"><span>Apache httpd server</span></a> and, leveraging the platform flexibility, it grew to support a number of legacy web services. As this setup was already tweaked for these particular needs, we didn&#8217;t really want to reconfigure it. However, we needed to know where we stood from a performance point of view, to find out how much traffic we could handle.  A quick <a href="http://httpd.apache.org/docs/2.0/programs/ab.html"><span>Apache bench</span></a> testing revealed:</p>
<pre><code>Document Length:        188 bytes
Requests per second:    1306.85 [#/sec] (mean)
Time per request:       38.260 [ms] (mean)
Time per request:       0.765 [ms] (mean, across all concurrent requests)
Transfer rate:          520.79 [Kbytes/sec] received
</code></pre>
<p>Our goal was to be able to safely handle about 5,000 requests per second. While we could sustain that traffic by scaling out across our client&#8217;s multiple web servers, when traffic volume would reach worst-case expectations, there would be an unsafe likelihood of the servers becoming saturated, followed by service degradation. Or worse yet, all of the web services could become completely unavailable. Needless-to-say, either case would be unacceptable. We had to isolate this service from the rest, however isolating on similar hardware as we were currently using for the web-farm would have been a prohibitively expensive solution, especially considering the transient nature of a project designed to last only a few months.</p>

<p>With such requirements, a cloud deployment was the obvious choice. While there <a href="http://joyeur.com/2011/04/22/on-cascading-failures-and-amazons-elastic-block-store/ "><span>are</span></a> <a href="http://stu.mp/2011/04/the-cloud-is-not-a-silver-bullet.html"><span>plenty</span></a> of <a href="http://joyeur.com/2011/04/25/network-storage-in-the-cloud-delicious-but-deadly/"><span>reasons</span></a> to stay away from the cloud, there are some really good reasons to use it, as well. The cloud would let us use exactly as much CPU and bandwidth as we needed, and provide an easy and quick way to get more if we required it. Our service did not store persistent data, even at a session level, so if a cloud instance went "poof," there was nothing we couldn&#8217;t afford to lose. When no longer needed, we could just shut down or scale back the servers, without worrying about excess hardware&#8202;&#8212;&#8202;the exact benefit cloud supporters always want. EC2, here we come!</p>

<p>With the move to EC2, we had the option to deploy the prototype code that we had written already. However, as that code leveraged an existing ecosystem designed to service a much wider spectrum of needs, duplicating the environment would have been overkill, and attempting to strip it down to the minimum necessary would have been a rather daunting challenge with little long-term benefit. With the luxury of exploring a green field approach, we turned our attention to Node.js. At OmniTI, we had the advantage of having seen Node.js used already a few times for production services, and we had even incorporated it into a few solutions we had developed, so we knew that the type of light-weight, fast response code that we were looking to develop for this project was very well suited for Node.js. Through a bit of serendipity, <a href="http://omniti.com/is/theo-schlossnagle"><span>Theo Schlossnagle</span></a> had just recently branched, and then finished, a new version of <a href="https://github.com/postwait/node-geoip"><span>node-geoip</span></a> that was capable of reading the MaxMind City database. Add to that my personal joy for getting the chance to use Node.js in production, for a customer project, and the decision was clear.</p>

<p>Plan in hand, the perl script was quickly converted to Node.js and placed on a small Apache EC2 instance for load testing (thanks to <a href="http://omniti.com/is/zach-malone"><span>Zach Malone</span></a> for assistance with all of the cloud benchmarking work). The entire code follows.</p>

<pre><code>var http = require('http'),
    sys  = require('sys&#8217;),
    geoip= require('geoip');

var con = new geoip.Connection('/www/geodata/GeoIPCity.dat', 0, function(){});

http.createServer(function (req, res) {
    if( req.url == '/get_city' ) {
        res.writeHead(200, {'Content-Type': 'text/plain'});
        var ip = req.headers['x-forwarded-for'] || 
                     req.connection.remoteAddress;
        con.query( ip, function(result) {
            var obj = new Object();
            if(!result){ obj.city = 'Unknown';   }
            else       { obj.city = result.city; }
            res.end(JSON.stringify(obj) + "\n");
        });
    } else if( req.url == '/crossdomain.xml' ) {
          res.writeHead(200, {'Content-Type': 'text/xml'});
          res.end("<?xml version=\"1.0\"?>\n<!DOCTYPE cross-domain-policy SYSTEM \"http://www.adobe.com/xml/dtds/cross-domain-policy.dtd\">\n<cross-domain-policy>\n<allow-access-from domain=\"*\" />\n</cross-domain-policy>\n");
    } else {
          res.writeHead(404, {'Content-Type': 'text/plain'});
          res.end("File not found.\n");
    }
}).listen(80);
</code></pre>

<p>Slightly more than twenty lines of code. This will return a JSON object with the name of the nearest city, based upon client IP address, or "Unknown" if it does not resolve. It will also serve a crossdomain.xml file to any flash objects that need one, and return a 404 to any other requests. Where&#8217;s the web server, one may ask? Node.js takes care of all of that for you.</p>

<p>Simple Apache bench testing of this code gives between 300 and 600 requests/second on a Small EC2 instance with a single virtual CPU. </p>
<pre><code>Document Length:        223 bytes
Requests per second:    344.75 [#/sec] (mean)
Time per request:       145.033 [ms] (mean)
Time per request:       2.901 [ms] (mean, across all concurrent requests)
Transfer rate:          96.62 [Kbytes/sec] received
</code></pre>
<p>Yes, much slower than what we were benchmarking on our web-farm, but much cheaper to scale out, not to mention that we were having the service separation we wanted. To scale out, we had to load balance our instances; <a href="http://aws.amazon.com/elasticloadbalancing/"><span>Amazon&#8217;s Elastic Load Balancing</span></a> was used here.</p>

<p>It was expected that a small decline in service would occur due to the overhead, but we were pleasantly surprised to see slightly BETTER performance. Apparently, getting the IP address in Node.js from the request header is faster than getting it from the connection object, so having a load balancer in the middle actually improved performance.</p>
<pre><code>Document Length:        223 bytes
Requests per second:    367.87 [#/sec] (mean)
Time per request:       135.918 [ms] (mean)
Time per request:       2.718 [ms] (mean, across all concurrent requests)
Transfer rate:          112.40 [Kbytes/sec] received
</code></pre>
<p>In repeating the test with two, and then three, server instances behind the load balancer, each new instance added continued to scale the volume of requests-per-second we could handle by another ~350-600 requests/second. So, with only three small EC2 instances, we were able to crank out between 1200-1500 requests/sec of GeoIP lookups.</p>

<p>350 to 600 requests/second is a pretty large window, and it means that some of our EC2 instances do much more work then others. This is something that you have to deal with when you are deploying a cloud-based solution. Thankfully, EC2 gives you a lot of flexibility to rapidly create and destroy instances, so if you get an especially slow instance, it can be worth throwing the instance away and creating a new one. As a bonus, if needed, it takes fewer than 15 minutes to manually get a new instance provisioned, set up, and running, without using Amazon EBS. Not relying on EBS enabled us to dodge the infamous <a href="http://aws.amazon.com/message/65648/"><span>EC2 outage</span></a>.  Our service was unaffected despite running in the unfortunate Virginia data center cloud.</p>

<p>Now, just because we were using Node.js, and we were deploying to the cloud, doesn&#8217;t mean we toss away due diligence. In order to make certain that the EC2 solution offered good performance for the money, we decided to benchmark the same code on a Joyent <a href="http://www.joyent.com/products/smartmachines/ "><span>SmartMachine</span></a> that we had available.  A single Joyent system had the performance of ~3.5 small EC2 instances:</p>
<pre><code>Document Length:        223 bytes
Requests per second:    1564.30 [#/sec] (mean)
Time per request:       31.963 [ms] (mean)
Time per request:       0.639 [ms] (mean, across all concurrent requests)
Transfer rate:          438.43 [Kbytes/sec] received
</code></pre>
<p>The cost of the Joyent system was, however, twice as much as three small EC2 instances, plus a Elastic Load Balancer. Joyent includes a generous amount of bandwidth with any instance (Amazon does not), but their large, fixed monthly cost meant that we would not have as much flexibility to scale up and down as we did with EC2, which has hourly billing.</p>

<p>So, we had a working solution at this point, but we still had to make sure it would continue to work; in short, it had to be monitored. Normal end-to-end monitors and request timing monitors were put in place on the load balancer, as well as checks on each individual server instance. But, we also wanted to know how much traffic we were serving without anymore fuss. Node.js could keep track of that for us as well. By simply adding:</p>
<pre><code>var cities = 0, xmls = 0, fnf = 0, status = 0;
</code></pre>
<p>&#8230; some variable++'s in the appropriate spots, and &#8230;</pr>
<pre><code>
    } else if( req.url == '/status&#8217; ) {
        res.writeHead(200, {'Content-Type': 'text/plain'});
        status++;
        var obj = new Object();
        obj.cities = cities;
        obj.xmls   = xmls;
        obj.fnf    = fnf;
        obj.status = status;
        res.end(JSON.stringify(obj) + "\n");
</code></pre>

<p>. . .we could see exactly how much traffic, of each type, that each Node.js instance had served; along with whether any of them had crashed (as evidenced by a reset counter).  This was set up to be pulled by <a href="http://circonus.com/"><span>Circonus</span></a> which can consume the JSON data and graph the usage trends over time.</p>

<p>Perhaps also of interesting note, all of this was done almost a year ago.  "A few months" turned into much longer. The client&#8217;s required utilization has gone up and down with a corresponding number of EC2 nodes added or removed.  But this simple script hasn&#8217;t had to be modified or touched since.  It has happily run a production service without any problems for a minimal amount of time invested.</p>

<p>To be fair, this was a rather simple problem that could have been solved in a number of different ways, perhaps even more effectively. But sometimes it behooves you to explore those sexy new technologies, learn their trade-offs and understand them better. In this way, you&#8217;ll understand the trade-offs involved, and you can feel comfortable deploying them for critical components of an architecture. While it&#8217;s essential to remember that sexy doesn&#8217;t mean good, it&#8217;s a pleasant reminder that sometimes good can be sexy.</p>
]]></content:encoded>
            <pubDate>Tue, 22 Nov 2011 20:03:27 GMT</pubDate>
        </item>
        <item>
            <title>Thoughts on Web Application Deployment</title>
            <link>http://omniti.com/seeds/thoughts-on-web-application-deployment</link>
            <guid>http://omniti.com/seeds/thoughts-on-web-application-deployment</guid>
            <description><![CDATA[Abstract: A quick overview of various strategies to ease deployment of web applications, and some common pitfalls and failure modes to avoid.  Intended to be broadly technology-agnostic.

Introduction
Over the course of my career, I&#8217;ve worked in ...]]></description>
            <content:encoded><![CDATA[<p><i>Abstract: A quick overview of various strategies to ease deployment of web applications, and some common pitfalls and failure modes to avoid.  Intended to be broadly technology-agnostic.</i></p>

<h3>Introduction</h3>
<p>Over the course of my career, I&#8217;ve worked in a number of different environments, each with their own particular processes and procedures for deploying systems, from development to production. Over time, a number of best practice patterns and common anti-patterns have emerged, which this article will attempt to enumerate and explain. I hope this information will give you pointers and direction to improve your processes so that that deployment is made both easier and less error-prone. As much as is possible, I will be technology agnostic, so of course your particular environment may vary or require additional steps, but following along the broader themes listed here should be helpful.</p>

<h3>Step Zero: Intelligent Use of Source Control</h3>
<p>This seems like something that is almost forehead-slap obvious, but you should be using source control during development. Working without it is like doing high-wire acrobatics without a safety net. There is no project so small, either in scope or staff, that cannot benefit from some sort of source configuration management (SCM) system in place. Selecting which SCM to use (e.g., Subversion, git, Perforce) depends upon your team&#8217;s development style and environment choice , and is beyond the scope of this article. In general, the Pragmatic Programmers, O&#8217;Reilly and APress books covering particular systems tend to be good resources.</p>
<p>Further, source control must be used intelligently to be of much use. Common anti-patterns include things like having the development environment exist in only one shared space that uses one source control checkout so people are colliding over editing the same physical bits for any given file, or never branching/tagging so that trying to determine the exact state of your system at a given point in the past is an exercise in frustration.  If you ever have to ask "is anyone else editing this file?," you are either using a supremely broken SCM or you are doing something gravely wrong.</p> 
<p>Your SCM setup should enable you to work concurrently and in isolation with a minimum of hassles, allowing easy integration of work done concurrently on the same module or set of modules; easy reproduction of the system as it was at any point in time in the past; and, ideally, easy searching of commit history because all of these scenarios will come up repeatedly in any project of significant scale. For example, if you find yourself trying to diagnose a recurrence of an issue with a particular ticket number, your diagnosis will be vastly sped up if it is easy to find all of the commits related to that ticket number in the past. Similarly, being able to tell easily who exactly added a particular feature months or years ago might make it much faster to track down the organizational knowledge required to extend or fix it in the present.</p> 
<p>The more isolation feasible in the environment, the less coordination overhead is required to work together as a team on a given workload, meaning that your productivity scales more linearly with additional developer resources. Good usage of a SCM can aid this by making it easy to keep individual development environments in sync; a common best practice is to make sure that each developer can run their own stand-alone copy of the execution environment based upon a frequently-updated SCM checkout. As a specific example, with the common LAMP (or similar) technology stack, it is easy for each developer to have an account on a shared machine, with a checkout in their home directory that is used as the root for a vhost/distinct port so that each developer may work in isolation talking to different ports on the same host.</p>
<p>Having a particular "branch" or "tag" devoted to staging/testing and production environments is a common best practice. The particulars of how this is done vary by SCM, but every modern SCM should have some facility corresponding to one or the other of the above if not both. Branching/merging tend to be slightly more complex operations in most SCM systems, but the effort expended in learning these operations will repay itself many times over as you begin to be able to take a more sophisticated approach to the various states and stages of your system&#8217;s evolution over time. See steps two and three below for further discussion along this theme.</p>

<h3>Step One: Deployment Planning</h3>
<p>Take the time to write out a deployment plan, even if it&#8217;s just a brief one. At a minimum, your deployment plan should include:</p>
<ul>
<li>Name and short purpose description of the project (seems obvious, but depending upon how widely this information is distributed and how big your organization is, your readers may not automatically know what you&#8217;re working on in order to tie your authorship back to what this proposed production deployment is all about)</li>
<li>Names of and contact information for the staff responsible for its development (particularly tech leads and project managers)</li>
<li>Source location (e.g., links into the SCM&#8217;s web interface or descriptions of how to retrieve the source for SCM&#8217;s that don&#8217;t have a web interface)</li>
<li>List of affected systems/what resources will be used (i.e., which servers this code will be pushed to, are there any extra steps that have to take place, like running of database modification scripts or setting up of extra server software/new configuration for existing resources?)</li>
<li>Deployment and Rollback procedures (this may reference standard operating procedures in other documents if there is nothing out of the ordinary for the given deployment)</li>
</ul>
<p>A wiki is a common tool for this, but even an email to the right people or a mailing list can suffice. One advantage of a persistent document is that it can "grow" over time as the project evolves rather than being reconstructed from scratch on each deployment (i.e., things like project staff or the particular branch of the project&#8217;s source being used may change only infrequently, but dated/versioned logs might be kept of which revisions were rolled out when, or which revisions required an extra step, etc.). This is particularly important for continuous deployment environments (see next section). It is also easy to maintain a template like this, so that each one follows a common layout and helps staff remember frequently overlooked steps (e.g., helping developers remember to mention system software config changes required or database schema changes).</p>
<p>This may seem like unnecessary overhead (developers who truly enjoy writing documentation are not common), but retaining organizational knowledge about what is deployed where, when and why is crucial to keeping non-trivial systems running over time. Even if the original staff who deployed a system are still with the company (staff turnover being a fact of life for any organization), remembering precisely what was done and why potentially years after the fact is not an easy feat. The effort invested in this now will repay itself in the future.</p>

<h3>Step Two: Continuous Deployment<br/> or Phased Deployment?</h3>
<p>There are two common modes of deployment for web applications. The first models traditional software engineering by using phased deployment, where phases of release correspond to planned or scheduled bundles of additional features and new bug fixes. A variant of this is the "boxcar" or "feature train" model that ships a defined release on some set schedule ("If it&#8217;s ready in this six week window, it goes on the shipment train, if not, it waits for the next one."). This is common for environments that have rigorous quality assurance or change control requirements, as it allows a built in time period prior to each phase&#8217;s release for those processes to execute in a regular, repeatable fashion on a known schedule. In these environments it is common to "snapshot" the particular phase for deployment in some fashion, via something like a "branch" or "tag" as discussed in step zero. For example, a Q3 phased release for a system might have "prod-Q3-2011" as its source branch/tag. The state of the system so denoted might then further be used as the base for issue remediation hotfixes that must go live in between regular phased releases. For an automated deployment environment (see next section), the system would need to be aware of the correct current branch/tag to use as its deployment source (or perhaps offer the option of the currently available sources that match a given pattern to the user making the deployment).</p>
<p>The second, and more recent development, is continuous deployment. With continuous deployment, new features or bug fixes may go live at any time. Some environments push live to every user at the same time, and others use a "feature flag" approach where a given user must have a given flag or set of flags in their active session or profile to be exposed to the new code.  Care must be taken for "feature flag" setups to ensure that tests of the system (see monitoring and verification section below) are using the correct flag or sets of flags to accurately capture the state of the system as the end users see it.</p>
<p>What I will say next has proven to be one of the more contentious parts of this essay in internal discussion, so I freely admit that this is a point worthy of further thought particularly as time provides more evidence on the ways that continuous deployment works or fails.  I do not believe that continuous deployment systems should be configured such that the source for pushes to production machines (e.g. a branch or trunk or whichever nomenclature is appropriate for your environment) is the same space that developers initially check code into.  I am willing to stipulate that things like developer mindset and discipline in concert with automated checking scripts within the commit process may eliminate many sources of error that could be introduced in such an "insta-live" system, but I&#8217;m also a big believer in the power of Murphy.  Having some separation here, however low friction, should help prevent many errors (maybe something like a code review queue or holding pen that things go through before going to production, or development in branches with deploys drawn from trunk with many and small merges vs fewer large ones).  In my mind continuous deployment is more about release automation, scope of work per released quanta, and democratization of the release process combining to empower individuals to release quickly than any particular SCM configuration litmus test.</p>
<p>Which of these approaches works best for your team may be dictated by the business/regulatory needs your application must satisfy, or may be limited only by the consensus of personal preferences involved. Generally speaking more conservative environments will trend toward phased deployment out of necessity.</p>

<h3>Step Three: Deployment Automation</h3>
<p>The easier you can make it to do the "right things" for your environment, the more likely people are to do them. What these steps might be will of course vary widely, but common examples may include things like moving files into place (static assets, interpreted code), compilation and movement/packing of resultant binaries (for compiled language environments), application of database changes, and so forth. Almost all of these various steps may be automated via some mechanism (e.g., via scripting languages either directly on a command line, or perhaps in more sophisticated environments an actual deployment manager standalone application). Particularly large, multi-server systems may choose to roll deployments out in stages to increasingly larger subsets of their total infrastructure, effectively using A/B testing with progressively larger portions of their active user base to check for any problems as scale increases or negative user experience feedback; this is obviously much better done in as automated as possible fashion as the chance for simple errors increases dramatically as the number of manual interactions increases.</p>
<p>As a quick example of a simple implementation of this kind of setup, several years ago I worked in a PHP-based environment which used the "qa" and "prod" CVS tags to tag particular revisions of various files as being suitable for deployment to a particular environment. When a developer (with the right access privileges) accessed the deployment manager web application, he could select which tag to deploy; the system would do a CVS checkout of that tag to a scratch area and then do an rsync command to move all of the code (and associated static assets) to the appropriate server(s). This radically reduced the overhead of deploying code, although it was not perfect in the sense that database and other system config changes still required the involvement of the relevant systems teams. A variant of this in use at another organization similarly depended on conventional branches for "qa" and "prod", but instead of using rsync would use ssh to invoke svn up commands directly on each affected machine (Apache was configured to deny access to .svn directories).</p>
<p>One tool that seems often overlooked in this area is use of OS-native software packaging mechanisms to distribute content and execute scripts required for the given change set. These scripts may be either tailored to the particular release, or may be general standard scripts that by convention draw data from named portions of the source being deployed (e.g. a "db/001.sql &#8230; N.sql" file set might be iteratively applied in order if they exist, or a "etc/001.patch &#8230; N.patch" set of patch files might be applied in a similar fashion). Use of this sort of packaging system will make it much quicker to verify that a given app is installed, what files are associated with it, whether any of those files have been modified, and so forth, and also makes installation/upgrade/removal far more automated. Another example for a Java-based system might be an OS-level package that contains the compiled WAR file and pre-/post-install scripts to invoke the correct application server steps to install or update the application.</p>
	
<h3>Step Three: Monitoring and Verification</h3>
<p>Being able to keep a real-time watch on your system&#8217;s performance and user behavior is extremely important during and after a deployment. If server errors surge after a push, clearly the deployment will need to roll back, but other more subtle failure modes may also be important (a change that leads to increased latency on the site might hurt conversion/activity rates of users, for example). Having a system in place to collect and monitor these technical and business metrics will go a long way toward increasing your assurance that a given deployment has not introduced any issues.</p>
<p>In a related vein, having a suite of integration tests that you can run on production to quickly verify that all expected functionality is working at any given point in time can be extremely handy (so that you don&#8217;t have to wait for a user to stumble on the one out of the way use case that happens to now throw an error). This becomes particularly powerful in systems large enough that manual testing of the entire API/UI is inefficient. These integration tests must be distinguished from unit tests which are likely also part of your testing and deployment strategy, albeit at a more granular source-code level. In all cases, designing for modularity and testability will make your life much easier when it comes to verifying the behavior of your software, but that is a matter for another article.</p>
<p>The resources and further reading section below has links to a few different tools for both areas listed above. There are many other options, of course, so finding the best fit for your environment would be a matter of further research.</p>

<h3>Conclusion</h3>
<p>I hope this article has given you some insight into how to improve your deployment processes, with the goal being reduction in complexity and uncertainty related to making your system evolve to fit ever-changing business needs. The steps outlined above may be adopted/adapted to your organization in stages, but the more fully you adopt them the more synergistic benefits you will see. In all cases, the guiding principle should be to make it easier to do the right things for your environment and minimize end-user complexity. No matter what technology stack you are using, and no matter what type of application you are writing, getting deployment right can make the difference between going crazy from stress and having a happy, productive work day.</p>

<h3>Resources and Further Reading</h3>
<h4>Source Configuration Management Systems</h4>
<ul>
<li><a href="http://subversion.apache.org/"><span>Subversion</span></a> -- A common centralized SCM used by many organizations; free software.  Quality books covering "svn" are available from several publishers, and some are available online freely as well, e.g. <a href="http://svnbook.red-bean.com/"><span>the red bean svn book</span></a></li>
<li><a href="http://git-scm.com/"><span>Git</span></a> --  An increasingly popular distributed SCM, used by large projects such as the Linux kernel; free software.  As with svn above, git has several good texts in print and some are available online e.g. <a href="http://progit.org/book/"><span>Pro Git</span></a></li>
<li><a href="http://trac.edgewall.org/"><span>trac</span></a> -- A web interface to several common SCMs (svn, git, etc.); integrates a ticket management system and wiki as well as source browser, free.</li>
<li><a href="http://mtrack.wezfurlong.org/"><span>mtrack</span></a> -- Similar to trac but with several enhancements e.g. native ability to handle multiple projects per single install (trac as shipped is intented to have one instance per managed project)</li>
</ul>

<h4>Deployment Planning</h4>
<ul>
<li><a href="http://www.dokuwiki.org/dokuwiki"><span>dokuWiki</span></a> --  A common and full-featured wiki; free. PHP based so anything supporting that (apache or similar on Unix, IIS on Windows, etc.) should at least have a good chance of running it.</li>
</ul>

<h4>Deployment Automation</h4>
<ul>
<li> Scripting Languages&#8202;&#8212;&#8202;This will greatly depend on your environment, but almost any enterprise computing platform these days will have some sort of scripting mechanism, e.g. <a href="http://perl.org"><span>perl</span></a>, <a href="http://python.org"><span>python</span></a>, <a href="http://ruby-lang.org"><span>ruby</span></a>, etc.  (Windows versions in particular of things like perl and python may be obtained from <a href="http://activestate.com"><span>ActiveState</span></a> both freely and with support contracts.)</li>
<li><a href="http://rsync.samba.org/"><span>rsync</span></a> -- an intelligent method of syncing files between two computers, free software.</li>
<li><a href="http://rubyhitsquad.com/Vlad_the_Deployer.html"><span>Vlad the Deployer</span></a> --  a free, ruby-based deployment automation system. I&#8217;ve seen this used in-house in concert with additional development in ruby to produce Solaris and CentOS packages automatically as well as rolling them out to the target systems.</li>
<li><a href="http://rubyonrails.org"><span>Ruby on Rails</span></a> as a system deserves credit for thinking about deployment automation more than many other frameworks, e.g. database migrations and deployment managers like Capistrano/Bundler.</li>
<li>Your chosen operating system&#8217;s package management documentation; generally speaking any enterprise grade server operating system will have some sort of package management and documentation/guides will exist for how to make/maintain packages for that system.</li>
<li>Cloud-based deployments are another special case, as many "cloud" infrastructures are themselves scriptable to allocate/deallocate additional resources, making another level of potential automation as well as simply managing the deployment of code and config changes.  An example of this is <a href="https://github.com/nimbul"><span>Nimbul</span></a> from the New York Times (centered around Amazon&#8217;s set of elastic/cloud services).</li>
</ul>

<h4>Monitoring and Verification</h4>
<ul>
<li><a href="http://seleniumhq.org/"><span>Selenium</span></a> --  Selenium is a way to record and then play back web application interactions via browser, and is useful when constructing behavioral/integration tests to verify a site&#8217;s functioning.</li>
<li><a href="http://nagios.org"><span>Nagios</span></a> -- Commonly used infrastructure monitoring tool, can be a bit of a bear to set up the first time; free.</li>
<li><a href="http://www.cacti.net/"><span>Cacti</span></a> -- A graphing and trending application, free.</li>
<li><a href="http://circonus.com/"><span>Circonus</span></a> --  Circonus takes the setup and maintenance hassles out of monitoring and trending, available as a service.</li></ul>
]]></content:encoded>
            <pubDate>Tue, 01 Nov 2011 18:58:04 GMT</pubDate>
        </item>
        <item>
            <title>Your ORM Sucks</title>
            <link>http://omniti.com/seeds/your-orm-sucks</link>
            <guid>http://omniti.com/seeds/your-orm-sucks</guid>
            <description><![CDATA[I don&#8217;t like frameworks. Web application frameworks, ORMs, whatever.

I don&#8217;t mean that as harshly as it probably sounds. It&#8217;s something like saying, "I don&#8217;t like cooking with microwaves." They have their uses, certainly - I&#8...]]></description>
            <content:encoded><![CDATA[<p>I don&#8217;t like frameworks. Web application frameworks, ORMs, whatever.</p>

<p>I don&#8217;t mean that as harshly as it probably sounds. It&#8217;s something like saying, "I don&#8217;t like cooking with microwaves." They have their uses, certainly - I&#8217;m not going to scrub out a pan in the morning because I want to make oatmeal, for example - but there are limits to what they can do, and I think there&#8217;s a reluctance or inability to recognize that. I&#8217;m certainly not above nuking a pile of Bagel Bites, but I don&#8217;t tell myself that it&#8217;s haute cuisine.</p>

<p>Granted, like any framework, ORMs certainly have their uses, and most projects will benefit from using them in some capacity. No one likes writing the same boring INSERT, UPDATE and DELETE statements for every table. They enforce consistency - you essentially don&#8217;t have a choice about naming conventions or class structure anymore, so you can&#8217;t screw them up. They usually maintain relationships from the database as part of the code. Some have their own internal query cache. It&#8217;s usually easy to extend them. Unfortunately, based upon my cursory research, there appears to be at least one attempt at a MongoDB ORM, so I can&#8217;t use "don&#8217;t support nosql" as a point in their favor anymore.</p>

<p>So by all means, use an ORM every time. By which I mean that every repository should probably have one, and emphatically not that it should be used for every query. Because whatever the tool at hand: Zend Framework, Class::ReluctantORM, a microwave&#8202;&#8212;&#8202;there always ends up being a place where it doesn&#8217;t work, or doesn&#8217;t work very well, and you&#8217;re forced to do things the old-fashioned way. Sometimes the simple solution really is the best. Why would you bootstrap Zend and load up a bunch of model classes just to import some records? You can do that with a DBI handle and a perl script. Or flat files and sed/awk, probably. To some extent this is just a matter of opinion, and that&#8217;s fair, but there are situations where my way - the ugly, hacky way - is objectively and demonstrably better. Not always. But sometimes.</p>

<p>By Way of Example</p>
<ul>
<li>In what might be the canonical case of "Why Would You Do This", a listing of articles with headlines and perhaps publication dates, with the titles linking to individual article pages containing the full text. To render this list, the ORM-written query was selecting all fields from the table. Because that&#8217;s all it knew how to do; you ask for a list of articles and you get those articles, with no thought put into why you want them, or which fields you need, because it&#8217;s a generic tool and that&#8217;s all it knows how to do. Sometimes I think it helps to think of ORMs as the dumbest programmer you&#8217;ve ever worked with. Think of the query that guy would write, and that&#8217;s probably similar to the inefficient unreadable gloop you&#8217;re getting from the machine generation.</li>

<li>A three-layer navigation menu, with almost all the items on it determined by what was, or wasn&#8217;t in the database. After spending a few hours untangling what the thing was doing, it was something like this:

<pre>
<code>
select('e.event_id, e.name, e.url_name, i.url, 
 i.title, tr.title,  a.article_id, a.title, 
 ae.article_event_id, rg.title, rv.title,
 rm1.related_media_id, rm2.related_media_id, i.sort_order')
->from(CLASS . ' e')
->leftJoin('e.info_page i ON e.event_id = i.event_id 
 AND i.is_deployed IS TRUE AND i.pub_date <= NOW()')
->leftJoin('e.tour_results tr')
->leftJoin('e.articles ae')
->leftJoin('ae.article a ON a.article_id = ae.article_id 
 AND a.pub_date <= NOW() AND a.is_highlight IS TRUE 
 AND a.is_deployed IS TRUE')
->leftJoin('e.related_media rm1')
->leftJoin('rm1.photo_galleries rg 
 ON rg.photo_gallery_id = rm1.media_id 
 AND rm1.media_type = 'photo_gallery' 
 AND rg.is_highlight IS TRUE 
 AND rg.status = 1 AND rg.pub_date <= NOW()')
->leftJoin('e.related_media rm2')
->leftJoin('rm2.videos rv ON rv.video_id = rm2.media_id 
 AND rm2.media_type = 'video' AND rv.is_highlight IS TRUE 
 AND rv.status = 1 AND rv.pub_date <= NOW()')
->where('e.instance_id = ?', array(&#8230;))
->andWhere('e.deployed IS TRUE')
->orderBy('e.start_date ASC');
</code>
</pre>

This thing took around a second and a half to build and run the query, and returned 250 or so rows from the database. Then it took <i>30 more seconds</i> to parse it all into a nested structure of PHP objects. And for all that, the developer still had to write most of the SQL themselves. Given that it made an entire section of the site unusable, and that the replacement, hand-wrought query (for all of it&#8217;s faults) didn&#8217;t, I&#8217;m content to throw our ORM under the bus here.</li>

<li>A page to view poll results in a CMS admin. Either the ORM didn&#8217;t support anything as simple as "SELECT COUNT(*) FROM answers GROUP BY answer_id", or the person who wrote it didn&#8217;t think it was a problem to select 80,000 rows and then have PHP parse them into objects. Frankly I&#8217;m not thrilled by either alternative, and as you can probably guess, this thing ran out of memory and barfed on a pretty regular basis.</li>
</ul>

<p>The root of the problem (as with most problems) is not thinking critically, not being aware that all this magical query dust doesn&#8217;t come cheap.</p>

<p>You have to use the right tool for the job. It&#8217;s not uncommon for the balance between generic, easy to use and quick to develop, and bespoke, laborious and highly performant, to tilt sharply towards the latter. The pain in the ass here is that it&#8217;s not unusual for this sort of problem to lie dormant on a dev dataset (a dozen rows per table and just enough information to test out edge cases), and then one day rear up and slow your pages to a crawl or blow them up entirely, as soon as you hit real data. It&#8217;s up to the developer to have some notion about seeing this coming. Even then, everyone gets bitten by this from time to time.</p>

<p>What it boils down to is that if you write a bad query, one that does "SELECT * FROM tbl_huge LEFT OUTER JOIN tbl_big_mclarge" and returns an unnecessarily wide data set full of BLOBs, or that joins across a dozen tables when it only really needs 3, or that has a big stupid slow "SELECT &#8230; FROM &#8230; WHERE NOT IN (SELECT &#8230;)", or that tries to run a SQL "COUNT" in PHP, and it becomes a problem, it is your fault. I don&#8217;t care if you wrote the thing yourself, or if you used an ORM and <i>it</i> wrote the query, <i>it&#8217;s still your fault, and you are going to have to fix it</i>. "But that&#8217;s the way the product does it" is not an acceptable response. Ever. For anything. Code is running on your servers. You are responsible for it. A microwave makes wretched chicken, so I guess it&#8217;s time you learned how to work the stove, because I&#8217;m not eating that crap.</p>

<p>So by all means, use ORMs for your trivial cases, for basic stuff or where performance isn&#8217;t an issue. But it&#8217;s eventually going to hit a wall and you&#8217;ll have to do your own dirty work. And when that happens, you can&#8217;t say you weren&#8217;t warned.</p>]]></content:encoded>
            <pubDate>Fri, 21 Oct 2011 20:06:52 GMT</pubDate>
        </item>
        <item>
            <title>The Opportunity of Crises</title>
            <link>http://omniti.com/seeds/the-opportunity-of-crises</link>
            <guid>http://omniti.com/seeds/the-opportunity-of-crises</guid>
            <description><![CDATA[Nobody likes a crisis; they are difficult, troubling and sometimes dangerous. For most of us in web operations, the chances are slim that a crisis will be truly life-threatening, but when millions of dollars are on the line it can feel like a pressure ...]]></description>
            <content:encoded><![CDATA[<p>Nobody likes a crisis; they are difficult, troubling and sometimes dangerous. For most of us in web operations, the chances are slim that a crisis will be truly life-threatening, but when millions of dollars are on the line it can feel like a pressure cooker and have a negative impact on lifestyle, relationships and mental health. Most companies go to extreme lengths to avoid crises, and even when one does occur, the typical response is to first deal with it, and subsequently pretend it never happened. As if the memories are too painful to discuss, we avoid the topic all together; talking to your customers about it is quite risky.  It is probably best not to mention it at all; you should just move on. Unfortunately, for most organizations, reacting like this means missing a grand opportunity to make your company better.</p>

<p>Like any organization, we are always <a href="http://omniti.com/is/hiring"><span>on the lookout for new talent</span></a>. Of course, you want people who are "smart and get things done", but beyond that, I have found one particular personality trait to be critical to long term success at OmniTI; the ability to stay calm in a crisis. While I tend to think OmniTI does well in avoiding them, we do have a tendency to <a href="http://omniti.com/does"><span>attract customers with a lot on the line</span></a> and, apparently, with a critical mass of customers, so we are no strangers to crises. While it is pretty clear to me that composure under stress is a fundamental requirement in high-stakes jobs (like large-scale web operations), I think it is generally helpful in any situation.  Contingency planning can only get you so far, and when your packets are spilling all over the floor you need to keep a clear head about you to make sure you can assess and remediate as if you&#8217;ve done so since kindergarten. If you can&#8217;t remain calm, the situation can deteriorate quickly. Turning to the blame game before solving the problem at hand is a sure sign of such deterioration. You must fight that urge. If you can&#8217;t, your team can&#8217;t be as open with communications as you need them to be, and your recovery time will suffer. Be upfront about how you want your teams to respond, ideally before problems arise. There are real crises in the world where people die at the hands of companies; walking though one of these exercises can be humbling and enlightening; James Lukaszewski takes us through a "Death by Burger" scenario in his <a href="http://www.e911.com/monos/A001.html"><span>Seven Dimensions of Crisis Communication Management</span></a>, and outlines positive and negative ways that a company can respond to such an incident.</p>

<p>That said, resolving a crisis should not only be about solving the problem at hand. When calamities occur, it&#8217;s important to recognize that your company has an opportunity for introspection. What is it about your processes that led to the crisis you&#8217;ve just survived? Do your process and tool chains do everything they need to do? Don&#8217;t just determine if they work, but do they do the job in they way you would like the job to be done. Seldom will you arrive at good answers to these questions through the normal course of business. Even if you think failure is human (perhaps especially when you think so), it&#8217;s important to understand what processes failed or what information was unavailable that led to this human error. That information is crucial because, in most cases, the people on your team are acting in a manner they think is safe and appropriate&#8202;&#8212;&#8202;and in the best interest of the company. The knee-jerk reaction in these cases is often a summary dismissal, but that will often leave you with the core issue unaddressed: they thought their actions were acceptable. If you fail to gain an understanding of the underlying causes, this bleak episode is likely to become a rerun; either with a new employee, or perhaps with an existing team member who also doesn&#8217;t understand where the appropriate lines need to be drawn.</p>

<p>One thing I believe is very helpful is to look at how others handle these things. The Internet is new, Web Operations even newer; but crisis management and postmortem analysis are not. Quite often I see people lay blame at either the wrong people or processes in times of (and even after) failure. Ideally you should not be trying to lay blame at all, but instead figure out where improvements are needed. Many people mistakenly assume that crises are born out of mistakes; often they are not. As businesses grow over time, it&#8217;s easy for plans that were once appropriate to become inadequate. You need to look at your systems holistically. For folks in Web Operations, a healthy understanding of <a href="http://www.ctlab.org/documents/How%20Complex%20Systems%20Fail.pdf"><span>why complex systems fail</span></a> can help you gain a better perspective.</p>
 
<p>If you are running a team, you owe it to yourself to turn crises into dialogue. If your customers were affected, be honest with them about where things went wrong, and why what you did was the appropriate thing or how you plan to adjust course going forward. Be careful not to overreact; the goal should not be to add more process, but rather to improve process. Your next crisis is your next great opportunity to learn more about your organization and to strengthen it for the future. Don&#8217;t miss it!</p>]]></content:encoded>
            <pubDate>Mon, 12 Sep 2011 21:56:46 GMT</pubDate>
        </item>
        <item>
            <title>Security Is Not a Feature, It&#039;s a State of Mind</title>
            <link>http://omniti.com/seeds/security-is-not-a-feature-its-a-state-of-mind</link>
            <guid>http://omniti.com/seeds/security-is-not-a-feature-its-a-state-of-mind</guid>
            <description><![CDATA[ "Is our site secure?"

That is never a question you want to hear when launching a new website. And it is also an impossible question to answer. The technical definition of security is "the state of being free from danger or injury"; but you can never ...]]></description>
            <content:encoded><![CDATA[ <h3>"Is our site secure?"</h3>

<p>That is never a question you want to hear when launching a new website. And it is also an impossible question to answer. The technical definition of security is "the state of being free from danger or injury"; but you can never protect anything perfectly all the time. So the question should be, "Is our site secure enough?"</p>

<p>And it is never a question that should be asked at launch time. Security must be part of the planning, part of the programming, part of the testing, part of the deployment process and, finally, part of the monitoring and upkeep of a site. It should be part of every stage of development, however, "security at every level" doesn&#8217;t have to cost extra time or money, and in fact it shouldn&#8217;t. If "site security" means millions of dollars or weeks of extra work, then the problem lies with the website development process and its creators. When you start treating security as a "feature" instead of a necessary part of your development process, it becomes a resource eating monster.</p>

<p>Far too many IT professionals, managers and others involved in creating new sites view security as a last-minute feature in the push to "get it out fast." In fact, generally the argument for not attending to it is: "We can&#8217;t waste time or money on security features; we have to get this site launched!"</p>

<p>You won&#8217;t have a site, or a business, for long if someone manages to retrieve your entire database of personal information or credit cards. Like most things in life, a balancing act is required. Defining what security is exactly, and what it means for a site is always the hard part. What does "security" mean in a website or a web application? It means being defensive.</p>

<p>"Defensive driving" is a term thrown at every student in drivers training. It means to drive as if every other person on the road were an idiot trying to hit you, because the majority of them are less-than-fantastic drivers and being aware of the danger is half of the solution.</p>

<p>Any developer working on a website should be thinking in the same manner: Every user is an idiot trying to break the site. However, the reaction to that constant danger should be equal to the needs of the website. When driving on a sunny, dry road in broad daylight, a driver can be far less diligent than when driving on a wet road, in a blizzard or in the dark. The conditions of the road are going to affect stopping distance, maneuverability and the ability to avoid hazards. The amount of diligence needed for a website should be equally tailored to environmental conditions. An e-commerce site has far different needs than a social networking site, or a fan site for an author or artist.</p>

<p>Having a plan&#8202;&#8212;&#8202;from the beginning&#8202;&#8212;&#8202;for the important issues with the site is a necessary first step. Implementing the plan as part of your general process shouldn&#8217;t be the end of the line, however. The other critical piece of the puzzle is ongoing maintenance. Sites and audiences change, and those changes will mean new challenges. Proper monitoring and maintenance of a site is part of the process of security.</p>

<p>Knowing a site&#8217;s operating environment and type of users will help to define what security measures are needed up front, eliminating the problems inherent with trying to "bolt on" security after the fact. Even a general overview of what kind of information a site is going to collect and distribute is enough to have an idea of what kind of audience that site will attract. It is far easier to leave room for future security enhancements than to try to plug holes in an existing system.</p>

<p>So take the time to sit down before you start creating the site and answer some of the following questions.  Record the answers in a document and put it with your code so you can refer back to the answers.</p>
<ol>
<li>What kind of data am I going to be collecting and storing?</li>
<ol style="list-style-type: lower-alpha">
<li>Basic Information (Names and email addresses)</li>
<li>Personal Information (Phone numbers, physical addresses)</li>
<li>Asset Information (Credit Card numbers, bank information)</li>
<li>Identifying information (SS#, Drivers license numbers)</li>
<li>Business Information</li>
<li>Medical Information</li>
</ol>
<li>What kind of physical system am I going to be using and who has access?  This includes backups.</li>
<li>What kind of software am I going to be using and how will it be maintained?</li>
<li>What kind of ongoing system will be put in place to maintain the system and data?</li>
</ol>

<p>These questions will give you an excellent idea of how much concern for security your site will warrant.  The higher the level of information collected, the greater security you&#8217;ll need.  The less control you have over the physical systems in place, the more diligent your security measures need to be.  The less control you have over the software in place, the more security measures you may need to put in place. If you have little budget for ongoing monitoring, you&#8217;ll need to invest more in automating more security measures up front.</p>

<p>Remember that no matter what kind of site you are creating, the basics can never be ignored.</p>
<ol>
<li>Keep your software up to date with security fixes</li>
<li>Validate all input</li>
<li>Escape all output</li>
<li>If you&#8217;re dealing with something sensitive - use SSL for logins (the industry is showing signs of adopting SSL for everything).</li>
<li>Use sftp or scp or at the very least ftps for transferring files from your server</li>
<li>Regenerate a user&#8217;s session when access permissions change</li>
<li>Validation should always be done server side, even if you have javascript checks</li>
</ol>

<p>If security becomes part of your state of mind at every step along the way, instead of a last-minute, add-on feature, you&#8217;ll never have to answer the question "is our site secure?" because you&#8217;ll always be aware that it is secure enough.</p>]]></content:encoded>
            <pubDate>Thu, 11 Aug 2011 16:53:49 GMT</pubDate>
        </item>
        <item>
            <title>When things go wrong - a case study</title>
            <link>http://omniti.com/seeds/when-things-go-wrong-a-case-study</link>
            <guid>http://omniti.com/seeds/when-things-go-wrong-a-case-study</guid>
            <description><![CDATA[Theo Schlossnagle is very fond of pointing out that in operations, you can
never succeed in fulfilling expectations.

"Operations crews are responsible for the impossible: it must be up and
functioning all the time. This is an expectation that one can ...]]></description>
            <content:encoded><![CDATA[<p>Theo Schlossnagle is very fond of pointing out that in operations, you can
never succeed in fulfilling expectations.</p>

<p>"Operations crews are responsible for the impossible: it must be up and
functioning all the time. This is an expectation that one can never exceed."
(<a href="http://omniti.com/seeds/instrumentation-and-observability"><span>Instrumentation and observability</span></a>)</p>

<p>So, this article is about a time when things went wrong. It&#8217;s not about an
emergency situation where services were down, but more a subtle issue that
almost went unnoticed. We will review how the issue was detected, how it was
fixed&mdash;and most importantly&mdash;how a root cause was determined.</p>

<p>This issue affected a production website for one of OmniTI&#8217;s clients. They
had three web servers, all connected through a front-end load balancer
appliance.  (There were database servers as well, but they aren&#8217;t relevant for
this story.) Like any good (or even half decent) load balancer, it checks
often (every few seconds) to ensure that that the web servers are up and
serving web pages. If the web server appears down or isn&#8217;t responding, then
the load balancer stops directing traffic to that web server. This gives you a
measure of redundancy in addition to load balancing, if you have enough web
servers to cover the incoming requests even with some out of commission.</p>

<p>The problem was uncovered by chance when working on the load balancer. We
spotted that the load balancer was misdetecting that a server was down, taking
it out of service, and a few seconds later on the next check, it would bring
the server back into the rotation. This cycle repeated over and over, with
each of the web servers being taken out of service for a short period. During
this time, the site was still available: at least one of the servers was
continually in service. In addition, our monitoring showed that everything
appeared to be OK, both the external checks against the main website, and the
checks against the individual web servers.</p>

<p>Having discovered the issue, the troubleshooting began. One of the first
things to look at with any issue are log files. When set up properly, logs go
a long way in telling you what is going on; the hard part is figuring out
which logs have the information you need.</p>

<p>The first log file we checked was the load balancer log. It had entries
that looked like the following, that corresponded with the service failures:
</p>

<p><code>Monitor_http302_of_foowidgets-www1:http(192.168.1.51:80): DOWN; Last
response: Failure - TCP syn sent bad ack received with fin</code></p>

<p>So, according to the load balancer, the reason for the failure is 'TCP syn
sent bad ack received with fin'. The error message is highly technical and not
particularly helpful.</p>

<p>Here&#8217;s a quick (and incomplete) overview of TCP to explain what that
means:</p>

<p>When you open a connection, packets are sent back and forth with various
flags set - the relevant flags here are SYN, ACK, and FIN. The opening
sequence goes something like:</p>

<img alt="when-things-go-wrong_diagram-1.png" src="http://images.omniti.net/omniti.com/i/b/when-things-go-wrong_diagram-1.png" width="448" style="margin-bottom: 1em;" />

<p>And to close the connection:</p>

<img alt="when-things-go-wrong_diagram-2.png" src="http://images.omniti.net/omniti.com/i/b/when-things-go-wrong_diagram-2.png" width="448" height="288" style="margin-bottom: 1em;" />
<p>The explanation for the "syn sent bad ack received with fin" error is
likely to be:</p>

<img alt="when-things-go-wrong_diagram-3.png" src="http://images.omniti.net/omniti.com/i/b/when-things-go-wrong_diagram-3.png" width="448" height="288" style="margin-bottom: 1em;" />

<p>At this point, the load balancer gets very upset and sulks in the corner
(well, it prints the weird log message). I&#8217;m guessing that the bad ACK/FIN is
probably from a previous connection, but at a high level: "Something weird is
going on with networking".</p>

<p>When things are screwy with the networking, you need to look in detail at
what is going on across the network and try to work out what&#8217;s going wrong.
The tools to do this are tcpdump and wireshark.</p>

<p>The load balancer is an appliance with its own custom software, but
underneath it&#8217;s just Unix. You can get a shell and run tcpdump to see what is
going across the network. Wireshark is essentially a graphical version of
tcpdump and is used here to analyze the network traffic.</p>

<p>I grabbed all of the traffic, opened it with wireshark, and limited the
view to just the traffic going to the web servers exhibiting the problem. The
wireshark filter is:</p>

<pre><code>
ip.addr == 192.168.1.254 &&
    ( ip.addr == 192.168.1.51 ||
      ip.addr == 192.168.1.52 ||
      ip.addr == 192.168.1.53 ) &&
    tcp.port = 80
</code></pre>

<p>The 192.168.1.254 IP is the load balancer, and 51-53 are the web servers.
</p>

<p>The following is the output of two complete HTTP transactions on the
monitors:</p>

<img alt="when-things-go-wrong_screenshot.png" src="http://images.omniti.net/omniti.com/i/b/when-things-go-wrong_screenshot.png" width="989" style="margin-bottom: 1em;" />

<p>There is no SYN sent with bad FIN/ACK; everything is green and looks just
fine. So I rechecked the load balancer console, and it showed eight checks
went out in the same time frame as the tcpdump. However, I saw only two in the
previous tcpdump. Something wasn&#8217;t right.</p>

<p>At the same time that this was going on, I was in touch with support for
the load balancer vendor. They were very helpful (in the sense that they did
everything they could without actually getting to the bottom of the problem),
asking for several tcpdump traces, and even escalating to their engineering
team. At this point we were convinced that the monitor that checks whether the
server is down was broken, and reporting the server down, when it wasn&#8217;t.
Unfortunately, none of this shed any light on the underlying issue.</p>

<p>We had also checked the usual culprits:</p>

<ul>
    <li>Checked that the backend servers were up</li>
    <li>Checked the physical cables. Other services were transiting the same physical path and they were fine.</li>
    <li>Ran tcpdump on the server&#8217;s network interface that linked it to the load balancer, this showed the same thing as the dump from the load balancer.</li>
    <li>Checked the configuration of the monitor on the load balancer. Other services were using an identical configuration without issues.</li>
</ul>


<p>Then, the crucial discovery:</p>

<ul>
    <li>Not enough checks were being sent out (we spotted this before)</li>
    <li>The support representative casually mentioned seeing traffic going out
    of the load balancer through another MIP. He thought maybe some of the
    checks were going out of this other IP.</li>
    <li>None of us realized the significance of this at the time, and a couple
    of days went by&mdash;with support convinced there was an issue with the
    backend servers, and me running as many checks as I could to try to
    prove/disprove that the backend servers were an issue.</li>
</ul>


<p>Here&#8217;s a bit of explanation of what was going on:</p>

<p>The load balancer has 3 different types of IPs (simplifying a little):</p>
<ul>
    <li>VIP - Virtual IP</li>
    <li>MIP - Mapped IP</li>
    <li>SNIP - Subnet IP</li>
</ul>


<p>A VIP is an IP upon which you run your virtual servers. These are what
client traffic hits.</p>

<p>A MIP is described as: "You use MIP addresses to connect to the backend
servers." (from the vendor&#8217;s knowledge base).</p>

<p>A SNIP is described as: ". . .an IP address that enables you to access a
load balancer appliance from an external host that exists on another subnet."
(from the vendor&#8217;s knowledge base).</p>

<p>From the explanation above, it makes sense that you would configure a MIP
to connect to the backend server. This is what we did when originally setting
up the load balancer, and it turned out to be completely the wrong thing to
do, although it did work for a while.</p>

<p>Some more explanation - Multihomed networking 101:</p>

<ol>
    <li>A server has two interfaces on two different subnets - 192.168.1.0/24 and 192.168.2.0/24.</li>
    <li>The server wants to send a packet out to 192.168.2.10. To do this, it looks up the address in the routing table and sees that it should send the packet out of the second interface.</li>
    <li>Also, the source IP of the packet sent out is the IP address that is associated with the second interface. For example: 192.168.2.2.</li>
</ol>

<img alt="when-things-go-wrong_diagram-4.png" src="http://images.omniti.net/omniti.com/i/b/when-things-go-wrong_diagram-4.png" width="448" height="270" style="margin-bottom: 1em;" />

<p>If you&#8217;re familiar with networking, the above explanation should sound
pretty straightforward. The problem is, the MIPs on the load balancer don&#8217;t
work like that. Things work the same for steps 1 and 2 above, but instead of
matching the source IP address of the packet to the interface it&#8217;s sending out
on, it just picks one IP from the list of available IP addresses:</p>

<img alt="when-things-go-wrong_diagram-5.png" src="http://images.omniti.net/omniti.com/i/b/when-things-go-wrong_diagram-5.png" width="448" height="270" style="margin-bottom: 1em;" />

<p>Now, to be fair to the load balancer vendor, this is the correct behavior
for MIPs when you read more into what they are for. They&#8217;re 'last resort'
source IPs when nothing else is suitable, (i.e. you don&#8217;t have a matching IP
on the same subnet). Because it&#8217;s a 'last resort' IP, it just picks one.</p>

<p>We had 3 different MIPs at the time, one for an external network, one for
the client network, and one for our internal network. This meant that fully
two-thirds  of traffic from the load balancer was getting sent out from the
wrong IP:</p>

<ul>
    <li>192.168.1.254</li>
    <li>192.168.2.254 - wrong network</li>
    <li>1.2.3.254 - external network</li>
</ul>

<p>Believe it or not, this shouldn&#8217;t have mattered, and in fact didn&#8217;t matter
for most of our services. The default route of the server was through the load
balancer - it had to be to answer client requests, which came from external IP
addresses.</p>

<p>However, by a horrible quirk of routing, on the backend web servers,
192.168.2.X was set to go out on a different interface, and traffic wasn&#8217;t
getting sent back to the load balancer, meaning 1 in 3 monitor responses
weren&#8217;t getting sent back.</p>

<p>This also meant that each web server was not serving traffic 33 percent of
the time and we were effectively running off of two web servers. If the right
combination of monitors went off, all three servers could be taken out of the
rotation.</p>

<p>The temporary fix was to make sure that 192.168.2.0/24 went out via the
load balancer. A single command fix gave a 50 percent capacity boost. The fix
was just in the nick of time, too &mdash; just four days after I made the fix, the
site was featured on the front page of msn.com and we got the biggest traffic
spike ever in its history.</p>

<p>This fix was only temporary, as we were still having checks originate from
the wrong IP, and the real fix was to use the Subnet IPs which, as their name
suggests, actually respect subnetting.</p>

<p>As with all complex systems, the problem was caused by a number of
different issues combined:</p>

<ul>
    <li>The documentation was misleading, which lead to the wrong IP type
    being configured on the load balancer.</li>
    <li>Multiple networks on the backend server combined with virtualization,
    which led to incorrect routing when combined with the check originating
    from the wrong address.</li>
    <li>Fault tolerant systems combined with the tiny outage duration for
    individual web servers masking the issue and greatly increasing time to
    detection.</li>
</ul>

<p>Experience is always earned the hard way. Having things go wrong leads to
a deeper understanding of how complex systems work, as shown in the example
above&mdash;at the end, we had a much better grasp of the inner workings of the
system than before. Gaining this understanding is essential to prepare you for
new technologies and production troubleshooting.</p>

<p>When a technology "just works," it is pretty much guaranteed that you
don&#8217;t know "how" it works (at least deeply). The real challenge is building
architectures where usual, run-of-the-mill mistakes cause no disruption of
service. That is an art and the artwork is an invaluable resource to the
organization: it provides a canvas for learning and gaining hard-won
experience.</p>
]]></content:encoded>
            <pubDate>Mon, 20 Jun 2011 18:02:33 GMT</pubDate>
        </item>
        <item>
            <title>Write a Better FM</title>
            <link>http://omniti.com/seeds/write-a-better-fm</link>
            <guid>http://omniti.com/seeds/write-a-better-fm</guid>
            <description><![CDATA[If you&#8217;ve been around software for any time at all, you&#8217;ve encountered the type. You ask what seems to you as a reasonable question, and the belligerent sorts fall over themselves to be unhelpful; calling you lazy, an idiot, or worse, and t...]]></description>
            <content:encoded><![CDATA[<p>If you&#8217;ve been around software for any time at all, you&#8217;ve encountered the type. You ask what seems to you as a reasonable question, and the belligerent sorts fall over themselves to be unhelpful; calling you lazy, an idiot, or worse, and telling you to RTFM. If you&#8217;re lucky, they&#8217;ll tell you where the FM is. If you&#8217;re not, they&#8217;ll tell you to STFW for it.</p>

<p>For those not up on the acronyms, RTFM and STFW stand for Read The Manual, and Search The Web, respectively.</p>

<p>The trouble is that there&#8217;s a direct correlation between the probability that you&#8217;ll be told to RTFM, and the probability that the FM is rubbish. That&#8217;s because humility, patience and a willingness to help beginners go hand-in-hand with producing good documentation.</p>

<p>The burden falls on those of us within the software community to write a better FM.</p>

<p>For the last ten years or so, I&#8217;ve been involved in several efforts to write better documentation on several Open Source projects. I&#8217;ve noticed some trends in documentation. While some projects have stuck to the tried-and-not-so-true, RTFM, "the newbie is a loser" style of customer support, an increasing number of projects have moved to customer-centered documentation and customer-centered support.</p>

<p>If we want customers to RTFM, we are obliged to write a better FM.</p>

<h3>I&#8217;m Not a "User"</h3>

<p>Respect is very important. If you are unable to treat beginners with patience and respect, you shouldn&#8217;t be doing customer support, or writing documentation. While there are places for banter and inside jokes, technical documentation is not one of them.</p>

<p>Thinking of your audience as customers, rather than as "users", or, worse yet, "lusers",  will greatly influence how you write.</p>

<h3>Listen To the Questions</h3>

<p>While this may seem blindingly obvious, it&#8217;s clear from the documentation of many products, and in particular their so-called 'Frequently Asked Questions&#8217;, that they have no idea whatsoever what questions actual users of their product are asking.</p>

<p>To know what questions are being asked, you should frequent the places where these frustrated users hang out after not finding the answers in your documentation. These places tend to be web forums full of bad advice and broken examples. When you feel your irritation building, remember that they exist because your documentation wasn&#8217;t filling the need, and so these third-party sites sprung up to fill it.</p>

<p>You will quickly find that they tend to be full of people asking the same questions, again and again, and getting a variety of answers of varying quality. It is now your job to make sure that the official documentation answers these questions correctly, showing best-practice solutions to the real-world problems, and makes them easy to find. Failure to do so simply drives people to these question-and-answer sites where they will continue to get bad advice.</p>

<h3>Ask Smart Questions</h3>

<p>Several years ago, Eric Raymond wrote a document about how to ask smart questions. While this seemed like a good idea at the time, it has since become a lengthy tome that no beginner will ever actually read, and which drips with condescension.</p>

<p>The document states three things:</p>

<ol>
<li>Try to find the answer yourself before asking.</li>
<li>Provide all relevant supporting data with your question.</li>
<li>If you don&#8217;t understand the answer, it&#8217;s probably because you&#8217;re too stupid to live.</li>
</ol>

<p>Points 1 and 2 are good, right and important. Unfortunately, point 3 colors the tone of the whole document. Indeed, the word "idiots" appears first in the second paragraph. Although it seems that Eric thinks he&#8217;s being funny, instead he insults everyone who doesn&#8217;t know as much as him.</p>

<p>Down at the very bottom of the document is a section titled "How To Answer Questions in a Helpful Way." This is the most (and perhaps the only) useful part of the document, and well worth reading.</p>

<p>While it is indeed important to ask smart questions, and not expect that someone is going to hold your hand every step of the way, as documentation authors, it&#8217;s important to cast our minds back to when we first started&#8202;&#8212;&#8202;how lost we felt, and how we didn&#8217;t even know what questions to ask. If that was too long ago, you can readily refresh your mind by picking a software product you&#8217;re unfamiliar with, in a language you don&#8217;t know, and trying to install it and get it running. It will all come back to you.</p>

<p>As Donald Rumsfeld famously remarked, there are also unknown unknowns, the ones we don&#8217;t know we don&#8217;t know. Start by assuming that your customer isn&#8217;t an imbecile, but that they may not know what questions they should be asking.</p>

<p>Help your customers know what questions to ask by structuring your documentation in terms of how they are going to use it. Segment the documentation by audience (Developer, User, Administrator), and then further by task (Installation, Reporting, Upgrading) rather than in seemingly arbitrary groupings like "How-To" and "Other Topics".</p>

<h3>No Stupid Questions</h3>

<p>You&#8217;ve often heard it said that there are no stupid questions. While this is obviously false, much documentation seems to start with the assumption that all questions are stupid. There is a middle ground.</p>

<p>You must start by assuming that questions are smart and useful.</p>

<p>Frequently when I&#8217;m watching in an IRC channel, someone will ask "How do I do X?", and the immediate response is "You don&#8217;t do X, you idiot! That&#8217;s a stupid thing to do! Did your mother ever drop you on your head?"</p>

<p>Rather than treating them like a teenager, try to imagine that the person asking the question is a professional, like yourself, working on a project that might not have been their idea, but that nevertheless they need to get working.</p>

<p>Likewise, when writing documentation, keep in mind the real-world problems that are faced when using your product. Remember that not everyone has the in-depth knowledge of the inner workings that you do. And, most importantly, remember to treat your customers with respect, all the time.</p>

<p>In practical terms, this means avoiding words like "obviously", "simply", and "just", while providing many immediately usable examples with detailed explanations of each point.</p>

<h3>Laziness, Impatience and Hubris</h3>

<p>Larry Wall once declared that the virtues of a programmer are laziness, impatience and hubris. This is often taken as license to be a jerk. I would assert, to the contrary, that the virtues of a documentation writer are laziness, patience and humility.</p>

<p>Yes, it&#8217;s important to be lazy. When someone asks a good question, answer them thoroughly, in exhaustive detail, and then publish the response so that the next time the question is asked, you can answer with a URL. Doing something well once beats doing it poorly, again and again.</p>

<p>You must be patient.  Being impatient with a customer implies that you think they are either being intentionally obtuse, or that they are just too stupid to understand what you are saying. Being patient with them shows respect. The patient, respectful answer will stick with them, while the impatient rude answer will be remembered only as an unpleasant experience to put behind them.</p>

<p>And you should be humble. I find it useful to remember that the person I&#8217;m talking to is probably an expert on something of which I am completely ignorant. I also find it helpful to remember the first time I was asking questions, and the way that I was treated at the time.</p>]]></content:encoded>
            <pubDate>Fri, 22 Apr 2011 01:51:05 GMT</pubDate>
        </item>
        <item>
            <title>Integrating Search</title>
            <link>http://omniti.com/seeds/integrating-search-into-your-application</link>
            <guid>http://omniti.com/seeds/integrating-search-into-your-application</guid>
            <description><![CDATA[

 Why Do It?

This may seem like silly a question at first, as search seems to be everywhere and its usefulness is apparent in so many contexts. The context you should consider though
is your application. Ask yourself how search might benefit your app...]]></description>
            <content:encoded><![CDATA[<img alt="search-image.jpg" src="http://images.omniti.net/omniti.com/i/b/search-image.jpg" width="448" height="220" class="mt-image-none" style="" />

 <h2>Why Do It?</h2>

<p>This may seem like silly a question at first, as search seems to be everywhere and its usefulness is apparent in so many contexts. The context you should consider though
is your application. Ask yourself how search might benefit your application and your users. Keep in mind that search is not a replacement for a good user interface. A good user interface should make it easy for the user to locate what they&#8217;re looking for on your site, without typing anything into a search box.</p>

<p>There are many <a href="http://usability.gov/guidelines/"><span>browse-dominant users out there that prefer to click</span></a> rather than put their hands to the keyboard, no matter how prominently you display a search box or how well your search performs.</p>

<p>It shouldn&#8217;t be a chore  for the user to find the contact information on your web site. I&#8217;ve looked at thousands of crappy flash restaurant web sites. If you have a search input box on your website and I have to type "address" into it because I can&#8217;t find it on the site, there is a problem.</p>

<p>A bad search implementation can hurt the usability of your website. If your search functionality is very unfriendly or unaccommodating when it comes to search terms, people will become frustrated and potentially give up on what they wanted to do at your site. This is especially true of search-dominant users of your site. Your search should be able to handle commas, apostrophes, hyphens and other punctuation.</p>

<h2>Deciding What you Need</h2>

<p>When implementing search capabilities for a web application, many developers might rush into integrating a known solution without asking a few key questions first. While these questions may seem a given and lead you to the same tool, many people don&#8217;t ponder too deeply about them and the result is an adequate, but not optimal, search functionality for the application. Let&#8217;s take a look at some initial considerations.</p>

<h3>1. What should be searchable in my application?</h3>

<p>Let&#8217;s say I have an online shopping cart. It seems reasonable that I would want my users to search the products. Which products exactly? Not products that haven&#8217;t been published from admin yet or discontinued products; and perhaps I don&#8217;t want to show products that are out of stock. Lay it all out. Everything that you expect to be searchable and the conditions that have to be met.</p>

<h3>2. How do I want my users to interact with the search functionality?</h3>

<p>How flexible will the search be with what user&#8217;s type in? Will it show automatic suggestions as the user types? Will it attempt to fix misspellings?</p>

<h3>3. Direct linking?</h3>

<p>Should particular search terms go straight to a particular page instead of showing results? If there was only one result, it seems obvious to advance the user directly to their goal.  Sometimes, in the case of certain search terms, you may still want to lead the user on a very specific journey.</p>

<h3>4. How current do search results have to be?</h3>

<p>Is it imperative that the product/blog post I just added be immediately available in the search results? If not, how much lag time is acceptable?</p>

<h3>5. What kind of search options do I want to provide the user?</h3>

<p>Are there advanced search options, including date ranges, sort order and relevancy? Note that you shouldn&#8217;t overload the user with advanced search options from the start. There should be a simplified version of search that is the default, however the advanced options should be easily accessible.</p>

<h3>6. Do certain results rank or weigh higher than others?</h3>

<p>For example, If I search for tomato, does your blog post about your grandma&#8217;s spaghetti recipe come up before the result for the contact page that has my address as 123 Tomato St?</p>

<h2>Initial Setup</h2>

<p>Once you&#8217;ve decided what is best for your application and its users with regard to the application&#8217;s search functionality, you can start looking around at the available tools to implement it. You will want to map your search functionality needs to the capabilities  provided for by the search tools. See how well these tools perform. Users expect search to be fast, they really don&#8217;t care how much information you have to go through to find what they want. More than likely, the information you want to search is stored in a database. Ideally, one does not want to do full text searches against the database. It is expensive for the database and if you have a high-traffic site where searches are going to be performed fairly often, steer clear of it.</p>

<p>Instead, many popular search tools bring search outside of the database scope by indexing the data you want searched in your database. This usually means that I will choose the table columns that have the information I want available for search, such as product name and product description. I would then create a script that pulls this data and adds it to my index (note that index size is usually 20-30% of the initial data being indexed, depending on the search tool). You will more than likely want to run this script from a cron job to refresh your index on an interval that is dependent upon your needs. Note that you can add and delete from the index as items are inserted and deleted from your database. This means that your need to refresh the entire index may change if you use this approach.</p>

<p>A big chunk of the work that you will face is the initial index setup work required to provide for the features and conditions you want in place for your search. Features such as wild-card queries, sorting, field weighting, multiple merged indexes, multi-faceting, ranking, result clustering and date ranges are some examples.</p>

<h2>Presenting the Results</h2>

<p>Depending upon your application and how advanced you want your search to be, you can make assumptions and educated guesses about the intended results. Amazon is a good example. As of the date of this article, typing in "Black Swan" into the search box returns results for "The King&#8217;s Speech", "The Fighter", and "True Grit"--all within the top 10 search results. Amazon is making the assumption that I may be interested in other Academy Award winning movies. Their end goal is that I will purchase those, as well. This makes sense for Amazon, does it make sense for your application? How search-centered is your application? Factoring in these types of assumptions makes for a lot more complexity in your search application. How long before those Academy Award related search results fade away and I am left with only "Black Swan" results?</p>

<p>Most results are displayed by relevancy, which makes sense; sometimes it makes sense to sort the results by date and can be a helpful option to those perusing the result set. Providing match context in your results can also be helpful. Consider users whose search term matches an exact phrase in an article on the site, but in the search results, only the article title is displayed. While it may be the very article the user was searching for, the user may not realize that they found what they were looking for in that search. Showing match context for the search terms in your display results, where applicable, can be helpful to users.</p>

<p>Often, a user will know meta information about what they want. Information such as, "I know it was in a blog post" or "I know it posted around Christmas time." While we are not mind readers, we can still make it easier for our users to find what they want, based upon the meta data they know.</p>

<p>One way to do this is by clustering your search results into relevant groups. When displaying results, instead of showing the relevant results for everything mixed together, show the relevant results grouped by a strong meta identifier, some example relevancy groups being "Articles", "Products", and "Users", or perhaps by year 2009, 2010, 2011.</p>

<p>Is there a mobile version of your application? How do your search results look there?</p>

<p>Many times, site searches are implemented with Google site search. This is not a bad idea depending upon your site content and search requirements, however keeping those search results contained on your site as opposed to sending the user to a google site search result page, keeps the user engaged on your site and is less confusing than being redirected. Google site search provides for this functionality.</p>

<p>Avoid presenting your search results to look like something from a Google text ad. Many people will think that is what it is and it will be ignored.</p>

<h2>Post Implementation</h2>

<p>What are people actually searching for? Are you monitoring the queries that are coming through your search form? What are some of the top queries? What results are being given to the user for these queries? Are they what is expected? Most developers will implement search functionality and make sure that it functions, but fail to monitor or provide tests to ensure the search is actually useful after being implemented.</p>

<p>Search is an important part of the web, and the technology behind it is becoming smarter and faster. Take advantage of it, but first take the time to discover your application needs and how you can best serve your users.</p>]]></content:encoded>
            <pubDate>Tue, 12 Apr 2011 01:19:32 GMT</pubDate>
        </item>
        <item>
            <title>The Web Developer&#039;s Guide to Writing Native iOS Apps</title>
            <link>http://omniti.com/seeds/the-web-developers-guide-to-writing-native-ios-apps</link>
            <guid>http://omniti.com/seeds/the-web-developers-guide-to-writing-native-ios-apps</guid>
            <description><![CDATA[ Ever since the release of the first iPhone and the first official iPhone/iOS SDK, mobile computing has taken a huge leap into the handheld realm. Where only a few years ago, it was normal to see hipsters hunched over laptops while sitting at Starbucks...]]></description>
            <content:encoded><![CDATA[ <p>Ever since the release of the first iPhone and the first official iPhone/iOS SDK, mobile computing has taken a huge leap into the handheld realm. Where only a few years ago, it was normal to see hipsters hunched over laptops while sitting at Starbucks sipping their lattes, and surfing the web on the free wifi; nowadays they&#8217;ve broken free from their local coffee franchises (although still addicted to the caffeine) and roam the outside world, still surfing websites but now on their iPhones over 3G connections.</p>

<p>So how do you, as a veteran web developer, take advantage of this phenomena to write really cool mobile apps that engage the user on a whole new level? Well, there are several ways. You can bite the bullet and learn Objective-C, CocoaTouch, iOS SDK, and spend several months writing a true, native iPhone app. You can stay in your comfort-zone and write mobile-sized HTML5 web apps that sit, hands-tied to your servers with very limited device-centric capabilities. You can cheat a little bit and use PhoneGap or some other library to write HTML5 mobile web pages that can be submitted as apps to the App store. Or, you can cheat a lot and use Appcelerator to write your code in a language you&#8217;re very familiar with, on a cross-platform API with practically all the device capabilities available to you&#8202;&#8212;&#8202;all resulting in native apps that can be submitted to the App Store <em>and</em> the Android Market.</p>

<h2>What is Appcelerator?</h2>
<p><a href="http://www.appcelerator.com/"><span>Appcelerator</span></a> is hard to define. It&#8217;s an open source framework that acts partially like a compiler and partially like a runtime. Without going into the nitty-gritty about how it does its thing, let&#8217;s just say Appcelerator will take Javascript code built on top of its API and turn it into a native application for the iOS and the Android platforms. For the iOS platform, Appcelerator&#8217;s Javascript API maps to the Objective-C/CocoaTouch equivalents; and for the Android platform it maps to the Android Java framework. After the code is compiled, you end up with the respective platform&#8217;s native binary app. Also, Appcelerator has a third mode which lets you write cross-platform WebKit-based desktop apps. We&#8217;ll only be concentrating on the mobile side of things in this article.</p>

<p>Appcelerator comes in two parts, Titanium Mobile SDK and Titanium Developer. The Titanium Mobile SDK is the heart of the mobile app writing framework. Titanium Developer is a fancy, front-end GUI that lets you set up various environment settings, run the code in a simulator, compile it for a real mobile device and even package it for distribution to the Apple Store. As a side note, Titanium Developer itself is written using the Appcelerator Desktop SDK.</p>

<h2>Installing Appcelerator</h2>
<h3>Required Prerequisites:</h3>
<ul>
<li>Very good working knowledge of Javascript (knowledge of <a href="http://jibbering.com/faq/notes/closures/"><span>closures</span></a> are a plus)</li>
<li>An Intel-based Mac of some sort</li>
<li>Xcode with the iOS SDK installed (<a href="http://developer.apple.com/ios/"><span>http://developer.apple.com/ios/</span></a>)</li>
<li>Android SDK installed (<a href="http://developer.android.com/sdk"><span>http://developer.android.com/sdk</span></a>)</li>
</ul>
<h3>Prerequisites if you want to run on an actual iOS device:</h3>
<ul>
<li>Pay Apple the $99/yr fee and register as an <a href="http://developer.apple.com/programs/ios/"><span>iOS developer</span></a> </li>
<li>Create/download all the keys/certificates (just follow the directions they provide on <a href="http://developer.apple.com/devcenter/ios/"><span>http://developer.apple.com/devcenter/ios/</span></a> under the iOS Provisioning Portal)</li>
<li>Add your mobile devices to the developer account</li>
<li>Create an AppID for your app</li>
<li>Create a Provisioning Profile that binds your AppID to the various devices you registered (this is the way to get adhoc distribution to developer iPhones for testing)</li>
<li>Download all those certificates and provisioning profiles, install them and configure your Appcelerator app to use them when compiling, so the code can be signed correctly</li>
</ul>

<p>Running the gauntlet of getting the environment setup to test on an actual iOS device requires another article entirely. However, once you get through it the first time, it gets easier. Just follow the instructions on Apple&#8217;s site. Here, we will just stick with running things in the iOS Simulator.</p>

<p>You&#8217;ll notice that an Android SDK must be installed even though we&#8217;re only doing iOS development. This seems to be a nagging requirement in order for Titanium Developer to get going (things may get fixed in later releases so try it out without it first, but if it complains, go ahead and install the Android SDK).</p>

<h2>Making a New Project:</h2>
<ol>
<li>Create a New Project</li>
  <ol>
  <li>Open up Titanium Developer </li>
  <li>Select New Project</li>
  <li>Select Mobile as the Project Type</li>
  <li>Fill in the other fields (make sure App Id exactly matches the AppID you registered in Apple&#8217;s developer website)</li>
  <li>Click "Create Project"</li>
  </ol>

<li>You&#8217;ll be taken to the Project Settings window (the Edit tab)</li>
  <ol>
  <li>Just click "Save Changes"</li>
  </ol>

<li>Now click the "Test & Package" tab, you&#8217;ll see 3 sub tabs:</li> 
   <ol>
   <li>Run Emulator - Runs your program in the emulator</li>
   <li>Run on Device - Runs it on a developer iPhone/iPod Touch (you&#8217;ll need provisioning profiles setup)</li>
   <li>Distribute - Packages up the app for submission to the App Store (or adhoc distribution to non-developer profiled iPhones/iPod Touches) </li>
   </ol>

<li>Click on the "Run Emulator" tab and then click on the "iPhone" subtab
Ensure that the SDK is the latest iOS SDK version and click "Launch" </li>
</ol>

<p>If all goes well you&#8217;ll see a bunch of messages scroll past while the project code is compiling, and then the iOS Simulator launches. </p>

<p>Congratulations, you&#8217;ve successfully run the skeleton code!</p>


<h2>Modifying the Code</h2>
<p>Open up the directory you told Titanium Developer to create, all your code and assets will be stored in the <code>Resources/</code> subdirectory. You&#8217;ll notice that it already contains the <code>app.js</code> skeletal code; this is essentially the entry point into your app (ie. your <code>main()</code> routine). You&#8217;ll also notice <code>iphone/</code> and <code>android/</code> subfolders. This is where you keep assets that will override the default assets you have in your Resources folder for when you need to target the particular platform.</p>

<p>Now let&#8217;s actually write some code, rename/move the existing <code>app.js</code> file to <code>app.js.old</code> (be aware that any file with the <code>.js</code> extension will get compiled by Titanium Developer by default even if it isn&#8217;t used in your end product). Create a new file called <code>app.js</code> and enter the following code:</p>

<pre><code>Ti.API.info('Creating a new root window');

var w = Titanium.UI.createWindow({ backgroundColor: 'white' });
w.open();


Ti.API.info('Creating Label');

var label1 = Titanium.UI.createLabel({
    text: "Name Please",
    backgroundColor: 'gray',
    color: '#000000',
    top: 20,
    height: 'auto'
});
w.add(label1);   // add the label to the window


Ti.API.info('Creating Text Input Field');

var textfield1 = Titanium.UI.createTextField({
    backgroundColor: 'green',
    hintText: "Type Here",
    height: 35,
    top: 100,
    left: 10,
    right: 10,
    borderStyle: Titanium.UI.INPUT_BORDERSTYLE_ROUNDED
}); 
w.add(textfield1);  // add the textfield to the window


Ti.API.info('Creating Button');

var button1 = Titanium.UI.createButton({
    title: 'Click Me',
    width: 150,
    height: 30
});
w.add(button1);   // add the button to the window


Ti.API.info('Adding an eventListener to the button');


// Here&#8217;s the button 'click' event listener, 
// notice the second parameter is an anonymous function with
// a parameter 'e', this is the event dictionary.

button1.addEventListener('click', function(e) {
    Ti.API.info('Button was clicked, e is ');
    Ti.API.info(e);
    Ti.API.info('Text field has value ' + textfield1.value);

    if(textfield1.value.length <= 0) {
        alert("Enter your name please");
    }
    else {
        label1.text = "Hello " + textfield1.value;
    }
});
</code></pre>

<p>Click on the Simulator tab and launch the app. If everything works, you&#8217;ll be shown a white window with a gray background label that says <samp>"Name Please"</samp>, a text field and a button that says <samp>"Click Me"</samp>. Go ahead and try clicking the button. You&#8217;ll see an alert popup message telling you to enter a name. If you look through the <code>button1.addEventListener</code> callback function, you&#8217;ll see where that <code>alert()</code> is coming from. Now enter your name in the text field and click the button again, this time you&#8217;ll notice the gray label up top change to say <samp>"Hello xxxx"</samp> (where xxxx is what you entered in the text field).</p>

<p>You&#8217;ll notice that we don&#8217;t have any <code>main()</code> functions or event loops; all that stuff is handled by the underlying iOS SDK. We&#8217;re developing on essentially an asynchronous event-driven model (similar to the Javascript model in browsers).</p>

<p>Congratulations on writing your first app! Your next steps should be to browse through and experiment with the various API methods available in the Titanium Mobile SDK found at <a href="http://developer.appcelerator.com/documentation"><span>http://developer.appcelerator.com/documentation</span></a>. 

Also be sure to read the Getting Started guides:
<a href="http://wiki.appcelerator.org/display/guides/Getting+Started+with+Titanium"><span>http://wiki.appcelerator.org/display/guides/Getting+Started+with+Titanium</span></a>

as well as the Getting Started with the KitchenSink demo app:
<a href="http://wiki.appcelerator.org/display/guides/Getting+Started+with+Kitchen+Sink"><span>http://wiki.appcelerator.org/display/guides/Getting+Started+with+Kitchen+Sink</span></a>
</p>

<h2>Programming Notes</h2>
<h3>Old-School Debugging</h3>
<p>The best way to debug is to throw <code>Titanium.API.info('some debugging message')</code> all over your code. You can also use the name <code>Ti</code> rather than <code>Titanium</code> (ie. <code>Ti.API.info('some message')</code> ) to save you from having to type so much. It&#8217;s also worth noting that you can usually pass arbitrary objects/variables by themselves to <code>Ti.API.info</code> and it will try to print out the string equivalent of the object (if available). <code>Ti.API.warn()</code> and <code>Ti.API.error()</code> are also available for logging purposes (they&#8217;ll show up in the Titanium Developer console in different colors).</p>

<h3>Subtleties of <code>createWindow()</code></h3>
<p>You can pass in a javascript filename to <code>createWindow()</code> using the dictionary key <code>url</code>, this way you can modularize your code into smaller chunks. The <code>createWindow()</code> method will essentially fire off the new window in a separate thread and you won&#8217;t really have access to any variables or functions within it after its launched. You can send initial data into the new window by adding arbitrary key/values to the dictionary parameter of createWindow. From within the new window script you can get access to those intial variables via the <code>Titanium.UI.currentWindow</code> variable. For example if you had code like:</p>

<pre><code>  var w = Titanium.UI.createWindow({ 
    url: 'newWindow.js&#8217;,
    foo: 'bar', baz: ['zab', 'rab'] 
});
</code></pre>

<p>then, within the <code>newWindow.js</code> file you&#8217;ll be able to get access to <code>foo</code> and <code>baz</code> like so:</p>

<pre><code>Titanium.UI.currentWindow.foo      // gives us 'bar'
Titanium.UI.currentWindow.baz     // gives us ['zab','rab']
</code></pre>

<p>The only way to get data back out of the window is to use the event handling facilities of the framework. And on that note&#8230;</p>


<h3>Event Handling is powerful asynchronous communications stuff</h3>
<p>Make liberal use of the event mechanisms of the framework to communicate between various threads and subsystems in your app. In addition to the built-in events, <code>addEventListener</code> can listen to any arbitrary event name you want. The complementary function, <code>fireEvent</code> allows you to fire any arbitrary event name you like, with any parameters you like, in a dictionary that gets passed on to the event callback function. Practically every object in the Titanium SDK has the ability to fire or listen to events. The most globally accessible object is the <code>Titanium.App</code> object: you can write an event listener to <code>Titanium.App.addEventListener</code> and do a <code>Titanium.App.fireEvent</code> from completely different areas of your app (see the example I provided above). This allows you a great deal of power for inter-thread communications (i.e., sending messages between windows and the like). Any number of event callback functions can be added via the <code>addEventListener</code>, and they&#8217;ll all be called in turn when the event is fired. You can also remove a call back function from an event listener by using <code>removeEventListener</code>. If you fire an event that isn&#8217;t being listened to, it will just be ignored, no harm no foul. Events aren&#8217;t queued forever so if you aren&#8217;t listening on an event when it&#8217;s fired and you missed it too bad.</p>

<p>Here&#8217;s a quick example (we&#8217;ll attach the event listener to the global <code>Titanium.App</code> object):</p>

<pre><code>Titanium.App.addEventListener('fooEvent', function(e) {
     // e will hold { bar: 'rab', zab: 'baz', 
     // and some other core event fields }
     Ti.API.info('fooEvent was called and ' +
                      'we got this event dictionary:');
     Ti.API.info(e);   
});

// &#8230; meanwhile somewhere else &#8230;
Titanium.App.fireEvent('fooEvent', { bar: 'rab', zab: 'baz' });
</code></pre>


<h3>Closures will be your best friends</h3>
<p>As your applications get more complicated, you&#8217;ll be writing a lot of <code>addEventListener</code> type code throughout. It is essential that you keep in mind in what context that event callback function will get called, and what variables are available in that scope. If you create/assign an event listener function within a scope and make use of a variable that is only available in that scope, then when the listener callback function is actually called (some time later way outside of the scope), you may not have access to that variable any more as it has long since been destroyed. This is a common enough thing in Javascript and the workaround is using <a href="http://jibbering.com/faq/notes/closures/"><span>Javascript Closures</span></a>. There are plenty of articles out there that explain it in great detail but let&#8217;s just say it&#8217;s a nice way to bind a scope to a function so that when the function actually does get called the scope is available along with the variables. Here&#8217;s a way of writing that <code>button1 'click'</code> event listener more robustly using a closure:</p>

<pre><code>button1.addEventListener('click', (function (lbl1, txt1) {
    return function(e) {
         Ti.API.info('Button was clicked, e is ');
         Ti.API.info(e);
         Ti.API.info('Text field has value ' + txt1.value);

         if(txt1.value.length <= 0) {
             alert("Enter your name please");
         }
         else {
             lbl1.text = "Hello " + txt1.value;
         }
    };
}) (label1, textfield1) );
</code></pre>

<p>I realize that it looks a little weird, but if you study it carefully you&#8217;ll see that we are actually creating an anonymous function that takes two parameters (<code>lbl1</code> and <code>txt1</code>). Then, we&#8217;re immediately executing that anonymous function, passing in the variables (<code>label1</code> and <code>textfield1</code>) as the parameters. We do all that in one shot. Now within the anonymous function, all we do is return another anonymous function (this will be the actual event callback function that the <code>addEventListener</code> will get). Notice the inner anonymous function has the event 'e' parameter. Remember that Functions are first-class citizens in Javascript and are treated like any other object (there actually is a Function class in Javascript) and that&#8217;s why we&#8217;re able to do this. Notice within the inner anonymous function we&#8217;re using <code>txt1</code> and <code>lbl1</code> as the variables instead of the <code>textfield1</code> and <code>label1</code> we used before. What this accomplishes is the outer function creates a scope having variables <code>lbl1</code> and <code>txt1</code> and the inner function binds to those variables; now no matter where the button 'click' event callback function is called the scope of <code>txt1</code> and <code>lbl1</code> will be correct. It&#8217;s tricky at first, but if you get into the habit of using closures for such event-handling assignments it could save you a lot of painstaking debugging later. On a somewhat important side note, given that there is still a reference to that memory/object the garbage collector won&#8217;t be repossessing the memory space until the reference count goes down to zero (ie. the anonymous function with the closure goes away); this may not seem like a big deal until you realize you&#8217;re developing for a small embedded system with very tight memory constraints.</p>

<h2>Need More Help?</h2>
<p>The API documentation can be found at <a href="http://developer.appcelerator.com/apidoc/mobile/latest"><span>http://developer.appcelerator.com/apidoc/mobile/latest</span></a> . Be forewarned that the documentation is usually a few steps behind the actual code base, and quite possibly may be incomplete or incorrect. Experimentation and educated guesses are usually required when trying out new functionality. Alternatively you can search in the "Q&A" section of the developer site to see if anyone else encountered the same problems <a href="http://developer.appcelerator.com/questions"><span>http://developer.appcelerator.com/questions</span></a>. If you get really desperate you can start digging through the actual framework Objective-C code located in <code>YourApplicationDirectory/build/iphone/Classes/</code>.</p>

<p>Here&#8217;s a big hint that took me a very long time to figure out. Almost every class that takes the form <code>Titanium.UI.*View</code> is based off of the <code>Titanium.UI.View</code> class so they inherit all of <code>View&#8217;s</code> properties and methods. They can be used interchangeably whenever a View is required. Views are the workhorses of the Titanium GUI framework, they provide the rectangular regions upon which the GUI widgets are drawn.</p>

<p>One of the best places to get working examples for the Titanium Mobile API is the KitchenSink demo app. As its name implies, it has demo code for practically every feature offered by the API. You should be able to download it from the same servers you downloaded SDK from (<a href="http://developer.appcelerator.com/get_started"><span>http://developer.appcelerator.com/get_started</span></a>) . Simply uncompress the downloaded archive and import the <code>KitchenSink</code> directory containing the <code>tiapp.xml</code> file (the Titanium project configuration file) into Titanium Developer. Then, launch it in the Simulator. Browse through the <code>KitchenSink/Resources/examples/</code> subfolder to find the acual javascript code. There&#8217;s quite a bit of undocumented code/features in there.</p>

<h2>Troubleshooting Tips</h2>
<p>If you ever come across an annoying bug in code (and you will) that you are fully certain should work (especially obscure low-level Exceptions that are thrown before crashing), and you can&#8217;t seem to get past it, try clearing out the build directory. For iPhones, the build directory you want to delete will be <code>YourAppsDirectory/build/iphone/build/</code> . Do NOT delete the outer <code>build</code> directory or you&#8217;ll have to create the project from scratch again. </p>

<p>Worst case scenario: You may have to create a new project from scratch and copy the <code>Resources</code> and all the assets into it. I&#8217;ve had this happen a few times, especially when the Mobile SDK was upgraded.</p>

<h2>iPhone App Development&#8202;&#8212;&#8202;Only Faster</h2>
<p>Appcelerator is a fickle beast that&#8217;s definitely rough around the edges and ever-evolving, but when it works (and when you get used to its quirky ways), it&#8217;ll help you throw together a working iPhone app much faster than if you had to write it all from scratch in Objective-C. It&#8217;s especially useful for the lazy web developer.</p>
]]></content:encoded>
            <pubDate>Mon, 04 Apr 2011 20:53:29 GMT</pubDate>
        </item>
        <item>
            <title>Aimless Social Media: your brand deserves better</title>
            <link>http://omniti.com/seeds/aimless-social-media-your-brand-deserves-better</link>
            <guid>http://omniti.com/seeds/aimless-social-media-your-brand-deserves-better</guid>
            <description><![CDATA[

The state of social media integration today can be likened to the early stages of "Web 2.0", eight years ago. At that time many confused the phrase, assuming it meant either: a formalized change to the technology powering the Word Wide Web, or a visu...]]></description>
            <content:encoded><![CDATA[<img alt="Toy plane in a tree" src="http://images.omniti.net/omniti.com/i/b/social-article-bw.png" width="448" height="220" style="margin-top: 1em;" />

<p>The state of social media integration today can be likened to the early stages of "Web 2.0", eight years ago. At that time many confused the phrase, assuming it meant either: a formalized change to the technology powering the Word Wide Web, or a visual design movement that focused on glossy headers, cute icons, crayola colors and gradients galore (cringe). Lucky for us, it represented a paradigm shift in how we (marketers, developers, designers and end-users) viewed interaction on the web&#8212;a shift that ultimately spawned the first social networks. Fast forward eight years: the organic development of user-generated media through these social networks and emerging mobile technologies spawned Web 2.0's progeny&#8212;Social Media. HOORAY!  But, like many of those misguided early adopters in 2005 (who added RSS icons to sites containing no feeds) similar mistakes are being made integrating social media into sites today.</p>

<p>Curious about social media&#8217;s "RSS icon"? Well, I&#8217;m sure you have seen it on sites before: the ubiquitous "Like" or "Follow Us" button. It seems like a great feature; it suggests, "these guys are cool, they are on Facebook or Twitter!" But what do these features do for your site? Do they spread the word? Sure. But to what gain? What happens when users "follow" your brand, or "like" your product and then find little substance once back on your site&#8212;finding only more "like" buttons? Odds are they will quickly move on. The web is chock-full of options, and you are not the only person dabbling in social media. So, how can your brand stand out over the "other guys"?  How can you avoid the pitfalls of aimless social media integration?</p>

<p>The trick is to create a user experience that speaks to your audience; leveraging the best that social media has to offer, while avoiding any inherited flaws. The modern web is an ever-changing beast, mercurial in its habits and powered by the "ADD generation." Social media is a marketer&#8217;s weapon for targeting users in this fickle medium. With a deeper understanding of the technologies behind social media applications and a little creativity from your team, you can create a rich, unique experience that will not only keep your users coming back, but will pave the way for a growing audience. Using these sites simply as viral launch pads is a near-sighted use of their true power&#8230;The users who give you their time&#8212;and your brand, in which you invest so much&#8212;deserve something more interesting.</p>

<p><em>How can you get started?</em></p>

<h2 class="section-head"><span>S</span>tep 1:  Understand your user base, then create features for them.</h2>

<p>Often social features are added to sites as a Hail Mary pass, fueled by notions like, "it&#8217;s what <em>they</em> did, and <em>they</em> increased conversions by 8%". But <em>they</em> are not you, and <em>their</em> users are not necessarily the same as your users. So, when starting any social media project, you must first ask yourself: Are my users into social media, and if not, why not/could they be?</p>  

<p>What if your user base is not filled with savvy social media junkies? Expecting a novice web user to leverage the full stack of social networking sites, means tough sledding. For most businesses, the safe bet is to start with Facebook. The core functionality behind the massive social network, has proven to appeal to a <a href="http://www.digitalsurgeons.com/facebook-vs-twitter-infographic/"><span>wide demographic</span></a>. They allow businesses to setup a <a href="https://facebook-inc.box.net/shared/9e5jiyl843"><span>branded Facebook Page</span></a>, with little risk or investment. Once your account is set up, users can subscribe to your page by "liking" it. This is where the "like" button is good: It&#8217;s an easy-to-adopt viral feature that users understand. This is why it&#8217;s so widely used. However, you still must have something to offer&#8212;something enticing to engage your users enough for them to share with others. If you don&#8217;t, the virus dies. You must first create a compelling experience on your own site before you add in the "like" button and share content with the world; otherwise, the effect is minimal. It is as if you are yelling into a giant wind tunnel. With social media, you have to yell with a purpose in order to make an impact.</p>

<p>How do you yell with purpose? How do you take your branding and turn it into social media features? Here&#8217;s where you must understand the value proposition for your users. Explore which products/services/experiences you offer that could be repurposed as a social media vehicle. For example, if you operate a travel site, media (photos/videos) or shared trip diaries may be an approach to take. If you operate a product site, sales and offers would provide an enticing reason to visit. The beauty of the medium is that social networks are as diverse as the sites they serve. Knowing who you are as a brand, and what your users want, will help to solidify your social media strategy and ultimately secure your place in the space.</p>


<h2 class="section-head"><span>S</span>tep 2: The technology exists. Understand it, then use it.</h2>

<p>Under the hood, social network platforms, such as Facebook, Twitter, Flickr, Foursquare and many more, are powerful web applications. As they grew over the later part of this past decade, so did the widespread adoption of using <a href="http://en.wikipedia.org/wiki/Web_service"><span>Web Services, particularly REST APIs</span></a> in web application development. Through the use of these Web APIs, the modern web has become an integrated platform of shared functionality, services and data. Applications have emerged that use a multiple external API approach to create a more robust feature set. In development terms, this approach is referred to as a web application hybrid, or "mashup." These mashups make up today&#8217;s most popular applications including, surprisingly enough, Facebook: great for the social network. Even better for your site.</p>

<p>With a mashup of your own, you have the ability to leverage the power of multiple APIs when designing a rich user experience on your own site. It is an experience open to functionality that is as diverse as the web itself and includes applications like: <a href="http://code.google.com/apis/maps/index.html"><span>Google Maps</span></a>, <a href="http://www.last.fm/api"><span>Last.fm</span></a>, <a href="http://www.bbc.co.uk/programmes/developers"><span>BBC</span></a>, <a href="http://developer.netflix.com/"><span>Netflix</span></a>, <a href="https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html"><span>Amazon</span></a>, <a href="http://developer.etsy.com/"><span> Etsy</span></a>, <a href="http://developer.usatoday.com/"><span>USAToday</span></a>, <a href="http://weather.weatherbug.com/desktop-weather/api.html"><span>WeatherBug</span></a>, <a href="http://www.salesforce.com/us/developer/docs/api/index.htm"><span>SalesForce</span></a>, <a href="http://code.google.com/apis/youtube/overview.html"><span>YouTube</span></a>, <a href="http://instagr.am/developer/"><span>Instagram</span></a>, <a href="http://developer.ebay.com/common/api/"><span>eBay</span></a>, and the aforementioned social networks, to name a few. With all these options, we can be kids in a functionality candy store. In order to gain any traction, you must find the service that best suits your needs.</p>


<h3>Let&#8217;s start with Facebook:</h3>

<p>First, design features that take full advantage of the API to create an integrated, enhanced experience on your site with segmented features and functionality. Sort of a: "Hey, if you clicked 'like' on our Facebook page, wait until you visit our site now&#8212;it&#8217;s cooler&#8212;more intuitive and better because you are now in the community. Cool, huh?". This approach not only promotes conversion from Facebook back to your site, it also promotes sharing&#8212;viral sharing&mdash;and we all know how much fun that can be.</p>

<p>One way you can achieve this is by designing a community feature for your site. Use the objects available via the API to leverage Facebook features such as wall posts, likes, notes, comments and user profile information. You can then create an experience where users will be able to communicate on your site&#8212;about your products&#8212;while seamlessly sharing back on Facebook. This is how you gain the power of Facebook&#8217;s social sharing features without affecting the user experience on your site. This community experience is one that you control; one that you can test, modify and enhance to better suit your own user base. Your users will appreciate the extra effort and care&#8212;something you can&#8217;t achieve with a Facebook page alone.</p>

<p>This community would have to use Facebook&#8217;s authentication methods in order to gain access to user data. By allowing users to sign-in using Facebook credentials, you take out the need for a registration process, making adoption easier for new users. In addition, you control the data collected, improving your business intelligence and paving the way for tightly targeted campaigns.</p>

<p>Now, say you&#8217;ve implemented this Facebook mashup; what happens if Facebook goes down? No worries! If you (or your engineers) have designed this feature intelligently, all objects will be generated and saved in your architecture, then pushed back to Facebook. In case Facebook goes down, the users on your site will remain unaware. <a href="http://omniti.com/seeds/breaking-social-dependency"><span>Breaking your dependency</span></a> on Facebook to handle operations avoids a potentially embarrassing situation for your brand.</p>

<h2 class="section-head"><span>I</span> have tried Facebook and I need more. What about other social networks?</h2>

<p>That answer really depends upon the user knowledge you gained through the discovery process done before you designed your strategy. For example, if you are a media company, the content/brand exposure vehicles, such as Twitter and Flickr, could be appealing options. However, if you are restaurateur, then the geo-location features in Foursquare are a logical choice. But not all businesses can make best use of the same solution. If you keep that in mind, then this exercise becomes vastly easier.</p>

<div class="seeds-cs">

<h3>Retail Store Case Study:</h3>

<p>Let&#8217;s say you are a boutique retailer with seven physical storefronts in the northeast corridor:</p>

<p>In the past, your web-site has been used as a brochure site to introduce new product lines, present new sales and promotions, and sell your "high-end" brand image to prospective customers. It&#8217;s sexy and well designed, but it doesn&#8217;t "work" to achieve your business goals. Your promotions are falling flat and your users don&#8217;t know they are there. How do you spread the word, and ultimately get more feet in the stores? Luckily for you, this is the core behind some of today&#8217;s most interesting and powerful applications, like Foursquare.</p>

<p>Using <a href="http://foursquare.com/business/venues"><span>Foursquare&#8217;s Merchant Platform</span></a>, you can create your own badges, campaigns and offers for users who check into your physical locations. This allows you to incentivize customers to come into your stores, with exclusive, geo-targeted offers.</p>

<p>You could create a "Super Shopper Badge," using the Merchant Platform. Set the unlock requirements to: "Check in at all seven of our locations, or at one location 20 times." Then, you would create a series of location-<a href="http://support.foursquare.com/entries/195165-what-is-a-special"><span>specific offers, that are only available to users with this achievement unlocked</span></a>. If I was shopping in a mall where one of your stores is located, I would see a call to action pointing out an offer nearby, noting that this offer is for anyone with a "Super Shopper Badge," I would want to see what type of product I could get if I qualified for this offer. You can modify the terms of these offers to target first-time shoppers (first check-in), or the "Mayor" (most check-ins at one location). These features elicit buying behavior almost immediately. It would not only prompt return visits from current customers (in hopes of unlocking achievements), it gives a powerful incentive for potential customers and rewards your most devoted customers.</p>

</div>

<p>Once you have customers in the store, buying your product, the next step is to enable users to share with others&#8212;spread the word to non-Foursquare users. How do you go about enabling users to share? Achieving this would, once again, require creative implementation of social media within your own architecture. Possibly a leader-board style user-tracking feature on your site, or a mobile mashup application that is a <a href="http://foursquare.com/apps/"><span>branded tool to engage new users</span></a>. Then, you could design campaigns and other social media initiatives around this "Super Shopper" campaign. Once you have a stable platform in place, the "like," "share" and "follow" buttons are back in the conversation. Now they have a job&#8212;a purpose that will give your campaign "legs" in the bigger social media market.</p>

<h2 class="section-head"><span>O</span>kay, I have some Ideas, but what&#8217;s next?</h2>

<p>Unlike the "like" button (no pun intended) more robust social features are not a 20-minute task. Ultimately, this is all a pipe dream without the technical know-how for implementation. These social platform APIs can evolve rapidly and vary from incremental changes to underlying technologies For example, <a href="http://developers.facebook.com/docs/guides/upgrade/"><span>Facebook changing from REST to Graph API</span></a>, or in the case of <a href="http://www.guardian.co.uk/technology/blog/2011/mar/14/twitter-developers-client-warning"><span>Twitter&#8217;s recent announcement</span></a>, a drastic change to the terms of service for API use. To stay ahead, you need an experienced team (in-house or vendor) that understands the functionality available, as well as the landscape for third-party applications on these social platforms.</p>

<p>To be successful, you must have creative thinking on both the code and the design side. You need it from marketing (sets the company marketing goal) to the designer (creates the experience to realize the goal); and from the designer to the developer (tech know-how to implement). Marketers have to engage their designers and developers to be successful. At the end of the day, you are adding social media to your site to drive new business. Without that goal in mind there is no ROI and the whole initiative falls flat. It&#8217;s not a cakewalk, but the technology is out there. In this article, we have only skimmed the surface of what can be done. Marketers have to get educated about everything social media has to offer before beginning their three-way conversations with designers and developers.</p>

<p>Once you have done your homework, <em>the fun can begin!</em></p>
]]></content:encoded>
            <pubDate>Tue, 22 Mar 2011 15:17:56 GMT</pubDate>
        </item>
        <item>
            <title>Breaking Social Dependency</title>
            <link>http://omniti.com/seeds/breaking-social-dependency</link>
            <guid>http://omniti.com/seeds/breaking-social-dependency</guid>
            <description><![CDATA["OMG, Facebook is DOWN!" That was the cry from millions when Facebook was unavailable for about three hours because of network issues. Given the nature of Facebook&#8217;s service, the downtime did not have any long-lasting effects on its user base. In...]]></description>
            <content:encoded><![CDATA[<p>"OMG, Facebook is DOWN!" That was the cry from millions when Facebook was unavailable for about three hours because of network issues. Given the nature of Facebook&#8217;s service, the downtime did not have any long-lasting effects on its user base. In fact, some say that the productivity significantly increased during the three-hour window without access to Facebook. The bottom line is: the unavailability of the social networking service doesn&#8217;t negatively impact its users (ego and reputation of the service aside). Does this also hold true for the companies leveraging Facebook, or other social networks, like Twitter, Flickr, FourSquare in their daily operations?</p>

<p>Today, more and more companies operating online businesses try to break into the social media realm by leveraging existing services to increase visibility and loyalty to their brand and bring more people to their sites (and consequently, increase the conversions, visits, purchases or participation). I&#8217;ve seen many incarnations of social networking implementations, from the basic, simplified authentication with Facebook Connect augmenting the regular process (for ease of registration/login), to full-blown applications relying heavily upon multiple features available from these services&#8217; APIs. Now, personally, I am all for having these services available and used strategically throughout the applications. It provides a tremendous benefit not only in brand familiarity and content, but also in cost saving&#8202;&#8212;&#8202;you&#8217;re leveraging years of someone else&#8217;s work for your gain. Consider Flickr. The storage, CDN and REST APIs to present the assets have all been developed and tested for you by a number of smart engineers for a number of years; all you need to do is to integrate the functionality within the content of your site. The same services are available to everyone, and you make the business decision about which features would be beneficial to your company&#8217;s strategy. The implementation of the features, however, varies significantly.</p>

<p>One of the major risks when implementing a third-party service is the reliance upon the availability of that service&#8202;&#8212;&#8202;one that you have no control over. And, no matter how large or successful that service is, it will go down at one point or another.  Twitter, as an example, is well-known for intermittent service degradation, often followed by noticeable outages. Now, imagine what happens during the Twitter downtime if your site&#8217;s content heavily relies only on Twitter API.</p>

<p>Let&#8217;s examine a situation where a large online media company decided to switch to Facebook Connect as the exclusive authentication method for their site. (To prevent the discussion about the viability of this choice, let me just note that there were legitimate business reasons for choosing this approach). This is where the fun starts. The graph below represents HTTP load time for the pages on the site at every stage of the process. Even without the captions on the graph, everyone should be able to pin-point the exact time when the new code was deployed, and the load time of the pages tripled. The project owners were notified, but since the load times were extremely low to begin with (thank you, properly implemented caching) the load speed was deemed acceptable, and the changes remained in production. Time passed. And then some more time passed. And then the dark day came - the day when Facebook went down. And the page load times on the media site tripled again, for a very brief period of time (while Facebook servers were just lagging), and then dropped to 0, i.e. "users are unable to see the site." Just like that-Facebook&#8217;s problem became the company&#8217;s problem.</p>

<img src="http://images.omniti.net/omniti.com/i/b/facebook-connect.jpg"/>

<p>Upon closer code investigation the problem was identified and resolved quickly, also reducing the page load time to it&#8217;s original threshold as a byproduct of the change, but it shows how dependent your site can become upon third-party service availability if the features are not implemented correctly.</p>

<p>How can these issues be avoided? There are a few common sense rules that, for some reason, are often ignored during development, which should help with the integration of external services without affecting your site&#8217;s performance.</p>

<ol>
<li><em>Only connect to a third-party service where needed.</em>
<p>Don&#8217;t try to connect to Facebook on every page load to validate that the user is still the user to whom you displayed the previous page. Cache the results locally.</p>
</li>

<li><em>Don&#8217;t make connections to a third-party service in the critical path of the page load.</em>
<p>Don&#8217;t load Google Analytics as the first thing on your page, you will delay the display of the content that actually matters. Make the connections after your content is loaded, or better yet, connect asynchronously.</p>
</li>

<li><em>Trap time-outs and errors.</em>
<p>You do it with your database connections, why would you treat external connections differently?</p>
</li>

<li><em>Create a fallback plan.</em>
<p>You have no control over external services, but you do have control over the content presented to your users. If  Flickr feed is the essential feature of your site&#8202;&#8212;&#8202;store the displayed history locally, so you can fall back to the latest available content in case Flickr is unavailable. Remember, sometimes stale content is better than no content at all.</p>
</li>
</ol>

<p>To make a blanket statement&#8202;&#8212;&#8202;don&#8217;t jump into using social media features without identifying a need for them and use them to support your primary business model. At the end of the day, when integrating any third party service, you are trying to leverage the benefits of the available functionality to enliven the experience for your own users, not to inherit the services&#8217; availability problems. Integrate smartly, not blindly.</p>]]></content:encoded>
            <pubDate>Mon, 14 Mar 2011 15:47:24 GMT</pubDate>
        </item>
        <item>
            <title>On the Engineering of SaaS</title>
            <link>http://omniti.com/seeds/from-making-software-to-running-saas</link>
            <guid>http://omniti.com/seeds/from-making-software-to-running-saas</guid>
            <description><![CDATA[Software has been around for a long time in various forms: open and closed, commercial and non-commercial. The one thing that holds true about software products is that you, as a consumer, have to acquire them, install them and operate them. &nbsp;For ...]]></description>
            <content:encoded><![CDATA[<p>Software has been around for a long time in various forms: open and closed, commercial and non-commercial. The one thing that holds true about software products is that you, as a consumer, have to acquire them, install them and operate them. &nbsp;For the past several years, there has been an industry movement away from providing software in this traditional sense and instead providing the use of the software as a service (SaaS). SaaS has been around in many forms. Many companies (and investors) have recognized the <a href="http://www.cooley.com/files/uploads/KippsCooley/kipps0909.html"><span>opportunities that SaaS provides as a business model</span></a>, but transitioning to it from a standard software development model requires a lot more than an executive decision. Herein I&rsquo;ll try to lend some insight into what&rsquo;s in store for you as you transition from a software company into a SaaS company.</p> 
<h3>1. A customer of one.</h3> 
<p>Typical software engineering processes are well-evolved and quite rigorous. They are designed to ensure that the product you release and ship around the world will boast minimal defects and incur as little as possible in the way of defect handling via patching or upgrading. While it may not be extraordinarily difficult to package the next version of your product, you must deal with making the installation/upgrade process as fool-proof as possible or you risk leaving customers stranded mid-upgrade. Getting the entire customer-base to upgrade to the latest version in a reasonable fashion is intractable and the more rapidly you release your product, the more frustrated customers become and the more unique versions you have to support &ldquo;in the wild&rdquo;.</p> 
<p>SaaS engeering couldn&rsquo;t be more different. Why? The typical software product driving a SaaS architecture has exactly one customer: you. You have one version of the product in production and it has to work all the time. An upgrade process, for example, is an entirely different beast. Making it robust and repeatable is far less important than making it quick and reversible. This is because the upgrade only ever happens once: on your install. Also, it only ever has to work right in one, exact variant of the environment: yours. And while typical customers of software can schedule an outage to perform an upgrade, scheduling downtime in SaaS is nearly impossible. So, you must be able to deploy new releases quickly, if not entirely seamlessly &#8212; and in the event of failure, rollback just as rapidly.</p> 
<p>You will find that your needs in operating the product will have a tremendous impact on the the engineering roadmap. Interestingly, you will likely find that the features incorporated into the product should have been on the roadmap in the first place, but you lacked the insight or foresight, because you were not responsible for operating the product in a production setting. From here on out, while you build the service for your users, you build the underlying software products for a customer of one.</p> 
<h3>2. You aren&rsquo;t a software company anymore.</h3> 
<a href="http://omniti.com/writes/web-operations"><span><img style="margin-left:0.5em;margin-bottom:0.5em;float:right;width:198px" src="http://s.omniti.net/i/content/books/web-operations-198.gif"></span></a> 
<p>You aren&rsquo;t a software company anymore, you are an operations company. Software as a Service is much more about service than software. In fact, the users of your service will be just as satisfied thinking that magic pixies power the service they use as some complex software system. With this change comes a rather intimidating shift in expectations. Users expect software to have bugs, they expect to schedule downtime to upgrade, install, backup or otherwise manage the software product they are operating. With a service, however, there is a strong predisposition of users to expect things to be &ldquo;always on.&rdquo; As a simple analogy, if you sell a user a diesel generator, they will expect it to need maintenance, needs refueling and have the occasional service issue. Sell them electrical service and watch them come with pitchforks demanding refunds if you have an outage of any sort.</p> 
<p>While this may seem silly at first, the expectation isn&rsquo;t out of line. It&rsquo;s a simple bit of economies of scale. Your job as a SaaS company is to operate the software, so logically you should do a better job than they would. Additionally, you are operating it for a large set of users, so it is a reasonable expectation that you have refined your operational techniques. Lastly, they pay you for one thing: to operate the service &#8212; so you had better get it right.</p> 
<p>Working as an engineering company with an operations focus rather than a product focus can be a significant challenge for traditional software engineering companies. &nbsp;You should expect to see roles removed, roles introduced and organizational structure changed to add accountability for operating your service as your users expect.</p> 
<h3>3. Continuous Deployment</h3> 
<p>One of the greatest advantages of being a customer of one for your software is that you don&rsquo;t have to worry about the oddball deployment or &ldquo;that guy&rdquo; that refuses to upgrade. &nbsp;It means that once you&rsquo;ve deployed the latest version of code into production, you have no legacy copies, no troubleshooting of version differences and a definitively less complicated error reporting process. This, however, can cause a paradigm shift in development and deployment processes. It means that you can have a bug report at 8 a.m., a fix by 8:15 a.m. and a deployment by 8:20 a.m. Traditional software engineering companies have no other word to describe this but &ldquo;insane.&rdquo; It might seem reasonable to simply elect not to subscribe to that pattern of behavior due to the risks involved, but there is weakness in that stance.</p> 
<p>In the era of SaaS, companies have engineered processes to successfully manage the risks of rapid deployment schedules (<a href="http://omniti.com/seeds/online-application-deployment-reducing-risk"><span>OmniTI</span></a>, <a href="http://timothyfitz.wordpress.com/2009/02/10/continuous-deployment-at-imvu-doing-the-impossible-fifty-times-a-day/"><span>IMVU</span></a>). What was once a patch release every two weeks can now be managed as hundreds of patch releases per day (in the extreme case). By carefully engineering risk out of the deployment process, a SaaS company gains agility to launch fixes, improvements and features into production at any time. If your competition can do this and you cannot, you are disadvantaged.</p> 
<p>While it may take considerable effort to redefine your engineering processes to adequately limit risk and allow for continuous deployments, the advantages are significant. Due to the velocity of deployments that must be supported, the process of deploying itself must be engineered to be non-disruptive to services. This alone has the side effect of enabling feature launches, upgrades and triage without consequential downtime. It is the first step toward an &ldquo;always on&rdquo; architecture.</p> 
<h3>4. Quality Assurance is now a continuous process.</h3> 
<p>Quality assurance has a strong role in software engineering. While there is much effort expended automating QA, an automated QA process is sufficient if your service is used only by automated systems. If humans consume your service, you must also have human-driven QA processes. So, while much of the QA process can be automated and performed rapidly, it will never replace human usage of the application to detect both errors and perceived errors. In SaaS systems, the velocity of user-facing change is (at least) an order of magnitude higher than in traditional software engineering. It is inevitable that bugs will not only appear, but that they will reappear. Performing a full QA regression prior to each release is often unfeasible.</p> 
<p>Your users are just as much members of your QA team as your employees. By making your users aware of that, by treating their feedback, complaints, bug reports and feature requests as first-class items, you enable them to improve your QA process and, more importantly, increase their tolerance for your mistakes. John Martin has a short, but enlightening, diatribe about the <a href="http://buildingsaas.typepad.com/blog/2006/08/highmetabolism_.html"><span>quintessential difference between QA in traditional environments and SaaS</span></a>.</p> 
<p>Perhaps the single most significant change to embrace is that of QA&rsquo;s place. What was once a engineering phase, a deliverable, or a series of bars on a project manager&rsquo;s Gantt chart (ultimately leading to a celebratory day of shipping a product release) is now a continuous and critical operational role within continually delivered and continually used service.</p> 
<h3>5. Multi-tenancy design.</h3> 
<a href="http://omniti.com/writes/scalable-internet-architectures"><span><img style="margin-left:0.5em;margin-bottom:0.5em;float:right;width:198px" src="http://s.omniti.net/i/content/books/scalable-internet-architectures-198.gif" /></span></a> 
<p>So far, we&rsquo;ve discussed mostly process changes that enable transforming from a builder of software to an operator of software. The last paradigm shift is perhaps the hardest as it relates to design philosophy and design goals rather than design processes.</p> 
<p>When traditional software is designed, it runs on a system or set of systems for a single user. While a &ldquo;user&rdquo; in this sense can be an individual, or a business unit or perhaps even a whole organization, it is clearly not &ldquo;all users.&rdquo; It is the difference between engineering a car and engineering a complete metropolitan transit system. It is an issue of designing at scale.</p> 
<a href="http://www.amazon.com/gp/product/0137030428?tag=akpa-20"><span><img src="http://images.omniti.net/omniti.com/i/b/the-art-of-scalability-188.jpg" style="margin-right:0.5em;margin-bottom:0.5em;width:188px;height:250px;float:left" /></span></a>
<p>Not only does this mean designing and building software that can handle thousands of times the load that your previous design enabled, but also engineering the solution to malfunction elegantly. Malfunction elegantly? Yes. All human engineered products will malfunction, it is a simple fact of life. In a SaaS, it is essential that when this happens that the malfunction is isolated to the smallest possible component of the service or to a specific customer. Back to our transit metaphor: the failure of a single bus, subway train or taxi must adversely affect as few users as possible; ideally, only those physically on the failed unit. This consideration is simply (and obviously) not present in the design of a single car.</p> 
<p>The engineering paradigm shift from a single-user product to a multi-tenancy product is the most challenging metamorphosis required by a software company that intends to adapt and survive in the SaaS era. Two books that talk about the underlying mechanics of these challenges are <a href="http://www.amazon.com/Scalable-Internet-Architectures-Theo-Schlossnagle/dp/067232699X"><span>Scalable Internet Architectures</span></a> (written by me) and <a href="http://www.amazon.com/gp/product/0137030428?tag=akpa-20"><span>The Art of Scalability</span></a> by Abbott and Fisher</p> 
<h3>Making good on a promise</h3> 
<p>While you may not have made a promise about what your SaaS offering will provide, the industry has set some undeniable expectations about what SaaS generally delivers. &nbsp;At a minimum, you must meet these expectations or your users will abandon you. These expectations are naturally derived from the key drivers for adopting SaaS: no maintenance, no upgrades, always current, always available, no commitment (or the desire for operational expenses over capital expenses). It is imperative that you understand where the bar is set for those wishing to shift into a SaaS delivery model.</p> 

 
<p>With the exception of software companies that produce software that powers SaaS, most traditional software companies must evolve into a SaaS delivery model or suffer death at the hands of competition. &nbsp;Evolving into a SaaS delivery model without addressing the above key points will lead to substandard service, artificially high operating costs, user attrition and eventual collapse. &nbsp;You have to do it. You have to do it right. Are you ready?</p> 
]]></content:encoded>
            <pubDate>Tue, 01 Mar 2011 16:15:28 GMT</pubDate>
        </item>
        <item>
            <title>Maintainable Stylesheets: Can CSS Be Object-Oriented?</title>
            <link>http://omniti.com/seeds/maintainable-stylesheets-can-css-be-object-oriented</link>
            <guid>http://omniti.com/seeds/maintainable-stylesheets-can-css-be-object-oriented</guid>
            <description><![CDATA[How can CSS be object-oriented? In short: it can&rsquo;t. But if we stopped there, then this would be a pretty short article! Let&rsquo;s take a look at what is meant by the term &ldquo;Object-Oriented CSS&rdquo; and how it can help improve your styles...]]></description>
            <content:encoded><![CDATA[<p>How can CSS be object-oriented? In short: it can&rsquo;t. But if we stopped there, then this would be a pretty short article! Let&rsquo;s take a look at what is meant by the term &ldquo;Object-Oriented CSS&rdquo; and how it can help improve your stylesheets.</p>

<h3>What&rsquo;s the Problem?</h3>

<p>First, we should take a step back and look at the problems that lead to CSS being difficult to maintain. Most projects start out easily enough. We write some styles for page structure, some default content styles, then some specialty content styles, right? We always ensure that our styles are separate from our markup and that our markup is clean and semantic, thanks to the web standards movement. When the project launches, we feel good about our work and don&rsquo;t think much about code maintainability. After all, it&rsquo;ll be easy to maintain because we wrote the code.</p>

<p>The first round of updates comes and it&rsquo;s not too difficult. The stylesheets are still fairly small and easy to navigate, so the appropriate styles are found and updated. Some new styles are added because the new content is a little different than the existing content. Soon, a second round of updates is made, then a third. Each time, a few more styles are added to the stylesheets. Then we&rsquo;re taken off the project for one of a hundred reasons, and someone else begins maintaining the site. Or perhaps we simply don&rsquo;t touch the site for a year and then have to come back and make some updates; after such a long interval, we don&rsquo;t remember as much about the code as we think we do. In either case, it&rsquo;s difficult to understand the mix of ids, classes and element names, so more style declarations are added, overriding previous styles where necessary to get the desired effect.</p>

<p>Before long, some styles become defunct and other styles become unnecessarily complex as the stylesheets become overly complicated. We&rsquo;ve ended up with spaghetti code, and the site&rsquo;s stylesheets now take longer to download as they get more bloated. How do we get out of this mess? Is there a way to create truly maintainable style sheets?</p>

<p>Yes, there is a way, and it&rsquo;s recently come to be known by the term, &ldquo; <a href="http://www.stubbornella.org/content/category/general/geek/css/oocss-css-geek-general//"><span>Object-Oriented CSS</span></a>,&rdquo; thanks to Nicole Sullivan. Style declarations are not objects in the programming sense of the word, but the term &ldquo;Object-Oriented&rdquo; refers to a mindset of thinking about how styles should be written and applied.</p>

<h3>Keep Your Styles Where They Belong</h3>

<p>Before we begin, we need to lay down a ground rule for keeping our styles maintainable: keep styles in the stylesheets. This means two things:</p>
<ol>
  <li><h4>Styles don&rsquo;t belong in markup.</h4> Nowadays, we usually are pretty good about this, but it&rsquo;s always a good reminder. Don&rsquo;t succumb to the temptation to insert a quick style attribute here and there. It seems innocuous now, but we may regret it later.</li>
  <li><h4>Styles don&rsquo;t belong in Javascript.</h4> This may or may not be up to us depending upon how the division of labor happens where we work. But if JS is on our plate along with CSS&hellip;great! That means we have the power to ensure that each site&rsquo;s behavior (JS) is separated properly from its styling. Javascript should <em>almost never</em> be used to manually modify styles on an element&mdash;the lone exception to this is when a style value has to be calculated based upon other information. If we set styles via JS, what happens when we need to change those styles in the future? We&rsquo;ll have to go dig through the JS and find every place those styles are being changed, instead of just making some simple stylesheet edits.</li>
</ol>

<p>If we can&rsquo;t edit styles with Javascript, how <em>should</em> JS interact with styles? The best method is using JS to modify elements&rsquo; classes. As we&rsquo;ll see below, classes are the keys to elements&rsquo; identities and run-time states. By using Javascript only for modifying class names, all of the actual styling is handled by the stylesheets, making styles easy to find and modify.</p>

<h3>And Now For Something Completely Different&hellip;Sort of</h3>

<p>When we discuss this new, &ldquo;Object-Oriented&rdquo; mindset, there&rsquo;s one key word that will be our mantra: &ldquo;reusability.&rdquo; Ok, there are really two key words: &ldquo;reusability&rdquo; and &ldquo;patterns,&rdquo; so I guess you could say we&rsquo;re looking for &ldquo;reusable patterns&rdquo; here.</p>

<p>If we start thinking about creating &ldquo;reusable patterns&rdquo; at the beginning of a project, it&rsquo;s a lot easier than going back and fixing a site that has already launched. But it still may be worthwhile to revisit existing sites&rsquo; code. Putting in the work now to make our stylesheets maintainable will not only save us future headaches, but it will also speed up our sites since the CSS files will be smaller and will download more quickly.</p>

<p>Before writing any code, we should pull out our layout comps and review them, looking for patterns. The patterns could be large and easy to spot like column arrangements or page block arrangements, or the patterns could be small like a box for entering login information. The patterns could be structural like column or block layouts, or the patterns could be stylistic like font and color choices. But finding the patterns is only the first step. When thinking about how to style these patterns, we need a paradigm shift in our approach. Instead of writing style selectors using IDs and element names, we will use classes to describe the elements&rsquo; identities and states. As a simple example of this, we&rsquo;ll look at a small toolbar of zoom controls.</p>

<p>When thinking on how to style a zoom toolbar, our first inclination might be to style the entire toolbar using its ID <code>#zoom_bar</code> and then style the individual controls using their IDs or element names, either <code>#zoom_bar #zoom_out</code> or <code>#zoom_bar button</code>. But we should stop and think about this for a moment. If we use IDs, then these styles are only applicable to this particular instance of the toolbar. If we want to reuse the styles we have to add another set of selectors, giving us <code>#zoom_bar button, #another_bar button</code>. Now we have two sets of selectors and will probably need to add more in the future. And what if we&rsquo;re styling link states? <code>#zoom_bar a:hover, #zoom_bar a:active</code> would turn into <code>#zoom_bar a:hover, #zoom_bar a:active, #another_bar a:hover, #another_bar a:active</code>. Writing styles this way quickly results in bloated stylesheets with way too many selectors.</p>

<h3>Decouple Your Styles</h3>

<p>Instead of styling elements according to &ldquo;what we call them,&rdquo; let&rsquo;s shift our thinking to &ldquo;reusable patterns&rdquo; and style elements according to two things: their <em>identities</em> and their <em>run-time states</em>. Now, what is the core identity of the zoom toolbar? Zooming is what it &ldquo;does,&rdquo; not what it &ldquo;is.&rdquo; Its core identity is that of a toolbar, or perhaps more generically, &ldquo;a group of buttons.&rdquo; So what if we give it the class <code>.button_set</code>? This refers to its generic identity and is very reusable. As for the controls inside, we don&rsquo;t use their element names either (like &ldquo;button&rdquo;), but instead we create a class based on their identity, which could be as simple as <code>.button</code>. This decouples an element&rsquo;s styling from its markup; now we can use <code>.button_set .button</code> as our selector, which avoids the specificity of a particular instance ID, which is not really reusable. This also gives us the flexibility to use buttons in our zoom toolbar or to change the elements in other (or future) instances of <code>.button_set</code>. What if we encounter an issue that makes us change all the buttons to anchors? By using an &ldquo;identity class&rdquo; and decoupling the styles from the markup, all we need to do is change the markup, not the stylesheets.</p>

<p>The other type of classes we will create for our <code>.button_set</code> is for the elements&rsquo; <em>run-time states</em>. In this instance we may have three different states: the default state, an active state and an inactive state. Creating classes for these states is as easy as creating <code>.active</code> and <code>.inactive</code> classes. Notice that these class names only describe the elements&rsquo; states, not what the elements look like or where they are positioned. Again, this decouples the markup from the styles, making the class selectors generic and extremely reusable.</p>

<p>When making this paradigm shift, we need to be careful to only use classes when defining our patterns&mdash;don&rsquo;t fall back into the trap of using IDs for styling. It&rsquo;s easy to relapse, but because of IDs&rsquo; <a href="http://www.stuffandnonsense.co.uk/archives/css_specificity_wars.html"><span>high specificity</span></a>, we should be especially vigilant about not using them for styling except to define occasional exceptions. And even then we should always ask ourselves, &ldquo;Can I take this exception and make it reusable?&rdquo; If we have to make an exception once, chances are good that we&#39;ll have to make it again later, and then we&#39;ll have two exceptions instead of one reusable pattern.</p>

<p>Please note that one ramification of decoupling our styles is that we may have to use multiple classes on a single element, making Internet Explorer 6 tricky to work with. In general, we should be able to work around this issue by adding concurrent classes onto different elements. But if we <em>must</em> support IE6 with multi-class selectors, we have two options: we can either use a Javascript fix (like <a href="http://code.google.com/p/ie7-js/"><span>IE7-JS</span></a>), or we can create &quot;joining classes&quot; which join together the styles from the individual declarations (e.g. create <code>.one_two</code> instead of using <code>.one.two</code>).</p>

<h3>Our Future Selves Will Thank Us</h3>

<p>So to recap: Object-Oriented CSS is a phenomenal approach to writing stylesheets, even if it has very little to do with traditional Object-Oriented-ness. However, one thing that all CSS <em>does</em> have in common with object-oriented programming is inheritance. Always remember to take advantage of inheritance! If we find ourselves wanting to add the same class or style to several elements, we should make sure it is absolutely necessary; we may be able to get the same results by adding the class or style to a parent element.</p>

<p>Taking advantage of inheritance&hellip;creating reusable patterns&hellip;decoupling our styles&hellip;it&#39;s pretty easy to see the benefits of using this approach on an entire site. Gone are the layered styles, on top of styles, on top of styles. Gone is the need to overwrite some previous style to get the effect we need. With a little careful planning, we all can have lean, fast stylesheets that make maintenance a breeze.</p>]]></content:encoded>
            <pubDate>Thu, 24 Feb 2011 16:53:29 GMT</pubDate>
        </item>
        <item>
            <title>Instrumentation and Observability</title>
            <link>http://omniti.com/seeds/instrumentation-and-observability</link>
            <guid>http://omniti.com/seeds/instrumentation-and-observability</guid>
            <description><![CDATA[

There has been considerable momentum established behind a movement called devops. This momentum is good.  There does not appear to be anyone coming out and saying "this whole devops movement is bad and ignorant." So, as one can assume with no notable...]]></description>
            <content:encoded><![CDATA[<img src="http://s.omniti.net/i/content/seasons/engineer-gears.png" width="450" height="250" alt="Gears and stuff"/>

<p>There has been considerable momentum established behind a movement called devops. This momentum is good.  There does not appear to be anyone coming out and saying "this whole devops movement is bad and ignorant." So, as one can assume with no notable adversaries, it stands to reason that the movement is a "good thing."</p>

<p>The devops movement is often thought of as an effort to bring the operations world into the development (software engineering) world. Statements like "A into B" are vague.  Let&#8217;s be clear on the concept: introduce the wisdom and experience from software engineering into the operations realm.  The software engineering world, over its brief history, has established many excellent paradigms including testing, version control, release management, quality control, quality assurance and code review (just to name a few).  These concepts, while they exist in good operations groups, are admittedly far less formalized and could stand some rigor.</p>

<p>So, while one might think we&#8217;d discuss the merits of software engineering principles in operations in this seed, we&#8217;re happy to disappoint. There are plenty of people talking about this already; they are making excellent points and getting their points across.</p>

<p>We&#8217;re here to speak to the other side of the coin. This is Theo at <a href="http://www.devopsdays.org/2010-us/"><span>DevOps Days 2010-US</span></a>, hosted by <a href="http://www.linkedin.com"><span>LinkedIn</span></a> in Mountain View, California:</p>

<video id="video_player" style="margin-bottom:20px;" src="http://images.omniti.net/omniti.com/video/media-assets/infrastructureascode-opt-1.mp4" controls=""  name="media" height="338" width="450"></video>

<p>Operations is not, and has never been, a janitorial service.  Operations crews are responsible for the impossible: it must be up and functioning all the time.  This is an expectation that one can never exceed.  One can argue that we establish SLAs (service level agreements) to bring these expectations within reason, but SLAs are legal terms that articulate allowable downtime, not desired down time.  Users want services available all the time.  As a result, operations is faced with an impossible task and, amazingly, makes good on unpromised availability more often than not.  Let&#8217;s talk about the <em>not</em>.</p>

<p>Operations is, by definition, the group that operates things.  These "things" encompass the entire technology stack: networking and systems hardware, operating systems, COTS (commercial, off-the-shelf) and open source application software and in-house tools.  Consider the following statement, which seems obvious, but is commonly overlooked: "It is easy to operate software and hardware that is operable."  Many common components in the information technology stack are simply inoperable by our definition.  This is where we get into the meat of things: how does one define operable?</p>

<p>Defining a component as operable is quite simple.  Inevitably, things go wrong.  When things go wrong, they must be understood to be repaired.  Troubleshooting is a zetetic process. To progress, one must ask questions.  These questions must be answered.  This should be plain and obvious to anyone who has ever experienced an unexpected outcome to a situation (technical or not).  So, why is this complicated? To be effective, one must not change the situation during the course of the question. This caveat is where things get complicated and fortunes are made.</p>

<p>To observe a situation without changing it is the ultimate achievement. While Heisenberg believed this to be impossible (and we agree), one can achieve a reasonably small disturbance during observation. An excellent example is the classic philosophical question, "If a tree falls and no one is there to hear it, does it make a sound?"  Let&#8217;s think about that question for a moment to better understand impact and side-effect.  Is the sound of the tree falling more or less likely to affect the overall situation than the actual destruction and subsequent felling of the tree?  The problem with many observation systems is that, in order to observe the sound of the tree, they must hew a tree during every instance of observation. We suggest a different approach.</p>

<p>Many systems have critical metrics, which are diverse and specific to the business in question.  For the purposes of this discussion, consider a system where advertisements are shown.  We, of course, track every advertisement displayed in the system and that information is available for query.  Herein the problem lies.  Most systems put that information in a data store that is designed to answer marketing-oriented information: who clicked on what, what was shown where, etc.  Answering the question, "How many were shown?" is possible but is not particularly efficient.  In order to answer the question, one must hew the tree and wait to hear the sound of its fall.</p>

<p>Instead of asking analytic questions, applications should expose this information as a consequence of normal behavior.  Just as the sound of the tree falling is a natural consequence of the act of hewing, the ad serving system is responsible for tallying the total impressions and exposing that information to those that care.  No significant work need be performed by the application to answer this question, just a pre-calculated response to a simple question. This enables a new way of application observation where witnessing metrics and their changes requires no substantial work by the application.  This paves the way to new types of application monitors (for example high-frequency monitors) that need not worry about altering the situation by observing it.</p>

<p>Not all questions can be asked before a problem occurs.  This is where observation ends and instrumentation begins.  Instrumenting code allows new questions to be asked and subsequently answered in a running environment.  A system admin or developer may look at a malfunctioning system and think, "How do I recreate this situation in a test environment?"  The reason we ask that question is because debugging in production is taboo. If a developer instruments code well, profound knowledge of the problem may be derived without the risk of altering its state.  <a href="http://dtrace.org"><span>DTrace</span></a> is the king of these systems and its adoption across various operating environments is growing.  Nevertheless, no one should argue that they should throw in the towel just because they don&#8217;t have DTrace available to them.  While powerful instrumentation might elude those without DTrace, we&#8217;ve found that we can get most of the way there with careful logging (a poor-man&#8217;s instrumentation) and continuously exploring critical metrics to expose for observation.</p>

<p>Many architectural components today provide an HTTP interface, primarily via a REST API.  Use it! Extend the HTTP server to expose critical component metrics via HTTP.  Use JSON, or use the <a href="https://labs.omniti.com/resmon/trunk/resources/resmon.dtd"><span>Resmon XML DTD</span></a>.  In Java, expose metrics via a Bean accessible via JMX. This can be a bit frustrating because Java-centric tools must be used to observe it, so instead, just expose those metrics via a servlet. There is even some free code for that: <a href="http://labs.omniti.com/labs/reconnoiter/browser/trunk/src/java/com/omniti/jezebel"><span>Resmon Java Servlet</span></a> (see Resmon.java and ResmonResult.java).  Exposed metrics can be tracked, trended and alerted on easily using tools like <a href="http://labs.omniti.com/labs/reconnoiter"><span>Reconnoiter</span></a> or <a href="http://circonus.com/"><span>Circonus</span></a>.</p>

<p>Making applications operable means that never again should operations personnel be stuck on the question, "The application appears hung, I wonder what it is doing?" All production code should be prepared to answer questions such as these at any time. "What are you doing?" and "How long is it taking?" are perfectly reasonable questions to ask of any piece of production code and you should demand a prompt and accurate answer.  The resulting metric data is consumable by both dev and ops teams, and even by those teams&#8217; managers.  After all, trending metrics is not just about detecting problems.  It is also fundamental to quantifying success.  This is what it means to be operable.  Software engineers everywhere, please make your software operable!</p>]]></content:encoded>
            <pubDate>Wed, 08 Sep 2010 20:30:52 GMT</pubDate>
        </item>
        <item>
            <title>Fast by default?</title>
            <link>http://omniti.com/seeds/fast-by-default</link>
            <guid>http://omniti.com/seeds/fast-by-default</guid>
            <description><![CDATA[ I recently attended the O&#8217;Reilly Velocity 2010 conference in Santa Clara, CA. For the past two years this conference attracted some of the smartest minds in web performance and web operations; this year did not disappoint.


 ~ James Duncan Davi...]]></description>
            <content:encoded><![CDATA[ <p>I recently attended the O&#8217;Reilly <a href="http://en.oreilly.com/velocity2010"><span>Velocity 2010</span></a> conference in Santa Clara, CA. For the past two years this conference attracted some of the smartest minds in web performance and web operations; this year did not disappoint.</p>

<img src="http://images.omniti.net/omniti.com/i/b/circonus-at-velocity.jpg" alt="Velocity 2010 Exhibit Floor"/>
<p class="cite" style="text-align:right;font-size:0.75em"> ~ James Duncan Davidson (<a href="http://www.flickr.com/photos/oreillyconf/4729116486/"><span><span>original</span></span></a>)

<p>Several exciting things debuted including <a href="http://opscode.com"><span>Opscode</span></a>'s hosted platform, Yahoo!'s <a href="http://hacks.bluesmoon.info/boomerang/doc/"><span>Boomerang</span></a> and our very own <a href="http://circonus.com/"><span>Circonus Enterprise Platform</span></a>.</p>

<p>Each year, the conference has adopted the mantra: "fast by default." This statement, largely applying to the web operations track, is an excellent theme.  The concept is that speed is feature number one and that your success as a online company is intrinsically tied to how users perceive the performance of your online presence.  This is true, the numbers tell us so.</p>

<p>The interesting part about web performance is that user-perceived performance comes from three separate elements: computation done by the service, computation done by the user and act of getting data between the two.  Velocity really focuses on the latter two: how do I optimize how content is delivered to my users and optimize how it performs once they&#8217;ve got it? This perspective is incomplete.  Should Velocity change to address all three elements? I say no. The audiences are different, the problems are different and there is no need to mess with a good thing. <a href="http://omniti.com/surge"><span>Surge</span></a>, on the other hand, only concentrates on the first element: server side performance and scalability.</p>

<p>Let&#8217;s face it, the server-side architectures that power today&#8217;s web services are as unique as the services they power.  Each site has its own unique challenges that come with its size, technologies, audience, offering and promises. Not to trivialize the web performance challenge, but the techniques used to increased user-perceived importance in transit and on the client side are largely the same from site to site (clean, small and effective DOM, CSS and Javascript, correct caching, image sprites, HTTP compression, etc.).  However, on the server side is where the unique magic happens.</p>

<p>Do you really think that the technology powering Google&#8217;s new Caffeine search indexer could be leveraged easily to help your internal service delivery platform? No. For a user to use your service, in an over-simplified form, they provide some input and receive from output.  Each time they ask a question and expect a result, you must "do some work."  Herein lies the challenge.</p>

<p>In a previous installment "<a href="http://omniti.com/seeds/yslow-to-yfast-in-45-minutes"><span>YSlow to YFast in 45 Minutes</span></a>", I explored reaping low-hanging fruit to achieve user-perceived speed-ups on this very site. The main effort there was to shorten the event horizon to render by removing, shortening and/or parallelizing various assets on which a page depends.  The obvious, but often ignored, part is that it all starts with a single request: "the page."</p>

<p>On the OmniTI web site, there is very little going on and as such, you&#8217;d expect that very little time is spent on our end "doing work" to give you the page content.  If you <a href="http://omniti.com/i/b/yfast-visit1.png"><span>look at the details</span></a>, you can see that to be true: 44 milliseconds waiting for data to start and 25 additional milliseconds waiting for the data to come down the pipe. This is relatively fast. This is not always the case; in fact, it often is not the case.</p>

<p>I was quite interested in this division of time and asked my helpful friends at <a href="http://keynote.com/"><span>Keynote</span></a> for some aggregate information. That information paints a rather interesting picture.  The average speed of a "web page load" comes in at over 2 seconds.  Obviously, these 2 seconds are split in some fashion amongst our three buckets.  What may be quite surprising is that, on average, 290ms seconds is spent server-side.  I speculate this is due to one of two reasons.  Most commonly, it is due to a lack of attention to how the architecture internally operates resulting in sloppy code and data architecture. To me, this is the better of the two reasons.  The other reason is a focus on "scale-out" with a blatant disregard for a maximum acceptable service time.</p>

<p>One web performance company, who shall remain nameless, actually spends as much as 2 seconds "thinking" before sending data to the client, producing an awful waterfall.  Note that the client-side performance is quite excellent, but still the user waits uncomfortably long.</p>

<a href="http://omniti.com/i/b/site-slow.png"><span><img src="http://omniti.com/i/b/site-slow.png" alt="Waterfall of Painful Initial Asset"></span></a>

<p style="margin-top: 1em">To put this in some perspective, a processor today can operate around 2.9GHz (that&#8217;s 2.9 billion instructions per second). 290ms sans a conservative 90ms of round-trip latency is 200ms of operating time or 580 million CPU instructions. The disturbing part of this is that most of what Keynote monitors is landing pages or specific hot paths, so many other pages on these websites are slower.  We all know that most websites today are more complicated than a single machine serving information, so a direct correlation of service time to CPU cycles is deeply flawed; however, I still believe it is illustrative, useful and compelling.</p>

<p>Furthermore, if your system is spending 200ms servicing a single request, you can do the simple math to find that even on an 8-way system, you can still only serve 40 requests/second. As your demand increases, you must add more and more machines. While provisioning these machines used to be challenging, the cloud has played on general performance-optimization delinquency and made this approach seem acceptable by making massive machine provisioning easy.  I&#8217;m here to tell you it is not acceptable.  Not only is it environmentally wasteful (using power and generating unnecessary heat), it is also wasteful of shareholder investment.  Faster sites running more optimally generate shareholder value.</p>

<p>The pervasive focus on front-end performance is explained by the easy gains that can be seen from relatively little investment.  However, as the numbers show, for most sites this simply isn&#8217;t enough to compete. <a href="http://www.scribd.com/doc/16877317/Shopzillas-Site-Redo-You-Get-What-You-Measure"><span>Shopzilla recently completed a 12 month engineering effort</span></a> to rearchitect their application because the server-side was too slow (pushing 8 seconds).  Now that it is blazingly fast, they have less infrastructure to maintain per dollar of revenue and an increase in revenue of 7-12%.</p>

<p>Attention to internal performance in fundamental to the success of online businesses.   Many of the larger web-based companies have smart people on staff that take performance seriously.  If you need help, this is what we work on for <a href="http://omniti.com/does"><span>our clients</span></a> everyday at <a href="http://omniti.com/"><span>OmniTI</span></a>.</p>]]></content:encoded>
            <pubDate>Tue, 13 Jul 2010 14:00:00 GMT</pubDate>
        </item>
        <item>
            <title>The cloud is great. Stop the hype.</title>
            <link>http://omniti.com/seeds/the-cloud-is-great-stop-the-hype</link>
            <guid>http://omniti.com/seeds/the-cloud-is-great-stop-the-hype</guid>
            <description><![CDATA[Cloud computing isn&#8217;t new, though I&#8217;m sure you&#8217;ve heard more about
it in the last few months than you did previously. The cloud is an
amazing thing, but one that is poorly understood. I believe this lack
of understanding stems from te...]]></description>
            <content:encoded><![CDATA[<p>Cloud computing isn&#8217;t new, though I&#8217;m sure you&#8217;ve heard more about
it in the last few months than you did previously. The cloud is an
amazing thing, but one that is poorly understood. I believe this lack
of understanding stems from technology confusion which is trumpeted by
corporations that have identified "the cloud"
as <a href="http://www.wikinvest.com/concept/Cloud_Computing"><span>a medium
for expansion and profit</span></a>. Don&#8217;t get me wrong, the cloud is useful
&mdash; but I hear some of the dumbest reasons why.</p>

<p>Before I launch my rant, I&#8217;ll qualify that <abbr title="Software as a Service">SaaS</abbr> existed before "the
cloud," yet in many defintions (like the link above) it is considered
a cloud service. I consider the cloud to be <em>only the infrastructure</em> because the software and the platform
has been provided by a third-party successfully before the term
"cloud" arrived. It isn&#8217;t fair to legitimize your concept by
repackaging two successfully proven technologies under your brand.</p>

<h2>The Cloud</h2>

<p>The cloud&#8230; what is it? A cloud is an infrastructure in which I
can provision computing systems. What makes this different from a
rack of servers? Very little, actually. The most important
difference is that provisioning of these systems is made convenient.
When a system is needed, the requester can programmatically start a
new one and needs not be concerned with network infrastructure,
machine specifications, power, cooling, etc. The cloud is built by
someone who cares about all of those things, but then it is packaged
in an easily consumable fashion. How does this happen? Well, this is
where people get confused.</p>

<h3>Virtualization</h3>

<p>This simple provisioning is empowered by some sort of
virtualization technology like <a href="http://www.xen.org/"><span>Xen</span></a>
(likely one of the commercial
implementations), <a href="http://vmware.com/"><span>VMWare</span></a>, <a href="http://www.sun.com/software/solaris/containers/index.jsp"><span>Solaris
Containers</span></a>
(Zones), <a href="http://www.parallels.com/products/pvc45/"><span>Virtuozzo</span></a>/<a href="http://wiki.openvz.org/Main_Page"><span>OpenVZ</span></a>,
etc. Why is this confusing? Beats me, but I see people listing the
advantages of virtualization as advantages of the cloud. As with most
technologies, you inherit the advantages of your foundation.
Virtualization brings a lot to the table, but you don&#8217;t need "the
cloud" to get it. Period.</p>

<p>The concept of private and public clouds is also poorly defined.
Some people hate the two terms, while others define them in useless
ways. I&#8217;ll define them in a very practical way in which the
differences have deep business meaning.</p>

<h3>Public Clouds</h3>

<p>The public cloud is Amazon&#8217;s EC2 and other similar "cloud
providers" where the owner of the underlying physical infrastructure
and the owner of the services running on the provisioned systems are
not the same. In this environment, your services run on someone
else&#8217;s equipment. What does this mean?</p>

<p>If they don&#8217;t pay their bills, the equipment can be seized. Other
companies may be running virtual environments on the same hardware,
same disks, same network. This means bugs in virtualization and
data isolation could result in information disclosure &mdash; the
really bad kind. At this point in time, I can&#8217;t envision a way to
make public cloud
infrastructure <a href="https://www.pcisecuritystandards.org/security_standards/pci_dss.shtml"><span>PCI-DSS</span></a>
compliant &mdash; and even if you could, I believe it increases the
possibility of compromise.</p>

<p>No virtualization is perfect (yet) in resource provisioning. This
means that defining a reliable performance expectation for a node in
the cloud can be very challenging.</p>

<p>It&#8217;s not all negative though. Because public clouds are popular,
they tend to have ample resources, which means more room for growth,
and a provisioning request is "less likely" to result in an message
that says "better luck next time, I&#8217;m flat out of horses."</p>

<h3>Private Clouds</h3>

<p>Private clouds are not shared. A private cloud is deployed by an
organization that wants the benefits of a cloud, but wants the
processes and premise controls over the infrastructure that powers it.
The key differences between a private cloud and public one are control
and size.</p>

<p>In a private cloud, you have fine-grained control over geographic
location. This can be important for meeting data availability and/or
redundancy guarantees made to clients. It can also be useful for
ensuring that at least part of your infrastructure is in a country
whose laws more closely align with your business needs.</p>

<p>There are clearly enormous advantages in the private cloud, in that
data security exists and the design and operation of the private cloud
can be congruent with business requirements providing more aligned
availability and consistent performance. The downside is that it is
likely to have more limited resources &mdash; provisioning 1000 new
instances is far more likely to result in a failure due to
insufficient resources.</p>

<p>So how big is big? In my experience, when you hit a run rate of 40
instances, build yourself a private cloud. That&#8217;s the point at which
it becomes undeniably cheaper.</p>

<h3>Resources</h3>

<p>One distinct and measurable difference between how private and
public clouds can be run is seen in the choice of virtualization
technology. Public clouds, by their nature, must isolate resources
between customers as extensively as possible to achieve acceptable
quality of service. There is no trust or cooperation between
virtualized customers.</p>

<p>No virtualization technology does this perfectly, but some do a
better job than others. Xen-based, and VMware-like solutions are some of the
more capable in this arena. Because both implementations run
completely separate operating system environments from a hypervisor,
they tend to segregate the guests more thoroughly by sharing less
resources.</p>

<p>This is good for guests, but bad for resource utilization. If I
need as much as 16GB of RAM for my instance and I&#8217;d like to run 8 of
them, that means I need 128GB of RAM in my host machine &mdash; that&#8217;s
an expensive box. On the other hand, if I need very little RAM (say
256MB on average of which 128MB is kernel and OS related processes)
the hypervised virtualization becomes quite bulky.</p>

<p>On the other side of the virtualization field are technologies like
OpenVZ and Solaris Containers (a.k.a. Zones). These technologies
share a kernel (and usually a filesystem buffer cache) across
guests. CPU resources can be sliced up, but memory (as it is shared)
is a challenge to dedicate cleanly to individual guests. While this
is clearly a bad (or at least challenging) thing for public cloud
providers, it is often completely acceptable for private cloud
needs.</p>

<p>The advantage of this "lightweight" virtualization is that you can
pack more guests onto a single host. We regularly run 40 Solaris
Zones on a single commodity server without issue. It is particularly
useful for applications that are low-powered, but in need of multiple
instances to meet their availability commitments.</p>

<h2>Burning the Straw Man</h2>

<p>Now that we know what clouds are, what&#8217;s the problem? The hype. The
hype is the problem. With hype come straw man arguments that delay or
hold back the healthy evolution and incorporation of this
technological paradigm.</p>

<h3>Argument 1</h3>

<blockquote><p>I need the cloud. In the cloud, if I need to deploy 50
machines, I can just do it. Without the cloud, I have to buy servers
and wait weeks for install and spend hours <span class="end-quote">installing
them.</span></p></blockquote>

<p>Deploying 50 new instances in a cloud is easier than 50 new
physical machines. But just because you can, doesn&#8217;t mean you should.
If it takes hours to install new machines, then you are doing your job
wrong. If it takes weeks to get your machines, then you are using the
wrong vendor. And <em>most importantly</em> if you suddenly realize
that you need 50 new machines, then you simply didn&#8217;t do your job
well. The cloud is not an excuse to avoid a business model. A
business model includes a budget and a solid, implementable plan for
growth based on thorough capacity planning. With that, you should see
it coming.</p>

<p>There are two reasons I hear when people justify the need to deploy
a large number of new machines, and both arguments fall apart when you
take a closer look.</p>

<h4>Argument 1a</h4>

<blockquote><p>Holy cow! Look at that traffic! I need fifty new instances. <span class="end-quote">Now!</span></p></blockquote>

<p><a href="http://omniti.com/seeds/dissecting-todays-internet-traffic-spikes"><span>I
know a bit about sudden traffic spikes.</span></a> If you need 50 machines suddenly to
handle a traffic spike, then, in all likelihood, you have built something wrong
and no amount of
provisioning will help. I&#8217;ve had the privilege of working with some
of the largest sites on the planet. I&#8217;ve seen traffic spikes of
10000% happen inside 30 seconds, but then again I&#8217;ve also seen more
than a gigabit of production traffic served to the masses off two $3k
USD boxes. If you are in that situation, you need a plan &mdash; and
it likely shouldn&#8217;t include "Oh shit! Bring 50 more instances
online!"</p>

<p>If you are providing a service that is unavoidably computationally
intensive, you actually have a solid argument. This is rare and I&#8217;ll
touch on that later.</p>

<h4>Argument 1b</h4>

<blockquote><p>I have a lot of developers and they each need their own
instance <span class="end-quote">quickly and easily.</span></p></blockquote>

<p>This is actually an awesome argument for the cloud. However, since
these are development instances, they don&#8217;t consume resources in the
same way that production instances do. We give out instances like
candy at OmniTI and typically can sustain about 40 instances on a
single $3k USD box using lightweight virtualization. CapEx and OpEx on
that are basically non-existent compared to an EC2 bill for the
same. As you can see, this is an argument for virtualization, not the
cloud.</p>

<h3>Argument 2</h3>

<blockquote><p>I want to use the cloud because that way I don&#8217;t have
to worry about networking and hardware <span class="end-quote">management.</span></p></blockquote>

<p>Network management has to happen. Hardware management has to
happen. You pay for it one way or you pay for it another. I&#8217;ve heard
people say that it takes countless hours per month to run 40 systems
including servers, switching equipment, routing, firewalls, etc. We
<a href="http://omniti.com/does/architecture-and-infrastructure"><span>manage
around 1000 servers at OmniTI</span></a> and from our immaculately maintained
time tracking system I can tell you that less than 35 hours per month
are spent on hardware provisioning, systems installation and concerns
of space/power/cooling. That comes out to about 2 minutes per machine
per month. Furthermore, I don&#8217;t have any reason to believe
that a cloud provider can do a significantly better job.</p>

<p>So, if so little time is spent on hardware and infrastructure
management, why does OmniTI have a busy ops team? Because
we&#8217;re doing all the <em>other</em> stuff. Configuring software,
performance tuning, and monitoring systems; monitoring systems to an
egregious and offensive level. I&#8217;m not speaking of CPU temperature
and disk failures (everyone monitors those). I&#8217;m talking about
realized I/O ops per spindle, network packets per interface, HTTP
response times, SSH keys, ICMP response latency, DNS, database health,
application-level correctness and, most importantly, business level
metrics. If you find this intimidating, look
at <a href="http://circonus.com"><span>Circonus</span></a> as an enablement
platform. If you like the cloud and/or SaaS, you&#8217;ll love this
service.</p>

<p>The operations team is the one place with access to data and
traffic that is "real-time enough" to detect business issues before
they manifest in significant monetary loss. Traffic anomalies,
chargeback rates, visitor retention&#8230; all these translate into money.
This is what ops does; they make things work; they make the business
work. And they spend a lot more time trending, investigating
and analyzing than they do replacing hard drives and network
cards.</p>

<h3>Argument 3</h3>

<blockquote><p>I can provision quickly in <span class="end-quote">the cloud.</span></p></blockquote>

<p>Yes. Yes you can. This is due to virtualization, not the cloud.
Download a virtualization technology and provision quickly outside the
cloud. I suppose that if my OS natively supports Virtualization (like
all modern OSs do), and my operations team leverages that to deploy
new instances quickly and easily, then we&#8217;ve created a cloud whether
we like it or not. Damn terminology. While it is now called a
"private cloud," I tend to just call it infrastructure operations.</p>

<h3>Argument 4</h3>

<blockquote><p>Operating in the cloud makes your environment more
resilient because you have to accommodate <span class="end-quote">unexpected
failures.</span></p></blockquote>

<p>What? This has to be the most back-assward statement I&#8217;ve heard on
cloud computing. Eagerly adopting an environment with a higher
failure rate because it forces you to be a better engineer? Well,
that&#8217;s not an engineer I&#8217;d hire. Good engineers have always known
that things can fail and have always had to design to accommodate that
truth &mdash; incessant reinforcement by some public cloud providers
is unwelcome and unneeded in this case. Assuming a well engineered
system (which should be an expected outcome of any engineering group)
the goal should always be to minimize the likelihood of failure within
budget.</p>

<h2>What the Cloud Lacks</h2>

<p>In addition to dismantling poorly constructed arguments for the
cloud, I thought I&#8217;d detail some of the things I find completely
missing in the cloud.</p>

<p>Generalization is the root of all evil when it comes to
performance. Just because you know how to use MySQL or PostgreSQL
doesn&#8217;t mean it is the right tool for every data storage need. People
have learned this lesson fairly well. In cloud infrastructures, there
is a goal to make systems alike to improve price points for capital
expenditure, reduce operation expenditure (slightly) by learning one
type of system well, and make the provisioning system simplistic. This
leads to the abomination that is "small," "large," and "huge" instance
sizes at some cloud providers.</p>

<p>As an engineer, when I have to build a system for a purpose I
specify as much as possible. AMD vs. Intel vs. Sparc? How many gigs
of RAM? What <em>speed</em> should the the RAM be? How much storage
do I need? How many I/O operations per second are required? Should I
use SSDs? How many networks must the system be on? Should we be using
link aggregation or not? VLANs? No VLANs? These are all important
things. If you need these things sometimes and everything has to be
the same, then you get these things all the time &mdash; paying for it
when you don&#8217;t need it.</p>

<p>It is a reality that when systems are specified, compromises are
made due to vendor relationships and part availability. However, the
requirements that drive these specifications still exist and are at
the root of the decisions: for instance I need 16GB of non disk-buffer
memory for working sets and 10,000 I/O operations per second. That
simply doesn&#8217;t translate to three cookie-cutter sizes.</p>

<p>Data is a big issue. There are a lot of companies out there working
on solving the data security issues that exist in public clouds
&mdash; let&#8217;s assume for a second that this is no longer an issue. A
follow-on issue is that the cloud is "out there" and the only way to
get data into and out of it is via the drinking-straw that is its
uplink. Drinking-straw you ask? Yes. The internet is, even today, not
as fast as a tractor-trailer full of tapes. If I have 10 TB of data
(which is extremely reasonable for any business intelligence system
these days), how do I back it up? I need a copy of that data off-site
and secure. We have some creative solutions around this using ZFS, but
still &mdash; I am contractually obligated to have my tapes (or some
other off-site and <em>off-line</em> storage medium). Private clouds
do not have this issue.</p>

<h3>Scaling Out or Scaling Up</h3>

<p>So many people talk about scaling out. Scaling out. Scaling out.
Scaling out. Scaling out is an excellent approach to tackling
requirements that cannot be easily accomplished on today&#8217;s hardware.
Not everything needs to be scaled out. I hear people say "I&#8217;m going to
have millions of records, I need to make sure my design can operate on
many machines." Millions? You&#8217;re going to go through the effort of
tackling distributed systems problems for a million rows? You have
priority issues. A single machine (with failover) is enough to do most
jobs. People lose sight of this too often. Making things redundant
(hot failover) is a lot easier than making them actively
distributed. So, if you can get away with scaling something
vertically, do it.</p>

<p>There are many cases where the growth of a specific system
component simply outpaces the availability of reasonably priced
hardware to scale it vertically. In these cases, you should make your
problems smaller. (You&#8217;d be surprised what can be accomplished over
beers with an expert in the field). If that fails, then you roll up
your sleeves and design your system to scale horizontally. Very few
systems require horizontal scalability from soup to nuts.</p>

<h2>Where It Works</h2>

<p>I said before that if you need to spin up 50 instances you clearly
didn&#8217;t do a good job planning. I&#8217;ll recant that and better qualify
where that is acceptable. That is acceptable when that is your well
thought-out plan. When would you need to spin up 50 new instances?
Let&#8217;s say you need to transcode a ton of video, let&#8217;s say you need to
sequence some DNA, let&#8217;s say you need to use a lot of computational
resources for a brief period of time and that is essential to your
business model. This is where the cloud shines like a super-star.</p>

<p>For computationally intensive tasks that are irregular, the idea of
batching work into a cloud of compute nodes is an excellent one.
Here, the advantages are clear. Given that each job can really gobble
up CPU resources, you can&#8217;t leverage the consolidation that
virtualization offers. At this point, the disadvantages are purely
the outcome of an equation of economics. How much does a CPU-second
cost and how much does it cost me to move the input for my job into
the cloud and extract the output from the cloud: instance costs and
bandwidth costs.</p>

<h2>The Honest Truth</h2>

<p>While it may appear that I hate the cloud, it simply isn&#8217;t so. I
hate the half-baked arguments for it. I hate the hype. It is a
perfectly legitimate tool in the already large arsenal of engineering
tools. Use the cloud where it makes sense, but please stop bludgeoning
me with the hype.</p>]]></content:encoded>
            <pubDate>Tue, 23 Mar 2010 00:00:00 GMT</pubDate>
        </item>
        <item>
            <title>Online Application Deployment: Reducing Risk</title>
            <link>http://omniti.com/seeds/online-application-deployment-reducing-risk</link>
            <guid>http://omniti.com/seeds/online-application-deployment-reducing-risk</guid>
            <description><![CDATA[Version control systems are nothing new to the world of software
development.  I&#8217;ll take the time now to unapologetically call you
an idiot if you don&#8217;t already have all your code and configurations in
a version control system. Once you sta...]]></description>
            <content:encoded><![CDATA[<p>Version control systems are nothing new to the world of software
development.  I&#8217;ll take the time now to unapologetically call you
an idiot if you don&#8217;t already have all your code and configurations in
a version control system. Once you start using version control, there
are several approaches available and, interestingly, online
applications work turns out to be profoundly different than
shrink-wrapped software.</p>

<h2>Traditional Software Development</h2>

<p>With shrink-wrapped software, you have features and fixes that are
integrated into the product and effectively queued up into what is
called a release.  Development of features is performed in version
control on what is commonly called a "branch" that allows isolation of
the developed feature until it is in an acceptable state to be
"integrated" back into a main line of development (also a branch) that
is used for integration testing.  Eventually, the features are merged
into a release branch and find their way to clients.  Bug fixes and
security related issues are addressed in a similar fashion (sometime
backwards in the process to fast-track their release to consumers of
the product).  This is a fly-by, over-simplified description of the
typical software development life-cycle.</p>

<p>A lot of people believe that how one manages to a release can
profoundly affect the product; two common strategies being "agile" and
"waterfall."  I&#8217;ll argue that both are valid, both have their place,
and both work in traditional software development.  The end goal is
the same: ship a quality product within the bounds of expectations set
by product management with the clients.  Typically, product releases
are made available to clients on regular intervals.  I&#8217;ve commonly
seen three, six, twelve and even 18 month release cycles.  Bug fixes,
security updates, patches, hot-fixes (they have many names) are
released more frequently (monthly, or for problematic products
weekly).  The client is responsible for upgrading their systems and,
if either feature or fix releases happen too frequently, the process
can become overly burdensome.</p>

<p>I&#8217;ll come out and make a rather unconventional claim: the approach
described thus far only works well when the number of clients using
the software is larger than one.  The larger the user-base, the better
this model works.  It might seem at first that large web sites that
have millions of users would be able to use this model to develop
their service, but now we&#8217;ve just exposed the crux of the issue.
Millions of users use their service, not their product. In fact, in
most cases, the only user of the actual software product is that
single web site. This alone shakes the foundation of traditional
development paradigms. Online environments have many parameters that
make this approach untenable.</p>

<h2>Get it on(line)</h2>

<p>Developing software for an online service, and developing
traditional software, have some fundamental differences.</p>

<p>In the online world, a software product drives a service to which
users have access. There is, most often, a single copy of the actual
software product in use. There is one consumer of the software:
you. The users are consumers of the service built atop the
software.</p>

<p>Most online services have thousands, if not millions, of users, and
as such the tolerance for disruptive upgrades is reduced (often
eliminated). You are forced into an environment where each production
upgrade happens only once, there are no practice runs and it simply
has to work.</p>

<p>In a traditional software model, new features can be distributed to
clients that are less risk-averse as a part of an early-adopter
program (a.k.a. beta program or tech preview program).  This approach
allows varied real-world tests of the new features so that when they
are made generally available in the product, the confidence in their
correctness and performance is sufficiently high. This simply doesn&#8217;t
work when the software you write for your service is only used by one
client.</p>

<p>Perhaps most challenging is the pace at which competition moves.
In the online world, I can have an idea this morning, an
implementation this afternoon and every client of my service that
shows up tomorrow will see it.  In fact, things can and do happen much
faster than that.  You might think that rapid concept-to-availability
push is reckless.  You might be right. But, your competition is doing
it.</p>

<p>The question is, how to you maintain a competitive pace and conquer
all these challenges, when the odds are stacked against you? The real
problem here is that the traditional software model bundles many
changes into a release and even the tiniest mistake can result in a
failure of the entire release (one mistake can break the whole
product). Each change should always be accompanied with a reversion
plan. Sometimes those plans are as easy as redeploying the product
sans the change, sometimes they are more involved. When hundreds of
changes are combined into a single release, the reversion to a
previous release becomes the intricate mess of hundreds of change
reversion procedures. When posed in these terms, the answer becomes a
bit more clear.</p>

<p>Each change could contain a mistake that could cripple the product.
However, if we make each change its own release, then the failure is
isolated to a micro-release that can be reverted with much less
disruption.</p>

<p>This leads to the very controversial technique of "deploying from
trunk."  Trunk (or HEAD or tip) is a version control term describing
the bleeding edge of the product. As people work on fixing regressions
and other bugs, as well as add new features into the product, they are
adding, modifying and removing code and configuration from version
control. If these changes are applied continuously and micro-releases
are done continuously, when the inevitable mistake occurs the
reversion process is isolated and prevents rollback casualties.</p>

<p>What&#8217;s a rollback casualty? If I make change A and you make change
B and they make their way into a single release, we have a casualty if
either (but not both) change has a bug requiring reversion. Due to my
mistake in A, we need to downgrade the product to the previous release
inducing a rollback of your perfectly functioning work on B. What&#8217;s
worse is that you could have put a lot of work into B ensuring that it
was done perfectly because you know that rolling it back would be
painful, but I knew that rolling back A would not be disruptive so I
was much less careful. This is just a nasty mess all around.</p>

<p>Big changes are scary, there&#8217;s a lot to test and a lot to plan. By
making micro-releases you amortize the risk by investing in deployment
efforts in a highly granular fashion.</p>

<p>So the real question is: How do you make this safe? Online
applications are not just a piece of code being run. They consist of
many moving parts that each change (often independently), but all
depend upon each other for correct operation; this is what makes
rolling back certain failed deployments so challenging. It might be challenging, but success is sweet: <a href="http://docs.google.com/viewer?a=v&q=cache:M3l1zbSSaRkJ:qconlondon.com/london-2008/file%3Fpath%3D/qcon-london-2008/slides/RandyShoup_eBaysArchitecturalPrinciples.pdf+ebay+"wired+off"&hl=en&gl=us&pid=bl&srcid=ADGEESiS2p2wu7dq6DMLDdaX0wqQtSXFRiDRUiWVJ8awjF3V4tm5pch8g5YKIOaIu675YRNrn0HtYxOzvfc82SKJwsY8uvmRTE8z_1MywgSCcB2FQM2VxXhIs2lCV9cF9bJ_ZXeEUaDd&sig=AHIEtbSXi5BdmtnIMI-fcx9RhwkNsKRpFg"><span>eBay</span></a>, <a href="http://codeascraft.etsy.com/category/operations/"><span><span>Etsy</span></span></a>, and <a href="http://code.flickr.com/blog/2009/12/02/flipping-out/"><span>flickr</span></a>.  It&#8217;s a tricky
balance that combines various philosophies:</p>

<ul>
 <li>"devops": engineering and operations are married and need to
     collaborate</li>
 <li>micro-releases: releases must never get too large, instead amortize
     risk with small, controlled releases</li>
 <li>dark launching features: building the feature out over time in a
     deployed and operational form to be simply "turned on" when
     properly qualified</li>
 <li>wired off: the approach that features should have on/off
     switches to provide an alternative to rolling back
     deployments</li>
 <li>fail forward: when things go wrong, have a solid plan to work
     forward to success (within your SLAs) instead of rolling back and
     trying again later.</li>
</ul>

<p>Each of these techniques require their own in depth despcription,
so we&#8217;ll leave that for future Seeds articles.  For now, just consider
that a traditional software engineering mindset can put you at a
desperate disadvantage in the world of online software
engineering.</p>]]></content:encoded>
            <pubDate>Wed, 17 Mar 2010 13:30:08 GMT</pubDate>
        </item>
        <item>
            <title>Marketing Malware</title>
            <link>http://omniti.com/seeds/marketing-malware</link>
            <guid>http://omniti.com/seeds/marketing-malware</guid>
            <description><![CDATA[
Internet registrar GoDaddy.com is notorious for two things: domain names and risque super bowl commercials. The infamy began in 2005, when GoDaddy paid $4.8 Million for two 30 second spots during Superbowl XXXIX. The commercial featured WWE wrestler C...]]></description>
            <content:encoded><![CDATA[<p class="first">
Internet registrar GoDaddy.com is notorious for two things: domain names and risque super bowl commercials. The infamy began in 2005, when GoDaddy paid $4.8 Million for two 30 second spots during Superbowl XXXIX. The commercial featured WWE wrestler Candice Michelle experiencing a &#8220;Wardrobe Malfunction,&#8221; a clear parody of the half-time show fiasco involving Janet Jackson the previous year. After having 16 variations of the storyboard filmed and rejected by the Fox Network, version number 17 was pre-approved for broadcast during the first and fourth quarters of the game. The 30 second spot in the first quarter drove a web site traffic increase of 1,600% to the GoDaddy site, and then a strange thing happened: the commercial never played a second time. NFL executives purportedly pressured Fox to pull the commercial due to its &#8220;inappropriate&#8221; nature, despite the fact that it had already been paid for, pre-approved by Fox, and initially aired. Led by GoDaddy CEO Bob Parsons, the blogosphere screamed censorship like bloody murder, which only served to fuel additional publicity. In the end, GoDaddy deemed the event so successful that they now define their brand around the &#8220;GoDaddy Girls,&#8221; airing annual Super Bowl commercials that tiptoe along the edge of broadcast acceptability.
</p>
<p>
At the center of the GoDaddy controversy is the correlation between advertising and brand identity. As a content provider, the Fox Network understands that their brand will be held responsible for the quality of both the program content and the advertising they deliver. If either aspect of the broadcast is sufficiently offensive or inept, they risk losing viewers to other stations. Consequently, Fox must provide content that is tame enough to avoid outrage from extremely conservative viewers while remaining provocative enough to satisfy the desire of third party companies seeking to push the envelope with their ad strategy. Content providers in any medium will be judged by the quality of the content they provide, and that includes the use and placement of advertising. This can be a tough balancing act, and Fox isn&#8217;t alone in walking the tightrope.
</p>
<p>
When it comes to the Web, the importance of managing advertising content takes on a new dimension. Hackers are increasingly using fraud and social engineering tactics to infiltrate advertising networks, and then utilizing their position within this circle of trust to inject malware, browser redirects, and cross site scripting attacks on unsuspecting visitors. If these attacks are successfully executed, hackers may steal credit cards, social security numbers, banking information, personal photos and anything else that has been digitized on the victim&#8217;s computer. Alternatively, a visitor&#8217;s computer may be turned into one of many &#8220;sleeper cell agents&#8221; in a botnet, ready to respond to a few keystrokes at any time and become an active participant in a worldwide Internet attack. It isn&#8217;t just web site visitors who are vulnerable. This same strategy can be used by black hat marketing consultants to siphon traffic from one web site to a competing web site or even to blacklist an entire site from the Google search index. The worst part? Hackers are able to target their attacks with profound granularity, making it extremely difficult for anyone within the targeted organization to know the attack is even happening.
</p>
<p>
The risk posed by vulnerable advertising mediums is not merely theoretical. In 2009, I documented two separate exploits that successfully penetrated highly trafficked and popular web sites. Both sites had full time IT teams who were previously unaware that the exploit was occurring. Furthermore, in a now highly publicized <a href="http://www.nytimes.com/2009/09/15/technology/internet/15adco.html?_r=2"><span>event</span></a> in September of 2009, the New York Times was the victim of a malware advertiser who legitimately purchased ad space from the Times while pretending to be a representative of Vonage. IT departments and technical staff are trained to watch site visitors for malicious activity, but painfully few are watching the advertisers.
</p>
<h2>Deconstructing the Hack</h2>
<p>
Client-side arbitrary code execution is the primary culprit behind advertising based attacks. Attackers first gain trust with the target by posing as a legitimate advertiser. This process may be as simple as paying for space in an automated advertising system, or as involved as calling a major corporation while posing as a sales rep or marketing executive from another company. After the attacker has been approved as an advertiser, they develop a custom script to exploit the medium being used. They may start off displaying advertising that looks legitimate, but inevitably they switch to the malicious ad that begins to infect or otherwise manipulate site visitors.
</p>
<p>
Depending on the amount of freedom granted to an advertiser, a variety of techniques may be used to deliver the hacker&#8217;s payload. Many setups allow advertisers to automatically submit a combination of HTML, CSS and JavaScript code to be embedded within the layout of the publisher&#8217;s site. In this scenario, hackers can easily embed malicious scripts by using JavaScript or by including an external Flash SWF in the markup. In the off chance that this code is reviewed at all before publication, it is likely reviewed by someone in marketing or sales who is only examining the submission based on the content currently displayed and is unable to analyze the underlying code for potential security vulnerabilities.
</p>
<p>
Allowing advertisers to place custom Javascript or Flash files inline as part of a web site&#8217;s markup is especially dangerous as the advertiser is no longer restricted by cross-domain access policies. This leaves the advertiser with the power to do virtually anything the web site developers can do, such as posting AJAX requests or altering any element of the DOM. In an attempt to prevent this, some web sites have opted for an alternative setup that links an iframe or traditional frame to a server belonging to the advertiser. However, this approach is also flawed because changes to the advertising code may be made at anytime and the publishing site is powerless to implement a pre-approval process or apply any automated content filtering.
</p>
<p>
Regardless of the method utilized, if an attacker gains the ability to execute custom code on the target site, Pandora&#8217;s box has been opened and virtually all the evil of the Internet may be unleashed. A few fictional yet plausible exploit scenarios include the following:
</p>
<h2>You Won (Malware)!</h2>
<p class="first">
Johann is a regular visitor to Finance Magazine&#8217;s web site. Like most Finance Magazine visitors, he is an investor with a diversified portfolio in mutual funds, bonds, and individual stocks. While browsing the latest financial news, a popup branded with the magazine logo suddenly appears and announces that he won a free 2 Year subscription to FM and a chance to win lunch with Warren Buffett. Lunches with Buffett are normally valued in the millions, and Johann has been wanting a print subscription to the magazine for some time. He clicks &#8220;Accept&#8221; and is asked to download the registration form. At first he is a bit suspicious and doesn&#8217;t understand why the registration form is a downloadable .exe file, <em>but it is a company promotional from the official web site</em>, so he reasons it must be okay. After downloading the application, he launches it and is asked several questions about his financial net worth. He is then asked for his full name, phone number, address, e-mail address and social security number. He is again a bit suspicious and doesn&#8217;t understand why he needs to enter his social security number, but the form does looks very professional. He clicks the info button next to the Social Security Number field and is shown this dialogue message:
</p>
<blockquote>
<p>
You must enter your social security number in order to ensure that &#8220;Lunch With Warren&#8221; contest participants are limited to one entry per person.
</p>
</blockquote>
<p>
With visions of the &#8220;Sage of Omaha&#8221; in his head and the promise of the next printed issue of Finance Magazine at his door step, Johann fills out the form and clicks submit.
</p>
<p>
Several months later, Johann&#8217;s financial life is in ruins. His personal information was used to register for several credit cards and a bank loan was issued for a Mercedes SLK in California. The executable file he downloaded included a custom Trojan Horse virus that allowed the attackers to login to his personal machine. They used this access to acquire his banking information and passwords, which they used four months later to wire over $10,000 to an account in the Cayman Islands. Although Johann is now suing for criminal negligence, his life (and his credit) will never be the same.
</p>
<h2>The Black Hat Who Stole Christmas</h2>
<p class="first">
Brianna is a successful small business owner with an online niche retail store that averages $25,000 in sales and 500,000 visitors per month. In addition to a product catalog, her site also contains high quality articles and videos that pull traffic from Google and allow her to further monetize her online presence by selling custom advertising. Advertising is sold on a month-by-month contract basis, and advertising providers are given an account that they paste HTML, CSS, and JavaScript in so it can be included directly in the site markup. Usually, Brianna&#8217;s sales skyrocket for the entire month of December as Christmas approaches. This year, however, she has seen a 60% reduction in traffic and sales are plummeting. After closely analyzing her site analytics, she realizes that traffic from Google has drastically been reduced. She went from being in the top 10 results for her product niche to not appearing anywhere within the top 5 pages, and she can&#8217;t figure out why. What Brianna doesn&#8217;t realize is that she sold an advertising slot for December to a Black Hat SEO optimizer working for a competitor. As part of his overall strategy for propelling his client to the top, he decided to cut the legs out from underneath the competitors by running advertising templates on their sites that carefully and skillfully violated nearly every technical SEO guideline required by Google. Brianna made a few hundred dollars for the advertising space, but unknowingly violating Google&#8217;s SEO rules would cost her tens of thousands of dollars in the year ahead.
</p>
<h2>Defending Against Malicious Advertisers</h2>
<p>Unfortunately, a foolproof technical solution to malicious advertisement doesn&#8217;t exist. However, this fact is not an excuse for apathy. The risk to both organizations and web site visitors can be mitigated by applying these technical steps:
</p>
<ol>
<li>Preview Ad Submission
<p class="first">
Within your advertising process, a system should be setup to preview all advertising before it is published live. This process should minimally include the ability to preview the way an ad looks and functions, but ideally it will also involve a quick scan of the actual advertising code to check for any unusual pieces of content (e.g. a few lines of encrypted JavaScript). If ad provider&#8217;s can automatically update their content, they should be sure that each new version is also approved before publication.
</p>
<p>
This tactic can not be used if ad placement is achieved by embedding frames or iframes that link directly to a third party server.
</p>
</li>
<li>Restrict Dynamic Advertising
<p class="first">
The single most effective method of preventing advertising abuse is to ban your advertisers from executing any dynamic advertising code. This process is as simple as stripping any scripting tags from advertising content and is extremely effective. However, the potency of this tactic comes at the cost of advertising flexibility as it limits all advertising to stylized images and text. In a medium where small games and other interactive content is often used to garner clicks and entertain eyeballs, this is often not a viable tactic. When scripting code is permitted, certain dynamic content should still be banned. For example, anything that involves alert() or document.location() calls in JavaScript could be stripped while leaving other code in place.
</p>
<p>
This tactic can not be used if ad placement is achieved by embedding frames or iframes that link directly to a third party server.
</p>
</li>
<li>Sandbox Dynamic Advertising
<p class="first">
In scenarios where dynamic content is permitted, it is useful to place ads within frames or iframes to take advantage of cross-domain safety restrictions. Yet as we have seen, using these constructs to link directly to a third party server is a security vulnerability as advertisers can easily change the content displayed without providing the publishers an opportunity to review the changes or strip malicious code. To obtain the best of both worlds, create an advertising sandbox by designating a domain or subdomain specifically to serve advertising content. Then place frames or iframes on your main site that link directly to the new domain. Because you control the ad domain, you will be able to preview ad submission, restrict dynamic advertising, and benefit from the added security that cross-domain security restrictions provide.
</p>
</li>
</ol>
<p>
Even when implementing the safeguards above, the decision to grant an advertiser space on your web site should not be taken lightly. Doing so inherently confers a degree of trust upon a third party. Ensure that trust is well placed by implementing a screening policy for all new advertising sign ups. Such a policy could be as simple as calling companies directly to verify that the representative is authorized to sell advertising on their behalf or as involved as requiring all advertisers to provide copies of their business incorporation license or other government issued identification. Regardless, the threat can not be delegated to the IT staff and forgotten. Marketing and sales play an equally important role, and the safest organizations are those who view security as the shared responsibility of all the members within it.
</p>   
<p>
There was a time when good advertising meant entertaining video or amusing copy. It could be judged purely on face value and the ability to generate ROI. That time has passed. In an increasingly interactive world, it is now more important than ever for organizations and individuals to understand that advertising consists of both form and function. Ignoring this fact can result in something far worse than assaulting the sensibilities of your audience; it can devastate their lives. Let content providers and audiences beware: the age of badvertising has begun. 
</p>]]></content:encoded>
            <pubDate>Tue, 22 Dec 2009 22:00:00 GMT</pubDate>
        </item>
        <item>
            <title>Business Metrics Too</title>
            <link>http://omniti.com/seeds/business-metrics-too</link>
            <guid>http://omniti.com/seeds/business-metrics-too</guid>
            <description><![CDATA[When I began tinkering around with web services as a hobby, it was common to fiddle with an application for days.  I would curse and grind and sputter with Apache and hobbled-together programs.  This would frequently unearth new challenges: setting up ...]]></description>
            <content:encoded><![CDATA[<p class="first">When I began tinkering around with web services as a hobby, it was common to fiddle with an application for days.  I would curse and grind and sputter with Apache and hobbled-together programs.  This would frequently unearth new challenges: setting up a mail service, creating a database to store user accounts and perhaps pulling content from a third party.  Inevitably these minor distractions would monopolize my attention and the original application would be left to gather dust, without any documentation or monitoring in place.
</p>
<p>This seems to be a common problem with many professional software development shops.  Project managers help to keep the development teams focused, but their goals are still feature-driven with an eye on the next release cycle.  The IT and Operations teams are painfully undermanned, left to maintain their systems and services without any training on the care and feeding of their new pet.  For the hobbyist or Open Source project, this becomes an annoyance.  If you&#8217;re running a business, operational neglect can have a dire impact on your bottom line.
</p>
<p>Systems Administrators are unconsciously trained to look at everything through a boolean filter.  Hosts are up or down.  Services are on or off.  Their understanding of the application stack is often superficial, limited to the same perspective as that of a typical user.  Does the website load?  Can I ping the servers?  This is a completely logical approach.  And yet, it fails to consider those "corner" cases where activity looks normal, but an internal component suffers an unexpected condition.  Stealthy failures like these can be missed for months and result in significant lost revenue or wasted overhead.
</p>
<p>Monitoring systems have improved over the years with "advanced" features like automatic discovery of hosts and services.  Scanning a network, they can identify hosts and differentiate web servers from workstations.  Resources are grouped logically.  It&#8217;s a very turnkey way to add monitoring to your infrastructure.  Unfortunately, for many companies, this is where the story ends (and the pain begins).  Attentions are subsequently focused elsewhere.  Priorities are reestablished.  One of the most important resources, the one that makes sure everything else is operating smoothly, becomes forgotten and orphaned.  It&#8217;s easy to forget about something that doesn&#8217;t make <a href="http://omniti.com/seeds/what-is-web-operations"><span>your job</span></a> easier or offer intrinsic value to your bottom line.

</p>
<p>A poor economy and high unemployment levels remind us how important it is to optimize our existing architecture.  The current trend towards Cloud Computing and <a href="http://omniti.com/seeds/virtualization-zfs-and-zetaback"><span>Virtualization</span></a> makes this even more challenging.  These technologies are useful for creating highly elastic platforms on a budget, but they complicate engineering by <a href="http://omniti.com/seeds/concepts-of-cloudish-storage"><span>outsourcing data storage</span></a> and processing to an external black box.  In turn, we&#8217;re forced to add resiliency in the form of additional processing nodes and redundant storage.  This added complexity introduces countless opportunities for disaster.  It&#8217;s a vicious cycle.
</p>
<p>As the Web has become the obvious target for fresh product development, additional layers of abstraction are introduced into the application stack.  New technologies and components offer exciting ways to communicate with the end user and from one business to another.  The higher we go, the more these layers are decoupled from traditional monitoring proficiencies.  The resulting programs are overly intricate and opaque.  We need new ways to increase visibility and derive useful data from modern business systems.
</p>
<p>Gaining visibility over business operations is probably the easiest improvement any company can make.  Quality analytics require a solid understanding of your IT operations and business processes, which come from transparency into your systems.  Once these have been established we should be equipped with the tools to streamline and simplify any infrastructure.
</p>
<ol style="font-size: 1.143em;">
<li>Key Performance Indictators
<p class="first">First and foremost, identify the external business metrics that directly affect your revenue.  Establish thresholds and put fault-detection monitors into place, just like you would for any server or application.  Alerts on business operations (e.g. new user registrations, orders per hour) are more important than the systems that support them.  Remember that revenue is an asset, and hardware is a cost.  Not the other way around.
</p>
</li>
<li>Review IT Monitors
<p class="first">Evaluate your existing IT monitoring systems.  Ensure that metrics are being gathered for every single host and service.  The breadth and depth of data collected now will directly influence the quality of the information that can be extracted later on.  It&#8217;s paramount to have the metrics to support your decisions, but you won&#8217;t know which they are until you can juxtapose them later.

</p>
</li>
<li>Stockpile Data
<p class="first">Collect as many metrics as possible, for as long as possible.  There are no good excuses for not storing metrics indefinitely.  Storage is inexpensive, and a variety of technologies allow us to scale capacity with ease.  In three years we should be able to look back on data with as much granularity as the information that was collected just yesterday.
</p>
</li>
<li>Highlight Deficiencies
<p class="first">Graph your metrics.  Study their trends and formulate a plan to address the immediate capacity limitations.  When deploying new resources, look for hints in the trends that reveal hidden relationships in your network.  But remember that this goes beyond planning for the future; this data has inherent value in supporting your ongoing decisions.
</p>
</li>
<li>Build Relationships
<p class="first">Correlate graphs in ways that represent your business systems.  Pinpoint metrics that relate towards a common goal (sales per visit, length of visit, average page size, network latency and webserver load).  You might be shocked at the patterns revealed.  If your trending application doesn&#8217;t allow you to correlate incongruent data easily, find a new one.
</p>
</li>
<li>Empower Stakeholders
<p class="first">Distribute the accumulated knowledge with individuals and teams within your organization that can take action towards positive change.  If possible, give them access to all of the information, not just the data that directly affects them.  For large architectures, there is rarely a single person with a holistic view of the entire stack.  Trust your partners and there&#8217;s a good chance they&#8217;ll unearth something you missed previously.

</p>
</li>
</ol>
<p>Fault-detection and Trending solutions should return more on investment than high uptime or speedy notifications.  They should prepare an organization to increase capacity before limits are reached, realign resources to meet <a href="http://omniti.com/seeds/dissecting-todays-internet-traffic-spikes"><span>unexpected traffic spikes</span></a>, help <a href="http://omniti.com/does/design-and-development"><span>Development & Design teams</span></a> to better understand your customers, and decrease the maintenance and staffing necessary for normal IT operations.  Feedback should be real-time and tuned to the needs of your organization.  In a nutshell, it should pay for itself and then some.
</p>
]]></content:encoded>
            <pubDate>Wed, 16 Dec 2009 16:00:00 GMT</pubDate>
        </item>
        <item>
            <title>Transcending the Medium</title>
            <link>http://omniti.com/seeds/transcending-the-medium</link>
            <guid>http://omniti.com/seeds/transcending-the-medium</guid>
            <description><![CDATA[I grew up in Boyd County, Kentucky, where John Deere tractors are practically an indigenous species. This now famous brand, started by none other than Mr. John Deere himself, began life with one sole product: the steel plow. Fortunately for lawn care c...]]></description>
            <content:encoded><![CDATA[<p>I grew up in Boyd County, Kentucky, where John Deere tractors are practically an indigenous species. This now famous brand, started by none other than Mr. John Deere himself, began life with one sole product: the steel plow. Fortunately for lawn care companies and husbands everywhere, Mr. Deere was a man who understood his market. He understood that his company existed to serve the needs of the agriculture industry, and not purely to cut furrows in soil. Because of his focus on the needs of his consumers, Deere &#38; Co. now manufactures hundreds of products and parts in multiple agricultural categories, perhaps the most iconic of which is the quintessential riding lawn mower. Business leaders would do well to take a page from the Deere &#38; Co. play book, especially when it comes to their technology strategy. If John Deere were alive today, I think he might offer the following advice: place your focus on the consumer, and then find ways for technology to serve them. I&#8217;ll say it even more forcefully: For just a moment, completely forget about the specific technology you&#8217;re using to serve your market. Instead, just think about the lives of those who use it. Why do they use it? What does it help them achieve? Is there a better way for them to reach their goals, even if it means completely stepping outside of the medium you have created? 
</p>
<p>
Consider Facebook, YouTube, and Digg. These are not &#8220;web&#8221; companies. They are, respectively, communications, entertainment, and news companies. If tomorrow a new technology emerged &#40;and inevitably it will&#41; that allowed consumers to better achieve their goals than the services offered by these so-called &#8220;web&#8221; companies, all three organizations would be forced to either embrace that new technology or risk significant loss of market share. The kicker is that they aren&#8217;t alone: what is true for Facebook, YouTube, and Digg is also true for you. 
</p>
<p>
Now for the secret: tomorrow may have come a day early. In April of 2009, Apple announced that the iPhone App Store had surpassed one billion downloads in just 9 months<sup><a href="http://omniti.com#footnotes" style="color: #990000;">1</a></sup>. Consider that for a moment. It took Firefox, the most used web browser in the world<sup><a href="http://omniti.com#footnotes" style="color: #990000;">2</a></sup>, over 4 years to reach one billion downloads<sup><a href="http://omniti.com#footnotes" style="color: #990000;">3</a></sup>, and VOIP/IM service Skype served its one billionth download after 5 years<sup><a href="http://omniti.com#footnotes" style="color: #990000;">4</a></sup>. Apple did it in 10 months, and they did it in an emerging market with an entirely unique distribution model. As impressive as that was, Apple has recently outdone themselves once again, reaching the 2 Billion download mark in September of 2009<sup><a href="http://omniti.com#footnotes" style="color: #990000;">5</a></sup>, doubling the total number of downloads served in just 5 months. 
</p>
<p>
This kind of exponential growth is unlikely to slow down anytime soon, and Apple&#8217;s success with both the iPhone hardware as well as the iPhone App Store has revolutionized the pace of innovation in the smart phone market. Verizon is projected to carry 18 different smart phones powered by the Google Android OS by the end of 2009<sup><a href="http://omniti.com#footnotes" style="color: #990000;">6</a></sup>, and Blackberry now has an &#8220;App World&#8221; to provide consumers of their smart phone line with third party applications. 
</p>
<p>
What does this impressive growth mean for business owners, executives, and IT managers? It means that consumers are becoming increasingly mobile, connected, empowered, and, ultimately, accessible. As large segments of the consumer market begin to adopt mobile technology, the organizations who benefit the most will be those who evolve to service them, finding new distribution or advertising channels for existing products and in some cases utilizing this technological shift to create entirely new offerings. 
</p>
<p>
Yet the most profitable organizations will capitalize on the opportunities created by the mobile market without becoming lost within them. As smart phones are beginning to reach critical mass, it is perhaps now more important than ever to realize that much of the excitement in the mobile space is fueled by HTTP and the Web, and Internet usage  certainly isn&#8217;t going anywhere but up in the foreseeable future. In the United States alone, Internet Service Providers have reached a point of market saturation with 71&#37; of all Americans reporting consistent Internet and Web availability at home or work<sup><a href="http://omniti.com#footnotes" style="color: #990000;">7</a></sup>, and 85&#37; of iPhone users report using mobile Safari to browse the web on a regular basis<sup><a href="http://omniti.com#footnotes" style="color: #990000;">8</a></sup>. Savvy managers will apply many of the lessons learned in previous online ventures to future expansion in the mobile space while continuing to build lasting value wherever their consumers are.
</p>
<img alt="Device Penetration" src="http://images.omniti.net/omniti.com/i/b/mark-seeds%28phone%29.png" />
<p>
In the past 10 years, I&#8217;ve had the pleasure of working on a variety of projects that span multiple mediums. In varying capacities, I&#8217;ve directly contributed to television productions, radio broadcasts, print publications, trade show exhibits, stage shows and online advertising campaigns. I&#8217;ve also served as the lead engineer on a variety of desktop, web, and mobile software projects. I consider myself to be a graduate of the school of hard knocks, and when you dance with lady experience long enough, you begin to realize that the principles of success transcend all mediums. The following are just a few &#8220;Transcendent Themes&#8221; that I believe must be applied regardless of the medium an organization chooses to operate within: 
</p>
<ol style="font-size: 1.143em;">
<li>Launching is Not Enough
<p class="first">
&#8220;If you build it, they will come.&#8221; It worked great for Kevin Costner&#8217;s character in Field of Dreams, but in the real world, merely bringing a product or service to market is rarely enough for it to succeed. Good products require good marketing initiatives behind them, and what is true in conventional mediums is also true on the web and the emerging mobile market. 
</p>
<p>
From banner ads to social media, today&#8217;s marketers have more tools at their disposal than ever before. Finding the right mix can be a challenge, but the solution will likely involve a cross-comparison of proven conversion rates against perceived market presence. In English: find out where your market is digitally congregating, look at the proven effectiveness of the tools at your disposal that are able to reach them, and then start a small initial campaign to test your analysis. 
</p>
</li>
<li>Sample It to Sell It
<p class="first">
Shopping mall fast food chains face some of the fiercest competition around. Sales quantity is king in a confined space with direct competition, low profit margins, and a product lifespan measured in hours instead of days or months. The presence of free samples in food courts isn&#8217;t primarily motivated by desperation but by survival. Fast food chains continue to offer free samples month after month in shopping centers across America because they understand that the cost of giving away small amounts of their product is adequately covered by the sales those samples generate. 
</p>
<p>
This same principle is applied in virtually every industry. Clothing stores often allow customers to use in store fitting rooms to try on outfits before purchasing. Automobile dealers allow qualified leads the famed &#8220;Test Drive,&#8221; and the movie industry produces previews of all new releases to entice the viewer to later enjoy the full experience. 
</p>
<p>
Samples sell, and this age old axiom may be even more effective when applied to digital products than to physical goods. 
</p>
<p>
Consider the case study of iCombat, a top 100 iPhone application. The month after the makers of iCombat released a free, &#8220;lite&#8221; version of their paid application, they were rewarded with an 8.73&#37; conversion rate. The most impressive part? A conversion rate of 8.73&#37; resulted in a monthly sales revenue increase of 496&#37;<sup><a style="color: #990000;" href="http://omniti.com#footnotes">9</a></sup>. From the iCombat creator:
</p>
<blockquote> 
<p>we had waited months longer than we should have 
to launch a lite version. There was no point to waiting and sacrificing 
the initial new release buzz.
</p></blockquote>
<p>
When it comes to profiting from the power of sampling, iCombat isn&#8217;t alone. Mobile analytics company Flurry released a study of the impact sample applications had on paid application sales and found that there is on average an 85&#37; &#8220;free-to-paid&#8221; sales lift generated by application sampling<sup><a href="http://omniti.com#footnotes" style="color: #990000;">10</a></sup>. From the Flurry report:
</p>
<blockquote>
<p>
Among your strongest marketing plays in the App Store is to offer 
a free trial of your game or application. Not only is the App Store designed 
for this, but also it&#8217;s the best way to reduce consumer risk in trying your 
application, with the goal of eventually getting that user to purchase the 
full version. Think: money. And from our data, it&#8217;s among the most 
effective moves you can make.
</p>
</blockquote>
</li>
<li>Make a Good First Impression
<p class="first">
1/20th of a second. That&#8217;s how long you have to make a good impression on the typical web user according to Dr. Gitte Lindgaard of Carleton University in Ontario, Canada. Dr. Lindgaard published an article in Behaviour and Information Technology in which he describes that a visitor not only forms a cognitive bias in the first 1/20th of a second after visiting a web site, but that this cognitive bias, formally known in psychology as the &#8220;halo effect,&#8221; would also significantly influence the user&#8217;s opinions on the reliability and usability of the web site. 
</p>
<p>
In the words of Dr. Lindgaard:
</p>
<blockquote>
<p>
&#8230;the strong impact of the visual appeal of the site seemed to draw attention away from usability problems. This suggests that aesthetics, or visual appeal, factors may be detected first and that these could influence how users judge subsequent experience&#8230;. Hence, even if a website is highly usable and provides very useful information presented in a logical arrangement, this may fail to impress a user whose first impression of the site was negative.
</p>
</blockquote>
<p>
While no studies have been published to measure the impact of the "halo effect" in mobile applications, it is undoubtedly an important aspect of the user experience. While a thorough discussion of design principles is beyond the scope of this article, consider the fact that proper use of color alone has been proven to increase brand recognition by up to 80&#37;<sup><a style="color: #990000;" href="http://omniti.com#footnotes">11</a></sup>. In visual mediums, color certainly matters. Give your users the right impression by abiding by the principles of color psychology shown below:   
</p>
</li>
</ol>
<img width="500" height="325" src="http://images.omniti.net/omniti.com/i/b/mark-seeds%28color%29.jpg" alt="Color Psychology"/>
<p>
Success is not limited to any single medium, and neither are your consumers. The companies that realize and apply that to their operations today will be the companies taking consumers into the frontiers of tomorrow. Be among them by placing the needs of your consumers first and then utilizing whatever technology is necessary to serve them. 
</p>

<h3>Footnotes</h3>
<ol>
<li><a name="footnotes" href="http://www.apple.com/pr/library/2009/04/24appstore.html">http://www.apple.com/pr/library/2009/04/24appstore.html</a></li>
<li><a href="http://www.w3schools.com/browsers/browsers_stats.asp"><span>http://www.w3schools.com/browsers/browsers_stats.asp</span></a></li>
<li><a href="http://en.wikipedia.org/wiki/Firefox#Market_adoption"><span>http://en.wikipedia.org/wiki/Firefox#Market_adoption</span></a></li>
<li><a href="http://share.skype.com/sites/en/2008/09/celebrating_1_billion_download.html"><span> http://share.skype.com/sites/en/2008/09/celebrating_1_billion_download.html</span></a></li>
<li><a href="http://www.apple.com/pr/library/2009/09/28appstore.html"><span>http://www.apple.com/pr/library/2009/09/28appstore.html</span></a></li>
<li><a href="http://bits.blogs.nytimes.com/2009/05/27/google-expect-18-android-phones-by-years-end/"><span> http://bits.blogs.nytimes.com/2009/05/27/google-expect-18-android-phones-by-years-end/</span></a></li>
<li><a href="http://www.census.gov/compendia/statab/tables/09s1118.pdf"><span> http://www.census.gov/compendia/statab/tables/09s1118.pdf </span></a></li>
<li><a href="http://macdailynews.com/index.php/weblog/comments/16715/"><span> http://macdailynews.com/index.php/weblog/comments/16715/ </span></a></li>
<li><a href="http://www.theapplicationfarm.com/2009/07/just-how-much-does-a-lite-version-help-boost-sales/"><span> http://www.theapplicationfarm.com/2009/07/just-how-much-does-a-lite-version-help-boost-sales/ </span></a></li>
<li><a href="http://blog.flurry.com/bid/19375/iPhone-App-Store-Marketing-Give-it-Away-to-Get-Paid"><span> http://blog.flurry.com/bid/19375/iPhone-App-Store-Marketing-Give-it-Away-to-Get-Paid </span></a></li>
<li><a href="http://www.colormatters.com/market_whycolor.html"><span> http://www.colormatters.com/market_whycolor.html </span></a></li>
</ol>]]></content:encoded>
            <pubDate>Tue, 13 Oct 2009 21:32:33 GMT</pubDate>
        </item>
        <item>
            <title>When Commodity Makes Sense</title>
            <link>http://omniti.com/seeds/when-commodity-makes-sense</link>
            <guid>http://omniti.com/seeds/when-commodity-makes-sense</guid>
            <description><![CDATA[We&#8217;d all like to spend as little money as possible to get the performance we desire from our computing hardware.  When the term &#8220;commodity&#8221; is used in relation to computing, it typically refers to products that are mass-produced and w...]]></description>
            <content:encoded><![CDATA[<p>We&#8217;d all like to spend as little money as possible to get the performance we desire from our computing hardware.  When the term &#8220;commodity&#8221; is used in relation to computing, it typically refers to products that are mass-produced and widely available, with little to distinguish them other than price.  This is in contrast to &#8220;enterprise&#8221; hardware &#8212; specialized, vertically-integrated product lines such as Sun SPARC and IBM POWER that target a narrower slice of the computing market and differentiate themselves much more on features than on price.  When I say &#8220;commodity&#8221;, I don&#8217;t simply mean standardized hardware, I mean the cheapest, lowest-common-denominator gear that gets the job done. There are legitimate use cases for both types of hardware, but as with all computing solutions, there are tradeoffs that must be understood to make the wisest choices.</p>

<img alt="Disk Array" src="http://images.omniti.net/omniti.com/i/b/array.png" width="500" height="201" />

<p>The choice to design an architecture with commodity hardware in mind comes with some enticing benefits. First, instead of one expensive widget, I can afford a bunch of cheaper widgets and spread out my work load among them, which also helps isolate failures and improves the overall continuity of service to my customers. Second, it allows me to scale my solution as the demands of the business grow.  Third, money saved by avoiding pricey hardware is freed to be spent in other areas.</p>

<p>Data storage is one market where there is a stark difference between enterprise and commodity hardware.  The first time I heard the term <abbr title="redundant array of inexpensive disks">RAID</abbr>, I learned that the &#8220;I&#8221; stood for &#8220;Inexpensive&#8221;. Later, I discovered that it is often given as &#8220;Independent&#8221;. Both make sense in context, but it seems now that the former meaning has been lost when 15K-rpm drives are <em>de rigueur</em> and sit at the top end of the price range.  Lowly 7200-rpm or even 5400-rpm SATA drives occupy the low end.  This is but one area of computing where commodity drives is not seen as capable of matching more expensive, enterprise drives. However, a holistic systems approach reveals that there are plenty of places where commodity drives makes sense in terms of delivering on business goals and ensuring a high quality of service.</p>

<p>At the high end, dedicated storage arrays with custom hardware controllers filled with 15K-rpm drives define enterprise storage. These are speed demons; the high spindle speed means low latency (around 2ms, compared to 4-5ms for 7200 rpm drives). To drive latency even lower, some arrays use a technique called &#8220;short-stroking,&#8221; which utilizes only the innermost area of each disk platter, minimizing the distance that the heads must move. Such luxury comes with a steep price. Short-stroking reduces the usable space of each drive, requiring more drives for a given amount of storage. Not only that, but 15K drives max out at 300GB, so obtaining the kind of storage sizes required by large enterprises requires entire cabinets full of disk shelves. 15K drives are power-hungry, hot-running monsters which, at a time when electricity and carbon footprint concerns are becoming increasingly important, means that the luxury of high performance is ever more expensive, outstripping the performance gains of adding more spindles.</p>

<p>By contrast, a commodity storage approach seeks to maximize storage space and minimize costs, both initial and ongoing.  The same budget can buy more spindles to keep latencies down and <abbr title"Input/Output Operations Per Second">IOPS</abbr> up.  These additional spindles can fit within the same (or less) space and power budget as well.</p>

<p>The commodity storage picture is not all rosy.  Enterprise drives are manufactured to withstand the higher level of vibration that comes with putting a lot of drives into the same chassis.  They also have lower bit-error rates and higher <abbr title"Mean Time Before Failure">MTBF</abbr> than their commodity cousins.  Simply put, commodity drives fail more often.  &#8220;Failure&#8221; could be anything from silent data corruption to outright mechanical malfunction.  Commodity drives also don&#8217;t spin as fast &#8212; 7200 rpm at most.  That increases latencies, especially on random read loads.  Any solution that employs commodity drives must account for these realities and work around them.</p>

<p>Thanks to some excellent, disruptive technology from Sun, namely <a href="http://www.opensolaris.org/os/community/zfs/"><span>ZFS</span></a>, I can design a system around inexpensive, 7200 rpm drives, bolstered by a few <abbr title"solid-state disk">SSD</abbr>s, which provides more capacity with fewer spindles and <a href="http://blogs.sun.com/brendan/entry/test"><span>improves read/write latencies</span></a> far beyond the capabilities of short-stroked 15K drives. ZFS features such as end-to-end checksums, guaranteed on-disk consistency, intelligent prefetch, and immense scalability make it a good fit for my motley array of cheap disks. If I run low on space, adding more disks is extremely simple and cost-effective.  Likewise, features like <a href="http://www.opensolaris.org/os/community/zfs/demos/selfheal/"><span>self-healing data</span></a> and <a href="http://blogs.sun.com/bonwick/en_US/entry/smokin_mirrors"><span>top-down, metadata-driven resilvering</span></a> mean that silent data corruption and device failures don&#8217;t have to be the ulcer-inducing events they once were.</p>

<p>Storage is just one example of how some careful thought during the design process can yield significant savings during implementation.  The same theory applies to entire server farms when deploying web applications.  An application that is designed to scale horizontally can be run on a large number of cheap servers rather than a few very expensive ones.</p>

<p>Sometimes it <em>doesn&#8217;t</em> make sense to go the commodity route. It is as important to know when you can scale horizontally as it is to know when you should not. An application may require more engineering to rearchitect it to scale out than it would cost to buy a larger single machine to scale it up. I&#8217;m not just talking about high-end RISC gear either &#8212; some relatively large x86-64 configurations are possible. For example, the upper end of <a href="http://www-03.ibm.com/systems/x/hardware/enterprise/index.html"><span>IBM&#8217;s System x</span></a> line can be configured with up to 4 4U chassis linked together in an 8-socket, 48-core, 128-DIMM system. That&#8217;s a monster box (cue the Tim Allen grunts). If you can run your app today on one machine, <em>and</em> you can plan its growth to fit into a monster box, then the cost-effective approach may just be to use the larger machine.</p>]]></content:encoded>
            <pubDate>Mon, 21 Sep 2009 15:00:00 GMT</pubDate>
        </item>
        <item>
            <title>YSlow! to YFast! in 45 minutes.</title>
            <link>http://omniti.com/seeds/yslow-to-yfast-in-45-minutes</link>
            <guid>http://omniti.com/seeds/yslow-to-yfast-in-45-minutes</guid>
            <description><![CDATA[The web is a complex beast.  There are many moving parts involved in delivering a complete web application today.  For a significant portion of my career, I have focused primarily on the architecture and implementation of the parts that an end-user nev...]]></description>
            <content:encoded><![CDATA[<p>The web is a complex beast.  There are many moving parts involved in delivering a complete web application today.  For a significant portion of my career, I have focused primarily on the architecture and implementation of the parts that an end-user never sees.  Racks, servers, databases, switches, routers and load-balancers; the list goes on, but you get the point.  The goal of such an architecture, of course, is to receive a user&#8217;s HTTP request and construct and return a complete result as quickly as possible.  To say that there are &#8220;a lot of moving parts&#8221; in today&#8217;s web architectures is an understatement &#8212; they are beasts.</p>

<p>What makes this even more complicated is that once you spew forth the result to the end-user you have a daunting set of user-perceptible performance issues remaining to be addressed.  This performance challenge happens in a hostile environment: one we do not control (the user&#8217;s computer) over a long-haul network we do not control driven via a browser we did not select &#8212; a daunting challenge indeed.</p>

<p>We are very fortunate to have excellent tools at our disposal with which to tackle this challenge.  Two of my favorites are Yahoo&#8217;s <a href="http://developer.yahoo.com/yslow/"><span>YSlow!</span></a> and Google&#8217;s <a href="http://code.google.com/p/page-speed/"><span>Page Speed</span></a> tools.  Both are extensions to the most excellent <a href="http://getfirebug.com/"><span>FireBug</span></a> add-on for <a href="http://www.mozilla.com"><span>Mozilla&#8217;s FireFox web browser</span></a>.  Both tools will help you dissect the various aspects of the content you deliver to end-users and understand how each bit will contribute to perceived slowness.  In the web (and most other things in life) perception is king.  A user&#8217;s perception drives their response.</p>

<h2>Irony: the not-so-delicious kind.</h2>

<p>I recently attended the <a href="http://en.oreilly.com/velocity2009"><span>Velocity conference</span></a> and the first workshop I attended was Steve Souders&#8217; excellent presentation on <a href="http://en.oreilly.com/velocity2009/public/schedule/detail/8807"><span>Website Performance Analysis</span></a>.  Steve Souders is the original author of YSlow! which I use on a daily basis.  Steve used YSlow! to show how to analyze website performance (as one might have assumed from his workshop title).  I popped open YSlow! on our corporate website and&#8230; horror!</p>

<p>While OmniTI has enormous breadth in the Internet space, we are primarily known as an Internet performance and scalability company. This made the fact that we received an F on YSlow! all the more embarrassing. This was a case of the right hand not knowing what the left hand was doing &#8212; something we evangelize against. I decided that I would fix that and aim to do it by the end of Steve&#8217;s presentation.  A play-by-play follows.</p>

<h3>No Expires Headers.</h3>

<p>It turns out that our images, javascript, and CSS didn&#8217;t have expires headers.  Our CSS is in a directory /c/, our javascript is located in /js/, and all our images are in /i/.  I could do this by content type, but a location-based approach gives me the flexibility of serving dynamic/uncacheable content with those content types if I choose to later:</p>

<pre><samp>
&lt;Directory "/www/sites/omniti.com/www/i"&gt;
    ExpiresActive On
    ExpiresDefault "access plus 1 month"
&lt;/Directory&gt;
&lt;Directory "/www/sites/omniti.com/www/c"&gt;
    ExpiresActive On
    ExpiresDefault "access plus 1 month"
&lt;/Directory&gt;
&lt;Directory "/www/sites/omniti.com/www/js"&gt;
    ExpiresActive On
    ExpiresDefault "access plus 1 month"
&lt;/Directory&gt;
</samp></pre>

<h3>Using Etags.</h3>

<p>Etags are on.  This isn&#8217;t really a problem in and of itself, but since some of our static content can be served by multiple machines and the Etag in Apache is based off inode, it will be different from machine to machine and cause issues:</p>

<pre><samp>
&lt;FilesMatch "\.(js|css|gif|png|jpe?g)$"&gt;
  FileETag None
&lt;/FilesMatch&gt;
</samp></pre>

<h3>Uncompressed content.</h3>

<p>This is even easier.  We run Apache 2.2, so:</p>

<pre><samp>
AddOutputFilterByType DEFLATE \
       text/html text/plain text/xml \
       application/javascript text/css
</samp></pre>

<h3>No CDN.</h3>

<p>We have a fast CDN-like caching layer residing at s.omniti.net that we can leverage&#8230; so I flipped all the images over to that.  Technically, this is cheating because you have to add s.omniti.net to the YSlow! configuration to be recognized as a CDN.  I was pleased to learn that even without formally moving the images to a known CDN, we still moved to an A rating in YSlow!</p>

<h3>Assets served from a domain with cookies.</h3>

<p>The move of all static assets to s.omniti.net resolved this issue.  This goes to show that even if you don&#8217;t have a CDN, simply putting your static assets in a different domain (that has no cookies) can considerably speed performance in two ways: (1) it allows for more concurrency on the network layer and (2) it reduces the upstream payload for quicker requests.</p>

<h2>The result?</h2>

<p>A noticeably faster web site in under 45 minutes.</p>

<p>Before I fixed things up, it took 486ms to render (over the conference Internet connection).</p>

<a href="http://omniti.com/i/b/yslow-visit1.png"><img alt="yslow-visit1-small" src="http://images.omniti.net/omniti.com/i/b/yslow-visit1-small.png" /></a>

<p>After, a bit of work, I was able to drop the time-to-render to 315ms over the same link.  That&#8217;s a 35% reduction and it almost drops the page load time down into the &#8220;so fast it doesn&#8217;t matter&#8221; arena.</p>

<a href="http://omniti.com/i/b/yfast-visit1.png"><img alt="yfast-visit1-small" src="http://images.omniti.net/omniti.com/i/b/yfast-visit1-small.png" /></a>

<p>There are several things I&#8217;d like to do that would further improve page load/render times.  The javascript used could be consolidated into a single js file (aside from the web analytics parts).  The CSS could also be consolidated from two files to one.  On our <a href="http://omniti.com/is"><span>about page</span></a> we have thumbnail photos of all our staff, they are all the same size and we could easily turn this into a single image and use CSS sprites; that would dramatically improve the perceived performance of that page.</p>

<p>Some things we did right?  Our search is wicked fast as we pull the results in AJAX and make a single DOM manipulation to visualize them.</p>

<h2>Next steps.</h2>

<p>Go fix your site.  Make it faster.  Make the web a better place.  It took me 45 minutes to make significant positive impact.  Granted, if I didn&#8217;t know your web application or it was more complicated than our corporate site (which I believe all are), it will take a bit longer.  It&#8217;s worth it.  Do it.  Or <a href="http://omniti.com/does/scalability-and-performance"><span>hire us to do it</span></a>.</p>]]></content:encoded>
            <pubDate>Tue, 07 Jul 2009 13:30:00 GMT</pubDate>
        </item>
        <item>
            <title>What is Web Operations?</title>
            <link>http://omniti.com/seeds/what-is-web-operations</link>
            <guid>http://omniti.com/seeds/what-is-web-operations</guid>
            <description><![CDATA[The field of web operations is one with which I am intimately
familiar.  For the last twelve years, I have immersed myself in this
field and have had the distinct privilege in helping define it.  Even
now, writing a job description for a web operations...]]></description>
            <content:encoded><![CDATA[<p>The field of web operations is one with which I am intimately
familiar.  For the last twelve years, I have immersed myself in this
field and have had the distinct privilege in helping define it.  Even
now, writing a job description for a web operations specialist is
nearly impossible and when I speak with colleagues about what web
operations truly is, we all seem to articulate things differently.  I
wrote an article a little over a year ago after attending the first
O&#8217;Reilly Velocity Summit.  I now sit in a hotel room preparing
my workshop for delivery at the second annual Velocity conference and
realize very little has changed.  While I still believe the definition
of web operations is in flux, I truly appreciate a forum in which it
can be explored further.  I strongly encourage anyone in the bay area
to swing by and partake.</p>

<p>While attending the summit that helps plan this conference, I had two
epiphanies:</p>
<ol>
<li>a realization of the lack of a career path for people who do what
we do (no standard titles, no standard roles and responsibilities and
certainly a lack of sex appeal);</li>
<li>a clear lack of terminology for the technology requirements
that are so common in these environments.</li>
</ol>
<p>Terminology is easy, in my opinion &#8212; you just argue until
someone wins.  Of course, arguing is a hobby of mine, so I have bias.
On the other hand, defining a career path that is an industry accepted
path is hard.</p>

<h2>The Career: Web Operations</h2>
<p>The term <a href="http://en.wikipedia.org/wiki/Web_operations"><span>Web
Operations</span></a> was used a lot during this event.  While it is not
awful, I really do not like this term.  The hard part is that the
captains, superstars, or heroes in these roles are multidisciplinary
experts.  They have a deep understanding of networks, routing,
switching, firewalls, load-balancing, high availability, disaster
recovery, TCP &amp; UDP services, NOC management, hardware
specifications, several different flavors of UNIX, several web server
technologies, caching technologies, several databases, storage
infrastructure, cryptography, algorithms, trending and capacity
planning.  The issue: how can we expect to find good candidates that
have fluency in such a nimiety of technologies?  In the traditional
enterprise, you have architects which are broad and shallow and their
team of experts which are focused and deep.  However, the
expectation is that your &#8220;web operations&#8221; engineer be both
broad and deep: fix your gigabit switch, optimize your MySQL database
and guide the overall architecture design to meet scalability
requirements.</p>

<p>I struggle with this.  Not everyone can be a superstar.  More
importantly, no one can really start as a superstar.  If we use an
apprentice model (which is common in industries without institutional
support) we limit the total number of able workers in this field.  So,
how do we (re)define the requirements for a junior web operations
person?</p>

<p>We have to have a plan for hiring on people and progressing them
through a career path to make this a legitimate discipline.  During
conversation, one of my colleagues said they just hire people that
they think are agile &#8212; &#8220;If I tell them to know IOS well
enough to configure a router and troubleshoot a problem, I expect them
to show up tomorrow with a basic understanding of IOS and ready to
start typing in commands at a console.&#8221; I agree this sort of
&#8220;no boundaries&#8221; attitude is required for the job, but
where do you start?</p>

<p>Another person mentioned that the reason for the lack of sex appeal in
the position was due to popular attitude.  Many people apply for
development positions and &#8220;don&#8217;t quite make the cut&#8221;
and are instead offered system administration positions.  I personally
don&#8217;t subscribe to this philosophy and we certainly do not operate
like that at <a href="http://omniti.com/"><span>OmniTI</span></a>, but I have seen
it in other companies &#8212; I hope it is not prevalent.</p>

<p>Basically, this is one of the few positions in the organization that
has no boundaries of responsibility.  If something breaks,
it <em>is</em> your problem.  Why isn&#8217;t this the case throughout
the organization &#8212; why is it that even the most junior of
developers doesn&#8217;t wake up to fix their code when it breaks and causes
service degradation in the middle of the night?  It is uncommon that
this level of responsibility is expected of developers, while it is a
quite common expectation of the operations crew.</p>

<p>Circling back, I really do not like the term &#8220;web ops.&#8221; I
realize it is not far off, but it isn&#8217;t sexy.  Google has a few
different roles with this level of responsibility.  One I like is
called: &#8220;Site Reliability Engineer.&#8221; However, I would like
a set of job titles and a progression through them that makes this an
appealing career path for young, ambitious geeks.</p>

<p>In order to define these roles, we should think about what they are
responsible for.  In our organization I see this as a few things:</p>

<h3>Junior</h3>
<p>On the junior level, they are responsible for learning.  They are
responsible for deploying new services and documenting such
deployments.  They are responsible for instrumenting deployments to
make sure that faults are detected and trending is possible.</p>

<h3>Mid-level</h3>
<p>On the mid-level, they are responsible for all of the above, and more.
Effective and complete troubleshooting of failures.  Making sense of
trending information.  Understanding work loads that exist.  Tuning
systems to better accommodate current workloads and proactive tuning
to handle known future workloads.  One of the key differences between
mid-level and junior is the ability to correctly prioritize
remediation of issues during incident response.  Staying calm,
collected and executing with clarity of thought during an emergency.</p>

<p>What does &#8220;complete troubleshooting&#8221; mean?  I mean
troubleshooting without boundaries.  I want no shyness in cracking
open developer code and telling them what they did wrong and
why. Finger pointing at people simply doesn&#8217;t work, you have to
point your finger at implementation problems, not people.  To do that
requires the skill to track a performance problem or reliability issue
down to a specific line of code or approach.</p>

<h3>Senior</h3>
<p>On the senior side, technology research and selection is a must.
Additionally, they are responsible for incorporating new technologies in the architecture to improve availability and reduce costs, constantly analyzing systems to
improve efficiency and capacity planning to understand growth well
enough to ensure provisioning and deployment outpace need.  Donald
Knuth long said that premature optimization is the root of all evil;
I&#8217;ve long said that the ability to accurately determine what is
premature separates senior from junior.</p>

<p>One of the core responsibilities that all engineering disciplines share is
assessing the appropriateness of the technologies at hand.  For example,
a &#8220;Web Architect&#8221; must ensure that
technology selection as well as development and deployment strategy
match the business need.  This is &#8220;hard.&#8221;</p>

<h2>Above and Beyond</h2>
<p>Web operations is a special role.  This role is in no way fitting for failed
developers, it is for developers/engineers that have outpaced their
career path.  One that has a deep understanding of how things work:
&#8220;a complete systemic view of general site architecture.&#8221;
However, they want <b>more responsibility</b>, they want to make sure
that <b>all of it works all of the time</b>: the app, the stack, the
hardware, the network.  Whatever technology the business needs, it
must work, it must performs and it must be able to meet demand.
Lastly, in their heart of hearts, they must believe that all problems
are equal in their need for resolution and problem prioritization is
dictated by business impact and not by flights of fancy (how cool or
interesting the problem is).</p>

<p>It is an impossible job requirement: &#8220;Knows everything about all
technologies deployed in Internet architectures.&#8221; While no one
fills this requirement, what I want is someone whose career goal is to
find out how close they can get.</p>]]></content:encoded>
            <pubDate>Mon, 22 Jun 2009 00:39:00 GMT</pubDate>
        </item>
        <item>
            <title>Concepts of Cloud(ish) Storage</title>
            <link>http://omniti.com/seeds/concepts-of-cloudish-storage</link>
            <guid>http://omniti.com/seeds/concepts-of-cloudish-storage</guid>
            <description><![CDATA[It&#8217;s rare that I write an article simply to educate.  Most of
the time I am attempting to articulate or justify a position, or
simply rebutting someone&#8217;s nonsensical yammering.  For a
refreshing change, I thought I would take some time to e...]]></description>
            <content:encoded><![CDATA[<p>It&#8217;s rare that I write an article simply to educate.  Most of
the time I am attempting to articulate or justify a position, or
simply rebutting someone&#8217;s nonsensical yammering.  For a
refreshing change, I thought I would take some time to educate you on
the fundamentals of large-scale data storage.  Many people think of
&#8220;storage as a service&#8221;
(now being called &#8220;cloud storage&#8221;) as a magic
black box.  At the end of the day, it is just bits on disks.  And like
all things, if you use enough of it, you can more than cover the cost
of managing it yourself by simply eliminating your vendor&#8217;s
margin (insourcing).</p>

<p>There are more and more services providing outsourced storage.  The
concept is simple: you upload a digital asset to the vendor (via some
sort of API or tool), they return an identifying key of some sort
(sometimes this key is provided by you, the uploader) and they store
the asset for you.  To retrieve the asset, you use a similar method to
the one used to upload.  In the simplest terms, you can think of it as
a mapped network drive to which you can save assets, and later
reconnect to retrieve them.</p>

<p>By no means is this new technology.  However, the idea of managing
one&#8217;s own storage, combined with growing space requirements and
fear of loss due to lack of redundancy, have driven people to want to
make this particular problem someone else&#8217;s.  Making this choice
&#8212; to solve the problem yourself or to outsource &#8212; is
always the outcome of several factors: cost, convenience, and
safety.</p>

<h3>Redundancy: The basics</h3>

<p>Let&#8217;s take a look at the fundamentals of data storage.  We
all want our data to be safe.  It&#8217;s pretty obvious that storing
exactly one copy of the data isn&#8217;t safe, but it&#8217;s actually
more complex than you would think &#8212; storing two copies
doesn&#8217;t buy you much without taking a few extra steps.</p>

<p>Before we dive in and explore methods for keeping data safe across
systems, we need to realize that one of our fundamental assumptions is
invalid.  We assume that when we write data to a disk, it will have no
errors when we read it back.  <a href="http://indico.cern.ch/getFile.py/access?contribId=3&amp;sessionId=0&amp;resId=1&amp;materialId=paper&amp;confId=13797"><span>This
assumption is fundamentally wrong</span></a>.  There&#8217;s this little evil
thing called a bit error (basically, one of the zeros or ones that was
written came back inverted).  How often this type of error occurs is a
probability called bit error rate (BER).  The <abbr title="bit error
rate">BER</abbr> on modern spinning disks is usually around 10<sup>-13</sup> or
10<sup>-14</sup>.  Basically, for every 1 to 10 terabytes you write, one of the
bits &#8212; when read &#8212; won&#8217;t equal what was written.  A
single erroneous bit might not matter for some types of data, but for
others, such an error could be disastrous. We write a lot of data
these days, and bit errors are silent, so the lesson here is: write
checksums with your data.</p>

<p>The classic method of ensuring that data is safe is to store
multiple copies on different physical media.  Inside a single system,
this can be accomplished with RAID1 (mirroring), which makes sure all
data is on two physically separate disks.  With a bit of (somewhat)
clever math, we can take that same data, split it into a few pieces
and store each piece on a different drive. We can then calculate a
block of parity data, and store that on an additional drive.
Retracing the same math backwards shows that we can lose any single
disk in the set, and we&#8217;ll still be able to reconstruct our
data.  This is the basis for RAID5.  Sometimes systems need to be
resistant to multiple concurrent disk failures (hence the introduction
of RAID6, which uses an erasure code such as <a href="http://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction"><span>Reed-Solomon</span></a>).</p>

<p>None of these scenarios are designed to reduce the risk of data
corruption.  Rather, they were designed to prevent data loss due to
hardware failure of one or more underlying disks.  One issue with
using RAID is that you are storing files on a set of drives, those
files consist of chunks of data (blocks) which map to physical blocks
of bits on the drives, and somewhere along that path we could lose our
way.  If a specific physical block goes bad, or somehow becomes
unreadable, we can&#8217;t easily map it back to a logical object,
such as a file.  We only find out that there&#8217;s a problem when we
try to read the object.  Another problem with this general technique
is that all of these disks live in a single system and if that system
fails, all of the data is unavailable (or worse, lost).</p>

<p>So, RAID is designed to keep our data somewhat safe within a single
system, but it doesn&#8217;t address system failures.  The most
obvious design is to put all of our information on two systems.  There
are pros and cons with this approach.  On the positive side, once
we&#8217;ve identified which system holds a copy of our asset, we only
need to communicate with that single system to retrieve a copy of the
asset &#8212; simplicity.  The downside here is that we&#8217;ve used
half of our storage as redundancy, and yet if two of our nodes fail,
we&#8217;ve necessarily made unavailable (or permanently lost)
1/(N*(N-1)) of our assets.  With two nodes, this works out to 100% (of
course), and with 10 nodes, it&#8217;s around 2%.</p>

<p>Taking a different approach altogether allows us to use half of our
storage for redundancy, while maintaining dramatically greater
availability.</p>

<h3>Erasure codes</h3>

<p>High availability of assets in light of system failures is achieved
by today&#8217;s peer-to-peer systems.  Their technical description is
clear-cut, yet extremely detailed.  By using erasure codes, these
systems are able to split data into many pieces (similar to RAID5),
but instead of calculating simple parity, they calculate unique
erasure codes.</p>

<p>Imagine we split our data into 5 pieces, and then calculate 5
additional pieces of data, any of which could be used to reconstruct
any of the original 5 pieces were they found to be unavailable &#8212;
these are erasure codes.  So, with the data in 5 fragments + 5 erasure
fragments, we&#8217;ve consumed twice the space but can now stand to
lose any five pieces before the data becomes unavailable and/or lost.
The main drawbacks to such a system are that calculating and
distributing erasure codes is much more complicated than simply
storing two copies of the same data, and that retrieving data requires
contacting at least 5 machines to serve an asset.</p>

<p>This erasure code approach assumes a slightly larger network of
servers.  With two copies and 100 machines we see 99.8% availability
with 5 machine faiures. With a 10 fragment (5 data + 5 coded)
scenario, if 5 nodes fail, we maintain 100% availability.  In the
pathological case where 50 of our 100 nodes fail, the two-copy method
would result in an availability of approximately 75.3%, whereas the
erasure code method would achieve approximatately 98.7% asset
availability.</p>

<h3>Back to reality</h3>

<p>In peer-to-peer systems, where clients enter and leave the network
rapidly, the use of erasure codes for high redundancy is quite
necessary.  However, in a datacenter environment, with redundancy on
each system and maintenance windows that we control, the situation is
entirely different.  Controlling the servers, their configuration and
their region of deployment gives us a landscape on which we can build
a sufficiently redundant system with all sorts of advantages.</p>

<p>Reduced system complexity and simple distributed processing are
significant advantages that result from having whole data objects like
images or documents present on a single node.  With this model,
we can offload some computational processing to the nodes that hold the
data and they can act without consuming additional resources such as the
CPU time and network bandwidth required to reconstitute whole objects 
from their distributed pieces.</p>

<p>At the end of the day, a hybrid/adaptive approach between the two
would yield the best outcome.  I see that being the next thing in
distributed storage.  Most of us that are faced with storing large
amounts of data have already thrown traditional filesystems and
POSIX-compliance to the wind and are looking for fresh, more
appropriate solutions to our specific problems.</p>

<p>For now, until these merge, the approach of redundantly storing
whole assets makes the most sense.  It is simple and easy to build,
deploy and administer.  It is also trivially easy to understand and
troubleshoot.</p>]]></content:encoded>
            <pubDate>Thu, 11 Jun 2009 20:16:48 GMT</pubDate>
        </item>
        <item>
            <title>Virtualization, ZFS and Zetaback</title>
            <link>http://omniti.com/seeds/virtualization-zfs-and-zetaback</link>
            <guid>http://omniti.com/seeds/virtualization-zfs-and-zetaback</guid>
            <description><![CDATA[It used to be the case that when you wanted to deploy a new application
you would need to buy new server hardware to host it on. Today however, there
are many different virtualization technologies to choose from, each allowing
you to have more than one...]]></description>
            <content:encoded><![CDATA[<p>It used to be the case that when you wanted to deploy a new application
you would need to buy new server hardware to host it on. Today however, there
are many different virtualization technologies to choose from, each allowing
you to have more than one virtual server per physical machine. Virtualization
has a number of benefits &#8212; lower cost, power, space, and cooling. Of course,
you need to have a machine powerful enough, but many services, especially
internal ones such as company wikis and instant messaging servers, do not
require the full resources of a physical server, and it makes sense to combine
these using virtualization.</p>

<p>In many cases, web applications can be combined on a single server using
virtual hosting facilities in Apache, but this is an imperfect solution.
Inevitably the situation arises where you have an application that doesn&#8217;t
play well in a virtual hosting situation, be it badly written, or requiring
specific versions of libraries or modules that conflict with another
application. There are also administrative concerns &#8212; anybody who has access
to one application has access to them all.</p>

<p>The virtual hosting method also eliminates one of the biggest benefits of
using virtualization on entire servers. Many virtualization technologies
provide some method of transferring a virtual machine between physical
hardware &#8212; if a particular server is behaving badly, just transfer all the
virtual machines onto replacement hardware with little to no loss of service
and without having to reinstall the operating system/applications.</p>

<p>Here at OmniTI many of our servers run Solaris, giving us two very
powerful features on which we heavily rely when it comes to making
use of virtualization: Solaris containers (Zones), and ZFS.</p>

<h2>Virtualization using Zones</h2>

<p><a href="http://www.sun.com/bigadmin/content/zones/"><span>Zones</span></a> provide
lightweight virtualization for Solaris. Unlike many other virtualization
solutions such as VMWare or VirtualBox, Solaris zones don&#8217;t emulate physical
hardware on which several complete operating systems run; rather, there is
one kernel running in the system with multiple partitions (the zones) in
which user programs run.</p>

<p>This type of virtualization doesn&#8217;t force you to pick a set amount of RAM
for each virtual machine, or set up virtual disk images (although <a href="http://www.opensolaris.org/os/community/zones/faq/#rm"><span>resource
    limits</span></a> can be set for each zone). Because there is no hardware
emulation going on, zones are also incredibly <em>fast</em> &#8212; fast enough
that we are able to run multiple production services on a single machine
without any perceptible slowdown. Even for high traffic sites that can
saturate an entire (physical) server, we are still able to make use of zones
(with just one non-global zone per server) without any significant
performance hit. This allows us to benefit from the ease of moving a zone from
one machine to another, either in the event of hardware failure, or to migrate
to a more powerful machine.</p>

<p>Zones can also ease administration of multiple servers by centralizing
package management. By default, any package installed on the global zone is
automatically installed to all non-global zones. You can also specify that
certain paths are inherited from the global zone, reducing disk space
requirements per zone. The inherited paths become read only, forcing them to
be the same across all zones. If all packages are installed
from the global zone, and you make use of inherited paths, then you can be
assured that every zone has the same software configuration.</p>

<p>However, this doesn&#8217;t have to be the case &#8212; you also have the option of
installing packages in the zones themselves if different zones need different
packages installed. To do this, don&#8217;t inherit any directories. This creates a
'large' or 'whole root' zone, and you are free to install whatever is needed
inside the zone itself.</p>

<h2>ZFS and Zones</h2>

<p>ZFS has <a href="http://opensolaris.org/os/community/zfs/whatis/"><span>many
    useful features</span></a> that put it far ahead of most other filesystems that
are available.  Several of them are of particular interest in that they make virtualization better: a pooled storage model, snapshots,
and the ability to transfer filesystems via the <code>zfs send</code>
command.</p>

<p>Pooled storage does away with the idea of having filesystems on individual
partitions, and having to guess how much space will be occupied by individual
filesystems. You just create one pool across the entire disk (or set of disks)
that you want to store your data on. Any filesystems you then create in that
pool will only use up as much space as needed to hold the data.</p>

<p>In practice, this means we can create individual filesystems for each of our
zones without having to worry about how much space to assign to each. Having
each zone on its own filesystem is required to be able to snapshot, backup,
restore and transfer zones individually.</p>

<p>Snapshots give you almost instant point-in-time copies of your filesystem,
each of which only take up enough space to hold what has changed since the
snapshot was taken. The benefits of this are numerous including the ability to roll
back to an earlier time and consistent backups (take a backup from the snapshot,
and you won&#8217;t have files being modified while the backup is in progress). From
the point of view of virtualization however, one of the biggest benefits of
snapshots is in combination with zfs send.</p>

<p>The <code>zfs send</code> command allows you to send a snapshot of a ZFS
filesystem from one machine to another (or on the same machine, if you so
desire):</p>

<pre><samp># zfs send data/zones/myzone@somesnapshot | \
    ssh remote_machine zfs receive data/zones/myzone
</samp></pre>

<p>This allows you to quickly move (or copy) a zone from one machine to
another: detach your zone, zfs send the filesystem to another machine, attach
the zone, and you have your zone up and running on a completely different
machine.</p>

<p>You can also make use of incremental snapshots to minimize the amount of
time the zone is down (a zone has to be halted in order to detach it):
snapshot the zone&#8217;s filesystem and send it across while the zone is still
running, shut the zone down, detach it, snapshot the zone&#8217;s filesystem once
more and send the incremental snapshot across.</p>

<p>Until recently there were <a href="http://www.opensolaris.org/os/community/zones/faq/#cfg_zfsboot"><span>issues
    with upgrading zones that live on a zfs filesystem</span></a>, but this has been
fixed in Solaris 10/08 (u6), and Live Upgrade is now supported. There is now
little reason not to use ZFS as the filesystem for zones.</p>

<h2>Backing it all up with <a href="https://labs.omniti.com/trac/zetaback"><span>Zetaback</span></a></h2>

<p>It doesn&#8217;t take much thought to realize that the snapshot/zfs send tools
can also be used to take backups of systems, especially when you make use of
incremental snapshots. At OmniTI we have developed <a href="https://labs.omniti.com/trac/zetaback"><span>Zetaback</span></a>, a backup tool
based on zfs that automates much of the work of taking and managing backups.

</p><p>With Zetaback, you specify a list of hosts, the retention policy, and how
often to take a full/incremental backup. Then you just let it go.  It connects
to each host via ssh, scans the host for filesystems to back up, and by
default will back up everything, automatically picking up new filesystems. You
can filter the list using regular expressions if you want to limit what is
backed up.</p>

<p>In addition to taking backups themselves, Zetaback provides tools to quickly restore zfs filesystems, view the status
of backups and generate reports showing which filesystems violate the backup
policy (e.g. those that have not had a successful backup in 1 week).</p>

<p>The choice to make use of virtualization is often an easy one, the choice
of which solution to go with is somewhat harder. If Solaris meets the needs of
your applications, then it is worth considering Zones. Combined with the
features of ZFS and Zetaback, they provide a flexible and powerful solution.</p>

<h2>Some real-world numbers</h2>

<p>We&#8217;re a web infrastructure and development shop, so we run a lot of development servers.  Each environment needs the flexibility of its own software selection including version.  To accommodate that, we run 37 zones on 2 development servers.  Each development server has 8GB of RAM and two dual-core 64-bit AMD processors &#8212; in financial terms: about $2300 each.  Our production boxes, that serve corporate mail, document management, version control, instant messaging, directory services, etc., all run in zones also.  For that we have two boxes (just like the development ones) on which  17 zones happily reside.  All of our important services run on a rather small set of machines&#8202;&#8212;&#8202;easy to manage, cheap to power and cool.  And for our purposes, it is far more efficient than heavy-weight virtualization like VMWare ESX.</p>

<p>We&#8217;ve been running this type of light-weight virtualization for over two years now.  We&#8217;re pretty happy with it.  I suggest you give it a whirl.</p>]]></content:encoded>
            <pubDate>Fri, 10 Apr 2009 14:39:27 GMT</pubDate>
        </item>
        <item>
            <title>ORMs Done Right</title>
            <link>http://omniti.com/seeds/orms-done-right</link>
            <guid>http://omniti.com/seeds/orms-done-right</guid>
            <description><![CDATA[Object-Relational Mapper (ORM) systems are one of the most contentious topics in database application development.   Creating an ORM is notoriously perilous, but using them has pitfalls as well.  Most ORMs provide little protection against misuse; the ...]]></description>
            <content:encoded><![CDATA[<p>Object-Relational Mapper (ORM) systems are one of the most contentious topics in database application development.   Creating an ORM is <a href="http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx"><span>notoriously perilous</span></a>, but using them has pitfalls as well.  Most ORMs provide little protection against misuse; the inexperienced developer can easily create an application that unilaterally imposes awkward database design constraints, hammers the database with innumerable queries, and is very difficult to optimize.</p>

<p>ORMs provide an automated link between the application object model and the database model.  Practically speaking, this generally means that tables become classes and rows become objects.  At the most basic level, this provides per-object persistence services.  Most ORMs also handle relationships between objects, turning compositional relationships between objects into foreign key relationships in the database.</p>

<p>In this entry, we present the benefits and pitfalls of ORMs and introduce a new Perl ORM implementation, <code>Class::ReluctantORM</code>.  <code>Class::ReluctantORM</code> is &#8220;a reluctant ORM for reluctant people.&#8221;  Its design goals are to create a framework that is unambitious, scalable, and easily circumvented.  These goals are not so much technological as philosophical; an approach that has value, as we shall see.</p>


<h2>Benefits of ORMs</h2>

<h3>Why developers love them</h3>
<p>Developers see a tremendous productivity gain: persistence no longer has to be hand-crafted into each class.  Since all model classes are using the same techniques, there is a large consistency gain as well.  Developers also get to keep their head in application-model space, without having to shift gears into database-model space.   Context-switching is expensive, and switching between implementation languages can be especially jarring.</p>

<p>Consider getting a list of pirates on a ship:</p>

<pre><code>  my $dbh = DBI->connect(&#8230;);
  my $sth = $dbh->prepare(&gt;&gt;EOSQL);
  SELECT p.* 
    FROM highseas.pirates p
      INNER JOIN highseas.ships s ON s.ship_id = p.ship_id
    WHERE s.name = ?
EOSQL
  $sth->execute('Golden Hind');
  my @pirates;
  while (my $row = $sth->fetchrow_hashref()) {
     push @pirates, Pirate->new($row);
  }
  $sth->finish();
  foreach my $pirate (@pirates) {
     # Do something with $pirate
  }
</code></pre>

<p>Compared to:</p>
<pre><code>  my $ship = Ship->fetch_by_name('Golden Hind');
  my @pirates = Pirate->search_by_ship($ship);
  foreach my $pirate (@pirates) {
     # Do something with $pirate
  }
</code></pre>

<p>The second example is much more legible, and remains entirely in the application model&#8201;&#8212;&#8201;you don&#8217;t have to think about how the database is set up, or how the tables interact.  You don&#8217;t even need to know SQL.</p>

<p>Additionally, most modern ORMs can shield the business logic from limited changes in the database schema (such as table or column renames).  While this sort of change is usually better hidden at the database logical layer using a view, organizations that have restrictive database change policies may appreciate the added flexibility.</p>

<h3>Why leads love them</h3>

<p>Team leads find ORMs to be very useful for several reasons.  The most obvious is the reduced amount of time spent wiring up persistence layers; instead, developers can stay focused on the business problems.  This enables new capabilities. For example, using an ORM, it&#8217;s much easier to knock out a quick prototype in response to an <abbr title="request for proposals">RFP</abbr>, or explore an alternative design.  Additionally, since the amount of SQL is dramatically reduced, developers need not have SQL skills to be productive.</p>

<h3>Why project managers love them</h3>

<p>Project managers like ORMs for many of the same reasons that team leads do.  Because productivity is increased, bids can be lower, or more features can be delivered for the same schedule.  This may lead to more contracts.  The reduced skillset needs of an ORM-based project can also help solve staffing problems.</p>

<h2>Pitfalls</h2>

<h3>DBA Gripe #1: ORMS that dictate DB design</h3>

<p>Some ORMs dictate database design.  These constraints typically center around keys.  Commonly, primary keys are required to be be single-column, integer, and auto-incrementing.  Foreign keys are often under the same constraints.  This leads to the proliferation of artificial keys.</p>

<p>Naming conventions are another sore point.  Some ORMs require primary keys to be named <code>id</code>, others require them to be named <code>pirate_id</code>, or even <code>pirate</code>.  Tables may be required to be named in the plural, and one ORM&#8217;s notion of pluralization may not match that of another ORM (e.g. <code>staff</code> vs <code>staffs</code> vs <code>staves</code>).</p>

<p>These constraints are annoying but tolerable if the database design is new and the  ORM-based application is the only client.  But in most real-world situations, the ORM-based application is only one consumer of the database.  It may be a pre-existing design with several legacy apps already using the schema.  It is possible to use views and rules to appease the ORM&#8217;s requirements, but that trades the developer&#8217;s productivity gain with a busywork task for the DBA.</p>

<p>Finally, there is the issue of schema ownership.  Both the ORM and the DB know about the database structure.  When a change is needed, where do you make the change?  Some ORMs &#8220;own&#8221; the schema, and will execute DDL to modify the database to match changes in the object model.  Others don&#8217;t own the schema but instead mirror the database schema in a configuration file.  Better ORMs read the database at startup, and configure the object model accordingly (though this has problems of its own, especially related to startup speed).</p>

<h4><code>Class::ReluctantORM</code>&#8201;&#8212;&#8201;stay agnostic</h4>

<p><code>Class::ReluctantORM</code> is firmly in the &#8220;read the schema from the database on startup&#8221; camp.  Some configuration is still needed&#8201;&#8212;&#8201;to set up connection handles, declare classes, and to create relationships that cannot be auto-detected.  Pushing more of the configuration into auto-configurators, while maintaining overridablility, is an active area of development.</p>


<h3>DBA Gripe #2: Opaque, Baroque Query generators</h3>

<p>An ORM, by its very nature, must contain some kind of query generation mechanism.  SQL is an easy language to generate, but a very difficult language to generate well.  There are many dialects.  An ORM may choose to generate standards-compliant (but slow) queries, or it may attempt to optimize for the particular database engine.  As the optimization increases, the query complexity often increases.  Some ORMs choose to punt, generating many simple queries (see DBA gripe #3); others may generate one massive, multi-<code>JOIN</code> query. In either case, at some point you will get the classic complaint that &#8220;the database is slow.&#8221; A DBA wants to be able to tune and replace these queries with hand-crafted versions.  This may or may not be possible.  Even if it is, the query generator is buried in the ORM code itself, in developer-land, and often requires both developer and DBA to invest time to optimize a query.</p>

<h4><code>Class::ReluctantORM</code>&#8201;&#8212;&#8201;query monitors</h4>

<p>It is important that the SQL generation process be as transparent as possible.  To this end, <code>Class::ReluctantORM</code> provides a unique monitoring facility that provides hooks for several key events in the life of a query, including initiation, SQL generation, execution with bound parameters, result fetching, and teardown of the query.</p>

<p>Monitors may execute arbitrary Perl code at any or all of the events.  A monitor may abort a query if needed, or simply log statistics or debugging data.  Monitors may be attached at compile time or runtime, and may be attached to a particular class, or all classes in the model.</p>

<p><code>Class::ReluctantORM</code> ships with six canned monitors, including those for join count, column count, data volume, timing, diagnostic, and one which executes the query under <code>EXPLAIN ANALYZE</code> to predict performance.  The developer is free to add new monitors.</p>


<h3>DBA Gripe #3: hidden expensive actions</h3>

<p>Consider this expression:</p>

<pre><code>  my $jewels = $ship->pirates->first->hideaways->find_by_name('Skull Island')->treasures->first->jewel_count();
</code></pre>

<p>While it won&#8217;t win any awards for formatting, it is fairly clear: get the number of jewels that the first pirate on my ship has stashed away on Skull Island.  It&#8217;s easy to imagine a junior programmer writing this, or a journeyman programmer writing a less contrived example.</p>

<p>Does this code, at first glance, look like it is hammering the database?  How many database queries will this result in?  Depending on the ORM, it may range from 1 to 6. And depending on the database, the 1 query might be better or worse than the 6. In almost every case, the queries involved will pull back more information than they need from the database, so even when the ORM gets the queries right, it&#8217;s still likely to have unnecessary overhead.</p>

<p>Or this common case:</p>

<pre><code>  foreach my $ship (@fleet) {
     foreach my $pirate ($ship->pirates()) {
        foreach my $hideaway ($pirate->hideaways()) {
           foreach my $loot ($hideaway->treasures()) {
              tithe_to_queen($loot);
           }
        }
     }
  }
</code></pre>

<p>That should have scared you.</p>

<h4><code>Class::ReluctantORM</code>&#8201;&#8212;&#8201;mandatory prefetching</h4>

<p>Looking back at this example:</p>

<pre><code>  # 1-6 queries
  my $jewels = $ship->pirates
                    ->first
                    ->hideaways
                    ->find_by_name('Skull Island')
                    ->treasures
                    ->first
                    ->jewel_count();
</code></pre>

<p>This usage is problematic because:</p>
<ul>
  <li>There is no indication that queries are occurring.</li>
  <li>Any performance issues will be detected in production, not in development.</li>
</ul>

<p><code>Class::ReluctantORM</code> does not allow accessors to directly execute queries.  Instead, each accessor looks for a cached value, and returns it if found.  A cache miss throws a <code>FetchRequired</code> exception.  A full-featured prefetching facility is available:</p>

<pre><code>  # One query
  my $ship = Ship->fetch_deep(
    where => 'name' => 'Golden Hind',
    with => {
      pirates => {
        hideaways => {
          treasures => {}
        }
      }
    }
  );
  # Zero queries
  my $jewels = $ship->pirates
                    ->first
                    ->hideaways
                    ->find_by_name('Skull Island')
                    ->treasures
                    ->first
                    ->jewel_count();
</code></pre>

<p>This <code>fetch_deep</code> call executes exactly one <code>SELECT SQL</code> statement, <code>JOIN</code>ing against the related tables.  The results are then processed to create one ship object, which has a collection of pirates, each of which has a collection of hideaways, each of which has a collection of treasures.  This data is now prefetched, and a long, deep chain of method calls like above is now permissible.</p>

<p>Importantly, if a programmer adds a method call (say, <code>$pirate->parrots</code>) that is not prefetched, an exception will be thrown the first time it is executed.  The developer will see this immediately in testing, and add the required clause to the prefetch.  This integrates scalability directly into the development process.  This feature, unique to <code>Class::ReluctantORM</code>, is what provides its name: it is <em>reluctant</em> to do database fetches.</p>


<h3>Software engineering: impedance mismatch rabbit hole</h3>

<p>We have a fundamental problem with ORMs: relations aren&#8217;t classes, and tuples aren&#8217;t objects.  This problem is called the &#8220;impedance mismatch&#8221; between the database model and the application model, and is discussed in detail in several places on the Internet.  Some of the more troubling issues include:</p>

<ul>
  <li>Identity&#8201;&#8212;&#8201;Two objects referring to the same record are distinct, though there is only one record.</li>
  <li>Partial fetches&#8201;&#8212;&#8201;Most OOP languages do not have a notion of an object that is only partially populated, but it is perfectly valid (and desirable) to select only a subset of columns from a table.</li>
  <li>Inheritance&#8201;&#8212;&#8201;There is no clear analogue of inheritance in the database world.  Several <a href="http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx"><span>approaches exist</span></a>, but they all have severe drawbacks.</li>
  <li>Caching&#8201;&#8212;&#8201;The object model will be out of date as soon as it leaves the database.  Should results be cached?  How long?</li>
</ul>

<p>ORM developers are faced with a few nasty choices.  Keep it simple, and let the user of the ORM know that the ORM is an unsynchronized, approximate model of the database.  Or gradually add complexity, attempting to patch over the impedance mismatch.  The latter path gets into diminishing returns quickly.</p>

<h4><code>Class::ReluctantORM</code>&#8201;&#8212;&#8201;90% rule</h4>

<p>Some ambitious ORMs try to solve 100% of the object-database problem.  <code>Class::ReluctantORM</code> tries to solve the easiest 90%.  That means it makes the choice that the impedance mismatch is a very hard problem, and the ORM will do its best, but you still need to be aware of its limitations.  This scope limit helps exclude features that would dramatically increase complexity (for example, there is very little support for aggregates).</p>


<h3>Skill atrophy</h3>

<p>One of the big advantages of ORMs is also a major disadvantage: no, or little, use of SQL. We learn skills through exposure and experience, and if we are never exposed to SQL, we&#8217;ll never learn it.  Or, if we know some SQL, then use ORMs <em>exclusively</em>, our skills will likely atrophy.  In almost every application, the ORM will need to be bypassed at some point, and then SQL skills will be sorely missed.</p>

<h4><code>Class::ReluctantORM</code>&#8201;&#8212;&#8201;SQL pass-thru</h4>

<p>For the remaining 10% of problems outside the scope of <code>Class::ReluctantORM</code>, several avenues are provided to bypass the query generator and use SQL directly.  Because it was developed in a shop with a heavy mistrust of ORMs, <code>Class::ReluctantORM</code> is designed to make this bypass as easy as possible.  The documentation mentions how to bypass the ORM early.</p>

<p>Avenues of SQL support, ranging from SQL-centric to object-centric:</p>
<ol>
  <li>Ask the ORM-managed object or class for a database handle, and execute statements on it.  Results are in raw values, not part of the object model.</li>
  <li>As above, but wrap this into a method call on an ORM object or class, thus integrating SQL into the object model.  This is handy for aggregate functions.</li>
  <li>Future releases aim to provide the ability to override specific ORM-generated queries with your own SQL.</li>
  <li>Ask <code>Class::ReluctantORM</code> to intepret the SQL into its own representation, and execute.  If the translation was successful, return values will be ORM-based objects.  This is a <code>Class::ReluctantORM</code>-exclusive feature.</li>
  <li>Write a query directly using <code>Class::ReluctantORM</code>&#8217;s abstract SQL engine.  You&#8217;re no longer writing SQL directly, but performing method calls on <code>FromClause</code> objects, for example.  This is guaranteed to return ORM objects.</li>
  <li>Use ORM methods and pass SQL fragments as arguments (e.g., a <code>WHERE</code> clause for a <code>search()</code> method).</li>
</ol>

<p>In all six cases, the developer must use SQL or SQL concepts.  This may help reduce SQL atrophy.  In many cases, because the SQL can be just &#8220;dropped in,&#8221; you can have a DBA or SQL expert develop SQL for a specific query with no contact with the ORM.</p>


<h3>Framework lock-in</h3>

<p>Like any framework, using one is often irreversible.  It is very difficult to adapt an application to use a different ORM&#8201;&#8212;&#8201;even if the interfaces are similar, often times there will be differences among the query specifications, the DB requirements, or the semantics of operations (e.g. do inserts cascade?).  ORMs are especially susceptible to lock-in because their footprint is so ubiquitous in the application code.  Every time you deal with a relationship between your model objects, you interact with the ORM.</p>

<h4><code>Class::ReluctantORM</code>&#8201;&#8212;&#8201;unsuprising interface</h4>

<p>While there is little that can be done to fight lock-in, <code>Class::ReluctantORM</code> tries to reduce the pain of switching to another ORM, or dropping ORM support altogether, by using common conventions for method names (accessors are named directly after the property, for example).  When a new feature is added, the interfaces of other ORMs are studied, and similar conventions are adopted if possible.</p>


<h3>Sometimes it&#8217;s the wrong tool</h3>

<p>ORMs are not good for everything.  ORMs by their nature are weaker at these tasks:</p>

<ul>
  <li>Reporting and summarization&#8201;&#8212;&#8201;ORMs are good at treating rows as objects.  What happens when a column is an aggregate?  In this case, raw SQL is much more convenient.  Aggregate APIs are often inflexible and complex.</li>
  <li>Anything involving fast startup&#8201;&#8212;&#8201;If the ORM queries the database for its schema at startup, there will be a lag before the ORM is ready.  This isn&#8217;t a problem for long-running processes like web servers, but it can be a burden for command-line scripts.</li>
</ul>

<h4><code>Class::ReluctantORM</code>&#8201;&#8212;&#8201;lots of fish in the sea</h4>

<p>If an ORM isn&#8217;t right for your project, <code>Class::ReluctantORM</code> won&#8217;t help you.  Even if an ORM is a good fit, keep in mind it is a slow-startup ORM.  There are others that are fast-startup, large-configuration ORMs, and even some that can cache their configuration.</p>

<h2>Conclusion</h2>

<p>For all their pitfalls, the tremendous productivity advantages of ORMs will continue to tempt developers to use them.  Like any productivity booster, ORMs seem to draw a lot of hype, and it&#8217;s important to see through the hype to the realities and shortcomings of the technology.  Once those shortcomings have been addressed, however, ORMs can be used conscientiously.</p>

<p><code>Class::ReluctantORM</code> is a new ORM implementation that seeks to make it harder to fall into the traps.  It avoids some &#8220;impedance mismatch&#8221; issues by narrowing its scope to the most common 90% of use cases.  For the more complex situations, numerous SQL bypass avenues are available.  Whether queries are ORM-generated or customized, they all pass through the query monitoring system, providing an early warning system for scalability problems.  Finally, mandatory prefetching can reduce bad coding practices early in the development cycle.</p>]]></content:encoded>
            <pubDate>Wed, 18 Mar 2009 14:49:49 GMT</pubDate>
        </item>
        <item>
            <title>Under the Hood</title>
            <link>http://omniti.com/seeds/under-the-hood</link>
            <guid>http://omniti.com/seeds/under-the-hood</guid>
            <description><![CDATA[My perspective on the evolution of OmniTI is somewhat like that of a mechanic on a team of race car designers. As the company changes and becomes more sophisticated, my job has been and still is to ensure we have all the necessary parts to accommodate ...]]></description>
            <content:encoded><![CDATA[<p>My perspective on the evolution of OmniTI is somewhat like that of a mechanic on a team of race car designers. As the company changes and becomes more sophisticated, my job has been and still is to ensure we have all the necessary parts to accommodate those changes and that they are incorporated into the new design. So while the techies are customizing the machine (the glamorous part of the job), I am busy working under the hood. So what changes have taken place over the years to keep the OmniTI racing machine on the track and way out in front?</p>

<p>In the beginning our headquarters was located in Theo&#8217;s house. It was very nice but small. We built out a secure data room there in order to satisfy the security requirements of one of our clients. It had plexiglass windows, rack space and a 200-pound metal door. We personally did the build out which was an onerous job, to say the least! It seems mind boggling to think that we now manage thousands of machines in datacenters all around the globe. From that office we upgraded to a suite of three executive offices in Calverton, Maryland. While staying there we located unfinished office space in Columbia and designed it to accommodate our particular work requirements. We were growing fast and needed office space for our additional staff as well as for meeting with clients.</p>

<p>We started with two people and within three years grew to a staff of four. Our clients required services and support 24/7 so the days were long, as were the nights. Holidays, weekends and vacations were commonly workdays, and working into the wee hours of the morning after a full workday was the norm. The work scaled with the increase of staff from 2 to 4 people, so the workload remained the same across the board. Then we reached a point where we were in complete overload. At that point the engine needed overhauling and refitting to stay in the race. So we interviewed and hired 4 new staff - 3 developers and 1 system administrator. This was a major redesign.</p>

<p>The addition of new staff meant we had to make some serious changes to the shop. We needed an HR department and benefits package that would be both attractive and competitive with other companies&#8202;&#8212;&#8202;yet another upgrade. We also had to have a more comprehensive employment contract and that meant having a labor lawyer to advise us. This was in addition to the corporate counsel who helped us craft our client contracts. The pit crew was growing!</p>

<p>After the move to the new site in Columbia, we immediately increased the size of the team. Before we knew it we were fifteen strong and still growing. During this time we formally defined our product initiatives Ecelerity, Postal Engine and MultiVIP and then proceeded to trademark them. We also began to do business as a separate entity called Message Systems. Now we had two race cars on the track!</p>

<p>As the staff grew we began to understand how instrumental our culture was to our success and started to truly nurture it. This was and remains a unique work atmosphere that permeates the operations of the entire workforce. What exactly is the OmniTI culture? Ask ten people on staff and you probably will get ten different answers. I, on the other hand, have been part of that culture from the get-go and can tell you that, however it is defined, it is the heart and soul of OmniTI and the fuel that feeds the machine. The culture is derived from work principals that were instilled at OmniTI&#8217;s inception. These include providing quality services to meet clients&#8217; needs as the number one priority; standing behind our work and being accountable for our mistakes; being passionate about our work; and using mindshare and brainstorming as working tools. Our office is designed specifically to enable and encourage this sort of interactive environment. As a result, the OmniTI culture has attracted some of the best and the brightest in the industry.</p>

<p>After the Columbia office came a second larger Columbia office, and an office down under the Manhattan Bridge (DUMBO) in Brooklyn, New York. We currently have a staff of 7 working in that office.</p>

<p>What about the crew you may ask? Over the years we have been fortunate to have a diverse staff representing a collection of ethnic and cultural backgrounds for which we are all richer. And our team has had many sponsors including trade associations, private industry, not-for-profits, political organizations, and government to name a few. As we zoom around the race track we also take time for the occasional pit stop by having pizza every Thursday, spring cookouts (with serious volley ball games), summer picnics and awesome holiday parties!</p>

<p>So as our race cars become more and more sophisticated they continue to require constant attention to maintain all the working parts. Each day brings new adjustments to the engines and frameworks to keep the motors fine-tuned and the goings smooth.</p>

<p>I&#8217;ll take this opportunity to share some helpful hints with the other mechanics out there:</p>

<ol>
<li>Stay organized and, despite this digital age, keep hard copies of everything.</li>
<li>Retain legal counsel that understands your business and earns your trust.</li>
<li>Set deadlines for everything.</li>
<li>If you are going to do something, always take the time to understand how to do it right. If you don&#8217;t have time to execute it right, take the time to document the corners you cut and how that is likely to bite you later.</li>
</ol>

<p>Zoom-zoom!</p>]]></content:encoded>
            <pubDate>Thu, 12 Mar 2009 01:48:03 GMT</pubDate>
        </item>
        <item>
            <title>Stacking the Deck for Publishers</title>
            <link>http://omniti.com/seeds/stacking-the-deck-for-publishers</link>
            <guid>http://omniti.com/seeds/stacking-the-deck-for-publishers</guid>
            <description><![CDATA[

Newspapers and magazines have a unique opportunity with online publishing. They have the best content. They have the most talented writers, editors, and managers. The industry has survived everything the world has thrown at it since newspapers first ...]]></description>
            <content:encoded><![CDATA[<img alt="Daily Planet" src="http://images.omniti.net/omniti.com/i/b/458-daily-planet.jpg"  />

<p class="first">Newspapers and magazines have a unique opportunity with online publishing. They have the best content. They have the most talented writers, editors, and managers. The industry has survived everything the world has thrown at it since newspapers first emerged in the 16th century. But, making the most of the Web requires specialist help. Whether publications have a dedicated digital team, or integrate print and web together, there is a distinct difference in how editorial, design, and production need to operate. On the one hand, the recent <a href="http://www.thestandard.com/news/2009/01/27/gatehouse-claims-victory-even-its-online-reputation-suffers"><span>furor about &#8220;deep linking&#8221;</span></a>&#8201;&#8212;&#8201;where the legality of sites linking to individual pages was finally put to rest&#8201;&#8212;&#8201;showed the fragility of trying to apply standards that are reasonable in print to the Web. On the other hand <em><a href="http://telepgraph.co.uk/"><span>The Telegraph</span></a></em> has shown with its in-house video and audio studios, evolutionary <a href="http://www.flickr.com/photos/lloyd-davis/425838238/"><span>news room</span></a> and <a href="http://advertising.telegraph.co.uk/"><span>integrated advertising solution</span></a>, that a <a href="http://www.journalism.co.uk/5/articles/531141.php"><span>radical content strategy</span></a> can <a href="http://www.journalism.co.uk/2/articles/531631.php"><span>pay</span></a> <a href="http://blogs.journalism.co.uk/editors/2007/10/04/telegraph-wins-top-aop-award-guardian-wins-three-others/"><span>dividends</span></a>. The whole approach for print media on the Web is evolving. It&#8217;s definitely a brave new world, but not everything has changed: content is still king!</p>

<h2>Doom, gloom, flourish!</h2>

<p>As some commentators, fueled by the news of falling readerships and economic woes, prematurely sound the death knell for print, <em><a href="http://newyorker.com/"><span>The New Yorker</span></a></em> published a beautifully researched article by <a href="http://www.history.fas.harvard.edu/people/faculty/lepore.php"><span>Jill Lepore</span></a> (in print and pixel) entitled <em><a href="http://www.newyorker.com/arts/critics/atlarge/2009/01/26/090126crat_atlarge_lepore"><span>Back Issues: The Day The Newspaper Died</span></a></em>. In it, we relive the last time the death of print loomed large in America: November 1st, 1765. On that day the <a href="http://en.wikipedia.org/wiki/Stamp_Act_1765"><span>Stamp Act</span></a> came into force. It required printers to affix a stamp to each of their pages, pay a halfpenny tax on each half sheet of paper, and a two shilling tax on each advertisement. Ostensibly, the tax was imposed in the &#8220;colonies&#8221; by the British Parliament to fund the French and Indian wars, but the backlash was severe. Printers were better placed than most to vent their frustrations. It has been argued that the Stamp Tax was one of the sparks that provoked revolution, bringing the issue of taxation without consent sharply into focus, with the ire and vitriol of printers like Benjamin Edes of <a href="http://www.loc.gov/rr/news/18th/140.html"><span><em>The Boston Gazette</em></span></a>, and Benjamin Franklin thrown in for good measure. They survived that calamity, and some newspapers like <em><a href="http://en.wikipedia.org/wiki/The_Hartford_Courant"><span>The Hartford Courant</span></a></em> are <a href="http://www.courant.com/"><span>still published</span></a> today both in print and pixel.</p>

<p>Jill Lapore&#8217;s article in January coincided neatly with the month the credit crunch came home to roost. It was looming, it was imminent, then suddenly it was here. About the same time we also heard that the Pulitzer Prize-winning <a href="http://www.nytimes.com/2008/10/29/business/media/29paper.html"><span><em>Christian Science Monitor</em> was going online-only</span></a>. The 27-year-old <a href="http://www.nytimes.com/2008/11/20/business/media/20mag.html"><span><em>PC Magazine</em> followed suit soon after</span></a>. Various teen magazines previously had made the switch, including <em><a href="http://www.missbehavemag.com/"><span>Missbehave</span></a></em>, <em><a href="http://www.cosmogirl.com/"><span>Cosmogirl</span></a></em>, <em><a href="http://ellegirl.elle.com/"><span>Ellegirl</span></a></em>, and <em><a href="http://www.teenmag.com/"><span>Teen</span></a></em>. In December and January other publications also went online-only, including <em><a href="http://www.impre.com/hoynyc/home.php"><span>Hoy Nueva York</span></a></em>, <em><a href="http://www.time.com/time/magazine/asia/"><span>AsiaWeek</span></a></em> (now redirecting to <em>Time</em>), and the <em><a href="http://www.kansascitykansan.com/"><span>Kansas City Kansan</span></a></em>.</p>

<p>Far from &#8220;closing&#8221;, as some sources seem to suggest, publications are <em>evolving</em> just as they always have. The shift is perhaps a little faster and more pronounced than previous changes, but only a few titles are swapping print production for online-only. The vast majority of newspapers and magazines are augmenting print with online production. How well the transition occurs is the rub. Many will struggle with a half-hearted approach; others, like <em><a href="http://thelegraph.co.uk/"><span>The Telegraph</span></a></em>, are embracing the change, investing and flourishing. It&#8217;s also worth affirming that a publication doesn&#8217;t have to appear on paper to be worthy, a fact validated by the Pulitzer Prize Board when they announced in December 2008 that they were <a href="http://www.pulitzer.org/new_eligibility_rules"><span>broadening the competition to allow online-only publications</span></a>.</p>

<h2>Content is almost enough</h2>

<p>Print publishers still have the most important advantage: great content. The Web sometimes can seem packed with titillating dross, but people don&#8217;t thirst just for a quick bit of light-hearted refreshment; they also hunger for the substantial. Satisfying both with a content strategy specifically for the Web, putting the right infrastructure in place to support it, and understanding the behavior of the Web&#8217;s audience are the keys to success.</p>
<p><a href="http://advertising.telegraph.co.uk/new%5Ftcuk/"><span><em>The Telegraph</em> is investing in user experience design</span></a> and using it to sell ad space. Get the user experience right and people will become subscribers and readers more readily than they ever have for print. The potential audience is global. The advertising revenue stream is global. The publishers who grasp the nuances of online publishing first will have the competitive edge to evolve into the first truly international news and feature sites. Those who invested early are already ahead: <em>The Guardian</em> <a href="http://www.journalism.co.uk/2/articles/530672.php"><span>launched in America in 2007</span></a>. The paper predicts its <a href="http://www.journalism.co.uk/2/articles/531498.php"><span>podcasts will be profitable by April this year</span></a>. Today, it has an almost equal split of readers, with a third from the UK, a third from the U.S., and a third from the rest of the world.</p>

<p>International audiences have very different requirements from content and advertising. Designing production and technical infrastructure that can deliver location-specific material is the future of truly international publications. The same digital content delivery channels (web site, RSS feeds, email, podcasts, and video) are still relevant. The material and style will respond to the context and the audience receiving it. Hot topics emerge much faster in the new era, and publications need the right technology in place to know what they are and be able to react. Some publishers are already providing location-specific content, and being very sophisticated in how they understand and serve their global audience. Advertisers are taking notice.</p>

<h2>&#8220;Web 2.0&#8221; and all that jazz</h2>

<p>Print publications provide an enacted narrative. Readers start at the cover, then flip, and read. They observe the story told by the publication in a pre-determined sequence, with only the words and images from the staff to tell the tale. In contrast, the narrative of the Web can be both enacted and emergent. The narrative emerges from both the published material of the professionals and the audience contributions. These contributions can be within the web site, on personal blogs aggregated by services like <a href="http://technorati.com/"><span>Technorati</span></a>, or on social networks like <a href="http://twitter.com/"><span>Twitter</span></a>. When <a href="http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html"><span>Tim O&#8217;Reilly coined the term &#8220;Web 2.0&#8221;</span></a>, user-generated content was at the core of his thoughts. Whatever we call it&#8201;&#8212;&#8201;user-generated content, Web 2.0, emergent narratives, or reader contributions&#8201;&#8212;&#8201;it requires an approach that is more sophisticated than just opening up a site to comments. There are many ways in which users can and should be able to reuse content, add their own, and participate. There are also many ways in which publciations can pull in content from around the Web to help them tell the tale. In fact, editing such content is a valuable service. You only have to look at the success of sites like <em><a href="http://www.newsvine.com/"><span>Newsvine</span></a></em> and <a href="http://ffffound.com/"><span>Ffffound</span></a> to see how citizen journalism can contribute to the industry. It&#8217;s not like the idea is new.  <cite><a href="http://en.wikipedia.org/wiki/James_Franklin_(printer)"><span>James Franklin</span></a></cite>, editor of <em><a href="http://en.wikipedia.org/wiki/The_New-England_Courant"><span>The New England Courant</span></a></em>, had this to say about his editorial policy just before American independence:</p>

<blockquote><p>I hereby invite all Men, who have leisure, Inclination and Ability, to speak their Minds with Freedom, Sense and Moderation, and their Pieces shall be welcome to a Place in <span class="end-quote">my Paper.</span></p></blockquote>

<h2>Connecting the dots</h2>

<p>The overall objective for newspapers or magazines remains the same as it&#8217;s ever been: a large audience to attract and retain high-paying advertisers. Meeting that objective online means combining the content with the best user experience. User experience is not a term found in the print world. Paper is a static medium. It&#8217;s a passive experience. Letters to the editor have been the traditional interaction of the audience with print publications. The Web is neither static or passive. It&#8217;s dynamic, with content being syndicated, read, and shared in new ways. It&#8217;s interactive, with content being augmented, reused and commented on as it&#8217;s published. Reader behavior has changed. Expectations have changed. People still want to passively read, but they also want to interact, republish, share, and comment&#8201;&#8212;&#8201;and they want these on demand. They want a different experience. It requires a different kind of strategy that understands the audience expectations and technology, and describes exactly how publications can use both to be successful.</p>

<h3>User experience design</h3>

<p>The first step is to have a clear set of business objectives in a reasonable timescale. Publications have to invest. How they invest is the big question. All the solutions are already available to bridge the gap between business objectives and audience behavior. Understanding audience behavior is the first step on the path to profitability. <em>User experience design</em> delivers just that. Rather than asking questions, it observes behavior. From that we can build a clear picture of how the audience experience can be optimized. That may involve more than just tweaking the design of the interface. It can encompass elements like content strategy: what is published, how it&#8217;s delivered to people, and how it&#8217;s written to achieve the business objectives.</p>

<h3>Web application development</h3>

<p>User experience design is nothing without the right applications to support publishing operations and deliver the content through the various channels. They should actively <em>help</em> journalists, editors, and managers do their job. Applications should make interaction for readers quick, easy and fun&#8201;&#8212;&#8201;an adventure. The software has to be secure. It has to scale well as (hopefully) increasing numbers of visitors find the great content, and keep coming back. The experience has to be fast, safe, and helpful. Anything less is a disservice to the reader in much the same way that badly printed text or art would be in print.</p>

<h3>Internet architecture and infrastructure</h3>

<p>Applications need machines to run on. That means intelligent technical architectures and infrastructure. If the audience is international then the infrastructure needs to be. That means multiple locations. It means twenty-four hour monitoring, often using <a href="https://labs.omniti.com/trac/reconnoiter"><span>tools written</span></a> <a href="https://labs.omniti.com/trac/zetaback"><span>specifically for the task</span></a>. It means rapid reaction times to fluctuations in traffic.</p>

<p>After our recent <a href="http://omniti.com/remembers/2009/two-webby-awards-for-national-geographic"><span>work with the award-winning National Geographic</span></a>, their readership went up by 500%. The applications and infrastructure behind the site also had to handle massive traffic spikes as stories spread virally around the Web. Having infrastructure that can perform when the spikes arrive is almost an art. It can get expensive, and as we all become more <a href="http://omniti.com/seeds/using-less-is-green"><span>conscious about environmental impact</span></a>, performance is the answer. <a href="http://friendster.com/"><span>Friendster</span></a>&#8217;s situation gave us a chance to show <a href="http://omniti.com/helps/friendster"><span>how to scale a site properly</span></a>. They were about to launch in China and predicted they would need twice as many servers to do so. Page loading times were already slow at 9 seconds. With a little help from us, they launched in China with the same number of servers they already had. They doubled the number of users to 60 million, but pages loaded more than twice as fast at 3.5 seconds.</p>

<h2>Stacking the deck</h2>

<p>As good as the content might be, or the perception of the brand in the real world, when <a href="http://www.upi.com/Odd_News/2008/12/05/Obamas_Zune_story_crashes_news_site/UPI-96001228524763/"><span>infrastructure or applications fail</span></a>, or are <a href="http://news.cnet.com/8301-1009_3-10041743-83.html"><span>hacked</span></a> or <a href="http://www.wired.com/politics/law/news/2003/03/58200"><span>threatened</span></a>, the brand can be irreparably harmed in the eyes of readers. How this happens is often straightforward: using different vendors for design, application development, and infrastructure turns the gaps between them into critical fault lines. Under the pressure of success or failure, the fault lines are amplified. For example, vendors have to communicate with each other, often having very different processes, and contractual obligations. Trying to fix a problem, innovate, or improve performance becomes expensive because each vendor only understands their specialist area. No single vendor can see the whole story to find the most efficient solution. While all this is going on, the experience fails and the audience falls away in frustration.</p>

<p>The business case for separating different production areas no longer exists. Business objectives are best met with a holistic approach to design, development, and infrastructure. They all fundamentally affect the user experience, which is the single most important factor affecting visitor numbers on the Web. Combining great content with holistic technology will save money and encourage innovation. Publishers who do it early and do it right will give their readers the best possible experience, and stack the deck in their favor for years to come.</p>]]></content:encoded>
            <pubDate>Thu, 12 Feb 2009 15:00:00 GMT</pubDate>
        </item>
        <item>
            <title>Custom Trending and the Benefits of Source Code Availability</title>
            <link>http://omniti.com/seeds/custom-trending-and-the-benefits-of-source-code-availability</link>
            <guid>http://omniti.com/seeds/custom-trending-and-the-benefits-of-source-code-availability</guid>
            <description><![CDATA[One of the self evident truths about system administration is that you need to
know what is going on with your systems. Monitoring - knowing that your
systems are working as expected and, more importantly, knowing when they
aren&#8217;t - is the thing ...]]></description>
            <content:encoded><![CDATA[<p>One of the self evident truths about system administration is that you need to
know what is going on with your systems. Monitoring - knowing that your
systems are working as expected and, more importantly, knowing when they
aren&#8217;t - is the thing most people consider first when they realize that fact.
Equally important however, is trending - knowing what your systems were doing
in the past. Trending allows you to determine if the current state of your
system is normal, or if something has changed that could signify a problem.
Trending also allows you to predict when your current systems will become
unable to handle what is expected of them. When the amount of traffic to your
website is about to outgrow your systems, you can see this and add more
capacity, either by adding servers or replacing them with something more
powerful, before the capacity problems start to occur.</p>

<p>At OmniTI, we use a number of systems for trending, including
<a href="http://www.cacti.net/"><span>Cacti</span></a> and our very own
<a href="https://labs.omniti.com/labs/reconnoiter"><span>Reconnoiter</span></a>. Each system comes
with a large number of monitors built in, allowing you to trend anything from
network traffic, to system load to disk space. Sometimes however, you need
metrics for which there is nothing currently available. This is where these
systems&#8217; extensibility comes into play.</p>

<p>The following example shows a situation where we needed information that our
current monitoring/trending systems were not able to provide,
and we needed to extend them with a custom trending solution:</p>

<p>One of our clients had a website that had become very popular, and was
suffering performance issues as a result. We suspected that at least part of
the system was I/O bound, and so we wanted to gather metrics on the I/O
performance of the system over time. The systems in question were running
Solaris 10 with the data on
<a href="http://www.sun.com/bigadmin/features/articles/zfs_overview.jsp"><span>ZFS</span></a>. The
normal <code>iostat</code> command, for which a number of monitors exist, does not give
true values for reads and writes performed by ZFS. Iostat can only see
read/write requests from filesystems. True I/O statistics can be obtained on
the command line by running the <code>zpool iostat</code> command. This works in a
similar way, producing output similar to the following:</p>

<pre><samp># zpool iostat rpool 10 5
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
rpool       16.6G  57.7G      1      0  43.5K  2.06K
rpool       16.6G  57.7G      0      0      0      0
rpool       16.6G  57.7G      0      0      0      0
rpool       16.6G  57.7G      0      0      0      0
rpool       16.6G  57.7G      0      0      0      0
</samp></pre>

<p>The statement above that our monitoring systems were not able to provide the
information that we needed isn&#8217;t quite true. Elsewhere, we had a monitor that
obtained zpool I/O statistics using a long running <code>zpool iostat</code> process,
taking the values out and entering them into a database, with a custom script
that fetched the values from the database and entered them into cacti. The
system in question was a database server, so this method, while clunky, worked
well enough for its purpose. For monitoring the web servers however, using the
same method just wasn&#8217;t practical and we needed something better. We needed
something that didn&#8217;t require running a long running process and running a
database server on the machine just for trending information.</p>

<p>The obvious choice here was to use <abbr title="Simple Network Management Protocol">SNMP</abbr>. Cacti (as well as pretty much every
monitoring/trending package) has built-in support for obtaining data over
SNMP, and net-snmp (the snmp agent in use on the server) has various ways of
extending functionality to get custom metrics.</p>

<p>Having chosen SNMP, the next decision was how to get the data we needed and
present it over SNMP. The seemingly obvious choice would be to run <code>zpool
iostat</code> and parse the output as was done previously, presenting those values
over SNMP. However, that either requires the long running <code>zpool iostat</code>
process, or running it once for a few seconds at a time to get a snapshot of
the I/O over that period, which will lead to inaccurate results (it won&#8217;t tell
us anything about the performance of the system during the time between
checks). One of the things that Cacti (or rather rrdtool, which cacti makes
use of) is very good at is taking raw data and generating meaningful
statistics from it. If we could somehow get raw I/O values rather than
already aggregated values such as 'n KB over the past m seconds read' and pass
those to cacti, then cacti could do the work and we would get accurate values.</p>

<p>Enter open source. The source code to OpenSolaris is available, including the
<a href="http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/zpool/zpool_main.c#1863"><span>source code to the zpool command</span></a>,
which it possible to see how the <code>zpool iostat</code> command itself worked.  Once
you trace the various calls made by that function, it turns out that
underneath, the <code>zpool iostat</code> command uses libzfs to fetch the exact raw
values we are looking for. It was then a relatively simple matter to take that
code and print out the raw values:</p>

<pre><code>#include &#60;stdio.h&#62;
#include &#60;sys/fs/zfs.h&#62;
#include &#60;libzfs.h&#62;

/*
 * Sample code to demonstrate printing of raw zpool io stats.
 * Compile with: cc -lzfs -lnvpair zpoolio.c -o zpoolio
 */

int print_stats(zpool_handle_t *zhp, void *data) {
    uint_t c;
    boolean_t missing;

    nvlist_t *nv, *config;
    vdev_stat_t *vs;

    if (zpool_refresh_stats(zhp, &#38;missing) != 0)
        return (1);

    config = zpool_get_config(zhp, NULL);

    if (nvlist_lookup_nvlist(config, 
        ZPOOL_CONFIG_VDEV_TREE, &#38;nv) != 0) {
        return 2;
    }

    if (nvlist_lookup_uint64_array(nv, 
        ZPOOL_CONFIG_STATS, (uint64_t **)&#38;vs, &#38;c) != 0) {
        return 3;
    }

    printf(
        "pool:%s read_ops:%llu write_ops:%llu " \
            "read_bps:%llu write_bps:%llu\n",
        zpool_get_name(zhp),
        vs->vs_ops[ZIO_TYPE_READ],
        vs->vs_ops[ZIO_TYPE_WRITE],
        vs->vs_bytes[ZIO_TYPE_READ],
        vs->vs_bytes[ZIO_TYPE_WRITE]
    );
    return 0;
}

int main() {
    libzfs_handle_t *g_zfs;
    g_zfs = libzfs_init();
    return(zpool_iter(g_zfs, print_stats, NULL));
}
</code></pre>

<p>Once this was done, the next step was to get the values exported over SNMP so
that cacti could view them. Net-SNMP has a <code>pass</code> directive that allows you to
delegate an OID to an external program, and have that program print out the
results it needs. These values are then exported over SNMP, available for any
of the above monitoring tools to make use of.</p>

<p>In Cacti, it was then just a matter of creating an appropriate SNMP Data
Query, adding some Graph Templates, and wait for the pretty pictures to come
flowing in.</p>

<p>This example shows how you might approach developing code to obtain custom
metrics, and shows the benefits of having the source code available so that
you can learn from tools that do most, but not all of what you are trying to
achieve. Sometimes, what you need just isn&#8217;t available and you just have to build a solution from the available pieces.</p>

]]></content:encoded>
            <pubDate>Tue, 10 Feb 2009 20:53:17 GMT</pubDate>
        </item>
        <item>
            <title>Increasing the Aperture on Security</title>
            <link>http://omniti.com/seeds/increasing-the-aperture-on-security</link>
            <guid>http://omniti.com/seeds/increasing-the-aperture-on-security</guid>
            <description><![CDATA[Security is good. Security is necessary. Security is someone else&#8217;s
concern. Security is for our CISSP engineers to focus on in their dimly lit
rooms with their lava lamps and empty Red Bulls stacked to the
heavens. It&#8217;s an ugly business, a...]]></description>
            <content:encoded><![CDATA[<p>Security is good. Security is necessary. Security is someone else&#8217;s
concern. Security is for our <abbr
title="Certified Information Systems Security
Professional">CISSP</abbr> engineers to focus on in their dimly lit
rooms with their lava lamps and empty Red Bulls stacked to the
heavens. It&#8217;s an ugly business, and making sense of it requires
professionals with years of experience and dog-eared certificates. It
has become an industry unto itself and frightens politicians to adopt
far-reaching policies that stoke the smoldering furnace of paranoiac
innovation. We attend Hacker conferences and Security summits to
behold the latest zero-day vulnerability and poke fingers at the
developers who fail to create secure software.</p>

<p>We are Systems Administrators. We are Web Developers. We are
Database Engineers and Storage Architects and Pointy-Haired Bosses
with hard copies of <a href="http://www.schneier.com/blog/"><span><span>Schneier&#8217;s
blog</span></span></a> littering our desks. We&#8217;ve been told before that Security is
our mandate, but to what end? We have heeded the vendor patch
announcements and updated our servers. We comply
with <a href="https://www.pcisecuritystandards.org/"><span>PCI</span></a>
requirements and pass the Nessus scans. We&#8217;ve studied the
latest <a href="http://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project"><span>OWASP</span></a>
Top 10 list and applied its principles to combat the cross-site
scripting attacks and SQL injections. Haven&#8217;t we? Sure we have! Or
have we? Have I, as a Systems Administrator, considered the
implications of the AJAX interface written by the dev team? Has the
<abbr title="Database Administrator">DBA</abbr> considered the impact
of an exploit in the <code>chroot</code>ed webserver?  Have our PHP developers been
in touch with the <abbr title="Storage Area Network">SAN</abbr>
administrator to ensure that he has the capacity to withstand
unlimited file uploads, and what effect that might have on our
encrypted volume?</p>

<p>A secure infrastructure is not a zero-sum game. While applications
on the Internet have become more complex and availability has
increased, so have the attack vectors and their impact on the rest of
the application stack. We&#8217;ve long recognized that an <abbr
title="Open Systems Interconnection">OSI</abbr>-centric approach to
security is a losing proposition. Firewalls are no longer considered a
panacea for network attacks. Modern intrusions exercise multiple
layers of the defense perimeter. Engineering secure applications
becomes akin to a round
of <a href="http://www.hasbro.com/jenga/"><span>Jenga</span></a>; how many pieces
can we lose before the structure collapses? A cohesive approach to Web
Application Security mandates a holistic approach across the entire
engineering organization. But how many of us think beyond the edges of
the envelope that defines our professional skills and aptitude?</p>

<p>Most of us gain our skills through a function-oriented approach. We
learn repeatable steps that culminate in the desired result. Often
this includes a period of trial-and-error where we assimilate aspects
of the misstep and adapt to avoid the failure condition. This is only
natural and is reinforced by our trainers or educators in order to
attain the prescribed goals. However, these tactics can also be used
to seek out the stress points within a system. When you increase your
knowledge of the entire application stack, you reveal "opportunities
for efficiency" by understanding the relationship of each component to
the whole.</p>

<p>Unfortunately, the churn of modern software development fosters a
"feature-first" mentality. We&#8217;re all familiar with the
process. Marketing or Project Management determines the feature set
that will drive purchases and upgrades for the client base. Deadlines
are aligned with revenue cycles rather than product maturity. In the
end, Developers feel the squeeze from Project Managers focused on
their calendars and from Customers, frustrated by another release as
unwitting QA test subjects. Engineers are forced into specialization,
becoming a cog with a narrowed focus. Our aperture begins to close,
decreasing our exposure to the application stack and impeding
interoperability with other project teams and departments. Although
we&#8217;ve entered the Information Age, popular software engineering
practices are still rooted in the assembly line mentality.</p>

<img alt="security across the stack" src="http://images.omniti.net/omniti.com/i/b/split-arch-v-3.jpg" style="float: right; width: 230px; height: 288px; margin: 0 0 1em 1em; border: 0;" />

<p>The birth of secure software requires more than a commitment to
correct code and elegant program design. A sort of Renaissance man is
needed, a polymath who familiarizes himself (or herself) with the
belts and pulleys of neighboring components. Someone who has a passion
for the whole system. Orthogonal studies should become a desired
trait, not a distraction. Database Administrators, Network Engineers,
Java Developers and Systems Administrators working in harmony.</p>

<p>Hackers, in the acceptable sense, are a strange breed. By
definition we enjoy the art of deconstruction. We want to know what
makes things tick. Perhaps it&#8217;s something in our biological blueprint
that drives our thirst for knowledge. We have an insatiable curiosity
for understanding the misunderstood or unknown. An expensive hobby,
for certain. Whether that price comes in the form of relationships,
money or spare parts, it matters not. The knowledge of how that system
works is compelling enough to hold our attention for hours and days
and weeks and years.</p>

<p>Or maybe we just like breaking stuff.</p>

<p>After an attack, it&#8217;s only natural to wonder what motivated the
attacker to focus on us. We comb through the evidence looking for the
exposed stress points. But to focus on the bugs is to ignore the
problem and merely serves to reinforce the broken processes. The same
vulnerability will crawl out of a different hole next time. It will
wait for the next hacker. And they will come. The hacker has already
created new tools to make it easier next time. The hacker is
generous. He likes to share his toys with his hacker friends.</p>

<p>Engineering teams with a foundation in "whole system" design stand
a better chance of resisting and recovering from attacks. By studying
different layers of the application stack they gain an understanding
of the operational complexities and attack vectors. They can predict
vulnerabilities in the design and planning phases. They can isolate
exploits faster and pinpoint failures in unfamiliar regions. New code
becomes inoculated by the changes in philosophy. Junior programmers
and administrators pass these principles on to their peers. A new
Renaissance begins.</p>
]]></content:encoded>
            <pubDate>Wed, 04 Feb 2009 21:00:00 GMT</pubDate>
        </item>
        <item>
            <title>Embracing Failure to Rise Above Enterprise-Class Thinking</title>
            <link>http://omniti.com/seeds/embracing-failure-to-rise-above-enterprise-class-thinking</link>
            <guid>http://omniti.com/seeds/embracing-failure-to-rise-above-enterprise-class-thinking</guid>
            <description><![CDATA[Failures in technical
systems are inevitable.  Drives die, network interfaces wink out,
backhoes take out cross-country backbones, data rooms flood. The effects of such failures range from minor inconveniences to
crippling outages, but thoughtful plann...]]></description>
            <content:encoded><![CDATA[<p>Failures in technical
systems are inevitable.  Drives die, network interfaces wink out,
backhoes take out cross-country backbones, <a href="http://www.youtube.com/watch?v=t0gBReKskXQ"><span>data rooms flood</span></a>. The effects of such failures range from minor inconveniences to
crippling outages, but thoughtful planning can greatly increase the
possibility that the next failure will be the former instead of the
latter.  The fact is that 100% uptime for 100% of users is an
unrealistic goal.  Creating an information technology infrastructure that expects
failures and minimizes user exposure to those failures is critical to
preserving continuity of service to the majority of users.  This is the point of transcendence into carrier-class thinking.</p>

<p>Military planners always
factor in casualties when deciding on a plan of action.  The failure
of an individual component (a soldier, a tank, an airplane) is
expected, but the overall goal will still be achieved.  Many
enterprises do excellent risk management in their business operations
but fail to apply those same principles to their IT infrastructure. 
The mantra of smart investing is diversification; likewise, in the
insurance industry, the goal is to spread the company&#8217;s risk among a
wide population, only a few of whom will actually make a claim in a
given year.  Yet when it comes to IT planning, all the eggs go into
one large (often expensive) basket.  No amount of money can ensure
that a single point of failure will never fail.  That money would be
better spent on engineering around failures, to design systems that
fail gracefully, or that at least fail only partially, limiting the
damage to some subset of users.  Put another way, plan for the
failure of the most critical piece of infrastructure and engineer
service continuity despite that failure.</p>

<p>I like stories.  As a
systems administrator for more than 10 years, I have my fair share of
them, both good and bad, and the really memorable ones have valuable
lessons to teach us about how to construct systems that allow most
users to see little or no interruption to their service.</p>

<p>In my <abbr>ISP</abbr> (Internet Service Provider) days, the company
I worked for had one physical server hosting email.  It was a
relatively large, expensive UNIX server, but it got the job done and
had impressive reliability compared to <abbr title="Personal Computer">PC</abbr> hardware of the day.  As the ISP business expanded, the demand for email grew beyond what the
server could handle, and when the inevitable outage occurred, it
affected every single mail user in the system.  The solution was, of
course, to get more servers.  It was not cost-effective to grow with
more big UNIX servers, due to a number of factors such as rack space
and power, not to mention the capital investment.  We needed more
(and smaller) servers to store the mail, to both absorb our growing
capacity of users and to reduce the impact of an individual server
going down.  The system we came up with decoupled mail routing from
mail delivery and mailbox access.  This enabled us to deploy
lightweight <abbr>MX</abbr> (Mail Exchanger) servers that didn&#8217;t need much in the way of local
storage, as all incoming mail was delivered to some other host.  The
MX servers were behind a load balancer, so we could scale them
horizontally as required to keep up with demand.  The mail storage
hosts had more local storage, utilizing <abbr>RAID</abbr> (Redundant Array of Inexpensive Disks) to survive disk
failures, and had standby hosts to which all mailbox data was
replicated in case of host failure.  Gluing it all together was a set
of proxy hosts backed by <abbr>LDAP</abbr> (Lightweight Directory Access Protocol) to locate users&#8217; mail
storage host and handle mailbox access.  The directory service was
also used by the MX hosts for inbound delivery, to locate the
appropriate storage host.  Users connected to the proxies instead of
directly to their mail storage host.  We could do quick maintenance
or handle short outages without most customers ever realizing there
was a problem.  For example, <abbr>POP</abbr> (Post Office Protocol) clients checking for new mail would
be given a "no new mail" response when the backing store was
unavailable.    This architecture was much more resilient to
failures, and in the event of a failure (or even a maintenance
event), the existence of the proxy between users and the actual
server allowed us to reduce the users&#8217; exposure to the problem.</p>

<p>The next illustrative story
comes from a client who operates a large email infrastructure
supporting millions of users.  Their mail storage sits on a <abbr>SAN</abbr> (Storage Area Network), implemented on
three expensive, vertically-integrated systems from a major vendor and interconnected on a costly Fibre Channel switching fabric (which is, as ZFS author <a href="http://blogs.sun.com/bonwick/entry/zfs_end_to_end_data"><span>Jeff Bonwick</span></a> puts it, "a network designed by disk firmware writers. God help you.")  The result is a very high ratio of spindles to control
units, so when there is a problem with one unit, that problem affects
one-third of their customers, which could run well over several
million users.  That&#8217;s a lot of eggs in one basket.  The price of the
basket does not guarantee an absence of problems-- the redundant
control heads <i>must</i> run the same firmware version, so a
firmware bug will wipe out both of them.  The cost of the storage
platform is sufficiently high that scaling horizontally becomes
prohibitively expensive, and doesn&#8217;t go very far to address the
spindle-to-control-unit ratio.  What they need is a fundamental shift
in storage planning.  More and cheaper baskets
to hold fewer eggs each, so fewer eggs are lost when a basket fails. 
In this case the baskets are commodity servers and direct-attach
storage running free software and exporting block devices over <abbr>iSCSI</abbr> (Internet Small Computer Systems Interface)
to the servers handling client connections.  For the cost of one of
the vendor-supplied storage systems, we get nine new storage nodes,
each with redundant control heads and data storage.  These nine nodes
provide the same amount of usable space as the three old units, and
have capacity to spare.  The cost savings enables more nodes to be
purchased, and facilitates horizontal scaling to meet demand.  Additional cost savings are realized on the interconnects, which can be standard 10Gb Ethernet.  The
larger number of nodes means a three-fold decrease in the number of
users exposed to a node failure, and future scaling only decreases
this number further.</p>

<p>Turning away from email, my
final story covers data warehousing for a large, web-focused marketing
company.  Their <abbr>OLTP</abbr> (Online Transaction Processing) database that backs the
website runs on <a href="http://www.oracle.com/"><span>Oracle</span></a>.  They need a separate place to run
intensive data-mining queries and transformations that are not appropriate for the
OLTP system, for which the typical solution is an <a href="http://en.wikipedia.org/wiki/Operational_data_store"><span>Operational Data Store</span></a> (<abbr>ODS</abbr>), a type of data warehouse.  Initially this was another Oracle instance on a single server.  When the size
of the dataset grew beyond the capacity of the server, a decision
had to be made.  A server with enough memory and CPU power to
handle the load would have exceeded the Oracle product license, but
purchasing additional licenses was cost-prohibitive.  The solution was
two-fold: convert the ODS to the open source <a href="http://www.postgresql.org/"><span>PostgreSQL</span></a>
server, and put it on two systems instead of one.  The conversion to
PostgreSQL is outside the scope of this article, but the decision to use
two servers provides several distinct advantages.  First, they are
not set up as master/slave, which keeps the setup simple.  They both replicate from Oracle in
parallel, having no awareness of one another.  This works fine since
the data-mining queries are essentially read-only (some jobs do data transformations, but they operate on temporary tables.)  Second, both systems are fast enough to handle
the entire operational load, so if one system is down, all its jobs can be
shifted to the other with no degradation of service to users.  Third,
upgrades to PostgreSQL can be tested with live data without disrupting
service, as jobs can again be shifted away from the instance being
upgraded.  Under normal circumstances, both servers are used for
production work, yielding the best return on investment.</p>

<p>These stories illustrate
the advantages of expecting failure and engineering around it to
create robust internet architectures.  Failure is inevitable, but
dire consequences need not be.</p>
]]></content:encoded>
            <pubDate>Tue, 27 Jan 2009 15:04:14 GMT</pubDate>
        </item>
        <item>
            <title>The Irony of Sun Database Technology</title>
            <link>http://omniti.com/seeds/the-irony-of-sun-database-technology</link>
            <guid>http://omniti.com/seeds/the-irony-of-sun-database-technology</guid>
            <description><![CDATA[It&#8217;s been just over a year since Sun announced it had agreed to purchase MySQL, the ever popular open source database technology. At the time most people saw the move as a way for Sun to make it&#8217;s way into the internet space, where MySQL ha...]]></description>
            <content:encoded><![CDATA[<p>It&#8217;s been just over a year since <a href="http://http//www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2008/01/17/BU77UGDVT.DTL&#38;type=tech"><span>Sun announced it had agreed to purchase MySQL</span></a>, the ever popular open source database technology. At the time most people saw the move as a way for <a href="http://sun.com/"><span>Sun</span></a> to make it&#8217;s way into the internet space, where <a href="http://mysql.com/"><span>MySQL</span></a> has made a lot of in-roads. There was also a thought that Sun might be able to help MySQL overcome some of the technical hurdles it faced when moving into enterprise level usage.</p>

<p>So after one year, what has this marriage produced? Well, they finally pushed out the 5.1 release which had been stuck in development for a couple years, although it didn&#8217;t include any major changes from what was on the road-map before the Sun purchase. They also <a href="http://www.techcrunchit.com/2008/07/23/new-mysql-fork-turns-back-the-clock/"><span>pushed out a fork of the code base</span></a>, which took the interesting step of removing features, rather than adding the enterprise technology many people were looking for, leaving Sun a bill for $1 Billion dollars but without an "Enterprise" database to call it&#8217;s own. The irony of all of this is that, even before the MySQL purchase, Sun already had a product containing technologies similar to today&#8217;s leading commercial database, it&#8217;s just that the technology lives in a file system, specifically <abbr title="Zettabyte Filesystem">ZFS</abbr> (Zettabyte Filesystem).</p>

<p>One of the basic tenets of a database system is that you can guarantee that data is safe on disk, and generally that any database will give you a chance to throw away changes if you need to. In the database world, you know this as <code>COMMIT</code> and <code>ROLLBACK</code>, common operations to most people, although missing from the <a href="http://en.wikipedia.org/wiki/MyISAM"><span>MYISAM</span></a> technology that Sun purchased from MySQL. In the ZFS world, while not implemented the same way as in a database, these ideas are embodied in the commands <code>zfs snapshot</code> and <code>zfs rollback</code>. Both of the commands work with active data partitions, and work so well that you can use them as protection in large batch command style operations against MYISAM; simply <code>zfs snapshot</code> your system before hand, run your large MYISAM command, and then <code>zfs rollback</code> afterwards if you find you need to go back. </p>

<p>Of course what good is a system, database or any other, if you cannot back it up? The back-up process for MySQL is straightforward, although it&#8217;s use of <code>LOCK TABLES</code> makes it a second-rate solution at best. Consider with any sufficiently large system, <code>LOCK TABLES</code> will keep you from providing five nines uptime almost by definition. ZFS on the other hand gives you the ability to make backups with ease. Once you have a snapshot of your system, the ability to clone, promote, or send a snapshot gives you quite a bit of flexibility for backing up your system, and it can all be done on-line.</p>

<p>But it gets even better really. One of the things that many databases deal with is caching data files from the file system, using some algorithm to determine what should be kept in memory. In MySQL, the database only caches index files (data files themselves are left to the OS to handle), and it does so using a simple <abbr>LRU</abbr> (least recently used) cache; a caching mechanism where the least recently used data is purged whenever new data requests are made. Again, ZFS contains something more sophisticated. ZFS uses something known as an <abbr>ARC</abbr> (adaptive replacement cache), which improves upon the LRU idea by keeping track of not just how recently something is used, but also how frequently it is used. Again the nature of work being done makes for different specifics in implementation, but other database have looked at and implemented ARC systems and seen significant improvements over the LRU method.</p>

<p>And still there are other examples, take the <a href="http://blogs.sun.com/perrin/entry/the_lumberjack"><span>ZFS intent log</span></a>. The ZFS intent log is used by ZFS to gather systems calls in memory and log them, both for purposes of performance; system calls can be aggregated together before execution; and crash recovery; in the event of a crash, ZFS can examine the log and replay any system calls that did not finish execution. Of course those familiar with databases will recognize this approach, as it is commonly implemented as a transaction log within database systems, for much the same reasons; commits to the database can be aggregated for performance, and in the event of a system crash, the commit log can be replayed to ensure all committed transaction made it to disk. Unfortunately, MYISAM, the storage engine owned by Sun, does not get these benefits.</p>

<p>Now, we must say that MySQL has been around for some time, so it&#8217;s users have gone through the trouble of finding workarounds to the lack of functionality we&#8217;ve been seeing in ZFS. Luckily Oracle provides a storage engine for MySQL, known as <a href="http://www.innodb.com/"><span>InnoDB</span></a>, which implements much of the features discussed above. Also MySQL has simplified replication support built in, which allows for users to set up multiple copies of the database without significant effort. In fact, these techniques are encouraged, as you can use the slave database system for taking backups or for crash recovery in case of loss of the primary node. What we think is often overlooked is that here, the database, which should be a model of data integrity and robustness, gives you workarounds and tools like <a href="http://dev.mysql.com/doc/refman/5.0/en/repair.html"><span><code>CHECK</code> and <code>REPAIR</code></span></a>, while the filesystem, what you typically expect your database to protect you from, is so carefully designed in ZFS to ensure data integrity, that <code>CHECK</code>/<code>REPAIR</code> are unnecessary.</p>

<p>Unfortunately the cynic in us has to wonder if we will ever see some of the more sophisticated ideas from ZFS make their way into MySQL. After all, since the current workarounds tend to require running multiple instances, and Sun is in the business of selling hardware (either multiple servers, or servers large enough to house multiple virtual servers, take your pick), keeping things status quo creates a nice relationship between these two divisions of the company. Given that, maybe there is no irony at after all.</p>

]]></content:encoded>
            <pubDate>Thu, 22 Jan 2009 15:39:51 GMT</pubDate>
        </item>
        <item>
            <title>Using Less is Green</title>
            <link>http://omniti.com/seeds/using-less-is-green</link>
            <guid>http://omniti.com/seeds/using-less-is-green</guid>
            <description><![CDATA[Every time I hear about green computing I feel like there is a gap&#8201;&#8212;&#8201;an enormous gap. The same thing is true in most conservation efforts I witness:


I see grand plans to make it easier and cheaper to produce foods, but no trend to a...]]></description>
            <content:encoded><![CDATA[<p>Every time I hear about green computing I feel like there is a gap&#8201;&#8212;&#8201;an enormous gap. The same thing is true in most conservation efforts I witness:</p>

<ul>
<li>I see grand plans to make it easier and <a href="http://www.borealisgroup.com/industry-solutions/base-chemicals/plant-nutrients/precision-farming/"><span>cheaper</span></a> to produce foods, but <a href="http://www.healthatoz.com/healthatoz/Atoz/common/standard/transform.jsp?requestURI=/healthatoz/Atoz/dc/caz/nutr/obes/alert08032004.jsp"><span>no trend to actually eat less</span></a>;</li>
<li>companies make hybrid vehicles and the consumers flock to them <a href="http://www.newcarpark.com/blog/?p=68"><span>without regard for the environmental manufacturing costs</span></a> and even as these issues are solved and hybrids have less per-mile environmental impact people will still drive too much;</li>
<li>the fast adoption of <a href="http://www.consumersearch.com/light-bulbs/compact-fluorescent-light-bulbs"><span>florescent light bulbs</span></a> (and <a href="http://www.consumersearch.com/light-bulbs/led-light-bulbs"><span>now LED</span></a>) and yet people still leave their lights on when they don&#8217;t need them.</li>
</ul>

<p>I tend to argue a point where I believe my point is right and the alternative is wrong. In this unique case, I find that the alternative is right, but just not "right enough." We could do better. We should do better. Think complete.</p>

<h2>Green Computing through Hardware Optimization</h2>

<p>So much focus is placed on making equipment (processors, ram, storage) more energy efficient that people are losing sight of the bigger picture. Energy efficient equipment is certainly one piece of the puzzle. Unfortunately, too many people see that one piece as a <cite lang="fr">fait accompli</cite> in their energy conservation efforts.</p>

<p>At <a href="http://omniti.com/"><span>OmniTI</span></a> we&#8217;re always careful about fully understanding the power profile of the hardware we install. We are conservative and look for the most power efficient machines we can find that still meet our architectural requirements (which can vary wildly from component to component). Everyone should do this. IBM and HP and Intel are all telling you that you should do it and that they can help. Do it. Let them. But please, don&#8217;t stop there.</p>

<h2>Green Computing through Virtualization</h2>

<p>The next step that is popular in the efforts to save your wallet (and the planet) is consolidation. This is the philosophy that one of today&#8217;s machine is powerful enough to accomplish the goals of many of yesteryear&#8217;s machines. So, virtualize! Take the old machines, turn them into virtual servers and run them on one machine today. Virtualization (of one type or another) has many advantages including: ease of management, simplistic disaster recovery, flexibility in technology selection, shorter provisioning times and the opportunity for consolidation.</p>

<p>Many of our engineers run <a href="http://www.virtualbox.org/"><span>VirtualBox</span></a> or <a href="http://vmware.com/"><span>VMWare</span></a> to quickly launch the platform of their choice. They are allocated one machine each, so they only have the opportunity to use a certain number of watts. Virtualization makes their job a bit faster and a bit easier despite the user experience being ever-so-slightly slower than running native. This use of virtualization does not reduce energy consumption in any significant way though it does increase individual productivity.</p>

<p>We have development environments that are managed by the operations team here that must resemble (as closely as is economically feasible) the production environment to which they deploy. We have many of these and they are all distinct, but not heavily loaded. It is feasible that consolidation could be used in this approach. Our actual situation is that we have to operate 40 isolated development environment. We do this on&#8230;</p>

<ul>
<li>Two $2300 1U machines</li>
<li><a href="http://en.wikipedia.org/wiki/Solaris_Containers"><span>Solaris Containers (Zones)</span></a> as the lightweight virtualization technology</li>
<li>at about 200W run rate, which results in about 3.5 MW-hours per year</li>
</ul>

<p>If you considered the alternative naive implementation:</p>

<ul>
<li>40 1U machines</li>
<li>at about 180W run rate, which results in about 63.1 MW-hours per year.</li>
</ul>

<p>We realize a savings of 59.6 megawatts. Wow! Now, that is an utterly naive method. Instead, let&#8217;s look at a popular method like VMWare ESX:</p>

<p>To run 40 VMWare instances&#8230;</p>

<ul>
<li>I need some substantially bigger hardware at 2GB of RAM per instance (Solaris containers and other similar technologies have some memory sharing efficiencies).</li>
<li>We only have 40 instances here, so going the blade center route seems less compelling.</li>
<li>An <a href="http://www-03.ibm.com/systems/x/hardware/rack/x3650/index.html"><span>IBM x3650</span></a> should be able to manage <a href="http://www.google.com/search?q=IBM+vmware+sizing+guide&amp;ie=utf-8&amp;oe=utf-8&amp;aq=t&amp;rls=org.mozilla:en-US:official&amp;client=firefox-a"><span>about six instances</span></a> (which aren&#8217;t peak and can afford some occasional performance degradation).</li>
<li>Seven of these at 230W each we burn 14.1 MW-hours per year.</li>
<li>This assumes you use local storage. If you need a SAN, you&#8217;ll have to add that into the power profile too.</li>
</ul>

<p>One can say they burn 14 megawatts per year instead of 63! But to me, burning 3.5MW is even better. Now, for those financially responsible types, I&#8217;ve only spoken to recurring operational costs. If you run the numbers on initial capital investment you&#8217;ll see an even more significant savings by simply choosing the right tool for the job (between $80k and $100k by our internal calculations).</p>

<p>This isn&#8217;t to say that you should never use VMWare or a similar heavy-weight virtualization technology. Those technologies afford you specific advantages (like the ability to run entirely different operating systems in each instance). You could also consider something slightly lighter-weight like <a href="http://xen.org/"><span>Xen</span></a>. But, if you find that your virtualization requirements on Solaris will fit in the Containers model (or your Linux needs would be satisfied by <a href="http://wiki.openvz.org/Main_Page"><span>OpenVZ</span></a>) you stand to gain a lot. We only have 40 instances, and the choice saved us 10 megawatts over the next best virtualization solution. Imagine if you had 1000.</p>

<p>These concepts are not likely to be foreign to any reader. Most people have considered virtualization approaches along with hardware replacement to reduce energy costs. But please, don&#8217;t stop there.</p>

<h2>Green Computing through Performance Optimization</h2>

<p>When I look to virtualization technologies for consolidation, there is one requirement&#8201;&#8212;&#8201;a single machine has enough horsepower to power more than a single virtual instance. At OmniTI we deal with some large Internet architectures that serve millions upon millions of people. The bottom line is, I can completely saturate any piece of hardware you give me. There is no opportunity for consolidation in many of these architectures. The awful thing is that I see people choose hardware that is more energy efficient and simply leave it at that. The logical conclusion everyone has arrived at is: "if I can get the same CPU cycles and I/O operations for less watts, I win!" Yes, you win. No, this is not the conclusion of anything. It is the beginning. I hope your ultimate goal is not to spend CPU cycles, it is to service users. The obvious progression from here is: "if I can serve the same number of users with less CPU cycles and I/O operations, I win!" Now we&#8217;re getting some where. That statement starts with the end in mind. This is the land of performance optimization.</p>

<p>I usually try to explain concept through metaphors and analogies, but this multi-resolutioned efficiency concept was a hard one to translate. So hard, that I&#8217;m at a loss. Those who know me well, will say: "Theo without a clever analogy at hand?! That&#8217;s like Denis Leary without a vulgar rant." Alas, I&#8217;ll just give some examples.</p>

<ul>
<li>We increased both the functionality and the performance in <a href="https://labs.omniti.com/trac/fastxsl"><span>core XSLT technologies</span></a> for <a href="http://www.friendster.com/"><span>Friendster</span></a> and we enabled them increase system performance by a factor of over 2.5. That translates to 60% less hardware or 2.5 times as many users. Armed with that, they chose to <a href="http://news.cnet.com/8301-13577_3-9783671-36.html"><span>enter China</span></a>.</li>
<li>We developed a purpose-built content publishing system for <a href="http://ngm.nationalgeographic.com/"><span>National Geographic Magazine</span></a> and were able to deploy an infrastructure of less than half the size (less than half the power) of the leading competitive offering. This architecture was able to sustain several prolonged front-page exposures on <a href="http://msn.com/"><span>msn.com</span></a>&#8201;&#8212;&#8201;delivering, at peak, as many as 3000 <i>new</i> visitors per second.</li>
<li>We developed the <a href="http://messagesystems.com/"><span>Message Systems MTA</span></a> that helps the largest of the large ISPs handle incoming mail volume with as much as 80% reduction in infrastructure when replacing competing commercial incumbents and as much as 95% reduction when replacing open source incumbents.</li>
</ul>

<p>The goal is to get where you are going while spending less. Less of what? Less money, less power, less heat, less CPU cycles, less, less, less. Less of everything. Not only is it better for our planet, it&#8217;s simply cheaper. Don&#8217;t excessively or wastefully use resources. Be responsible: conserve.</p>
]]></content:encoded>
            <pubDate>Tue, 20 Jan 2009 22:13:50 GMT</pubDate>
        </item>
        <item>
            <title>Dissecting Today&#039;s Internet Traffic Spikes</title>
            <link>http://omniti.com/seeds/dissecting-todays-internet-traffic-spikes</link>
            <guid>http://omniti.com/seeds/dissecting-todays-internet-traffic-spikes</guid>
            <description><![CDATA[Today&#8217;s Internet has changed quite a bit from the Internet I used to know.  The Internet has always been successful because of net neutrality.  What&#8217;s net neutrality?  It&#8217;s complicated, but essentially it means that anyone anywhere ca...]]></description>
            <content:encoded><![CDATA[<p>Today&#8217;s Internet has changed quite a bit from the Internet I used to know.  The Internet has always been successful because of net neutrality.  What&#8217;s net neutrality?  It&#8217;s complicated, but essentially it means that anyone anywhere can publish with equal rights.  These aren&#8217;t the kind of rights people usually talk about&#8230; I&#8217;m not speaking of freedom of speech.  Instead, I&#8217;m talking about content being simply bits.  It doesn&#8217;t matter if it comes from <a href="http://cnn.com/"><span>CNN</span></a> or <a href="http://lethargy.org/"><span>my personal blog</span></a>, you as a reader can download the bits that make up the pages you see without bias or preferential treatment.  This makes it darn easy to be a publisher and leads to a fabulous ecosystem with an overwhelming amount of varied content.  However, with more content it is easy to recognize that much of it is utter trash.  Yes. Yes. I know that one man&#8217;s trash is another man&#8217;s treasure.  However, it presents opportunities for sites that help you navigate the wasteland.</p>

<p>Many popular sites today are popular because they link to articles and news items and photographs and movies all over the Internet; they are "interest aggregation services."  And while the Internet has (for now) a decent preservation of net neutrality when it comes to simple web content, not all publishers are on equal footing.  Not long ago, anyone could run a server anywhere (their basement) with DSL or cable or (gasp) dial-up&#8202;&#8212;&#8202;now, the challenge is coping with unexpected attention.</p>

<p>Years ago, the site <a href="http://slashdot.org/"><span>slashdot</span></a> coined a term "slashdotted" which meant that a site received so much sudden traffic that service degraded beyond an acceptable point and the site was effectively unavailable.  This often happened to sites that were at the end of small pipes (DSL, T1, etc.) and occasionally (though rarely) due to bad engineering.  While slashdot might have coined the term, they simply don&#8217;t have the viewership numbers that other large sites today have.</p>

<p>At <a href="http://omniti.com/"><span>OmniTI</span></a>, I work on sites that aren&#8217;t on the end of T1 lines.  Sites with gigabits or tens of gigabits of connectivity.  Sites with 50 million or more users.  Sites powered by thousands of machines. I also work on sites that service millions of people from just a handful of machines (efficiency certainly has its advantages sometimes).  I find it particularly interesting that already popular sites (with significant baseline bandwidth) are seeing these unexpected surges.  For a long time, my blog has been on this same machine which is a vhost for several other web sites.  I&#8217;ve had traffic spikes from places like slashdot, reddit, digg, etc.  And, no surprise, I couldn&#8217;t actually see the bandwidth jump on the graphs&#8230; 10Mbits to 11Mbs?  That&#8217;s not a spike.</p>

<p>Things are changing.  Sites like <a href="http://digg.com/"><span>Digg</span></a> are becoming ever more popular and people are drawn to them as a means of sifting the waste of the Internet.   This means as more people rely on <a href="http://digg.com/"><span>Digg</span></a> and <a href="http://reddit.com/"><span>Reddit</span></a> and other similar sites, the number of unexpected viewers of your content can rise more sharply.</p>

<p>What does all of this mean?  It means that the old rule of thumb that your infrastructure should see 70% resource utilization at peak is starting to falter.  The typical trends used to look like this (this is last week&#8217;s graph from a retail client with a user base of 3 million):</p>

<div style="text-align: center;"><img style="border: 1px solid rgb(200, 200, 200); padding: 4px; text-align: center; display: block; max-width: 800px;" src="http://images.omniti.net/omniti.com/i/b/boringtrend.png" alt="" /></div>

<p>We see a nice peak, a nice valley.  Thursday afternoon, we see a nice traffic spike.  Well, this used to be what I called a traffic spike.  Now, different services have different spike signatures.  It resembles traffic model of classic Internet advertising, except that there is genuine interest and thus dramatically higher conversion rates.  It&#8217;s a simple combination of placement, frequency and exposure.  Because content, unlike ad banners, exists for an extended period of time (sometimes forever), the frequency is very high.  Digg and Reddit have excellent placement with very little exposure (things move out quickly).  A site like CNN or NYTimes usually provides mediocre placement (unless you are on the front page) and excellent exposure.</p>

<p>Lately, I see more sudden eyeballs and what used to be an established trend seems to fall into a more chaotic pattern that is the aggregate of different spike signatures around a smooth curve.  This graph is from two consecutive days where we have a beautiful comparison of a relatively uneventful day followed by long-exposure spike (nytimes.com) compounded by a short-exposure spike (digg.com):</p>

<div style="text-align: center;"><img style="border: 1px solid rgb(200, 200, 200); padding: 4px; text-align: center; display: block; max-width: 800px;" src="http://images.omniti.net/omniti.com/i/b/spikesdissected.png" alt="" /></div>

<p>The disturbing part is that this occurs even on larger sites now due to the sheer magnitude of eyeballs looking at today&#8217;s already popular sites.  Long story short, this makes planning a real bitch.</p>

<p>And the interesting thing is perspective on what is large&#8230;  People think Digg is popular&#8202;&#8212;&#8202;it is.  The <a href="http://nytimes.com/"><span>New York Times</span></a> is too, as is CNN and most other major news networks&#8202;&#8212;&#8202;if they link to your site, you can expect to see a dramatic and very sudden increase in traffic. And this is just in the United States (and some other English speaking countries)&#8230; there are others&#8230; and they&#8217;re kinda big.</p>

<p>What isn&#8217;t entirely obvious in the above graphs?  These spikes happen inside 60 seconds.  The idea of provisioning more servers (virtual or not) is unrealistic.  Even in a cloud computing system, getting new system images up and integrated in 60 seconds is pushing the envelope and that would assume a zero second response time.  This means it is about time to adjust what our systems architecture should support.  The old rule of 70% utilization accommodating an unexpected 40% increase in traffic is unraveling.  At least eight times in the past month, we&#8217;ve experienced from 100% to 1000% sudden increases in traffic across many of our clients.</p>

<p>I talk about scalability a lot.  It&#8217;s my job.  It&#8217;s my passion.  I regularly emphasize that scalability and performance are truly different beasts.  One key to scalability is that a "systems design" scales.  Architectures are built to be able to scale, they are not built "at scale."  It&#8217;s just too expensive to build a system to serve a billion people (until you have a billion people).  It&#8217;s cheap to <em>design</em> a system to serve a billion people.  Once you have a billion people accessing your site, you can likely justify executing on your design.  Google is successful for this reason: their ideas scale and they can build into them as demand rises.  On the flip side, traffic anomalies in the form of spikes are unexpected (by their definition) and scaling a system out to meet the <em>unexpected</em> demand is almost unreasonable.  I would even argue that it is more of a performance-centric issue.  I want every asset I serve to be as cheap to serve as possible allowing me to handle larger and larger spikes.</p>

<p>The reason I find all of this stuff interesting is that understanding <a href="http://omniti.com/does/scalability-and-performance"><span>performance and scalability</span></a>, understanding the <a href="http://omniti.com/writes/scalable-internet-architectures"><span>principles of scalable systems design</span></a> and having <a href="http://omniti.com/does/scalability-and-performance/process"><span>sound and efficient processes for handling performance issues</span></a> is becoming crucial for sites regardless of their size.  This takes insight and practice and it reminds me of Knuth&#8217;s famous saying:</p>

<blockquote><p>We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.</p></blockquote>


<p>That&#8217;s all well and good, but which 97% of the time?  My response to Knuth&#8217;s statement (with which I completely agree) is:</p>

<blockquote><p>Understanding what is and isn&#8217;t "premature" is what separates senior engineers from junior engineers.</p></blockquote>


<p>Let&#8217;s add perspective on the word "sudden."  Most network monitoring systems poll SNMP devices (like switches, load-balancers, and hosts) once every five minutes (we do this every 30 seconds in some environments).  Some people say, "my site scales! bring it on." We see these spikes happen inside 60 seconds and they occasionally induce a ten-fold increase over trended peaks.  Often times, this spike can be well underway for several minutes before your graphing tools even pick up on it.  Then, before you have time to analyze, diagnose and remediate&#8230; poof&#8230; it&#8217;s gone.  Be careful what you wish for.</p>

<p>This, in many ways, is like a tornado.  Our ability to predict them sucks.  Our responses are crude and they are quite damaging.  However, predicting these Internet traffic events isn&#8217;t even possible&#8202;&#8212;&#8202;there are no building weather patterns or early warning signs.  Instead we are forced to focus on different techniques for stability and safety.  The idea of a DoS, a DDoS or the sometimes similar signature of a sudden popularity spike doesn&#8217;t increase my heart rate anymore&#8202;&#8212;&#8202;it&#8217;s just another day on the job.  However, I thought I&#8217;d share the four guidelines that I believe are key to my sanity in these situations:</p>

<ol>
<li><em>Be Alert</em>: build automated systems to detect and pinpoint the cause of these issues quickly (in less than 60 seconds).</li>
<li><em>Be Prepared</em>: understand the bottlenecks of your service systemically.  Understanding your site inside and out.  Contemplate how you would respond if a specific feature or set of features on your site were to get "suddenly popular."</li>
<li><em>Perform Triage</em>: understand the importance of the various services that make up your site.  If you find yourself in a position to sacrifice one part to ensure continued service of another, you should already know their relative importance and not hesitate in the decision.</li>
<li><em>Be Calm</em>: any action that is not analytically driven is a waste of time and energy. Be quick, not rash.</li>
</ol>

<p>Back to those other countries&#8230; Enter China and their recently lessened censorship and we have a looming tidal wave for smaller sites that achieve sudden popularity.  Spikes of several hundred megabits per second are difficult to account for when your normal trend is around twenty megabits per second.    The following graph is traffic induced from a link from a popular foreign news site (that I can&#8217;t read).  I call it: "ouch:"</p>

<div style="text-align: center;"><img style="border: 1px solid rgb(200, 200, 200); padding: 4px; text-align: center; display: block; max-width: 800px;" src="http://images.omniti.net/omniti.com/i/b/spikechina.png" alt="Graph showing a sharp rise in traffic with a long tail." /></div>]]></content:encoded>
            <pubDate>Thu, 15 Jan 2009 18:36:51 GMT</pubDate>
        </item>
    </channel>
</rss>

