Archive for the 'Geek' Category

Towards the Semantic Web Friday, March 7th, 2008 by Matt

Imagine a world where much of the data that’s flowing around the net takes on meaning, thereby becoming knowledge. Imagine a world where a system in Germany can infer that I’m the uncle of Wade, based on data from a system in the US indicating that my brother Page is the father of Wade. Imagine a world where an online product review from Steve is displayed for me ahead of the other 500 reviews, because of inferred trust derived from the knowledge that both Steve and I happen to share a common friend, you.

Although we’re still a long way off, recent events — in which MakaluMedia staff have played an important part — have brought us a few steps closer to such a world.

MakaluMedia hacker and researcher Arto Bendiken has long been interested in distributed systems, and “information about information”, and naturally developed an interest in the Resource Description Framework (RDF). RDF, in short, is a technology that allows the representation of data as “knowledge”. If two independent systems store their data in RDF, and share common semantic “vocabularies”, then the two systems can effectively share their “knowledge”. What does RDF look like? A simple example has been taken from this Quick Introduction to RDF.

  @prefix : <http://www.example.org/> .
  :john    a           :Person .
  :john    :hasMother  :susan .
  :john    :hasFather  :richard .
  :richard :hasBrother :luke .

As it happens, our Drupal team, led by Arto, has been working for a few years on a project with our colleagues at M.C. Dean and Raincity Studios in the development of a sophisticated collaboration and communication platforms for the US government, based on the Drupal platform. (Certainly it represents one of the largest and most complex Drupal instances in the world.) This platform presently supports more than 60 international clients servicing use cases ranging from policy definition collaboration, to natural crisis management, to school operations in the African continent.

This project represents a natural fit for RDF technology, given the value realized in sharing “knowledge”, not just “data”, between the various instances of the platform, as well as the growing number of other RDF-enabled systems around the world. Towards this end, the project team has been working intensively during the past months to design, develop and begin to integrate an RDF storage, management and access framework into Drupal. And since a primary objective of this project is to release the developed products as freely available open-source software, much of this RDF work can be tracked and accessed from the Drupal RDF project page.

Given that Drupal forms the core technology of the platform , the project team naturally maintains a close relationship with the its founder and leader, Dries Buytaert. In guiding the evolution of the Drupal platform, Dries has always demonstrated a willingness to take bold steps in the direction of progress, and this has been evidenced once again this week. In his keynote speech at the Drupalcon Boston 2008 conference, Dries made the big announcement that the future versions of Drupal (beginning with version seven), will be based on RDF.

Drupal presently dominates the market of open-source content management systems, and so this announcement represents a huge step forward to the building of a truly “Semantic Web”. If interested, you can read various reactions from the blogosphere at Network World and SitePoint.

We are tremendously proud to have been a part of this progress, and look forward to continuing work towards a world of networked knowledge.

Catalog Choice registers half a million users! Monday, January 28th, 2008 by Matt

Catalog Choice on the Today Show.

On January 24, Catalog Choice saw its biggest day yet, when it was covered in a fantastic piece on NBC’s “The Today Show”:

Over the course of the day, the catalogchoice.org website saw over two million page views, and registered 60,000 new user accounts, bringing the total number of registered users, three days later, to over 500,000!

In addition, “Catalog Choice” was the number one search term for the day on Google:

Untitled

Coping with the traffic.

Coping with a sudden increase in traffic, orders of magnitude more than typical, was a challenge. The front-end web application servers quickly became overloaded, and later the back-end DB server became overloaded (we were servicing over 2,000 DB queries per second!) Since it’s still not possible (with our hosting providers, at least) to bring on additional servers on-demand, we quickly made several modifications to the application:

  1. We made a number of layout modifications in the application that would allow us to cache content to a far greater extent.

  2. These same modifications also targeted the reduction of DB queries.

With these modifications, we were able to cope well with the secondary traffic surge.

Lessons learned.

It’s quite possible that Catalog Choice is now one of the largest Ruby on Rails applications running on the internet, in terms of number of users. Over the past few months of operation, we’ve learned some lessons:

  1. Although not related to Rails, we’ve learned that it’s a good idea, especially for a site with this broad of a user base, to be conservative on the use of client-side technology. When originally launched, we had implemented elegant page transitions, catalog finder live type-ahead, and other similar UI features — all done with JavaScript (AJAX) in a way that gave the site a desktop-application feel. We considered this acceptable practice, as we were designing for IE 6/7, Safari 2/3 and Firefox 2/3.

    However, when you have 500,000 users, even 1% on older browsers represents quite a large crowd! So we’ve since modified the site to work in a far more traditional manner, relying very little on client-side JavaScript, and where necessary, degrading very gracefully.

  2. For hosting, our infrastructure, like many these days, is based on virtual machines. We have N number of front-end web application servers, each practically maxed out in terms of CPU and memory. Based on the experience with the Today Show traffic, we’re thinking now that it might be better to have 2N front-end servers, each with half the CPU and memory, since it’s a lot easier to quickly add CPU and memory to an existing server (to meet demand), than it is to bring on additional VMs. (This is, assuming 2N front-end servers with half the memory are roughly comparable in cost to N servers with double the CPU and memory, which might not be the case.)

It has been a very exciting experience to watch the site grow, analyze the usage patterns, and adjust the application and its user interface to not only improve the usability and user experience, but to adapt to the changing user profiles (i.e. now that over 500,000 of our visitors are no longer first-timers, and that we have over 1,000 merchants in the system.)

How the site is doing.

When the site first launched, the consumer response was (and continues to be) nothing short of amazing. It is clear that this site is meeting a very big need in the United States; that is the reduction of unwanted paper catalogs. The industry’s response was, expectedly, lukewarm, especially after the Direct Marketers Association (the DMA) issued an email to all its members to “Just say no!” to Catalog Choice.

However, with half a million vocal consumers behind it, Catalog Choice has become an influential heavyweight. A website feature we launched last week alerts users to which specific merchants have refused to honor their opt-out requests, and provides the merchants customer support telephone number, just in case the consumer would like to give them a call. Within 24 hours, after being inundated with phone calls from angry customers, we had merchants changing their minds :-)

A misconception in the industry (promoted by the DMA) is that Catalog Choice seeks to do away with catalogs altogether. That couldn’t be further from the truth. Catalog Choice is about doing away with just those catalogs that are unsolicited and unwanted.

All in all, Catalog Choice has been a fantastic project for MakaluMedia. We’re fortunate to be one of very few companies having the opportunity to build and operate such a large-scale Rails site, and a site that serves such a meaningful social purpose!

Spam Filters, Alien Technology and Ruby on Rails Wednesday, July 5th, 2006 by Arto

Lisp - made with secret alien technology

When Paul Graham’s A Plan for Spam made its dramatic entrance into the anti-spam battle four years ago, it heralded the beginning of the end for spam — as we knew spam back then, anyway. Applying a simple statistical approach, based on word frequency analysis with a naive Bayesian classifier, Graham described how to create a spam filter accurate enough (99+%) that false positives effectively ceased to be an issue.

The central idea of the Graham Algorithm was quickly adopted en masse by spam filters, and as a result, the spam arms race has in the past few years tipped in favor of the good guys. “Successful� spam has devolved into exactly what Graham predicted it would: “some completely neutral text followed by a URL.� For me personally, the combination of good server- and client-side filters has made spam yesterday’s problem. (Well, that, and using Gmail as a front-end for lower-priority e-mail; spam all you want, it’s Google’s problem and they’re up to the task.)

Recently at MakaluMedia, we’ve succeeded in applying similar text classification principles to another unrelated problem area, with the intent of forcing the computer to do the tedious job it was invented for, allowing us super-apes, in turn, to spend more time under a palm tree on the nearby beach, sipping tinto de verano and working out answers to deep existential questions, or whatever else it is that one does on the beach (note to self: need more practice).

The exact details of this covert project will have to await its escape into the wild, should it ever evolve the capability for that. For the time being, some of the technically more interesting tidbits will have to do as fodder for my ramblings.

First, as with most of our internal development, and an ever-increasing percentage of our client projects, this system was developed using the high-productivity Ruby on Rails framework, and reached the magical 0.1.0 mark (i.e., pre-alpha, but usable enough to solve many of the developers’ own needs) in less than a man-week of intensive coding (not to forget the skimming of a good number of research papers related to the subject).

However, to ensure a permanent gap on the competition (after all, it sometimes seems like half the world has already jumped onto Rails), we also pulled out the big guns: the top-secret alien technology known as Lisp.

(Don’t be fooled by the devious code name, intended to confound us earthlings with spurious ideas relating to speech defects — this is seriously powerful stuff: exposure is guaranteed to subtly but permanently alter your brain structure in ways not yet fully understood. In fact, the aliens have theorized that the Universe may actually be one giant Quantum Lisp machine, explaining how it is possible that lots of irritating, seemingly superfluous parentheses can act as magic incantations conveying an apparently inexhaustible power as per the principle of Clarke’s Third Law. But that’s neither here nor there for our present purposes.)

In our case, we simply integrated into the Rails system an interpreter for a subset of Scheme, a Lisp derivative; thus no doubt confirming Greenspun’s Tenth Rule once again. (Well, to be fair, the Lisp interpreter in question is only some few hundred lines of fairly elegant Ruby code.)

The system’s top-level classification and scoring algorithms are implemented in this Lisp subset, allowing us to easily fine-tune and try out new tweaks at runtime, and perhaps in the future letting us semi-automatically pit various competing implementations against each other in a manner not dissimilar to a genetic algorithm.

Due to the combined RAD-factors of Rails and Lisp, we quickly proceeded through a number of intermediate prototypes along the way, starting from a short-lived Ferret-based implementation, evolving to a hand-rolled tokenizer and SQL-backed corpora storage, and eventually ending up with the current version that delegates the content classification to a Dr. Strangelove-inspired piece of excellent software called the CRM114 Controllable Regex Mutilator.

(Speaking of CRM114, I was surprised to not find any existing Ruby bindings for it, and thus took the time to transcribe a previous Python wrapper into a Ruby version, to be released shortly.)

Anyway, the system appears to work more or less according to spec, but definitely still needs some more tweaking before embarking on world domination. For one thing, all this number crunching is, well, rather heavy (let’s just say it’s a good thing we have A/C in the server room where the development box is located). Although CRM114 itself is pretty light on its feet, we’re dealing with an exponentially growing data set, and the next challenge will be to put some checks on resource consumption.

So, for the time being, we’re not going to let the development box interact with our space systems department, to prevent any non-regulated growth or inadvertent contact with the aliens. More updates to follow as they happen.

Note to self: too many tinto veranos can make you forget there’s a fine line between tongue-in-cheek surrealism and plain-bad geek humor.

Update 2006/11/06: I’ve released the Ruby interface to CRM114 on RubyForge.

Gauging Reactions to the Slashdot Redesign Thursday, June 1st, 2006 by Arto

Alex’s win in the Slashdot CSS redesign contest has been making the rounds on the net.

S/D/R — The Big Three

Mere moments after the official announcement on Slashdot, the story made its way to Digg and Reddit. To date, Slashdot’s original announcement has garnered 852 comments. The Digg story has been “dugg” 1715 times and commented on 203 times, while at Reddit the story has gained 110 points.

While we have been receiving a constant stream of private congrats via e-mail, comments on all three sites cover the full spectrum from “love it” to “hate it”; the latter kind occasionally moving on to some disproportionately extreme reactions that may perhaps be a symptom of an excessive disconnect with Real Life(TM). Goes with the territory, and Alex is taking it all in stride, I hope. This much is obvious: had this been a vote, instead of CmdrTaco’s call, I doubt any single one of the proposed designs could’ve sustained a clear majority.

Unfortunately, what contributed to an initial negative backlash of sorts was the fact that when the story broke, the design preview was missing a number of elements, including the actual Slashdot logo itself. Slashdot staff quickly corrected the situation, but the posted comments show that a significant number of people thought the erroneous version was the final design, and were understandably upset.

The press release

OSTG’s press release was published on MarketWire, eventually being picked up by MSN Money as well. It includes this comment attributed to CmdrTaco:

“Alex Bendiken’s entry was selected because his design improved upon many shortcomings of Slashdot’s original design. His design moves commonly-used functions into positions of prominence, and improves the readability of articles. His entry required only minor changes to our core HTML, and breathes fresh life into a site that has remained aesthetically unaltered over its 8+ year lifespan,” said Slashdot founder and site director Rob Malda (aka Cmdr Taco).

Blogosphere reactions

The blogosphere has received the new Slashdot with open arms and an almost unequivocally favorable opinion:

  • Steve Bryant posts on his eWeek blog that he thinks he Slashdot redesign “looks pretty damn good. Contemporary, but not so much that it’ll be outdated soon”. He also comments that “all of Google-dom is filled with the name Alex Bendiken” — well, actually, it’s only like 600 entries at the moment, but you, dear reader, are more than welcome to add your contribution to the growing number…
  • John Gruber of Daring Fireball likes the new design, calling it “a big improvement that preserves everything that’s good about the classic Slashdot brand.”
  • Rui Carmo of The Tao of Mac agrees with Gruber that the new design is “very slick indeed”.
  • An editorial in PHP Magazine calls Alex’s design “very nice and well done”, while drawing on their own previous experience to add that it’s no easy task to satisfy everyone. How true…
  • David A. Utter, staff writer at WebProNews, details the differences between the old and new designs, without neglecting the runner-up.
  • Ryan over at CyberNet News blogs: “I believe that Alex really deserved to win. From the bunch of redesigns that I saw his was the best. He kept the integral parts that makeup Slashdot but he also implemented a slick interface.”
  • Philipp Lenssen thinks the design “is cool. The font could be easier to read, tho”.
  • Phil Crissman likes the new look, adding “I’m sure I’ll see a lot of comments along the lines of It’s the same, only different (true), you just added round corners/gradients (no and yes — they already had some round corners), and more such complaints. I’m of the opinion that it retains the characteristic slashdotness of the design, but manages to make it look current. Good job, I say.”
  • Ronald Heft, Jr. states: “I personally love it. I’ve always hated the current design, and while the new one does resemble the current design, it greatly improves upon it. The site feels less jagged and seems like a more calming place.”
  • Michael Angeles thinks the new design ” adds a good deal of white space around the margins by removing the black background and increases height between lines of text, which makes the left nav much easier on the eyes. The previous design always felt cramped to me.”, and goes on to ponder the merits of the font selection and the differences between Arial, Tahoma and Verdana.
  • BorkWeb labels Alex’s brainchild “a pretty clean and snazzy design”.
  • Scott Troyan blogs that “it looks nice. Very clean, retaining classic Slashdot elements, while rejecting the classic Slashdot ugly.”
  • Adrian Lee ponders what makes for an effective website and states the redesign “takes that general look, and makes it much smarter and cleaner. Much easier to read and skim over, generally nicer on the eyes and I don’t feel like my attention is pulled around as much. Generally I’m impressed.”
  • S. Shreyas calls the new design “a very decent layout and 100x better than the older one”, though critizing aspects like the grey color and whitespace usage, goes on to lament that the runner-up’s, Peter’s, design was slashdotted and unavailable for review.
  • Darren Foong is short and succinct: “it’s extremely awesome.”

International coverage

Here’s a quick sampling of reactions from the non-English part of the blogosphere:

More buzz in the blogosphere can be found at Technorati and, of course, Google.

Postscript to the Slashdot Effect Thursday, June 1st, 2006 by Arto

The actual slashdotting is now well over, and I’m glad to say we weathered the storm without any incidents. With the help of our sysadmin, Niall, we distributed the load across three dedicated servers in geographically diverse locations, and none of the boxes even broke a sweat.

However, that’s not to suggest the servers were idling; on the contrary, they were each servicing up to hundreds requests per second. I usually keep a monitoring console open to the servers, and when CmdrTaco originally posted the announcement, it was immediately obvious that something had happened. The staff in our colocation facilities noticed, too — it didn’t take many minutes for the first e-mail alert to arrive. (I fancy they heard the dual-CPU fans suddenly spooling up to maximum effect, but of course, their monitoring systems just warned them of a possible DDoS attack.)

Here’s a nice bandwidth graph from one of our servers, covering the first 24 hours. I thought the start of the slashdotting might be obvious enough that I won’t bother to specifically point it out:

Slashdotting bandwidth usage

On the software side of things, I was especially pleased to discover that no adjustments to Lighttpd’s settings were necessary in order for it to handle massive concurrency.

The load averages remained very reasonable, under 15.0 even with the nightly full backup running, except for a human error that resulted in one of the servers becoming momentarily unresponsive with a load average of almost 500.0 (oops); this was remedied within a minute.

Unfortunately, we didn’t get to test our server systems’ capacity to the fullest under a simultaneous, combined Slashdot/Digg/Reddit assault, since both Digg and Reddit linked to the actual design preview, and the direct links pointing here from the comments didn’t bring in much traffic to constitute a hammering. Oh well, live to fight another day.

MakaluMedia delivers success

Whether the objective is operator error minimization in a satellite tracking system, or the conversion of first-time visitors to buyers, MakaluMedia provides turn-key solutions that result in measurable benefits and positive return on investment for our customers. We help both small- and large organizations in the areas of business consulting, design (user interface, interactivity, corporate identity), system development and operations.

Contact us today. We look forward to hearing from you.