Using PhantomJS at scale

Did you know that SmugMug has a separate Sorcery blog? From time to time our Engineers will take you behind-the-scenes of something we’ve just built, or share some of the valuable insights we’ve had while investigating a particularly tricky fix. If you’re interested in the technical nuts and bolts of what powers our site or if you just want to hear what’s making our hearts beat a little faster, check it out! Their latest post deep-dives into The Phantom Renderer, or “how to make sure SEO works in the new SmugMug.”

SmugMug Sorcery

About a year ago SmugMug had a dilemma. Our upcoming site-wide redesign and refactor  (aka the new SmugMug) moved all of our rendering code into client-side JavaScript using YUI. We had a problem; SEO is critical for our customers and search engines couldn’t index the new site.

Possible solutions were thrown around: do we duplicate our code in PHP? Use an artificial DOM? What about PhantomJS? Duplicating code would be a monumental effort and a continued burden when writing new features. Initial tests of fake/artificial DOMs proved unreliable. A small prototype Node.js web server that hooked into PhantomJS proved promising. Node.js’ async model would be perfect for handling things that wait for I/O like rendering webpages. We came up with the project name ‘The Phantom Renderer’ soon after.

The prototype

I spent a few days whipping up a prototype proxy server to test with that worked like so:

  • Node.js web server accepts a…

View original post 1,168 more words

Published by


SmugMug brings you beautiful, personalized online galleries. We love photography and believe that taking pictures makes life better. We're here to help make it fun, enjoyable, and easy for all.

8 thoughts on “Using PhantomJS at scale”

  1. I am not a technical person and although I could read this it would not mean much to me. It is why I am here as a Smug Mug member. I am a lifelong photographer and I have, at times, made a living as a photographer of people. However I am now (as of yesterday) seventy one years old. What you have given me with the creation of Smug Mug is the opportunity to share the things and places I love in my images for others to see. Smug Mug is my miracle this Christmas. You have given me a way to share what I love with others by providing me with the ability to show my images in a larger size, an upload system I can understand, and always an answer for what I don’t understand when I get into trouble and can’t figure something out. The help has always come promptly, without recrimination and I think from people who want me to succeed.

    For all you have done for me, thank you my Smug Mug family. Merry Christmas and a very happy and prosperous New Year to each and every one of you.

    Sincerely yours,

    Dennis Clark Photographer

    1. Thanks so much, Dennis! You’ve warmed our hearts by letting us know that we have helped you connect your life with the people around you. Steve shared your words with the rest of us at SmugMug and we hope that you will always find joy in taking photos. Happy belated birthday!

    1. Hi Patricia! We know that not everyone is a tech geek, but we do have a lot of very technical fans who ask us about those under-the-hood details of SmugMug. For those of them just tuning in, we just wanted to post a shout-out to our Engineering blog so we don’t get too techy here.

      Stay tuned for more of the usual photo-friendly news.🙂

  2. I have been using a very similar approach for our site. My setup includes: NodeJS, Express, PhantomJS, Forever, TingoDB.

    Once we generate the html snapshot, we cache (file system) it and serve it using nodejs/express for any further requests. In addition, I also capture few stats (e.g crawler, url, datetime, response time) in a small in process DB (TingoDB).

    It has not been to long that we have started using it, but I think it is promising.

    I will be interested to hear from others about how they handled the following issues:

    1. What status do you normally return to crawler, in case of failures e.g the target page threw a 500 or 404.

    2. I have disabled image loading in PhantomJS , are there any other optimizations you considered ? Especially to reduce memory and/or CPU utilization by PhantomJS.

    1. Hi Ravi! Great to hear other people are taking the same route we did.

      For status codes, we proxy the status code from our target page. We consider The Phantom Renderer a proxy, so it should return the status code it receives.

      We also disabled image loading. Beyond that there isn’t too much you can do to reduce CPU and memory usage. Rendering webpages is pretty intensive. The only other performance improvement we made was keeping PhantomJS processes alive between requests. This reduces time-to-start; initializing a PhantomJS process can take a second or two depending on how busy the server is.

Comments are closed.