Putting the Kibosh on Duplicate Content

By Rebecca Stewart, Organic Search team

Recently, the SEO team at Exclusive Concepts hosted an internal “Lunch and Learn” for the company, where we discussed our team’s methodology so other departments can better understand what we do and why we do it. A question that came out of that meeting was one that inspired this blog post: “What is duplicate content, and why is it bad?”

It made me realize – we can be so engrossed in our client work, it may result in us making assumptions as to how much our colleagues (or clients) actually know about what we do and why we do it. Below, I’ll discuss what defines duplicate content, how it is most commonly created, and why it’s become a sticking point when content best practices are scrutinized. When SEOs at Exclusive Concepts talk about duplicate content, we’re talking about the copy on your site. For our eCommerce clients, this is typically the paragraphs on category or subcategory pages, as well as product descriptions. It can also include buyer’s guides or how-to articles. Basically, if it’s words on the page, we’ll refer to it as content.

Duplicate content, you say?

Duplicate content occurs when part or any of this content is replicated on another webpage. You’ll notice I didn’t say another website. Yes, you can create your own duplicate content. More on that in a minute.

When we look at a site for duplicate content, we’re looking both externally (on another site) and internally (on your domain.) We typically find content externally duplicated as a result of:

  • Using manufacturers’ product descriptions
  • Shopping feeds or affiliates (think The Find, Amazon, eBay)
  • Scraper sites (ShopWiki)

  • Copied content from other sites
  • Reused content on sister sites or blogs
  • Content syndication

Internally duplicated content commonly occurs when a site has separate pages for product styles, such as different colors or sizes. It can also occur due to indexable URL parameters which can be as a result of things like filters, pagination, or session IDs.

So, why did all of this become such a big deal? Nik does a great job in our eCommerce Guide to Google 2014 webinar explaining the history of the “birth” of Panda – I highly recommend taking a look if you haven’t already. For now though, I’ll summarize:

  • Until 2008, pages with content were placed in the Primary index, pages without content were placed into the Supplementary index. These were simpler times.
  • In 2008, Google decided the Supplementary index couldn’t just house pages with no content, so they began putting unique content into the Primary index and duplicate content into the Supplementary index. However, this resulted in good content (even though it was duplicated) being demoted to the Supplementary index. The unintended result of this was that people began investing in content – but the content they were writing wasn’t of high quality (think thin content, or content written by those whose first language isn’t English.)
  • In 2009, Caffeine rolls out. This new indexing process made indexing of the web more real-time and therefore unique content was being found much faster. Businesses noted that their index size was increasing with the unique content they had. So, they continued pushing out this low-quality, unique content, and Google’s Primary index begins to grow with lots of bad (but unique!) content.
  • In 2011, Panda is rolled out in an effort to identify the good unique content versus the bad unique content (obviously still looking at pages without content, as well.) Google’s statement when Panda was first rolled out in February 2011 said:

“This update is designed to reduce rankings for low-quality sites—sites which are low-value add for users, copy content from other websites or sites that are just not very useful. At the same time, it will provide better rankings for high-quality sites—sites with original content and information such as research, in-depth reports, thoughtful analysis and so on.”

Wait, what’s all the hubbub about?

One of the biggest concerns about this, and why I think content is a big topic of conversation with our clients, is because it is absolutely costly task of either your time or money to have unique content written for the entirety of a site.

For eCommerce sites, having product descriptions obviously is not “low-value add” for users, but what sets your descriptions apart from the hundreds of other sites selling the same product with unchanged manufacturer specifications? You’re doing your site a disservice by supplying customers with the exact same information shoppers can get on any of the other site that sells these products. In the words of Matt Cutts (tough love), “If you can’t manage to have 1,000 pages and have something unique, something different…why should your 1,000 pages…rank compared to someone else’s 1,000 pages of that same…content?” Obviously, keep the item specs — those are necessary. But why not also use this as an opportunity to cross-sell other products, recommend styling options, give examples of uses, or just tout your value-adds that make people want to buy from you?

I could talk about duplicate content, its causes, specific cases and how to prevent it ad nauseam, but for now I hope this post served as either a refresher as to why we’re constantly combating duplicate content or as a source of new information for you. If there’s anything you think I missed or a topic you’d like to see me dive into further, please feel free to reach out to our blog’s mailbag account at mailbag@exclusiveconcepts.com.

Rebecca has been with EXCLUSIVE for more than seven years, and is currently the Director of Organic Search. She majored in Marketing at Bentley University, with a concentration in Global Studies. Her favorite part of her job is analyzing data to make successful site recommendations. She enjoys cooking (and especially eating), good food and drink, working out, shopping, golf, and travel.