How We Found All Of Optimizely’s Clients

For those who aren’t familiar with Optimizely, they are a leader in the growing A/B testing industry.  Amazingly, they’ve managed to get their installation code down to just one single line of JavaScript as pictured below:

image

With one simple query we uncovered a total of 577,395 sites containing that Optimizely JavaScript library:

2017-01-11_1005

That’s a lot of clients! But, we wanted to dig even deeper and find all distinct Optimizely CDN URLs which contain their Optimizely client numbers. Using a regular expression search we were able to extract a list of over 12,000 URLs used on the top 1 million sites.

2017-01-11_1012

Try out this and other awesome search tools within our search and regular expression interfaces.

About NerdyData

Our search engine is different from search engines you’ve used before. Traditional search engines are geared towards providing answers, whereas our goal is to give you the best list of results for a query.

Our crawler has visited over 140 million homepages and collected terabytes of HTML, JavaScript, and CSS code. We’ve also designed several search interfaces that allow anybody to query against the source code of webpages, or download a list of sites containing a specific term.

//

How We Found Every Single Vulnerable Website

If you’re a security researcher and you’ve found an exploit in a commonly distributed web application, you may want to find sites that contain that vulnerable application so you can notify them.

The question is how do you find them?

image

Google Hacking Is Now Obsolete

Maybe you’ve heard of Google Hacking, a technique hackers use to find websites that contain a common filename or block of text that is present in a vulnerable piece of software by searching to find all sites containing them.  An example of this would be a Google query like

inurl:administrators.pwd

or

Powered by XOOPS 2.2.3 Final

If you are familiar with this method of vulnerability hunting, or this sort of thing interests you, you’ll be excited to know we’ve taken Google Hacking to another level.

How Does This Method Differ?

Traditional search engines only let you query the text of a webpage, not the markup. You can now find all websites that have a common piece of HTML code or JavaScript, in addition to a block of text. Here are some examples of what can done:

Websites running WordPress that are using version 3.5

Query: <meta name="generator" content="WordPress 3.5" />

imageClick to see query results

Websites with an upload form on their homepages

Query: name="MAX_FILE_SIZE"

imageClick to see query results

Websites using the Invision Power Board Forum

Query: ipsBadge

imageClick to see query results

New flaws in web application security measures are constantly being researched, both by hackers and by security professionals. Most of these flaws affect all dynamic web applications whilst others are dependent on specific application technologies.

In both cases, one may observe how the evolution and refinement of web technologies also brings about new exploits which compromise sensitive databases, provide access to theoretically secure networks, and pose a threat to the daily operation of online businesses.

//

Mixpanel Vs. Goliath

In a vast sea of analytic platforms, how many users choose Mixpanel over the competition?

image

It takes just 5 minutes to setup, and once you start watching the real-time data flood in, it’s clear that Mixpanel is not only the most “modern” and sleek analytics platform to-date, but also provides a unique take on customer-oriented statistics and insight.

This isn’t a blog post about why Mixpanel is better — instead we want to show you some interesting statistics that exemplify the uphill battle Mixpanel faces in competing with the analytics juggernauts.

There is no denying that Google Analytics and Omniture dominate the online analytics industry. But just how big are they?  We researched the topic and found:

After seeing these numbers we thought, “Well, Mixpanel has a low adoption rate among all webmasters, but maybe their target market is larger web companies”.  So we narrowed our search to just the top 1 million sites on the internet (based on traffic)  Mixpanel appears on just 540 websites out of the top one million. 

How hard would it be for Mixpanel to convince Google Analytics users to make the switch?

Upon further inspection we found 87% of domains that have Mixpanel code also use Google Analytics.

It’s tough to get out of Google’s shadow, so how will Mixpanel convince webmasters to pick them as their primary analytic platform?

About NerdyData

Our crawler has visited over 140 million homepages and collected terabytes of HTML, Javascript, and CSS code. We’ve also designed several search interfaces that allow anybody to query against the source code of webpages, or download a list of sites containing a specific term.

//

How Facebook Tricks Webmasters To Collect Users Web Surfing History

image

With the recent announcement that Facebook will begin selling your web browsing history to advertisers, we thought we’d take a look at how they actually get your web browsing history in the first place.

Most people assume that Facebook tracks them when on facebook.com, but you don’t have “Facebook” installed on your computer and you don’t “open up Facebook” to surf the web.  Where do they get data from?

Even without visiting facebook.com, plus.google.com, or twitter.com, you’re likely to encounter elements from these sites almost seven times a day. The trackers come in the shape of cookies, JavaScript, 1-pixel beacons, and Iframes, and cute looking widgets.

These elements have the ability to ping Facebook’s servers with:

  • The URL of the page you’re viewing
  • The site that referred you to that page
  • The browser you’re using
  • The OS you’re using
  • Your approximate geographic location
  • The size of your screen
  • If you’re logged into Facebook they can associate you with your Facebook profile.

The Facebook Like Button

One very popular widget on the internet is the Facebook like button. Facebook’s Like button has made it easy for hundreds of millions of Web users to share content with their friends on the social networking site. The button appears on more than one-third of the top thousand websites and has been integrated into everything from Bing search results to countless blogs around the ‘net. What users may not realize is that the soft blue thumbs-up is tracking their surfing habits, even if it doesn’t get clicked.

image

Any time the Like button is displayed, information is zapped back to Facebook’s servers.

Facebook Connect and Your Privacy

Facebook Connect is the next iteration of the Facebook Platform that allows users to “connect” their Facebook identity, friends and privacy to any site. Even if you never login to a site using Facebook Connect, the fact that they have the Facebook Connect JavaScript snippet present on their site means Facebook can see that you are present on that site.

Over 50,000 sites use Facebook Connect, and if you’ve visited one of them, you’ve been tracked.

image

Like Boxes Are Creeping On You Too

image

The Like Box is a special version of the Like Button designed only for Facebook Pages. It allows admins to promote their Pages and embed a simple feed of content from a Page into other sites. As this is a JavaScript widget, every time it is loaded it pings information about you back to Facebook servers.

We found over 1 million websites that have this box.  (and additionally show pictures of the followers faces)

image

What Can you Do About It?

Twitter and Pinterest, which track people with their Tweet and PinIt buttons, offer users the ability to opt out. And Google has pledged it will not combine data from its ad-tracking network DoubleClick with personally identifiable data without user’s opt-in consent. Facebook does not offer an opt-out in its privacy settings.

Instead Facebook asks members to visit an ad industry page, where they can opt out from targeted advertising from Facebook and other companies. The company also says it will let people view and adjust the types of ads they see.

How To Find New Clients For Your SEO Agency

NerdyData is a search engine for source code.  This post outlines some ways an SEO agency can use our tool to discover potential new clients, en masse.

It’s a gold rush out there for SEO agencies. As businesses come online in droves, they quickly discover that simply paying someone to develop a website will not get you the traffic you need to be profitable. Everyone wants to be at the top of a hot Google search. A criminal attorney in San Francisco who ranks for criminal attorney in san francisco will likely receive many contacts from people interested in legal representation.

Only a small percentage of websites show up in a top placement in organic search results for popular queries.  There are millions of websites that exist, but are are not optimized in a way that will make them appear for these frequently searched keywords, and so they are displaced by those that do optimize.

image

An SEO agency exists to bridge the gap between Google’s search algorithm and technologically unsavy business owners.            

We have come up with some ways an SEO agency can surface these poorly optimized sites using our search engine. Here are some examples: 


Search for sites that have “niche” and “location” in their <title> tag or on-page text, but DO NOT have a meta description tag

  • If you’re an SEO agency you could use this type of search to narrow down sites owned by “criminal attorneys” in “san francisco” that most likely doesn’t have an SEO agency because they lack a meta description tag on their web pages.

Additionally, we’ve made a number of tools that let you search within the <title> and Meta Descriptions of websites.


Search for sites that don’t have Facebook or Twitter badges, buttons, or social links on their pages.

  • There’s a good chance these sites do not have an online social presence.  Why don’t they?  These businesses could find new customers by creating a social media presence, but may not know how to create one.


Search for sites that use outdated or poorly optimized software

  • Many small business websites are using a version of a CMS, forum, or blog software that is not optimized for high volume queries in Google.  These sites are likely to already contain content, but are not designed in a way that allows them to capture search traffic for terms relevant to their business.

If you want to perform searches like these, try out NerdyData, a search engine that indexes the full source code of webpages and let’s your query using code snippets, as well as keywords.

Additionally, you can submit a request through this form and we can get in touch with you to help you uncover new business leads for your agency.

Or follow us on Twitter!

//