Web 2.0 Expo Conference
Nathan provided a really good example of how not to make your page visible to search engines.
The page was a flash based page. Looked really slick but…
- Takes 12.7 seconds to load because it forces you to choose a language first then your country before showing you anything.
- Because it’s totally flash based, the page doesn’t provide any details for search engines
- Nike actually delivers up a totally different page for search engines. This is called “cloaking” and is frowned upon by search engines.
- If you view the source of the page you can see that it isn’t even well formed (the body tag isn’t closed) so any parsing of the page will fail.
So What. What’s the Big Deal?
Because of this home page and it’s lack of meta information if you search for “lebron james shoes” on google (LeBron James is sponsored by Nike and one of their biggest names) it doesn’t result in a link to Nike until page 50 of results (if you ignore the sponsored links).
Cloaked pages are duplicate pages. Require twice the maintenance.
The proper way to do it is to provide alternate simple html within the same page that is viewable to search engine robots as well as legacy browsers and currently iPhones since they don’t support flash.
The solution? Make your site viewable by robots.
How do robots work?
- URLs can be submitted.
- check for headers
- number of inbound links
- number of “quality” inbound links.
- Quality is defined by a large set of rules. 300+ factors.
- Use HTML semantically. e.g. Use h1 tags rather than CSS to denote a heading. This way the search engine knows that it’s a heading.
- The title tag: use it. Search engines rely on it. Put something meaningful in there as well.
- The meta tag with attributes: name=”description” content=”….”: provides keywords to search engines.
- Anchor tags. Use useful text in the anchor, not “click here”.
- Use the heading levels: h1, h2, and h3. Did you know that there is only supposed to be one h1 per page?
- The noscript element. Within it you provide a dumbed down version of the page that search engine (and anything else that can’t handle your rich application) will see.
- Monolithic – search engine sees it as a black box.
- Linkable – multiple entry points.
- Crawlable – engine can access deep into the site.
The nike.com site is meant to be crawlable because it’s a public site trying to sell product but it is implemented as a monolith.
- Search engines parse URLs trying to capture information. Having human readable URLs is good for people but especially useful for search engines. Using hyphens instead of underscores allows the search engine to break up the URL and parse it.
HTTP status codes
- 200 OK
- 404 Not Found
- Some sites use “soft” 404s. They still return 200 even though the page was in fact “not found”.
- 301 Moved permanantly (greater than 20 minutes)
- 302 Moved temporarily (less than 20 minutes)
- 304 Not Modified – how do you enable this?
What is the “official” URL for a site?
oreilly.com oreilly.com/index.csp www.oreilly.com www.oreilly.com/index.csp
- Pick one and use 301 redirects to it from the others.
- Trim the default filename off of the URL.
- Make all internal links to the one unique URL.
The anchor tag’s nofollow attribute. Nofollow is supposed to stop robots from drilling into a link. It isn’t necessarily adhered to.