Internet Information Server 4.0

Chapter 14

Analyzing Web Sites

 

Internet Information Server’s most important job is to make sure the WWW service is running and optimized.  This means that main duty of IIS is to make web pages available.  If you’re a web page developer, you need to get all the information you can about your website.  If you’re not a developer, but the “webmaster”, you need to be able to inform your developers about problems with the webpages or the entire web site.

 

This class is not about making web pages.  However, a good understanding of website development will serve you well in having an improved understanding of the functions and purposes of IIS.  I highly recommend you learn Frontpage98 or FrontPage2000.  Eastfield will be offering classes on FrontPage98 this summer.  If you’re at all interested in learning web page development, this classes will help enormously.

 

IIS 4.0 includes a program called Site Server Express 2.0.  This is a watered down version of Microsoft’s full retail product call Site Server.  Site Server Express includes programs that will allow you to:

1.      Analyze the content of your website

2.      Create reports based on the results of your analysis

3.      Create logs of web site activity so you can analyze the usage of your website.

 

Site Server Express is really cool.  When you get into web site development, you will think its really cool too!

 

Introduction to Site Server Express

 

SSE allows you to:

1.      Publish Content

2.      Manage Content

3.      Analyze Usage

 

You do this stuff via:

1.      Content Analyzer

2.      Usage Import

3.      Report Writer

4.      Posting Acceptor

 

Content Analyzer allows you to see the layout of your entire site.  It will find broken links and create detailed HTML formatted reports of its findings.

 

Content Analyzer will also created something called a “WebMap”.  The WebMap is a visual database of your website.  It stores information on all the objects in your website, like graphics, sounds, text, video, and the like.

 

There are two ways that you can view your WebMap:

1.      The Tree View

2.      The Cyberbolic View

We’ll look at these in more detail later.

 

Content Analyzer will provide you with a Site Summary Report.  The report will give you a LOT of information on things like the count and sizes of pages, images, and applications.

 

Usage Import and the Report Writer will allow you to import your IIS logs and then using one of the 21 predefined templates, have a nice detailed report written for you based on the report template you’ve selected.

 

Report Writer generates reports in a variety of formats, including:

1.      HTML

2.      Word

3.      Excel

 

The Posting Acceptor allows your customers to upload files via the HTTP Post command, rather than having to use FTP and a dedicated FTP client program.

 

Using WebMaps

 

As mentioned above, a WebMap can show you every object on your website.  These include:

·        Pages

·        Audio files

·        Video files

·        Java Applets

·        ActiveX Controls

·        RealAudio

·        And more…

 

There are two ways that you can looks at this information: The Tree View and The Cyberbolic View.

 

The Tree View is much like what you see in the Explorer, except there is only one pane.  There is a little icon which represents which types of objects it has found.  You’ll see a plus sign next to a page icon, and when you expand it, you see the objects contained on the page if there are any.

 

The Cyberbolic view is unlike anything you’ve probably seen before (unless you’ve used FrontPage97/98/2000 before).  It is a graphical layout of your site, and gives you a real sense of why its called the “World Wide Web”.  The only way to know how to work with the Cyberbolic View is to look at it, which we’ll do in some of the excercises.

 

Quick Search

Content Analyzer includes a tool called “Quick Search”.  This tool allows you to search for the eight most common errors encountered on a website.  These include:

1.      Broken Links – will find broken links on your web

2.      Home Site Objects – Finds objects located in your domain, and finds problems with these objects

3.      Images without ALT – The ALT tag is used to describe pictures; they are very useful for people who are using text only browsers, and for the blind who use screen reading software.

4.      Load Size over 32k – The cutoff for what is considered maximum page size at 28.8kbps is 32k because of the download time involved.  This will change when the majority of people use faster links

5.      Non-Home Site Objects – This will provide a list of objects that are NOT contained on your website, and therefore represents a list of objects that are not under your direct control.

6.      Not Found Objects (404) – These are objects that could not be located, for whatever reason.

7.       Unavailable Objects – These are like the 404s.  It is more comprehensive because it includes objects that are on unavailable servers, broken com links, or password protected

8.      Unverified Objects – Shows objects that have not been checked to determine whether they are accessible

 

Site Summary Reports

 

Content Analyzer will do a complete site summary report on the web site that you point it to.  This report includes:

1.      Counts and sizes of objects on the web site

2.      Count of objects and links that are OK, missing, or in error

3.      Thenumber of levels in use by the site, where the home page is considered level one

4.      The average number of links

5.      And a LOT MORE…

 

On the next few pages there is an explanation of the report findings.  You won’t be tested on these, but keep in mind where these explanations are, because you will want to refer to them after doing your own site reports.

 

Using Log Files

 

As we’ve seen when working with the property sheets of the various components within IIS, we can log information about the WWW Service, FTP Service, SMTP Service, NNTP Service, and so on.

 

With all this logging going on, what can we do with it? IIS logging is very much more detailed than the logging of events that you see in the NT Event Viewer.  The IIS Logging is maintained in either one of two formats:

1.      Plain ASCII text files

2.      ODBC database formats

 

What can we do with this information?

·        Find out which users are accessing our site

·        Get information about the information they are requesting to see

·        See what content we have that is the most popular

·        Plan security needs and solutions

·        Detect and troubleshoot potential problems with out WWW or FTP sites

·        And a lot more…

 

There are several types of log formats that we can choose from:

1.      Microsoft IIS Log File Format
This is a fixed ASCII format

2.      NCSA Common Log File Format
Another fixed ASCII format, available for Web Site but not FTP sites

3.      W3C ExtendedLog File Format
A customizable ASCII format

4.      ODBC Logging
A fixed format logged to a database

 

We can use Site Server Express to analyze these logs.  However, we must import them first.  One limitation of SSE is that it will only import the IIS log file format or the NCSA Common log file format.

 

When saving log files, you can instruct IIS to save data for:

1.      The last 24 hours

2.      The last seven days

3.      The last month

4.      Or, until the log reaches a certain size.  After that it will stop logging.

 

Report Writer

 

After you have imported your logs you can then use Report Writer to produce reports from that information.  When you open Report Writer, you are given the option to:

1.      Choose an analysis from a catalog of standard reports

2.      Create your own from scratch

3.      Select a report you have previously created

 

The report writer has over 20 reports that are grouped as detail or summary reports. You can edit these reports to create your own custom reports.

 

Detail Reports

1.      Bandwidth
Show byte transfers on hourly, daily, and weekly basis.  Let’s you see trends so you can plan appropriately

2.      Browser and Operating System
Shows what browsers users have been using to access your site.  This will aid developers in coding for the browsers that most frequently visit your site.  With the demise of Netscape, it will be a lot easier for developers, as they won’t have to worry about coding web pages for non standard browsers like Netscape Navigator.

3.      Geography
Shows the location of your visitors.  This requires Whois and DNS resolution prior to completing these reports

4.      Hit
Shows the number of hits on the server each hour, day, and week, in addition to the average number of hits for a day of the week and an hour of the day.  This is helpful in planning for capacity.

5.      Organization
Shows the organizations that visit the web site.  Again, Whois and DNS queries must be done in order for this report to be completed.

6.      Referrer
Shows the top external organization names and URLs that users linked from to reach your web site.  This lets you know if your advertisements are working from the web sites that you’ve placed them on.  This requires that you’ve configured your logs to keep referrer data, and that you have done DNS queries first.

7.      Request
Shows the most and least requested documents over time and by folder.  If no one is looking at some of your pages, you might delete them and regain space on your server for other content.

8.      User
Lists the number of overall and first time visitors to the site, and the average number of visits per user, users per organization, and requests and length of visit per user.  The chart will also give trends in usage by registered and unregistered users over time.  This will let you know if first time users are coming back, or if you’re only getting new users.  You want people to come back, and ideally, make your site their homepage, so that everytime they open up their browser, yours is the first page they see

9.      Visits
Lists the number of requests made per visit by those users who do the most requesting, the average length of a visit, and the pages of your site that users are likely to access first during a visit or last just before leaving your site.

 

Summary Reports

1.      Bandwidth
Lists average amount of bandwidth used each day,and offers analyses by day of week, hour of day, and work versus non work hours.

2.      Browser and Operating System
Indicates which browsers and operating systems the visitors to your site are using, and whether you should adjust your content to accommmotdat them.

3.      Executive
Highlights the information on the detail reports

4.      Executive Summary for Extended Logs
Highlights the information in the Detail reports for users of extended logs.

5.      Geography
Gives a summary of the geographic analysis of visitors to your site.

 

Posting Acceptor

The Posting Acceptor allows users to use HTTP Post in order to upload content to their web sites.  This allows them to not have to use a dedicated FTP program to transfer their files to their web sites.

 

Programs that use the HTTP Post command include:

·        Microsoft Web Publishing Wizard

·        Internet Explorer

·        Netscape Navigator 2.02

·        Microsoft Frontpage using the HTTP post method for uploading files to non frontpage enabled web sites.