Internet Information Server
4.0
Chapter 14
Analyzing Web Sites
Internet
Information Server’s most important job is to make sure the WWW service is
running and optimized. This means that
main duty of IIS is to make web pages available. If you’re a web page developer, you need to get all the
information you can about your website.
If you’re not a developer, but the “webmaster”, you need to be able to
inform your developers about problems with the webpages or the entire web site.
This
class is not about making web pages.
However, a good understanding of website development will serve you well
in having an improved understanding of the functions and purposes of IIS. I highly recommend you learn Frontpage98 or
FrontPage2000. Eastfield will be
offering classes on FrontPage98 this summer.
If you’re at all interested in learning web page development, this
classes will help enormously.
IIS
4.0 includes a program called Site Server Express 2.0. This is a watered down version of
Microsoft’s full retail product call Site Server. Site Server Express includes programs that will allow you to:
1.
Analyze
the content of your website
2.
Create
reports based on the results of your analysis
3.
Create
logs of web site activity so you can analyze the usage of your website.
Site
Server Express is really cool. When you
get into web site development, you will think its really cool too!
SSE
allows you to:
1.
Publish
Content
2.
Manage
Content
3.
Analyze
Usage
You
do this stuff via:
1.
Content
Analyzer
2.
Usage
Import
3.
Report
Writer
4.
Posting
Acceptor
Content
Analyzer allows you to see the layout of your entire site. It will find broken links and create
detailed HTML formatted reports of its findings.
Content
Analyzer will also created something called a “WebMap”. The WebMap is a visual database of your
website. It stores information on all
the objects in your website, like graphics, sounds, text, video, and the like.
There
are two ways that you can view your WebMap:
1.
The
Tree View
2.
The
Cyberbolic View
We’ll
look at these in more detail later.
Content
Analyzer will provide you with a Site Summary Report. The report will give you a LOT of information on things like the
count and sizes of pages, images, and applications.
Usage
Import and the Report Writer will allow you to import your IIS logs and then
using one of the 21 predefined templates, have a nice detailed report written
for you based on the report template you’ve selected.
Report
Writer generates reports in a variety of formats, including:
1.
HTML
2.
Word
3.
Excel
The
Posting Acceptor allows your customers to upload files via the HTTP Post
command, rather than having to use FTP and a dedicated FTP client program.
As
mentioned above, a WebMap can show you every object on your website. These include:
·
Pages
·
Audio
files
·
Video
files
·
Java
Applets
·
ActiveX
Controls
·
RealAudio
·
And
more…
There
are two ways that you can looks at this information: The Tree View and The
Cyberbolic View.
The
Tree View is much like what you see in the Explorer, except there is only one
pane. There is a little icon which
represents which types of objects it has found. You’ll see a plus sign next to a page icon, and when you expand
it, you see the objects contained on the page if there are any.
The
Cyberbolic view is unlike anything you’ve probably seen before (unless you’ve
used FrontPage97/98/2000 before). It is
a graphical layout of your site, and gives you a real sense of why its called
the “World Wide Web”. The only way to
know how to work with the Cyberbolic View is to look at it, which we’ll do in
some of the excercises.
Content
Analyzer includes a tool called “Quick Search”. This tool allows you to search for the eight most common errors
encountered on a website. These
include:
1.
Broken
Links – will find broken links on your web
2.
Home
Site Objects – Finds objects located in your domain, and finds problems with
these objects
3.
Images
without ALT – The ALT tag is used to describe pictures; they are very useful
for people who are using text only browsers, and for the blind who use screen
reading software.
4.
Load
Size over 32k – The cutoff for what is considered maximum page size at 28.8kbps
is 32k because of the download time involved.
This will change when the majority of people use faster links
5.
Non-Home
Site Objects – This will provide a list of objects that are NOT contained on
your website, and therefore represents a list of objects that are not under
your direct control.
6.
Not
Found Objects (404) – These are objects that could not be located, for whatever
reason.
7.
Unavailable Objects – These are like the
404s. It is more comprehensive because
it includes objects that are on unavailable servers, broken com links, or
password protected
8.
Unverified
Objects – Shows objects that have not been checked to determine whether they
are accessible
Content
Analyzer will do a complete site summary report on the web site that you point
it to. This report includes:
1.
Counts
and sizes of objects on the web site
2.
Count
of objects and links that are OK, missing, or in error
3.
Thenumber
of levels in use by the site, where the home page is considered level one
4.
The
average number of links
5.
And
a LOT MORE…
On
the next few pages there is an explanation of the report findings. You won’t be tested on these, but keep in mind
where these explanations are, because you will want to refer to them after
doing your own site reports.
As
we’ve seen when working with the property sheets of the various components
within IIS, we can log information about the WWW Service, FTP Service, SMTP
Service, NNTP Service, and so on.
With
all this logging going on, what can we do with it? IIS logging is very much
more detailed than the logging of events that you see in the NT Event
Viewer. The IIS Logging is maintained
in either one of two formats:
1.
Plain
ASCII text files
2.
ODBC
database formats
What
can we do with this information?
·
Find
out which users are accessing our site
·
Get
information about the information they are requesting to see
·
See
what content we have that is the most popular
·
Plan
security needs and solutions
·
Detect
and troubleshoot potential problems with out WWW or FTP sites
·
And
a lot more…
There
are several types of log formats that we can choose from:
1.
Microsoft
IIS Log File Format
This is a fixed ASCII format
2.
NCSA
Common Log File Format
Another fixed ASCII format, available for Web Site but not FTP sites
3.
W3C
ExtendedLog File Format
A customizable ASCII format
4.
ODBC
Logging
A fixed format logged to a database
We
can use Site Server Express to analyze these logs. However, we must import them first. One limitation of SSE is that it will only import the IIS log
file format or the NCSA Common log file format.
When
saving log files, you can instruct IIS to save data for:
1.
The
last 24 hours
2.
The
last seven days
3.
The
last month
4.
Or,
until the log reaches a certain size.
After that it will stop logging.
After
you have imported your logs you can then use Report Writer to produce reports
from that information. When you open
Report Writer, you are given the option to:
1.
Choose
an analysis from a catalog of standard reports
2.
Create
your own from scratch
3.
Select
a report you have previously created
The
report writer has over 20 reports that are grouped as detail or summary
reports. You can edit these reports to create your own custom reports.
1.
Bandwidth
Show byte transfers on hourly, daily, and weekly basis. Let’s you see trends so you can plan
appropriately
2.
Browser
and Operating System
Shows what browsers users have been using to access your site. This will aid developers in coding for the
browsers that most frequently visit your site.
With the demise of Netscape, it will be a lot easier for developers, as
they won’t have to worry about coding web pages for non standard browsers like
Netscape Navigator.
3.
Geography
Shows the location of your visitors.
This requires Whois and DNS resolution prior to completing these reports
4.
Hit
Shows the number of hits on the server each hour, day, and week, in addition to
the average number of hits for a day of the week and an hour of the day. This is helpful in planning for capacity.
5.
Organization
Shows the organizations that visit the web site. Again, Whois and DNS queries must be done in order for this
report to be completed.
6.
Referrer
Shows the top external organization names and URLs that users linked from to
reach your web site. This lets you know
if your advertisements are working from the web sites that you’ve placed them
on. This requires that you’ve
configured your logs to keep referrer data, and that you have done DNS queries
first.
7.
Request
Shows the most and least requested documents over time and by folder. If no one is looking at some of your pages,
you might delete them and regain space on your server for other content.
8.
User
Lists the number of overall and first time visitors to the site, and the
average number of visits per user, users per organization, and requests and
length of visit per user. The chart
will also give trends in usage by registered and unregistered users over
time. This will let you know if first
time users are coming back, or if you’re only getting new users. You want people to come back, and ideally,
make your site their homepage, so that everytime they open up their browser,
yours is the first page they see
9.
Visits
Lists the number of requests made per visit by those users who do the most
requesting, the average length of a visit, and the pages of your site that
users are likely to access first during a visit or last just before leaving
your site.
1.
Bandwidth
Lists average amount of bandwidth used each day,and offers analyses by day of
week, hour of day, and work versus non work hours.
2.
Browser
and Operating System
Indicates which browsers and operating systems the visitors to your site are
using, and whether you should adjust your content to accommmotdat them.
3.
Executive
Highlights the information on the detail reports
4.
Executive
Summary for Extended Logs
Highlights the information in the Detail reports for users of extended logs.
5.
Geography
Gives a summary of the geographic analysis of visitors to your site.
The
Posting Acceptor allows users to use HTTP Post in order to upload content to
their web sites. This allows them to
not have to use a dedicated FTP program to transfer their files to their web
sites.
Programs
that use the HTTP Post command include:
·
Microsoft
Web Publishing Wizard
·
Internet
Explorer
·
Netscape
Navigator 2.02
·
Microsoft
Frontpage using the HTTP post method for uploading files to non frontpage
enabled web sites.