|

Stateful
-vs- Stateless
HTTP is a stateless
protocol. This means that an HTTP server has no information in a request
to tie it to any other request. The data in a response is based only on
the information the client sends in the request. It's like doing a math
problem in high school -- you are only allowed to use the facts
given in the problem plus mathematical logic to derive an answer.
HTTP stands out from all the other protocols you're probably familiar
with using. These protocols are all "stateful"
or "stated", which means
information divulged in one request can be used to modify future
requests. In fact these protocols have a concept of a "session"
wherein a batch of requests are sent and responses received. FTP
(File Transfer
Protocol) has many states, including
"the current directory". SMTP
(Simple Mail
Transfer Protocol)
and POP (Post
Office Protocol)
both include a concept of "who you are" which is used for all
requests. NNTP (Network
News Transfer
Protocol) allows you to "change
Usenet groups" to direct where future requests for articles will be
retrieved from.
Stateless protocols generally have the advantage that they require
fewer resources on the server -- the resources are pushed into
the client. But the disadvantage is that the client needs to tell the
server enough information on each request to be able to get the proper
answer. Cookies are a method for a server to ask the client to store
arbitrary data for use in future connections. The server is asking the
client to keep state information.
The hardest part of personalizing a Web page is maintaining
state -- tracking users as they click through
your site. Web browsers and servers have no built-in mechanisms to keep
tabs on and remember users as they go from page to page. That is, after
a users sends a request to the server and a Web page is returned, the
server forgets all about the user and the page she has just downloaded.
If a user clicks on a link, the server doesn’t have background
information about what page the user is coming from and, more
importantly, if the user returns to the page at a later date, there is
no information available to the server about the user’s previous
actions on the page.
Maintaining state can be important to developing complex interactive
applications. Several sites work around this problem using complex
server-side CGI scripts. But there is a solution, the new browsers
address this problem with cookies: a method of storing information
locally in the browser and sending it to the server whenever the
appropriate pages are requested by the user. Because cookies allow Web
builders to ask a user for personal information, store the data on their
computers, and retrieve that knowledge when the user returns, they are
the most common way to track visitors.
The cookie mechanism allows servers to personalize pages
for each client, or remember selections the client has made when
browsing through various pages of a site -- all without having to
use a complicated (or more time-consuming) CGI/database system on the
server's side.
Cookies work in the following way: When a CGI program
identifies a new user, it adds an extra header to its response
containing an identifier for that user and other information that the
server may glean from the client's input. This header informs the
cookie-enabled browser to add this information to the client's cookies
file. After this, all requests to that URL from the browser will include
the cookie information as an extra header in the request. The CGI
program uses this information to return a document tailored to that
specific client. The cookies are stored on the client user's hard
drive, so the information remains even when the browser is closed and
reopened.
JavaScript provides the capability to work with client-side
information stored as cookies.
Cookies provide a method to store information at the client side and
have the browser provide that information to the server along with a
page request. A cookie always includes the address of the server that
sent it. That's the primary idea behind a cookie: Identification.
Where
did the term cookies come from?
"Lou Montulli, (currently?) the
protocols manager in Netscape's client product division, wrote the
cookies specification for Navigator 1.0, the first browser to use the
technology. Montulli says there's nothing particularly amusing about
the origin of the name: 'A cookie is a well-known computer science
term that is used when describing an opaque piece of data held by an
intermediary. The term fits the usage precisely; it's just not a
well-known term outside of computer science circles.'"
The
Truth about Cookies
- Cookies just identify the computer being used, not the individual
using the computer.
- A cookie is not a script. A cookie may be written by a
script (either a CGI or JavaScript) but the cookies themselves are
simply passive text strings.
- The Netscape specifications limits a cookie to 4K
of text. Most cookies however rarely exceed 20-30
characters (a fraction of a kilobyte). The number of cookies
on your machine is limited to 20 per site visited up to a maximum
of 300. The oldest cookies are deleted.
- Cookie security is such that only the originating domain can ever
use the contents of your cookie. The trick that companies such as doubleclick
use is to embed a graphic from their domain on a page from another
domain. When the graphic (usually a banner) is loaded the doubleclick
domain sets a cookie.
- The specifications allow for cookies to be set with or without an expiry
date. The former are called 'Persistent Cookies' and the
latter 'Non-persistent'. A cookie without a valid (future)
expiry date will not be stored on your machine but will be available
for the duration of the current session (ie. until you log off).
- Cookie files stored on the client computer are easily read by any
word processing program, text editor or web browsing software. If a
merchant actually stores sensitive information in a Cookie, that
information can be read by any Cookie savvy person with access to
the computer storing the Cookie. Most web merchants sophisticated
enough to use Cookie based shopping programs will take steps to
protect any information transmitted and stored via Cookie
technology. For example:
- The merchant could use only Secure
Socket Layer
(SSL) or other
encryption-enabled Web pages to send and receive sensitive Cookie
information to protect that information from Web miscreants
sniffing that merchant's web correspondence. Any Cookie containing
sensitive information could be created using the
"secure" attribute so that it can be retrieved only by a
computer running SSL enabled software. Additionally, any sensitive
information actually stored in the Cookie should be encrypted to
hide it from others with access to the web surfer's computer.
- Better yet, the merchant can use a "short form Cookie"
that does not store the actual data but instead contains a pointer
that the merchant's computer can use to locate the file on the
merchant's machine where the information collected is stored.
The bottom line is that an unsuspecting Web consumer, using current
Cookie-enabled browsers in their default mode and ignorant of the fact
or content of a cookie, must rely on the merchant to "do the
right thing".
What Cookies cannot do
- Cookies CANNOT be used to get a persons e-mail address.
They can save the e-mail address after a browser types it into a
form, but they can't GET anything. A cookie is just a holder.
- Cookies do not steal credit card numbers, passwords or any other
information. Rather they allow a web site to store information a
visitor voluntarily submits to that web site on that visitor's
machine. In this regard, Cookies are no different that the
traditional databases maintained by retail stores, mail order
houses, and other merchants so many of us trust implicitly with the
same information the Cookie stores only on the same machine used to
supply that information.
- Cookies cannot be accessed by any computer other than the computer
that created the cookie. Yes, if a web surfer goes to Company Y's
web page and orders a product, Company Y can store whatever
information that surfer is required to provide to complete that sale
as a Cookie on the surfer's machine. Equally true is that only
Company Y can retrieve that information. Companies A, B, C etc.,
running on a different computer, cannot access any of the data
stored in the Company Y Cookie. Bottom line, storing the information
in a Cookie poses no greater risk of Company Y misconduct than
providing Company Y access to that same information via mail,
telephone, fax or a Cookie-less web page.
- Web sites that send Cookies cannot, by virtue of creating that
Cookie, access any information stored on the system housing the
Cookie that does not appear in that Cookie. The Cookie at most
allows the web site creating it to retrieve from a visitor's system
information that visitor has already submitted to that web site.
Cookies
can be used for a multitude of tasks including:
- Reminder calendars that use cookies to store
appointments and other messages.
- Country tours that users can take during several
visits to a Website – cookies are used to remember where the user
left-off.
- Adventure games that use cookies to keep track of
pertinent character data and the current state of the game.
- Storing data as you move from one page (or frame) to another, for
example shopping carts.
- Saving user preferences.
- Greeting people by name.
- Notifying visitor on what has changed since their
last visit.
- Using CGI you can use a cookie to identify repeat visitors to your
site and their movement patterns.
The last point and others like it cause concern for some users. What
you should realize is that tracking of visitors existed long before
cookies. Using CGI and server-side scripts you can be tracked much
more efficiently than by the humble cookie.
cookies.txt
During a browsing session Netscape stores your cookies in memory, but
when you quit they go into a file called cookies.txt (ie, C:\Program
Files\Netscape\Users\Username), but on a Macintosh the cookie jar is
called MagicCookie and resides in the preferences folder. Every time you
open your browser, your cookies are read in from disk, and every time
you close your browser, your cookies are re-saved to disk. As a cookie
expires, it is discarded from memory and it is no longer saved to the
hard drive.
www.sislands.com FALSE / FALSE 856869067 headCount 5
.sislands.com TRUE / javascript/week7/html FALSE 959145732 counter 3
www.sislands.com FALSE / FALSE 856869067 userName Frank%20Peter
Each line represents a single piece of stored information. A tab is
inserted between each of the fields.
- The domain of "originating" cookie. The domain
parameter takes the flexibility of the path parameter one step
further. If a site uses multiple servers within a domain the it is
important to make the cookie accessible to pages on any of these
servers.
domain=www.sislands.com
Cookies can be assigned to individual machines, or to an entire
Internet domain or sub-domain. The only restrictions on this value
is that it must contain at least two dots (.sislands.com, not
sislands.com) for the normal top-level domains, or three dots for
the "extended" domains (.ecom.sislands.com, not
ecom.sislands.com)
- flag -
A TRUE/FALSE value indicating if all
machines within a given domain can access the variable. This value is
set automatically by the browser, depending on the value you set for domain.
- If you provide a cookie path attribute, the
browser will check it against your script's URL before returning the
cookie. For example, if you specify the path "/cgi-bin",
then the cookie will be returned to each of the scripts
"/cgi-bin/tally.pl", "/cgi-bin/order.pl", and
"/cgi-bin/customer_service/complain.pl", but not to the
script "/cgi-private/site_admin.pl". The path "/foo"
would match "/foobar" and "/foo/bar.html". The
path "/" is the most general path. By default, path is set
to "/", which causes the cookie to be sent to any CGI
script on your site.
The two examples below should have help explain the path and what it
means.
Cookies created in Week 1&
read in Week 1 (annotated
version)
Cookies created in Week 1 & read in Week 7 (annotated
version)
If the second boolean ("secure")
attribute is set, the cookie will only be sent to your script if the
CGI request is occurring on a secure channel, such as SSL
(default is false).
- expiry date
is the large number before the cookie-name. It
represents the number of milliseconds since Jan 1, 1970 00:00:00 GMT
(called the epoch in JavaScript). Hence, there are no Y2K issues with
Cookies.
- The end of each line there is the cookie-name and cookie-value
- The Cookie.
Setting
Cookies
To set a cookie it is only necessary to specify a name-value pair.
The domain will be set automatically and the path will be "/".
A cookie set without an expiry date will not be written to the cookie
file as it cannot persist beyond the current session.
Cookie values, for example, may not include semicolons, commas, or
whitespace. For this reason, you may want to use the JavaScript escape()
function to encode the value before storing it in the cookie. If you do
this you’ll have to use the corresponding unescape()
function when you read the cookie value.
escape() creates and
returns a new string that contains an encoded version of the string. The
string is encoded as follows: all spaces, punctuation, accented
characters, and any other that are not ASCII letters or numbers are
converted to the form %xx, where xx is the two hexadecimal digits
that represent the ISO-8859-1 (Latin-1) encoding of the character. For
example, the ! character has the Latin-1
encoding of 33 which is 21 hexadecimal, so the escape()
replaces this character with the sequence %21.
Thus the expression:
escape("Hello
World!");
yields the string:
Hello%20World%21
while
unescape(Hello%20World%21);
yields the string:
Hello World!
The purpose of the escape()
encoding is to ensure that the string is portable to all computers and
transmittable across all networks, regardless of the character encodings
the computer or networks support (as long as they support ASCII).
The encoding performed by escape()
is like the URL encoding used to encode query strings and other
portions of a URL that might include spaces, punctuation, or characters
outside the standard ASCII character set.
The only real difference is that the URL encoding,
the spaces are replaced with a ‘+’
character, while the escape()
replaces spaces the %20 sequence.
Here is syntax use to set a cookie using JavaScript:
document.cookie="NAME=VALUE;
expires=DATE; path=PATH; domain=DOMAIN; secure";
and from the server:
Set-Cookie: NAME=VALUE;
expires=DATE; path=PATH; domain=DOMAIN; secure
Optional Attributes for Set-Cookies
| NAME |
DESCRIPTION |
| NAME=VALUE |
Both name and value
can be any strings that do not contain either a semi-colon,
space, or tab. Encoding such as URL encoding may be used if
these entities are required in the name
or value, as long as your script is
prepared to handle it. |
| domain=DOMAIN |
This attribute specifies a domain name range for which the
cookie will be returned. The domain-name
must contain at least two dots (.), e.g., ".microsoft.com"
This value would cover both "www.microsoft.com"
and "msdn.microsoft.com", and any other
server in the microsoft.com domain.
When searching the cookie list for valid cookies, a
comparison of the domain attributes of the cookie is made with
the Internet domain name of the host from which the URL will be
fetched. If there is a tail match, then the cookie will go
through path matching to see if it should be sent. "Tail
matching" means that domain attribute is matched against
the tail of the fully qualified domain name of the host.
Only hosts within the specified domain can set a cookie for a
domain and domains must have at least two (2) or three (3)
periods in them to prevent domains of the form:
".com", ".edu", and ".us". Any
domain that fails within one of the seven special top level
domains listed below only require two periods. Any other domain
requires at least three. The seven special top level domains
are: "COM", "EDU", "NET",
"ORG", "GOV", "MIL", and "INT".
The default value of domain is the host name of the server
which generated the cookie response. |
| expires=DATE |
Specifies the expiry date of a cookie. After this date the
cookie will no longer be stored by the client or sent to the
server (DATE takes the form Wdy, DD-Mon-YY HH:MM:SS GMT –
dates are only stored in GMT). By default, the value of expiry
is set to end of the browser session. |
| path=PATH |
The path attribute is used to specify the subset of URLs in a
domain for which the cookie is valid. If a cookie has already
passed domain matching, then the pathname component of the URL
is compared with the path attribute, and if there is a match,
the cookie is considered valid and is sent along with the URL
request. The path "/foo " would match "/foobar"
and "/foo/bar.html". The path "/" is the
most general path. If the path is not specified, it as assumed
to be the same path as the document being described by the
header which contains the cookie.
NOTE: And the more specific the path, the higher in
the cookie "order" it will be read from the cookie.txt
file. However, all the cookies from that domain will also be
sent in the HTTP header. |
| secure |
If a cookie is marked secure, it will only be transmitted if
the communications channel with the host is a secure one.
Currently this means that secure cookies will only be sent to
HTTPS (HTTP over SSL) servers. If
secure is not specified, a cookie is considered safe to be sent
in the clear over unsecured channels.
So mark it as secure if you are, for instance, running a
JavaScript shopping cart with SSL. |
By comparisons, the Cookie field in a request header contains only a
set of NAME=VALUE pairs for the requested
URL:
Cookie: name1=VALUE1; name2=VALUE2 …
Multiple Set-Cookie fields can be sent in a single response header
from the server.
Note: a cookie that has the same path and
name as an existing cookie will overwrite the old one – this can be
used as a way of erasing cookies – by writing a new one with an expiry
date that has already passed.
For a cookie to persist beyond the current session a valid expiry
date must be set. This is a number or date with a value greater than the
current time/date value. The best way to set an expiry date is to take
the current date value, add a set time period and convert to GMT
(remember we're on a global network). Future cookie standards may allow
setting a duration rather than on a set date.
The cookie(s) that you set or accept are only accessible at pages
with a matching domain name, matching path. Also the cookies must not
have reached or passed their expiry date. When these criteria are met
the cookies become available to JavaScript via the document.cookie
object.
Where
are the Cookies stored?
Where does MSIE keep
its cookies?
Microsoft keeps its cookies in different locations. You will find
your cookies in the folder C:\windows\cookies in Windows 9X and
C:\WinNT\profiles\username\cookies in Win NT
Each individual domain's cookies are stored in their own file, along
with the username that accessed the site. For example, if I went to
Yahoo, I would get a cookie that is stored in the file frank@yahoo.txt.
Note: that the username is not sent with the cookie.
Where does Netscape
keep its cookies?
You will find your cookies file in the folder C:\Program
Files\Netscape\Users\YourName then look for cookies.txt
Controlling
Cookies within your Browsers
If
You Want to Control Which Cookies You Accept:
You can order your browser to accept all cookies or to alert you every
time a cookie is offered. Then you can decide whether to accept one or
not.
If you're using Internet Explorer 4.0:
1. Choose View, then
2. Internet Options.
3. Click the Advanced tab,
4. Scroll down to the yellow exclamation icon under Security and
choose one of the three options to regulate your use of cookies.
If you're using Netscape Communicator 4.0:
On your Task Bar, click:
1. Edit, then
2. Preferences, then
3. click on Advanced.
4. Set your options in the box labeled "Cookies".
How
to See Cookies You've Accepted:
If you're using Internet Explorer 4.0
On your task bar, click:
1. View, then
2. Internet Options.
3. Under the tab General (the default tab) click
4. Settings, then
5. View Files.
Stopping
Cookies
The options to allow all or deny all cookie are relatively clear.
The option to warn before accepting cookies is useful when you are
developing a site that uses cookies but become annoying when you are
browsing the internet. Some servers are able to use cookies to gather
information about visitor behavior. When these are incorrectly
configured a single page can set a cookie for every graphic.
The intermediate option is to block cookies that do not originate
from the current domain. This means that if you are at http://www.foo.com/
and a server at http://www.bar.com/ tries to set a cookie through a
banner graphic on the page, that cookie will not be accepted.
Another poplar method is to replace your cookie.txt file with a
folder of the same name. This prevents any cookies from being accepted.
HTTP
and how it works
When a user requests a page, an HTTP request is sent to the server.
The request includes a header that defines several pieces of
information, including the page being requested.
The server returns an HTTP response that also includes a header. The
header contains information about the document being returned, including
its MIME type (such as text/html for a standard HTML page or image/gif
for a GIF file).
Cookies
and HTTP Headers
Cookie information is shared between the client browser and a server
using fields in the HTTP headers.
When the user requests a page for the first time, a cookie (or more
than one cookie) can be stored in the browser by a Set-Cookie entry in
the header of the response from the server. The set-Cookie field
includes the information to be stored in the cookie along with several
optional pieces of information, including an expiry date, path, and
server information, and if the cookies requires security.
Then, when the user requests a page in the future, if a matching
cookie is found among all the stored cookies, the browser sends a Cookie
field to the server in request header. The header will contain the
information stored in that cookie.
Cookies
and CGI scripts
In order for cookies to be useful, it is necessary for the server to
be able to take advantage of the cookie information it receives and for
the server to be able to generate cookie headers if they are needed.
This done primarily done by CGI scripts.
For instance, if you want to provide a custom search tool that would
search WWW indices selected by the user, you would need to develop a
system that follows this basic pattern:
- User calls the site using an URL that requests a CGI script.
- The script checks whether it is the user’s first time at the
site by checking whether there is a cookie field in the HTTP request
header.
- If there is no cookie, the script sends back a new search page
with all choices unselected and an empty search field.
- If there is a Cookie field, the script interprets the cookie and
returns a page with all the user’s previous choices selected.
- When the user conducts a search, the script returns the search
results along with a Set-Cookie field in the header to reset the
cookie to the newly selected values that the user used for the
search.
To implement this type of server-side processing for cookies may
require significant increases in the load on a Web server. With this
model, most pages are being built dynamically based on receiving cookie
information in the header.
This is in contrast to typical Web pages, which are static, and all
the server needs to do is send the current file to the client without
any additional processing.
Links:
|