[ Pobierz całość w formacie PDF ]
.plexample.The twist here is that I use an associative array inthe Cookie value instead of the regular array you saw previously.This structure is handy to easily increment vote totals as newones come in.The movie votes are kept current on a client-by-clientbasis for up to two hours.The second part, the dynamic graphing capability, is an interestingexample of how GD.pm makes your life easy.All you need to dois use the current Cookie value, parse it in some sensible manner,and feed it to the simple GD.pm graphics methods, such as filledRectangle.The available methods are discussed in depth in the documentationand take an intuitive set of parameters.The filledRectanglemethod, for example, takes five parameters: the four corners anda color.Granted, the graph I drew in Figure 24.14 is not on theorder of the Sistine Chapel, but the reader should get a senseof the cost (time to do the code) versus the return (an efficientmeans to provide graphics to the user).RetrievingWeb Data from Other ServersChapter 19 featured a discussion of TCP/IPas the fundamental building block on which the HyperText TransferProtocol stands.By exploiting this concept, developers can createtheir own client programs that perform automated or semi-automatedtransfer protocol requests.The well-known types of these programscommonly are known as robots, spiders, crawlers,and so on.(See note)Robots operate by opening a connection to the target server'sport (traditionally, 80 for HTTP requests), sending a proper request,and waiting for a response.To understand how this works, Listing24.21 shows opening a regular Telnet connection to a server'sport 80 and making a simple GETrequest (recall the discussion of the HEADmethod in Chapter 20).Listing 24.21.A Telnet session to the HTTP port 80./users/ebt 47 : telnet edgar.stern.nyu.edu80Trying 128.122.197.196.Connected to edgar.stern.nyu.edu.Escape character is '^]'.GET /<TITLE> NYU EDGAR Development Site </TITLE><A HREF="http://edgar.stern.nyu.edu/team.html"><img src="http://edgar.stern.nyu.edu/icons/nyu_edgar.trans.gif"></a><h3><A HREF="http://edgar.stern.nyu.edu/tools.shtml">Get Corporate SEC Filings using NYU </a> or<A HREF="http://www.town.hall.org/edgar/edgar.html">IMS </a> Interface</A></h3><h3><A HREF="http://edgar.stern.nyu.edu/mgbin/ticker.pl"><! img src="http://edgar.stern.nyu.edu/icons/ticker.gif">What's New - Filing Retrieval by Ticker Symbol!</A></h3><h3><A HREF="http://edgar.stern.nyu.edu/profiles.html">Search and View Corporate Profiles</A></h3>.Connection closed by foreign host./users/ebt 48 :Note that Martin Koster has developed a set of robot policies,which are not official standards but allow a server to requestthat certain types of robots not visit certain areas on the server.His proposal is available at http://info.webcrawler.com/mak/projects/robots/robots.html.It is considered good Web netiquette to follow Koster's guidelines;a poorly behaved robot can generate vociferous complaints fromsites that are affected adversely.Assuming that the requested file exists, the data is sent back,after which the connection closes.Note that it is unformatteddata; formatting is the job of the client software and, in thiscase, there is none.This is amusing but hardly automated.Although most programminglanguages include networking functions that the developer coulduse to build automated tools, the developer does not need to startfrom scratch.A number of URL retrieval libraries are readilyavailable for Perl.(See note)Listing 24.22 uses the familiar http_get(See note)program again.The purpose of this Perl script using http_getis to do the following:Retrieve a URL requested by the user (the root page)Parse the data returned and attempt to identify all <AHREF=HTTP:> links within the root pageRetrieve each of the HTTP links found in the root page that havean.html extension or noextension, parse those pages, and display the links found.It is interesting to study this program with the related robot.cgipresented in Chapter 22, "GatewayProgramming II: Text Search and Retrieval Tools."When run against http://www.hydra.com/,the output shown in Figure 24.15 was returned.The <HR>tag is used to separate each of the links found on the root page,with each of the second level links indented.Figure 24.15 : Output produced by executing LinkTree with the URL http://www.hydra.com/.Listing 24.22.The http_getPerl script.#!/usr/local/bin/perl#linktree.pl v.1require "cgi-lib
[ Pobierz całość w formacie PDF ]