Web Documentation

by Paul Morie


Contents

  1. Overview
  2. Database
    1. DB Schema
    2. Making Changes
    3. Adding a tool
    4. Making Backups
    5. Caveats
  3. Publications Section
    1. Group Bibfile
    2. Abstracts
    3. Using the Update Scripts
    4. Adding a publication
  4. Server Monitor System
    1. How does it work?
    2. Where are the scripts located?
    3. Adding a server

Overview

The purpose of this section is to give the reader the background knowledge necessary to effectively understand the rest of this document. As the author, I am assuming that you, as a reader, bring a basic knowledge of UNIX and relational databases to the table. If you are reading this because you have been tasked with maintaining the site, and you don't feel confident in one or both of these areas, then you will need to familiarize yourself before going further.

Technologies

Our web page runs is served from apache, and uses two key technologies. The first is PHP, which is used to generate dynamic content for presentation, and the second is MySQL, a free database which holds most of the content for the site. A typical page in the website makes several calls to the embedded PHP, which in turn grabs the relevant content for that page from the MySQL databaes and creates the presenting HTML. The key here is that the PHP files hold the presentation logic, while the MySQL database holds the content being presented.

Resources

There are several good resources for PHP and MySQL available online. For PHP, a good place to start is http://php.net. For MySQL, http://www.mysql.com is a good source of documentation. Both of these sites have documentation with user comments, which I have found to be an invaluable resource in the course of implementing and maintaining the site.


Back to top

Database

The web database is currently hosted the MySQL installing running on bashful.cs.uiuc.edu. The database is named web. The relevant passwords are available from the master password list.

DB Schema

The database schema is probably best described in terms of which tables are used in which sections of the site. Here is a rough list:

This is not meant to be a comprehensive list. If you are unsure of which tables a page uses, you can find out by reading the PHP code directly. This must be done from the command line -- the PHP code itself is not output to the browser. So, viewing the source of a rendered HTML page will not show the generating PHP.

Making Changes

The easiest way to make changes to the web database is to use a GUI tool such as mysqlcc, which can be downloaded for free from the MySQL site. Consult this link for instructions on installing mysqlcc. For the tasks I describe here, I'll assume that you are using some kind of graphical MySQL client, which is mush easier than writing an INSERT statement.

HOWTO: Adding a Tool

Adding a tool to the tools page is relatively easy. The Tools table is the only table you'll have to make any changes to. The columns and how they're rendered are described here:

All you need to do is create a new entry in the Tools table, and fill in those fields. The PHP files on the site will handle generating the pages.

This section wouldn't be complete without a plug for the excellent prog-metal band Tool.

Making Backups

One of the most essential tasks of maintaining the web database is making regular backups. Making backups of mysql databases can be done very easily using the mysqldump command. For more information, see the /home/roth/cogcomp/web_backup/README file on bashful. A good strategy is to make a backup of the database after every session where changes are made. This way, you recover easily if you mess something up (you will).

Recovering from a backup made with mysqldump is easy. Say that your backup is in a file called backup. All you have to do is have a user with CREATE and INSERT privileges for the database. At the command line, type:

cat backup | mysql -u <user> -p

NOTE: This information is now somewhat out of date. This command should work for mias.cs.uiuc.edu:

mysqldum --opt -u root web -p > webdb.<date>

where <date> is the current date. 

Caveats

It should be noted that there are two different PHP APIs for interfacing with MySQL. The mysql API is used for communicating with MySQL versions under 4.0, while the mysqli API is used for versions 4.0 and above. Currently, the web database runs on a MySQL 3.23 server, so we use the mysql API. If there comes a time when we must migrate to a MySQL 4.x server, I think the best thing to do would be to provide functions to hide the underlying API calls, ie:

function db_fetch_assoc($result) {
  return mysqli_fetch_assoc($result);
}

In this way, we can reduce the number of changes that have to be made in order to switch APIs.

Back to top

Publications

The publications section of the website is maintained in the group bibfile. The bibfile has extensions which can be used to indicate that a particular publication is related to a certain project or grant, is available online in a particular location, etc. The bibfile is used with a set of scripts which update the MySQL database for the group website. There is a CVS module, bib which holds all of the relevant files and scripts for the publications section of the site, including the group bibfile, etc. It can be checked out by setting your CVSROOT environment variable to :ext:flake.cs.uiuc.edu:/home/cvs, and issuing the command: cvs co bib.

Group Bibfile

As was previously mentioned, the group bibfile has a number of extensions that are designed to work with the DB update scripts. Here is a list of the extension and what they do:

Here is an example which illustrates the use of the extensions:

@article{LiMoRo05,
 author   = {X. Li and P. Morie and D. Roth},
 title    = {Semantic Integration in Text: From Ambiguous Names to Identifiable Entities},
 journal  = {AI Magazine. Special Issue on Semantic Integration},
 year     = {2005},
 comment  = {Named Entity Recognition; coreference resolution; Matching Entities Mentions within and across documents.},
 url      = "http://l2r.cs.uiuc.edu/~danr/Papers/LiMoRo05.pdf",
 projects = {MIRROR},
 pages    = {45--48},
 funding  = {MURI,TRECC,CLUSTER},
}

Abstracts

BibTeX afficionados may notice the absense of the abstract attribute from the example bibfile entry. The abstracts from papers are stored in files in a the Abstracts directory in the bib CVS module. There is a file, the abstract manifest, which associates bibitem keys with files in the directory. This is currently called abstracts.txt.new. I will change the name of the file to make more sense, at some point in the next couple weeks. The format for the abstract manifest is:

BibitemKey Filename
For example:
LiMoRo05 LiMoRo05.abs

I have been using the .abs suffix to denote that a particular file contains the body of an abstract, but it is not absolutely necessary. The file containing the abstract should be written in HTML; the text is sent directly into the database, and the displayed as HTML with no additional formatting happening in the PHP. I admit that the system for managing the abstracts is moderately annoying, but I believe it is much less annoying than maintaining the text of the abstracts in the bibfile.

The Update Scripts

There are currently two scripts used to update the database from the bibfile. The first, UpdateDB.pl, is used to produce the files with the contents of the tables to be inserted into the database. The second, net_update.pl, issues the commands to load the files into the database. This functionality used to be in one script, but I divided it so that you can see if there's some kind of error with the bibfile or abstract set-up before you run the database operation. It's probably not the most convenient arrangement, but it's also inconvenient to restore the database from a backup because you messed something up. You did have a backup, didn't you?

HOWTO: Adding a publication

Here's a simple checklist for adding a publication:

  1. Add the publication's bibitem to the bibfile.
  2. Add a line to the abstract manifest associating the bibitem key with a file.
  3. Put the HTML for the publications abstract in said file.
  4. Add the file to CVS.
  5. Commit the new bibfile to CVS.
  6. Run the update scripts.
    1. UpdateDB.pl ccg.bib
    2. net_update.pl

Back to top

The Server Monitor System

The server monitor system was created to help manage the operations of the several servers used to provide functionality for the various web-viewable demos the group has. I am including this under the web documentation because it makes sense to me -- the demos are arguable the most important part of the website, and thus, are documented here. In this section, I'll be discussing how the server monitor works, where the different scripts are, and how to add a server into the system.

How does it work?

The operation of the server monitor is fairly simple. There is a central script, connect_to_server_monitor.pl in the cgi-bin directory of l2r (which is located in /mounts/l2r/disks/0/www/cgi-bin), that connects to the monitor scripts running on the various machines (at this point, the only machines the scripts are running on are flake.cs.uiuc.edu and snow.cs.uiuc.edu). When you go to the script, by visiting http://l2r.cs.uiuc.edu/cgi-bin/connect_to_server_monitor.pl, the script connects to the other machines that have a server monitor running on them. These are specified in the script body. The script on the host machine checks to see which of the processes are running and sends a report to script running on l2r, which then creates an HTML page for you to view. From this page, you can start or stop servers managed under the server monitor system. It is important to note that a server can be crashed but still running, so the system doesn't tell you everything you need to know, but is a handy tool for managing most of the basic administration tasks of the demos.

On the machines the servers run, I have developed a system which makes it easy to add most types of programs into the server monitor without modifying them (before I was responsible for maintaining this system, a program had to write it's PID to a file in a well-known location.). Each server monitor server (for lack of a better phrase) has a directory called lbin. This directory contains symbolic links to the binaries used to run the servers. There is a one-to-one correspondence between the symlinks and the servers -- ie, if you have two snow servers you want to manage under the system, you will create a uniquely named symlink for each of them. For example:

ln -s /path/to/snow my_snow1
ln -s /path/to/snow my_snow2

If you are not familiar with the ln command, I suggest that you consult the manpage. Note that the -s flag is critical: it makes the link you create symbolic instead of hard. The difference between a hard link and a symbolic link is (basically) this: say you have a file, foo, and two links to it, hard which is a hard link, and soft which is a symbolic link. If you delete soft, foo remains unchanged. However, if you delete hard, foo is also deleted. There are other subtle differences, but this is the most important one.

Each server gets it's own symlink because symlinks are nice. Say I have a symlink to a binary. I run the binary by invoking the symlink. When I use ps to see which commands are running, it will report that name of the link instead of the name of the binary. This behavior makes it very easy to see which snow is running when you have 50 snow servers.

Where are the scripts located?

On flake, the scripts are located in /home/roth/cogcomp/demos/scripts. On snow, the scripts are located in /home/roth/cogcomp/demos/scripts.

HOWTO: Administering the server monitor

Administering the server monitor is generally a pretty easy task to handle. It's good to check to see that everything is running once a day, but generally, everything runs smoothly, barring out of the ordinary amounts of usage. There are a couple 'problem' demos, however. The SRL demo is one of these. Concurrent access to this demo can be a problem. We have added some additional locking in the Fex and SNoW servers the demo uses to help alleviate this problem, but there is still a stability issue. I have also noticed that some of the named-entity tagging servers like to crash. It is important to note that the system requires a human to do the actual monitoring. It will not send e-mails or raise a flag or anything like that. The system only checks to see which processes are running when you ask it to. I started working on a replacement for this system that autonomously monitors the servers and restarts them when they crash, but never got aorund to completing it.

You will need to restart the server monitor script on a machine when it is rebooted. This is very simple. All you need to do is log in as cogcomp, and issue the command server_monitor. This should work on both snow and flake. If you are adding a new machine, you should add a shell alias in the cogcomp account so that restarting the script is done uniformly over all the machines in the group.

HOWTO: Adding a server

If a server monitor is set up on the machine you'll be running your server on, then adding it into the system is fairly easy. There are two important parts of the server monitor script that you'll have to edit. But first, you need to be sure which script it is that needs the editing. Doing which server_monitor on a machine will show you where the script is. Once you've got that, here is a checklist to go through:

  1. Create a symlink to the binary/script: this should go into the previously mentioned lbin directory.
  2. Create a script to start the server: this should go into the same directory as the server_monitor script. This is the script that will be called to start the server, so it needs to start it to run in the background. This script should invoke your server via the symlink you made in step 1.
  3. Pick a key for the server: each server needs a unique 'key' to identify it in the server_monitor script. This is also the key that will identify the server on the start/stop page, so it should be something that you can look at and get a rough idea of what it is. For instance "myserver123" is not a good key, but "SRL - SNoW - Phrase" is.
  4. Add a line for the starting script to the server_monitor script: A template for this is:
    $start{"serverKey"} = "/path/to/start/script";
  5. Add a line for the name of the script to the server_monitor script: Remember how you made a symlink to your server? You need to add a line telling the script what the name of your link is. A template is:
    $processName{"serverKey"} = "name_of_link";
  6. Restart the server monitor: say killall server_monitor.pl and then server_monitor.