Below is an overview of the inner workings of Bomjpacket. See INSTALL for the administrative questions. This is performed in two steps: HTML -> XHTML -> WML The differences between HTML and XHTML are: * matched and properly nested tags, e.g. Test * all attributes are in quotes: Interesting page * no empty attributes: is unacceptable * case of the opening and closing tag mathces, i. e. is invalid The main difference between XHTML and WML is that the latter supports only a subset of the former's tags. We will describe the two steps in greater detail below: First, however, we need to do a few preprocessing actions, namely get rid of Javascript and CSS as they are different languages. Then we get rid of HTML comments. After that, we need to take care of encoded characters b/c the parser does not understand them. For example ™ gets converted to TM. HTML to XHTML -------------------------------------------------------------------- Function need_quotes() adds quotes to attributes wherever they need them For example,

gets converted to

Then we need to fix unclosed quotes in tags and remove tags from strings. Function fix_quotes_tags1() takes care of that. For example, Page2 gets converted to "Page1"Page2 using the following rules: "a<..." -> "a"<..... "a<.../>b" -> "a"<.../>"b" ".../>a" -> ..../>"a" Imagine a string with quotes inside: title="this is "OK" button" We will need to get rid of internal quotes. The first quote is extended until the next tag. Function fix_quotes_text() takes care of that. Function tags_toupper($data) convert tag names to uppercase. Function fix_tags($data) does a number of things: * replace these tags with line breaks: ->

->

->

->
* eliminate all the tags except WML ones. We only need the following tags: 'NOP', 'HTML', 'HEAD', 'TITLE', 'BODY', 'H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'A', 'BR', 'B', 'I', 'EM', 'LI', 'ADDRESS', 'DIV', 'CODE', 'BLOCKQUOTE', 'TT', 'PRE', 'STRONG', 'SMALL', 'SUP', 'SUB' * add opening and closing essential HTML tags if they are missing: , , as we need them in each HTML document We have already taken care of the attributes adding quotes around them, but function fix_attrs() does more: * Eliminate the following attributes as they might contain javascript: "onclick", "onchange", "onfocus", "onmouseover", "onmouseout", "onmousedown", "onmouseup", "onkeyup", "onkeydown", "onkeypress", "onsubmit" * Remove empty attributes * For the A, IMG, and DIV tags keep only certain attributes. We need to filter out certain attributes because the parser gets confused when the same attribute repeats which might happen. * Certain tags do not allow other tags nested into them, for example NO TAGS HERE. Function filter_content($data, $tag) takes care of that. Finally, add closing tags if they are missing and remove redundant ones, e. g.
gets converted to
Function pair_tags($data) does that. XHTML to WML -------------------------------------------------------------------- Now we have a nice XHTML file which is what the cellphone wants. I guess the reason why this is necessary is because cellphones do not have processing power to fix HTML file if it is not properly formatted. Therefore, this is what the server-side component has to do. But XHTML file on its own is not enough either. Typically, it is a very long web page, but think of a mobile's screen - it is very short. Therefore, we need to break the XHTML file into a number of smaller files, and we also convert them into WML language - a subset of HTML that cellphones understand. A few of them understand XHTML also but WML is simpler. The breakdown algorithm takes into account the layout of the original HTML page. A typical page includes a numebr of DIV elements which are nested into each other. For example, DIV id="wrapper" might have DIV id="menu" and DIV id="content" inside. Therefore, we will place two links on the first WML page: MENU CONTENT When a user clicks either of them (s)he goes to the appropriate WML page of that section. Therefore, there are 3 WML pages in total. Unfortunately, the names or IDs of the sections are not always self-explanatory. Therefore, guessing what hides behind a given link is often difficult. Instead of including links to the sections we give their preview, that is, include a few lines from each section and insert a link at the end that allows the user to expand that section: HOME - NEWS - ARTICLES-... This is a very interest... Only when the user clicks on the "..." link does the cellphone go to the appropriate section.