The Silicon Underground
  Welcome to Dave Farquhar's Silicon Underground Tuesday, November 24 2009 @ 04:45 PM CST  
Theme Changer
Change the look of the site by selecting a theme below:

What's New
STORIES
No new stories

COMMENTS last 48 hrs
No new comments

LINKS last 2 wks
No recent new links

Google Ads

User Functions
Username:

Password:

Don't have an account yet? Sign up as a New User

Firefox


Increase the speed of your Web pages   
Monday, November 25 2002 @ 10:00 AM CST
By David L. Farquhar

There are commercial utilities that will optimize your HTML and your images, cutting the size down so your stuff loads faster and you save bandwidth. But I like free.

I found free.

Back in the day, I told you about two programs, one for Windows and one for Unix, that will crunch down your JPEGs by eliminating metadata that's useless to Web browsers. The Unix program will also optimize the Huffman tables and optionally resample the JPEG into a lossier image, which can net you tremendous savings but might also lower image quality unacceptably.

Yesterday I stumbled across a program on Freshmeat that strips out extraneous whitespace from HTML and XML files called htmlcrunch. Optionally, it will also remove comments. The program works in DOS--including under a command prompt in Windows 9x/NT/2000/XP, and it knows how to handle long filenames--or Unix.

It's not advertised as such, but I suspect it ought to also work on PHP and ASP files.

How much it will save you depends on your coding style, of course. If you tend to put each tag on one line with lots of pretty indentation like they teach in computer science classes, it will probably save you a ton. If you code HTML like me, it'll save you somewhat less. If you use a WYSIWYG editor, it'll probably save you a fair bit.

It works well in conjunction with other tools. If you use a WYSIWYG editor, I suggest you first run the code through HTML Tidy first. HTML Tidy, unlike htmlcrunch, actually interprets the HTML and removes some troublesome information. But in some cases, HTML Tidy will add characters, but this is usually a good thing--its changes improve browser compatibility. If you feed HTML Tidy a bunch of broken HTML, it'll fix it for you.

You can further optimize your HTML with the help of a pair of Unix commands. But you run Windows? No sweat. You can grab native Windows command-line versions of a whole slew of Unix tools in one big Zip file here.

I've found that these HTML tools sometimes leave spaces between HTML elements under some circumstances. Whether this is intentional or a bug in the code, who knows. But it's easy to fix with the Unix tr command:

tr "> < " ">< " < index.html > indexopt.html

Some people believe that Web browsers parse 255-character lines faster than any other line length. I've never seen this demonstrated. And in my experience, any Web browser parses straight-up HTML plenty fast no matter what, unless you're running a seriously, seriously underpowered machine, in which case optimizing the HTML isn't going to make a whole lot of difference. Also in my experience, every browser I've looked at parses CSS entirely too slow. It takes most browsers longer to render this page than it takes for my server to send it over my pokey DSL line. I've tried mashing my stylesheets down and multiple 255-character lines versus no linebreaks whatsoever made little, if any, difference.

But if you want to try it yourself, pass your now-optimized HTML file(s) through the standard Unix fmt command, like so:

fmt -w 255 index.html > index255.html

Optimizing your HTML files to the extreme will take a little time, but it's probably something you only have to do once, and your page visitors will thank you for it.

  [ Views: 2011 ]  


Increase the speed of your Web pages | 6 comments | Create New Account
The following comments are owned by whomever posted them. This site is not responsible for what they say.
Dev T
Authored by: ImportedComment on Monday, November 25 2002 @ 11:51 AM CST
Another program to add to the Interesting DOS programs list :-)

Dev T
Trinidad and Tobago Computer Society at http://www.ttcsweb.org and http://www.ttcs.net

[ Reply to This ]

Steve DeLassus
Authored by: ImportedComment on Monday, November 25 2002 @ 02:00 PM CST
A small question, then a long diatribe. :) Is htmlcrunch smart enough to keep internal spaces in empty XHTML tags? e.g.
That space is highly suggested to keep "some" browsers from getting lost.

To quote Dennis Miller, I may be getting off on a rant here, but when I see programs of this sort, I have to ask "how much benefit am I gaining vs. the cost of the effort?" It seems like using htmlcrunch is a very low effort exercise, so I wouldn't lobby against using it. But unless you've got some nasty whitespace problems, I doubt it'll help you much. (HTML with volumes of commentary are a different story.) Let me 'splain.

HTML is text. Text is highly compressible. HTTP 1.1-compliant browsers will very nicely compress your HTML for its journey across the wire. Whitespace compresses especially well.

Unless you've got a vastly underpowered machine, raw parsing speed is a non-issue; it's what the browser does in reaction to tokens that takes time. That's a big distinction. I wouldn't say that CSS *parses* slowly, but that browsers take some time setting up and applying styles.

The same applies to the server side of things - PHP, ASP, etc. Parsers zip right past (i.e. do not interpret) whitespace and comments because they're not tokens. From peeking at the source, it looks like htmlcrunch is rather HTML grammar-specific, so I'm not sure what it'd do to a server script file. In any case, if you're a freak of nature and verbosely comment your PHP :), I'd consider using the Apache mod that will pre-compile and cache your scripts in lieu of munging source code and keeping multiple copies around. Still, for moderately commented code, I doubt you'll see much difference - we're still talking about interpreted code here, just minus the tokenizing. (Disclaimer: I haven't used this mod myself, but plan on trying it for grins soon.)

I think the core issue here is the perception of round-trip time. You make a page request, it goes over the wire, a server may execute a script and will hit memory and/or disk, data flows back over the wire, and your browser parses and renders it. The wire is orders of magnitude slower than memory and the primary bottleneck; disk access is relatively slow. But parsing the data isn't, though *acting upon* the data - either in a browser or a server-side script - may be. Ultimately, having 500 extra bytes of (compressed) whitespace isn't going to change a user's perception of a three- to five-second round-trip time, IMHO.

[ Reply to This ]

Glaurung
Authored by: ImportedComment on Friday, December 13 2002 @ 09:35 PM CST
Another windows based program to look at for optimizing HTML is Webtrimmer. Formerly shareware, now freeware, available at http://www.glostart.com/webtrimmer/webtrimmer.html.

Unlike HTML tidy, it doesn't reparse any code. It removes whitespace, lengthens lines, and replaces tokens like &nbsp or &quot with their shortest possible equivalent. Then it goes through and removes do-nothing HTML commands, like empty tag pairs, or redundant tags (like specifying the font for each and every paragraph) of the sort that MS word and Frontpage love to put in.

The program has its drawbacks -- it's old and only knows about HTML 3.2, it doesn't remove redundant code inside tables -- but it can significantly reduce Frontpage bloat, for those who don't have the time or ability to code pages by hand.

Anti-MS aside: A small online catalog I did for a client had its largest page at 30k when I was done hand coding it. Someone else uploaded the catalog, and that person stupidly ran the pages through MS Frontpage. The 30k page more than tripled in size, and the entire site gained hundreds of K of bloat. And in return for all that extra bandwidth, Frontpage took the aesthetic formatting I had very carefully tested on Netscape, Opera, and IE, and made it look like crap.

As to the question of how much reward such tweaking gives... there are still a lot of HTTP 1.0 browsers out there, and a lot of people using dialup (I'd still be using dialup myself if my partner's company hadn't offered to pay for our DSL so she can work from home - long story). I vividly remember watching the modem lights blink, drumming my fingers, checking that I had load images turned off, waiting for a simple HTML page to download. Anything that can reduce file sizes is a definite plus in the world of dialup.

[ Reply to This ]

Richard
Authored by: ImportedComment on Monday, February 03 2003 @ 03:02 PM CST
: : TO ALL THOSE OF KNOWLEDGE : :

I have been reading posts on this site and others regarding optimizing HTML and images.

Like many others, programs like Frontpage have created a balloon of redundant tags on my pages. I am also certain most of my images can be reduced without loosing ANY quality.

Though I have knowledge of the problem, I do not have the expertise or confidence to accurately carry out the repair without causing new problems. Also, knowing what areas are necessary to address and to what extent, ie.. blank spaces, browser compatibility, white space, etc.

So in short, I am looking for either (and preferred) a user friendly windows application to address these issues, optimizing my code and images, as well as advice on what should or should not be done to the code.... or, (though not good for long run self maintenance) find one of you to optimize the code and images by your own means.

Thanks for your time and help.. the assistance is MUCH MUCH appreciated!

Please contact me by email.. flux21@hotmail.com

~Richard

[ Reply to This ]

John Robot
Authored by: ImportedComment on Thursday, July 03 2003 @ 03:21 AM CDT
please visit our site

[ Reply to This ]

johnbrown1024
Authored by: ImportedComment on Friday, July 18 2003 @ 12:33 PM CDT
Check out monitorcentral (GPL'd).

Multi-Page Functions:

-Spell checks
-w3c problems (using TIDY)
-multi-page cleaning (also using tidy)
-section 508 problems
-web page size totals (including scripts and pics)
-find broken links
-age (modified)

[ Reply to This ]

What's Related
  • Back in the day,
  • htmlcrunch
  • HTML Tidy
  • here
  • More by DaveF
  • More from Design

  • Story Options
  • Mail Story to a Friend
  • Printable Story Format


  • Calendar
    November 2009
    SuMoTuWeThFrSa
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    1
    2
    3
    4
    5
    Click on any day to see postings and events for that date.

    Referrals

    Top 10 by Comments
    Story TitleComments
    Cheap laptops from Sotec 253
    An untrustworthy vendor 164
    Upgrading an eMachine 125
    eMachine upgrade advice 99
    Why I dislike Microsoft 51
    Upgrade diary: Gateway G6-400 35
    And we're live 30
    The day after the Columbia 22
    How to pray 22
    CD-ROM troubleshooting under Windows 9x 20

    Top 10 Read
    Story TitleViews
    eMachine upgrade advice 74352
    Upgrading an eMachine 63104
    How to view a blg file in Windows 2000 50661
    Cheap laptops from Sotec 32805
    Upgrade diary: Compaq Presario 7360 20003
    Upgrade diary: Gateway G6-400 19880
    CD-ROM troubleshooting under Windows 9x 15558
    Finding an open-source alternative to Ghost 14300
    Big trouble 13827
    Salary cap? Baseball needs something 11806

    Topics
    Home
    Apache (2)
    Baseball (63)
    Book reviews (2)
    Business (1)
    Christianity (57)
    Cooking (1)
    Copyright (16)
    Curmudgeonry (1)
    Design (7)
    DOS (6)
    Games (4)
    Genealogy (11)
    General (507)
    Hardware (168)
    Health (13)
    Human Interest (9)
    Humor/Satire (19)
    Investing (4)
    Journalism (1)
    Linux (93)
    Macintosh (22)
    Model Building (3)
    Music (33)
    net.culture (40)
    Personal (88)
    Photography (6)
    Politics (3)
    Retro Computing (26)
    Saving money (72)
    Servers and Networking (18)
    Society (49)
    Software (55)
    Spam (13)
    St. Louis (23)
    This weblog (14)
    Toy trains (74)
    Troubleshooting (7)
    Useless Trivia (1)
    Vendors (6)
    Video (21)
    Viruses (12)
    Windows (120)
    Writing (16)

    Older Stories
    Wednesday 30-Sep
  • 401(K) Paperwork (0)

  • Sunday 27-Sep
  • First impressions: HP Mini 110 (1)

  • Saturday 26-Sep
  • Getting more screen real estate in Firefox (0)

  • Wednesday 23-Sep
  • Barfy. (4)

  • Monday 21-Sep
  • Why I quit my job (2)

  • Saturday 12-Sep
  • Slimming down Windows XP for SSDs and nettops (0)

  • Thursday 10-Sep
  • And... bailing out. (3)

  • Friday 04-Sep
  • End of the innocence (0)

  • Monday 31-Aug
  • Installing Windows off USB (1)

  • Friday 21-Aug
  • Diving into real estate (0)

  • Who's Online
    Guest Users: 7

    Syndicate!
    Get your RSS/RDF fix here.

    List of all stories
    Click here for a list of all the entries on this site


    Created this page in 1.00 seconds


     Copyright © 2009 Dave Farquhar's Silicon Underground
     All trademarks and copyrights on this page are owned by their respective owners.

    Powered by GL 1.3.x