Get DOM CSS properties by using headless Mozilla 
Mozilla July 10th, 2009
Project repository: http://git.lazytech.info/?p=dom-traversal.git
This program can run without X environment by using headless Mozilla back-end, and output DOM nodes’ CSS properties for web pages.
For headless Mozilla, see: http://chrislord.net/files/fosdem-09-slides.odp
For compiling: http://center.lazytech.info/wiki/OffscreenMozilla, http://center.lazytech.info/wiki/DOMTraversal
Output snippet for parsing www.google.com:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | <html> Style: html, div, map, dt, isindex, form { display: block; } <head> Style: area, base, basefont, head, meta, script, style, title, noembed, param { display: none; } <meta> Style: area, base, basefont, head, meta, script, style, title, noembed, param { display: none; } <title> Style: area, base, basefont, head, meta, script, style, title, noembed, param { display: none; } <script> Style: area, base, basefont, head, meta, script, style, title, noembed, param { display: none; } <style> Style: area, base, basefont, head, meta, script, style, title, noembed, param { display: none; } <script> Style: area, base, basefont, head, meta, script, style, title, noembed, param { display: none; } <style> Style: area, base, basefont, head, meta, script, style, title, noembed, param { display: none; } <style> Style: area, base, basefont, head, meta, script, style, title, noembed, param { display: none; } <body> Style: body { display: block; margin: 8px; } body, td, a, p, .h { font-family: arial,sans-serif; } <textarea> Style: display: none; textarea { margin: 1px 0pt; border: 2px inset threedface; background-color: -moz-field; color: -moz-fieldtext; font: medium -moz-fixed; text-rendering: optimizelegibility; text-align: start; text-transform: none; word-spacing: normal; letter-spacing: normal; vertical-align: text-bottom; cursor: text; -moz-binding: url("chrome://global/content/platformHTMLBindings.xml#textAreas"); -moz-appearance: textfield-multiline; text-indent: 0pt; -moz-user-select: text; text-shadow: none; word-wrap: break-word; } input:-moz-read-write, textarea:-moz-read-write { -moz-user-modify: read-write ! important; } <iframe> Style: display: none; iframe { border: 2px inset; } <div> Style: html, div, map, dt, isindex, form { display: block; } #gbar { height: 22px; } #gbar, #guser { font-size: 13px; padding-top: 1px ! important; } #gbar { float: left; } <nobr> Style: nobr { white-space: nowrap; } <b> Style: b, strong { font-weight: bolder; } .gb1, .gb3 { height: 22px; margin-right: 0.5em; vertical-align: top; } ...... |
It ate about 20M+ memory when running. To reduce the memory usage, maybe we should build it by using Mozilla’s DOM and Layout components only. It’s a hard work…