Project repository: http://git.lazytech.info/?p=dom-traversal.git
This program can run without X environment by using headless Mozilla back-end, and output DOM nodes’ CSS properties for web pages.

For headless Mozilla, see: http://chrislord.net/files/fosdem-09-slides.odp
For compiling: http://center.lazytech.info/wiki/OffscreenMozilla, http://center.lazytech.info/wiki/DOMTraversal

Output snippet for parsing www.google.com:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
<html> Style:
html, div, map, dt, isindex, form { display: block; }
    <head> Style:
    area, base, basefont, head, meta, script, style, title, noembed, param { display: none; }
        <meta> Style:
        area, base, basefont, head, meta, script, style, title, noembed, param { display: none; }
        <title> Style:
        area, base, basefont, head, meta, script, style, title, noembed, param { display: none; }
        <script> Style:
        area, base, basefont, head, meta, script, style, title, noembed, param { display: none; }
        <style> Style:
        area, base, basefont, head, meta, script, style, title, noembed, param { display: none; }
        <script> Style:
        area, base, basefont, head, meta, script, style, title, noembed, param { display: none; }
        <style> Style:
        area, base, basefont, head, meta, script, style, title, noembed, param { display: none; }
        <style> Style:
        area, base, basefont, head, meta, script, style, title, noembed, param { display: none; }
    <body> Style:
    body { display: block; margin: 8px; }
    body, td, a, p, .h { font-family: arial,sans-serif; }
        <textarea> Style:
        display: none;
        textarea { margin: 1px 0pt; border: 2px inset threedface; background-color: -moz-field; color: -moz-fieldtext; font: medium -moz-fixed; text-rendering: optimizelegibility; text-align: start; text-transform: none; word-spacing: normal; letter-spacing: normal; vertical-align: text-bottom; cursor: text; -moz-binding: url("chrome://global/content/platformHTMLBindings.xml#textAreas"); -moz-appearance: textfield-multiline; text-indent: 0pt; -moz-user-select: text; text-shadow: none; word-wrap: break-word; }
        input:-moz-read-write, textarea:-moz-read-write { -moz-user-modify: read-write ! important; }
        <iframe> Style:
        display: none;
        iframe { border: 2px inset; }
        <div> Style:
        html, div, map, dt, isindex, form { display: block; }
        #gbar { height: 22px; }
        #gbar, #guser { font-size: 13px; padding-top: 1px ! important; }
        #gbar { float: left; }
            <nobr> Style:
            nobr { white-space: nowrap; }
                <b> Style:
                b, strong { font-weight: bolder; }
                .gb1, .gb3 { height: 22px; margin-right: 0.5em; vertical-align: top; }
 
......

It ate about 20M+ memory when running. To reduce the memory usage, maybe we should build it by using Mozilla’s DOM and Layout components only. It’s a hard work…

Tags: ,

CHM Reader is a great extension make Firefox support .chm file, project host on:http://sourceforge.net/projects/chmreader/

When I compiled it on my OSX 10.5.7 occurred following issue:

1
2
3
g++ -o components/mozCHMModule.os -c -fPIC -I/Users/duo/xulrunner-sdk/include -I/Users/duo/xulrunner-sdk/sdk/include components/mozCHMModule.cpp
/Users/duo/xulrunner-sdk/sdk/include/nsStringAPI.h:1053: error: size of array 'arg' is negative
scons: *** [components/mozCHMModule.os] Error 1

The line in nsStringAPI.h is PR_STATIC_ASSERT(sizeof(wchar_t) == 2), but 4-bytes is the default wide character size on macs, so we should add -fshort-wchar to compile flag options.

The other issue in link stage:

1
2
3
4
g++ -o platform/Darwin_x86-gcc3/components/libchm.dylib -dynamiclib components/chm_lib.os components/lzx.os components/mozCHMModule.os components/mozCHMFile.os components/mozCHMUnitInfo.os components/mozCHMInputStream.os -L/Users/duo/xulrunner-sdk/lib -L/Users/duo/xulrunner-sdk/sdk/lib -lxpcom -lxpcomglue_s -lnspr4 -lplds4 -lplc4
ld: file not found: @executable_path/libsmime3.dylib
collect2: ld returned 1 exit status
scons: *** [platform/Darwin_x86-gcc3/components/libchm.dylib] Error 1

Just add -Wl,-executable_path -Wl,/path/to/gecko/sdk/bin to link flag options.

Complete svn diff result:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Index: components/SConscript
===================================================================
--- components/SConscript       (revision 114)
+++ components/SConscript       (working copy)
@@ -7,6 +7,8 @@
 
 libs = ['xpcom', 'xpcomglue_s', 'nspr4', 'plds4', 'plc4']
 
+linkflags = ''
+
 # We use firefox development files instead for geckosdk on FreeBSD
 if system() != 'FreeBSD':
        try:
@@ -38,7 +40,8 @@
     cpppath.append('/usr/include/nspr')
 
 elif system() == 'Darwin':
-    cxxflags = []
+    cxxflags = ['-fshort-wchar']
+    linkflags = ['-Wl,-executable_path', '-Wl,%s/bin' % geckosdk]
 
 elif system() == 'Windows':
     cxxflags = ['/D', 'WIN32', '/D', 'XP_WIN', '/nologo', '/MT', '/O2']
@@ -70,7 +73,7 @@
                '/usr/local/lib/firefox3/sdk/lib']
 
 env = Environment(CPPPATH = cpppath, LIBPATH = libpath, LIBS = libs,
-                  CXXFLAGS= cxxflags)
+                  CXXFLAGS= cxxflags, LINKFLAGS = linkflags)
 
 bxpt = Builder(
     action = 'xpidl -w -m typelib -Icomponents -I%s -I%s -e $TARGET $SOURCE' \

Tags: ,

I have wrote a blog post about how to call JavaScript function from C++ XPCOM (XPCOM: Javascript function call), but just found another way to achieve this goal by observer mechanism.

C++ XPCOM code:

1
2
nsCOMPtr<nsIObserverService> observerService = do_GetService("@mozilla.org/observer-service;1");
observerService->NotifyObservers(NULL, "ping", ToNewUnicode(NS_ConvertASCIItoUTF16("www.google.com")));

JavaScript code:

?View Code JAVASCRIPT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
const Cc = Components.classes;
const Ci = Components.interfaces;
 
var aObserver = { 
    observe: function(subject, topic, data) {
        if (topic == "ping") {
            alert("Ping: " + data);
        }   
    }   
};
 
var observerService = Cc["@mozilla.org/observer-service;1"].getService(Ci.nsIObserverService);
 
observerService.addObserver(aObserver, "ping", false);

In multithreading XPCOM, sometime we should call JavaScript function by nsIProxyObjectManager (JavaScript and UI are on a single thread, see nsISupports proxies and nsProxiedService.h for detail)

Tags: ,