transform
[ class tree: transform ] [ index: transform ] [ all elements ]

Class: ExtractHTMLContents

Source Location: /ExtractHTMLContents.inc

Class Overview


Parses and selects automatically the HTML content to be parsed and embedded in a Word document


Author(s):

Copyright:

  • Copyright (c) 2009-2011 Narcea Producciones Multimedia S.L. (http://www.2mdc.com)

Variables

Constants

Methods



Class Details

[line 15]
Parses and selects automatically the HTML content to be parsed and embedded in a Word document



Tags:

version:  
copyright:  Copyright (c) 2009-2011 Narcea Producciones Multimedia S.L. (http://www.2mdc.com)
link:  http://www.phpdocx.com
since:  File available since Release ?
license:  http://www.phpdocx.com/wp-content/themes/lightword/pro_license.php


[ Top ]


Class Variables

static $HTMLContent =

[line 60]



Tags:

access:  public

Type:   string


[ Top ]



Class Methods


constructor __construct [line 74]

ExtractHTMLContents __construct( $html, [ $threshold = 50])

Class constructor



Tags:

access:  public


Parameters:

   $html  
   $threshold  

[ Top ]

destructor __destruct [line 124]

void __destruct( )

Class destructor



Tags:

access:  public


[ Top ]

method countFullChars [line 268]

int countFullChars( DOMNode $node)

Counts the full number of characters inside a HTML node



Tags:

access:  public


Parameters:

DOMNode   $node  

[ Top ]

method countTextChars [line 254]

int countTextChars( DOMNode $node)

Counts the number of text characters inside a HTML node



Tags:

access:  public


Parameters:

DOMNode   $node  

[ Top ]

method evaluateNode [line 154]

float evaluateNode( $DOMNode $node)

This is the method that associates a value between to a HTML node



Tags:

access:  public


Parameters:

$DOMNode   $node  

[ Top ]

method parseHTML [line 132]

void parseHTML( string $WordML)

This is the main mathod to extract the "valid content"



Tags:

access:  public


Parameters:

string   $WordML  

[ Top ]

method pruneLinks [line 282]

DOMNode pruneLinks( DOMNode $node)

Removes the links of a given node



Tags:

access:  public


Parameters:

DOMNode   $node  

[ Top ]

method purgeHTML [line 237]

boolean purgeHTML( $DOMNode $node)

This is the method that remove "unwanted" div or ul block elements



Tags:

access:  public


Parameters:

$DOMNode   $node  

[ Top ]


Class Constants

RAW_CHARS_WEIGHT =  0

[line 18]


[ Top ]

RAW_POSITION_WEIGHT =  5

[line 24]


[ Top ]

RAW_UNLINK_CHARS_WEIGHT =  0

[line 19]


[ Top ]

SIBLINGS_WEIGHT =  8

[line 17]


[ Top ]

TEXT_CHARS_WEIGHT =  20

[line 20]


[ Top ]

TEXT_POSITION_WEIGHT =  30

[line 25]


[ Top ]

TEXT_RATIO_WEIGHT =  5

[line 22]


[ Top ]

TEXT_UNLINK_CHARS_WEIGHT =  50

[line 21]


[ Top ]

TEXT_UNLINK_RATIO_WEIGHT =  25

[line 23]


[ Top ]



Documentation generated on Mon, 13 Jan 2014 13:44:23 +0100 by phpDocumentor 1.4.4