PHP Function to remove the HTML tags along with their contents - PHP function to remove HTML tags only
Usually strip_tags() function is used for removing tags from an html string. but there are some issues with this function

1. It does not validate the HTML, partial or broken tags can result in the removal of more text/data than expected.
2. It does not modify any attributes on the tags that you provide as allowable_tags parameter.
3. It may give different outputs for different versions of the same tag. For example for <br> and <br />
4. For a badly formated HTML string like " PHP guys <b<b>> rocks </b<b>> ", it may give unexpected results.
Here are some work arounds:
PHP function to remove the HTML tags along with their contents:
PHP function to remove the HTML tags and the line control characters
Illustration
Input Text
$text = '<b>PHP</b> <b<b>>guys</b<b>> are <div>rocking</div>';
Result for strip_tags($text) :
PHP <b>guys</b> are rocking
Result for strip_tags_content($text) :
PHP are
Result for strip_tags_content($text, '<div>'):
PHP are <div>rocking</div>
Result for strip_tags_content($text, '<b>', TRUE);
text with <div>tags</div>
Result for remove_all_tags($text):
PHP guys are rocking
Use of this Strip functions
1. Can be used for validating User inputs for html elements
2. Can be used to check the GET parameters to find the presence of html elements like script tags which hackers use to include unauthorised JS scripts into a web page ( *Cross-Site Scripting vulnerabilities [ XSS ] ) .
Sample Vulnerable URL for the above mentioned scenario:
https://www.yourdomain.com/test.php?op=1&place=Kerala-India<Script>alert(\"You are hacked\")</Script>
If this Get parameters are not validated and user is printing the $_GET['place'] variable. The when the page is loaded it will alert the message "You are hacked".
For more details visit http://php.net/strip_tags
*Cross-Site Scripting [ XSS ] Attack
A target system is identified with XSS which occurs when dynamically generated web pages display user input, such as login information, that is not properly validated, allowing an attacker to embed malicious scripts into the generated page which is then executed by the browser on the machine of any user that views the page with the malicious content.
If successful, Cross-Site Scripting vulnerabilities can be exploited to manipulate or steal cookies, create requests which appear to come from a valid user, compromise confidential information, or execute malicious code on end user systems.
XSS attack can also be prevented using the .htaccess file. Click here for more details. Here it checks for the presence of script or iframe tags in the url or the query string and if found , hacker will be redirected to a custom error page.

1. It does not validate the HTML, partial or broken tags can result in the removal of more text/data than expected.
2. It does not modify any attributes on the tags that you provide as allowable_tags parameter.
3. It may give different outputs for different versions of the same tag. For example for <br> and <br />
4. For a badly formated HTML string like " PHP guys <b<b>> rocks </b<b>> ", it may give unexpected results.
Here are some work arounds:
PHP function to remove the HTML tags along with their contents:
<?php
function strip_tags_content($text, $tags = '', $invert = FALSE) {
$op_string = "";
preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
$tags = array_unique($tags[1]);
if(is_array($tags) AND count($tags) > 0) {
if($invert == FALSE) {
$op_string = preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
}
else {
$op_string = preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
}
}
elseif($invert == FALSE) {
$op_string = preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
}
// ----- remove multiple spaces -----
$op_string = trim(preg_replace('/ {2,}/', ' ', $op_string));
return $op_string;
}
?>
function strip_tags_content($text, $tags = '', $invert = FALSE) {
$op_string = "";
preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
$tags = array_unique($tags[1]);
if(is_array($tags) AND count($tags) > 0) {
if($invert == FALSE) {
$op_string = preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
}
else {
$op_string = preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
}
}
elseif($invert == FALSE) {
$op_string = preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
}
// ----- remove multiple spaces -----
$op_string = trim(preg_replace('/ {2,}/', ' ', $op_string));
return $op_string;
}
?>
PHP function to remove the HTML tags and the line control characters
<?php
function remove_all_tags($string) {
// ----- remove HTML TAGs -----
$string = preg_replace ('/<[^>]*>/', ' ', $string);
// ----- remove control characters -----
$string = str_replace("\r", '', $string); // --- replace with empty space
$string = str_replace("\n", ' ', $string); // --- replace with space
$string = str_replace("\t", ' ', $string); // --- replace with space
// ----- remove multiple spaces -----
$string = trim(preg_replace('/ {2,}/', ' ', $string));
return $string;
}
?>
function remove_all_tags($string) {
// ----- remove HTML TAGs -----
$string = preg_replace ('/<[^>]*>/', ' ', $string);
// ----- remove control characters -----
$string = str_replace("\r", '', $string); // --- replace with empty space
$string = str_replace("\n", ' ', $string); // --- replace with space
$string = str_replace("\t", ' ', $string); // --- replace with space
// ----- remove multiple spaces -----
$string = trim(preg_replace('/ {2,}/', ' ', $string));
return $string;
}
?>
Illustration
Input Text
$text = '<b>PHP</b> <b<b>>guys</b<b>> are <div>rocking</div>';
Result for strip_tags($text) :
PHP <b>guys</b> are rocking
Result for strip_tags_content($text) :
PHP are
Result for strip_tags_content($text, '<div>'):
PHP are <div>rocking</div>
Result for strip_tags_content($text, '<b>', TRUE);
text with <div>tags</div>
Result for remove_all_tags($text):
PHP guys are rocking
Use of this Strip functions
1. Can be used for validating User inputs for html elements
2. Can be used to check the GET parameters to find the presence of html elements like script tags which hackers use to include unauthorised JS scripts into a web page ( *Cross-Site Scripting vulnerabilities [ XSS ] ) .
Sample Vulnerable URL for the above mentioned scenario:
https://www.yourdomain.com/test.php?op=1&place=Kerala-India<Script>alert(\"You are hacked\")</Script>
If this Get parameters are not validated and user is printing the $_GET['place'] variable. The when the page is loaded it will alert the message "You are hacked".
For more details visit http://php.net/strip_tags
*Cross-Site Scripting [ XSS ] Attack
A target system is identified with XSS which occurs when dynamically generated web pages display user input, such as login information, that is not properly validated, allowing an attacker to embed malicious scripts into the generated page which is then executed by the browser on the machine of any user that views the page with the malicious content.
If successful, Cross-Site Scripting vulnerabilities can be exploited to manipulate or steal cookies, create requests which appear to come from a valid user, compromise confidential information, or execute malicious code on end user systems.
XSS attack can also be prevented using the .htaccess file. Click here for more details. Here it checks for the presence of script or iframe tags in the url or the query string and if found , hacker will be redirected to a custom error page.