Skip to main content

PHP Function to remove the HTML tags along with their contents - PHP function to remove HTML tags only

Usually strip_tags() function is used for removing tags from an html string. but there are some issues with this function


1. It does not validate the HTML, partial or broken tags can result in the removal of more text/data than expected.
2. It does not modify any attributes on the tags that you provide as allowable_tags parameter.
3. It may give different outputs for different versions of the same tag. For example for <br> and <br />
4. For a badly formated HTML string like " PHP guys <b<b>> rocks </b<b>> ", it may give unexpected results.



Here are some work arounds:

PHP function to remove the HTML tags along with their contents:


<?php
function strip_tags_content($text, $tags = '', $invert = FALSE) {
$op_string = "";

preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
$tags = array_unique($tags[1]);

if(is_array($tags) AND count($tags) > 0) {
if($invert == FALSE) {
$op_string = preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
}
else {
$op_string = preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
}
}
elseif($invert == FALSE) {
$op_string = preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
}

// ----- remove multiple spaces -----
$op_string = trim(preg_replace('/ {2,}/', ' ', $op_string));

return $op_string;
}
?>



PHP function to remove the HTML tags and the line control characters

<?php
function remove_all_tags($string) {

// ----- remove HTML TAGs -----
$string = preg_replace ('/<[^>]*>/', ' ', $string);

// ----- remove control characters -----
$string = str_replace("\r", '', $string); // --- replace with empty space
$string = str_replace("\n", ' ', $string); // --- replace with space
$string = str_replace("\t", ' ', $string); // --- replace with space

// ----- remove multiple spaces -----
$string = trim(preg_replace('/ {2,}/', ' ', $string));

return $string;

}
?>




Illustration


Input Text
$text = '<b>PHP</b> <b<b>>guys</b<b>> are <div>rocking</div>';

Result for strip_tags($text) :
PHP <b>guys</b> are rocking

Result for strip_tags_content($text) :
PHP are

Result for strip_tags_content($text, '<div>'):
PHP are <div>rocking</div>

Result for strip_tags_content($text, '<b>', TRUE);
text with <div>tags</div>

Result for remove_all_tags($text):
PHP guys are rocking




Use of this Strip functions
1. Can be used for validating User inputs for html elements
2. Can be used to check the GET parameters to find the presence of html elements like script tags which hackers use to include unauthorised JS scripts into a web page ( *Cross-Site Scripting vulnerabilities [ XSS ] ) .

Sample Vulnerable URL for the above mentioned scenario:
https://www.yourdomain.com/test.php?op=1&place=Kerala-India<Script>alert(\"You are hacked\")</Script>
If this Get parameters are not validated and user is printing the $_GET['place'] variable. The when the page is loaded it will alert the message "You are hacked".


For more details visit http://php.net/strip_tags


*Cross-Site Scripting [ XSS ] Attack
A target system is identified with XSS which occurs when dynamically generated web pages display user input, such as login information, that is not properly validated, allowing an attacker to embed malicious scripts into the generated page which is then executed by the browser on the machine of any user that views the page with the malicious content.
If successful, Cross-Site Scripting vulnerabilities can be exploited to manipulate or steal cookies, create requests which appear to come from a valid user, compromise confidential information, or execute malicious code on end user systems.

XSS attack can also be prevented using the .htaccess file. Click here for more details. Here it checks for the presence of script or iframe tags in the url or the query string and if found , hacker will be redirected to a custom error page.

Popular posts from this blog

Strange problem occured while trying to create a CSV file using PHP Script - The file is not seen on FTP but can download using file's absolute path url

Strange problem occured while trying to create a CSV file - The file is not seen on FTP but can download using file's absolute path url Last day I came across a strange problem when I tried to create a csv file on therver using a PHP script. the script was simply writing a given content as a csv file. The file will be created runtime. What happened was, The script executed fine, file handler for new file was created and contents was wrote into the file using fwrite and it returned the number of bytes that was written.

How to get the Query string of a URL using the Javascript (JS)?

JS function get the Query string of a URL or value of each parameter using the Javascript(JS)? If you want to get your current page's url var my_url=document.location; to get the query string part of the url use like this: var my_qry_str= location.search; this will return the part of the url starting from "?" following by query string Lets assume that your current page url is http://www.crozoom.com/2013/page.html?qry1=A&qry2=B then the location.search function will return " ?qry1=A&qry2=B " to exclue "?", do like this:


Urgent Openings for PHP trainees, Andriod / IOS developers and PHP developers in Kochi Trivandrum Calicut and Bangalore. Please Send Your updated resumes to recruit.vo@gmail.com   Read more »
Member
Search This Blog