How to Extract a website from a string / plain text? – PHP

  php

Q(Question):

I need to extract website from Plain text string, I searched in most of the forums but i could not find a better one..
since its a plain text it will not have any HTML links or anchor tags to find and extract. the website may or may not contain "www." for example the website name can be "learnwell.com" instead of "www.learnwell.com". There are website names like main.cool.edu
Here is an example string "visit our webiste gravesfab.com"

A(Answer):

It may be me, but your question doesn’t seem to make sense.
What are you trying to do?

A(Answer):

Its simple … i have bulks of plain text files … i just need to extract all the Websites from it.

A(Answer):

So you mean you want to find all the domain names in a text file?

There may be a regex that validates a domain name structure.
I know they exist for email addresses, so try googling for regex and domain name and validate or check

A(Answer):

Hello Mr Code Green.. its not exactly the domain name. it may also contain sub-domain’s for example "support.domain.com".

And regarding Googling… I did not find a regex that match my criteria in while googling… and also "try googling" is not the answer that I am expecting from Bytes. if that was the case ,I would not have posted this topic here … ha ha

A(Answer):

I am not good with regex, I always look for somebody elses solution with Mr Google, that is the only reason I suggested it.
Like I said, I did find numerous versions that validated an email structure.
I am suprised you did not find similar for web addresses.

I will happily show my email regex.
Maybe it will give you something to build on, or hopefully prompt a regex guru to suggest something better

if(preg_match('/^[[:alnum:]][a-z0-9_\.\-]*@[a-z0-9\.\-]+\.[a-z]{2,4}$/i',$email))

A(Answer):

Thanks Buddy … I can start from here… Some more work around on your regex must get me there to the actual code.

To be frank I got many regex but could not find a perfect one.. most of them faild in odd conditions…

hopfully a regex that can omit "@" symbol can be derived from you code… I am on it… thanks again

LEAVE A COMMENT