What is best way to find 3-word groups in text? – PHP



I am writing a little script that will improve authors writing skills by
finding repeated phrases in the text.

The text of a chapter will average about 10,000 words, however, I could
reduce the size of the files if it is better to do so.

So the idea is to search through a string and find repeats of any 3 or 4 word group.

So if the author has repeated the phrase "then I went" 6 times in the text, then this would be found and highlighted.

I am not sure where to start with this 😮

Maybe it is best to start by converting the string into an array of all the words?

$word_list = explode(" ", $text);

But I still don’t know how the best way to find these repeated 3 or 4 word phrases is.

The other thing I want to provide is a list of all the words used ( maybe I will exclude words like and, the, a, etc) and the number of times they are used.

Any good ideas on how I should proceed ?



maybe using regular expressions?
like (to show the general idea)

// matches 3 or 4 word groups up to 5 letters per word


I guessed it might require regex, but I left the question
open in case there is a method that is less cpu intensive.

Thanks for your example, it will be useful as I am still not all that
good with regex.

What would be the best approach to count up all the different words ?



even if there is, what if the follow-up processes eat up that saved memory/workload/whatever?


get all single words into an array


Thanks for the pointers 🙂

I will follow them up and get some code down.



I have been playing about with the resulting word list for a while but ı can not work out how to get the number times the words occur in an array.

For example

$words = "Mary Had A Little Lamb and She LOVED It So much she had a fit and killed the lamb. She also loved lamb chops you see";

First I would this:

$words = strtolower($words);
$list = explode(" ", $words);

From here what would you recommend I do to get this:

mary 1
little 1
it 1
so 1
much 1
fit 1
killed 1
also 1
chops 1
you 1
see 1

a 2
had 2
and 2
loved 2

lamb 3
she 3

Any ideas ?


array_count_values() (did I mention that searching the manual is the first step?)