I am writing a little script that will improve authors writing skills by
finding repeated phrases in the text.
The text of a chapter will average about 10,000 words, however, I could
reduce the size of the files if it is better to do so.
So the idea is to search through a string and find repeats of any 3 or 4 word group.
So if the author has repeated the phrase "then I went" 6 times in the text, then this would be found and highlighted.
I am not sure where to start with this 😮
Maybe it is best to start by converting the string into an array of all the words?
$word_list = explode(" ", $text);
But I still don’t know how the best way to find these repeated 3 or 4 word phrases is.
The other thing I want to provide is a list of all the words used ( maybe I will exclude words like and, the, a, etc) and the number of times they are used.
Any good ideas on how I should proceed ?
maybe using regular expressions?
like (to show the general idea)
// matches 3 or 4 word groups up to 5 letters per word
I guessed it might require regex, but I left the question
open in case there is a method that is less cpu intensive.
Thanks for your example, it will be useful as I am still not all that
good with regex.
What would be the best approach to count up all the different words ?
even if there is, what if the follow-up processes eat up that saved memory/workload/whatever?
get all single words into an array
Thanks for the pointers 🙂
I will follow them up and get some code down.
I have been playing about with the resulting word list for a while but ı can not work out how to get the number times the words occur in an array.
$words = "Mary Had A Little Lamb and She LOVED It So much she had a fit and killed the lamb. She also loved lamb chops you see";
First I would this:
$words = strtolower($words);
$list = explode(" ", $words);
From here what would you recommend I do to get this:
Any ideas ?
array_count_values() (did I mention that searching the manual is the first step?)