Parsing challenge… – PHP

  php

Q(Question):

I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.

Thanks

A(Answer):

On Tue, 7 Oct 2003, Artco News wrote:

I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.

ruby is one of the more advanced 🙂

~/eg/ruby > cat ./parse.rb

#!/usr/bin/env ruby

txt = <<-txt
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
txt
pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
tuples = txt.scan pat

tuples.map{|tuple| p tuple}
~/eg/ruby > ./parse.rb

[" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
[" NN-110", "an info of NN-001", "BRY234", "some notes"]
[" NN-111", "1st line data\n 2nd line data\n 3rd line data", "BRT345", "another notes"]
[" NN-112", "description of NN-112", "BBC23", "multiline\n notes blah\n blah\n blah"]
[" NN-113", "info info", "MNO12", "some notes here"]

-a
====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: ah*****@noaa.gov
| Phone: 303-497-7238
| Fax: 303-497-7259
| The difference between art and science is that science is what we understand
| well enough to explain to a computer. Art is everything else.
| — Donald Knuth, "Discover"
| ~ > /bin/sh -c ‘for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\x0a\""; done’
====================================

A(Answer):

"Artco News" <ar*******@verizon.net> wrote in message
news:CT*******************@nwrdny01.gnilink.net…

I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

<?
//Assuming we use file to read the file we’ll get each line in an array, so
we’ll use $testdata as our sample file
$testdata = array ();
$testdata[] = "CODE#1^DESCRIPTION^CODE#2^NOTES\r\n";
$testdata[] = "NN-110^an info of NN-001^BRY234^some notes\r\n";
$testdata[] = "NN-111^1st line data\r\n";
$testdata[] = "2nd line data\r\n";
$testdata[] = "3rd line data^BRT345^another notes\r\n";
$testdata[] = "NN-112^description of NN-112^BBC23^multiline\r\n";
$testdata[] = "notes blah\r\n";
$testdata[] = "blah\r\n";
$testdata[] = "blah\r\n";
$testdata[] = "NN-113^info info^MNO12^some notes here\r\n";

$dbdata = array ();
$row = "";
$cnt = 0;
foreach ($testdata as $line) {
$delimiters = preg_match_all ("/\^/", $line, $waste);
if (($cnt + $delimiters) > 3) {
$dbdata[] = $row;
$cnt = $delimiters;
$row = $line;
} else {
$row .= $line;
$cnt += $delimiters;
}
}
$dbdata[] = $row;
print_r ($dbdata);
?>

…. produces …

Array (
[0] => CODE#1^DESCRIPTION^CODE#2^NOTES
[1] => NN-110^an info of NN-001^BRY234^some notes
[2] => NN-111^1st line data 2nd line data 3rd line data^BRT345^another
notes
[3] => NN-112^description of NN-112^BBC23^multiline notes blah blah blah
[4] => NN-113^info info^MNO12^some notes here
)

You can then easily iterate through this array, exploding each line by the ^
and creating the INSERT INTO table VALUES (); bits of SQL.

Paulus

A(Answer):

this script failed if any of the cell is blank/no-value,
e.g:

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^^^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
"Ara.T.Howard" <ah*****@fsl.noaa.gov> wrote in message
news:Pi*******************************@eli.fsl.noa a.gov…

On Tue, 7 Oct 2003, Artco News wrote:

I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.
ruby is one of the more advanced 🙂

~/eg/ruby > cat ./parse.rb

#!/usr/bin/env ruby

txt = <<-txt
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
txt
pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
tuples = txt.scan pat

tuples.map{|tuple| p tuple}
~/eg/ruby > ./parse.rb

[" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
[" NN-110", "an info of NN-001", "BRY234", "some notes"]
[" NN-111", "1st line data\n 2nd line data\n 3rd line data",

"BRT345", "another notes"] [" NN-112", "description of NN-112", "BBC23", "multiline\n notes
blah\n blah\n blah"] [" NN-113", "info info", "MNO12", "some notes here"]

-a
====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: ah*****@noaa.gov
| Phone: 303-497-7238
| Fax: 303-497-7259
| The difference between art and science is that science is what we
understand | well enough to explain to a computer. Art is everything else.
| — Donald Knuth, "Discover"
| ~ > /bin/sh -c ‘for lang in ruby perl; do $lang -e "print
\"\x3a\x2d\x29\x0a\""; done’ ====================================

A(Answer):

Got it! I just have to replace the (+) sign with (*) for blank or any
string.

Next, how do I insert those values into MySQL database, assuming I have
those table defined. Thanks.

"Useko Netsumi" <us*****@nyc.rr.com> wrote in message
news:bm************@ID-159205.news.uni-berlin.de…

this script failed if any of the cell is blank/no-value,
e.g:

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^^^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
"Ara.T.Howard" <ah*****@fsl.noaa.gov> wrote in message
news:Pi*******************************@eli.fsl.noa a.gov…

On Tue, 7 Oct 2003, Artco News wrote:

I thought I ask the scripting guru about the following.

I have a file containing records of data with the following
format(first column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g.
MySQL/Access?
Perhaps there are an advanced scripting language can do this easily.

ruby is one of the more advanced 🙂

~/eg/ruby > cat ./parse.rb

#!/usr/bin/env ruby

txt = <<-txt
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
txt
pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
tuples = txt.scan pat

tuples.map{|tuple| p tuple}
~/eg/ruby > ./parse.rb

[" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
[" NN-110", "an info of NN-001", "BRY234", "some notes"]
[" NN-111", "1st line data\n 2nd line data\n 3rd line data",

"BRT345", "another notes"]

[" NN-112", "description of NN-112", "BBC23", "multiline\n notes

blah\n blah\n blah"]

[" NN-113", "info info", "MNO12", "some notes here"]

-a
====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: ah*****@noaa.gov
| Phone: 303-497-7238
| Fax: 303-497-7259
| The difference between art and science is that science is what we

understand

| well enough to explain to a computer. Art is everything else.
| — Donald Knuth, "Discover"
| ~ > /bin/sh -c ‘for lang in ruby perl; do $lang -e "print

\"\x3a\x2d\x29\x0a\""; done’

====================================

A(Answer):

"Artco News" <ar*******@verizon.net> schrieb im Newsbeitrag
news:CT*******************@nwrdny01.gnilink.net…

I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.

Ruby:

#!/usr/bin/ruby

def process(rec)
while rec.size > 4
dbRec = rec.slice!( 0..3 )
# db insertion here
p dbRec
end
end

rec = []

while ( line = gets )
line.chomp!
rec.concat( line.split(‘^’) )
process rec
end

process rec

A(Answer):

On Wed, 8 Oct 2003, Useko Netsumi wrote:

Got it! I just have to replace the (+) sign with (*) for blank or any
string.

Next, how do I insert those values into MySQL database, assuming I have
those table defined. Thanks.

file: parse.rb
—-CUT—-
#!/usr/bin/env ruby
require ‘mysql’

# command line args
host, user, passwd, db, relation = ARGV
db ||= ‘test’
relation ||= ‘test’

# connect to db
mysql = Mysql.connect host, user, passwd
mysql.select_db db

# parse
txt = DATA.read
pat = %r{([^^]*)\^([^^]*)\^([^^]*)\^([^^]*)\n}mox
tuples = txt.scan pat

# insert tuples
sql = "insert into %s values(‘%s’,’%s’,’%s’,’%s’)"
tuples.each do |tuple|
begin
insert = sql % [relation, *tuple]
mysql.query insert
rescue Exception => e
p e
end
end

# show results
res = mysql.query(‘select * from %s’ % [relation])
while((row = res.fetch_row))
p row
end
# sample input is embedded below – can be read via DATA object
__END__
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
—-CUT—-

running it looks like:

~/eg/ruby > ./parse.rb
["CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
["NN-110", "an info of NN-001", "BRY234", "some notes"]
["NN-111", "1st line data\n2nd line data\n3rd line data", "BRT345", "another notes"]
["NN-112", "description of NN-112", "BBC23", "multiline\nnotes blah\nblah\nblah"]
["NN-113", "info info", "MNO12", "some notes here"]
i created a database named ‘test’, and a table named ‘test’ using ‘create
table test(f0 text,f1 text,f2 text,f3 text)’

hope that gets you going.

-a

"Useko Netsumi" <us*****@nyc.rr.com> wrote in message
news:bm************@ID-159205.news.uni-berlin.de…

this script failed if any of the cell is blank/no-value,
e.g:

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^^^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
"Ara.T.Howard" <ah*****@fsl.noaa.gov> wrote in message
news:Pi*******************************@eli.fsl.noa a.gov…

On Tue, 7 Oct 2003, Artco News wrote:

> I thought I ask the scripting guru about the following.
>
> I have a file containing records of data with the following format(first > column is the label):
>
> CODE#1^DESCRIPTION^CODE#2^NOTES
> NN-110^an info of NN-001^BRY234^some notes
> NN-111^1st line data
> 2nd line data
> 3rd line data^BRT345^another notes
> NN-112^description of NN-112^BBC23^multiline
> notes blah
> blah
> blah
> NN-113^info info^MNO12^some notes here
>
> How do I parse so I can insert them in the database, e.g. MySQL/Access? >
> Perhaps there are an advanced scripting language can do this easily.

ruby is one of the more advanced 🙂

~/eg/ruby > cat ./parse.rb

#!/usr/bin/env ruby

txt = <<-txt
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
txt
pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
tuples = txt.scan pat

tuples.map{|tuple| p tuple}
~/eg/ruby > ./parse.rb

[" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
[" NN-110", "an info of NN-001", "BRY234", "some notes"]
[" NN-111", "1st line data\n 2nd line data\n 3rd line data",

"BRT345", "another notes"]

[" NN-112", "description of NN-112", "BBC23", "multiline\n notes

blah\n blah\n blah"]

[" NN-113", "info info", "MNO12", "some notes here"]

-a
====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: ah*****@noaa.gov
| Phone: 303-497-7238
| Fax: 303-497-7259
| The difference between art and science is that science is what we

understand

| well enough to explain to a computer. Art is everything else.
| — Donald Knuth, "Discover"
| ~ > /bin/sh -c ‘for lang in ruby perl; do $lang -e "print

\"\x3a\x2d\x29\x0a\""; done’

====================================

====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: ar**********@noaa.gov
| Phone: 303-497-7238
| Fax: 303-497-7259
| The difference between art and science is that science is what we understand
| well enough to explain to a computer. Art is everything else.
| — Donald Knuth, "Discover"
| ~ > /bin/sh -c ‘for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\x0a\""; done’
====================================

LEAVE A COMMENT