experchange > perl

Sri (03-02-20, 06:25 PM)
Hi experts,

Thanks for helping me out earlier with parsing an XML file. But I was not able to install the XML module successfully. So, I am trying it alternatively as follows:

The files 1000+ entries of the following string (the number such as 10 in the below string changes in some lines)

maxCategories</key><selectedValue><string>10<

I need to count the embedded number in the above string (sort by the highest number on the top) and the output has to be in the following format.
=========

500 - 20 times
200 - 40 times
150 - 100 times
100 - 200 times
20 - 250 times
10 - 400 times

Here is the snippet I am writing, but I need your help in modifying to get the above output. Thanks in advance for your kind help.
+++++

while (<DATA>)
{
$max = "maxCategories";
$_ = ~s/$max.*?\(\d+)/$1/;
print;
}

__DATA__
blah blah;maxCategories&lt;/key&gt;&lt;selectedValue&gt;&lt;string&gt;10&lt; blah blah

blah blah;maxCategories&lt;/key&gt;&lt;selectedValue&gt;&lt;string&gt;100&lt; blah blah

blah blah;maxCategories&lt;/key&gt;&lt;selectedValue&gt;&lt;string&gt;500&lt; blah blah
Andreas Karrer (03-02-20, 07:35 PM)
* Sri <schimata>:
> Hi experts,
> Thanks for helping me out earlier with parsing an XML file. But I was not able to install the XML module successfully. So, I am trying it alternatively as follows:
> The files 1000+ entries of the following string (the number such as 10 in the below string changes in some lines)
> maxCategories&lt;/key&gt;&lt;selectedValue&gt;&lt;string&gt;10&lt;


Really with &lt; &gt; and not with < > ?

> I need to count the embedded number in the above string (sort by the highest number on the top) and the output has to be in the following format.
>=========
> 500 - 20 times
> 200 - 40 times
> 150 - 100 times
> 100 - 200 times
> 20 - 250 times
> 10 - 400 times


The typical perl idiom is to use the number as the key of an
associative array (here, %linecount) and increment the value each time you
encounter a line:

1: my %linecount;
2:
3: while (<DATA>) {
4: if (m/^maxCategories.*\&gt;(\d+)\&lt;/) {
5: $linecount{$1}++;
6: }
8: }
9:
10: for (sort { $b <=> $a } keys %linecount) {
11: print "$_ - $linecount{$_}\n";
12: }

Line 4 uses a regex to extract the number (10, 20, 100 etc) from the
line. What is matched in the first (here: the only) pair of parenthesis
of the regex is available in $1 afterwards. You don't need a s///, a match
if enough.

Line 5 uses this number in $1 as the key into the associative array and
takes advantage of the fact that the undefined value evaluates to 0 in
a numerical context. It's short for the much more verbose, but probably
clearer:

if (defined $linecount{$1}) {
$linecount{$1} = $linecount{$1} + 1
} else {
$linecount{$1} = 1;
}

Line 10: ``keys %linecount'' returns a list of the keys of the associative
array %linecount. ``sort { $b <=> $a }'' sorts this list in descending
numerical order.
Henry Law (03-02-20, 07:36 PM)
On 02/03/2020 16:25, Sri wrote:
> Hi experts,
> But I was not able to install the XML module successfully.


Does that mean you didn't have the right /authority/ to install it?
That happens, and it's a pain to work round. But if you mean that the
installation failed then you should work on that, I think.

> maxCategories&lt;/key&gt;&lt;selectedValue&gt;&lt;string&gt;10&lt;
> I need to count the embedded number in the above string (sort by the highest number on the top) and the output has to be in the following format.
> =========
> 500 - 20 times


> Here is the snippet I am writing, but I need your help in modifying to get the above output. Thanks in advance for your kind help.
> +++++


This "snippet" is a long long way from any likelihood of producing the
output described above. For a start it doesn't compile, even if you
omit "use strict", which is never a good idea.

Unmatched ) in regex; marked by <-- HERE in m/maxCategories.*?\(\d+) <--
HERE / at ./sri.pl line 9, <DATA> line 1

Did you copy and paste your code direct? I'd guess not.

In the hopes of being a helpful I mended your regex (taking the escape
out before the left paren) and this happened:

henry@eris:~/wip$ ./sri.pl
18446744073709551614184467440737095516151844674407 37095516141844674407370955161518446744073709551614

I can't even imagine how your three data lines produced that and I'm not
going to spend time trying to find out. It's probably something to do
with the fact that you've coded "= ~" instead of "=~".

I think you need to do a bit more work and get something that more or
less does what you want and then maybe someone here will help you to
complete it.

Hint: use a hash to accumulate instances of your number, with something
like $counts{$the_number}++. Then print the contents of the hash at the
end of the program.
Rainer Weikusat (03-02-20, 09:24 PM)
Sri <schimata> writes:
[..]
> 100 - 200 times
> 20 - 250 times
> 10 - 400 times


You should really provide more information about (or have a clearer idea
of) you data format. Depending on how that looks like, the code below
may for work it. It works for the example.

---------
my %freqs;

/maxCat\D+(\d+)/ and ++$freqs{$1} while <DATA>;
print("$_ - $freqs{$_} times\n") for sort {$b <=> $a } keys %freqs;

__DATA__
blah blah;maxCategories&lt;/key&gt;&lt;selectedValue&gt;&lt;string&gt;10&lt; blah blah
blah blah;maxCategories&lt;/key&gt;&lt;selectedValue&gt;&lt;string&gt;100&lt; blah blah
blah blah;maxCategories&lt;/key&gt;&lt;selectedValue&gt;&lt;string&gt;500&lt; blah blah
blah blah;maxCategories&lt;/key&gt;&lt;selectedValue&gt;&lt;string&gt;500&lt; blah blah
blah blah;maxCategories&lt;/key&gt;&lt;selectedValue&gt;&lt;string&gt;500&lt; blah blah
blah blah;maxCategories&lt;/key&gt;&lt;selectedValue&gt;&lt;string&gt;10&lt; blah blah
Henry Law (03-03-20, 12:55 AM)
On 02/03/2020 19:24, Rainer Weikusat wrote:
> my %freqs;
> /maxCat\D+(\d+)/ and ++$freqs{$1} while <DATA>;
> print("$_ - $freqs{$_} times\n") for sort {$b <=> $a } keys %freqs;


I knew we were in the presence of a master. My code would have been
twice as long.
Similar Threads