Discussion:
[rt.cpan.org #106142] [Patch] Preload dependencies for PDL and PDL::NiceSlice
(too old to reply)
Roderich Schupp via RT
2015-07-29 15:23:40 UTC
Permalink
Wed Jul 29 11:23:33 2015: Request 106142 was acted upon.
Transaction: Correspondence added by RSCHUPP
Queue: Module-ScanDeps
Subject: [Patch] Preload dependencies for PDL and PDL::NiceSlice
Broken in: (no value)
Severity: (no value)
Owner: Nobody
Requestors: ***@cpan.org
Status: new
Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=106142 >
use PDL;
use PDL::NiceSlice;
my $x = pdl [[2,3,4],[1,2,3]];
print $x(1,);
print $x;
I agree on the %Preload rule for PDL/NiceSlice.pm, it can be made more robust as

'PDL/NiceSlice.pm' => 'sub',

but the rule for PDL.pm needs further investigation. Somehow utf8_heavy.pl is
needed... Maybe this was caused by rev 1501 in https://www.openfoundry.org/svn/par
when I removed the scan rule that says

Foo::Bar::quux(...)

or

Foo::Bar->quux(...)

implies we should add a dependency on Foo::Bar.

Cheers, Roderich
Shawn Laffan via RT
2015-07-29 21:50:59 UTC
Permalink
Wed Jul 29 17:50:53 2015: Request 106142 was acted upon.
Transaction: Correspondence added by SLAFFAN
Queue: Module-ScanDeps
Subject: [Patch] Preload dependencies for PDL and PDL::NiceSlice
Broken in: (no value)
Severity: (no value)
Owner: Nobody
Requestors: ***@cpan.org
Status: open
Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=106142 >


Thanks. The PDL::NiceSlice approach is much simpler.

If I can get a chance I'll look into the PDL code to see what sort of calls are used.

Regards,
Shawn.
Post by Roderich Schupp via RT
use PDL;
use PDL::NiceSlice;
my $x = pdl [[2,3,4],[1,2,3]];
print $x(1,);
print $x;
I agree on the %Preload rule for PDL/NiceSlice.pm, it can be made more
robust as
'PDL/NiceSlice.pm' => 'sub',
but the rule for PDL.pm needs further investigation. Somehow
utf8_heavy.pl is
needed... Maybe this was caused by rev 1501 in
https://www.openfoundry.org/svn/par
when I removed the scan rule that says
Foo::Bar::quux(...)
or
Foo::Bar->quux(...)
implies we should add a dependency on Foo::Bar.
Cheers, Roderich
Shawn Laffan via RT
2015-08-02 04:48:43 UTC
Permalink
Sun Aug 02 00:48:42 2015: Request 106142 was acted upon.
Transaction: Correspondence added by SLAFFAN
Queue: Module-ScanDeps
Subject: [Patch] Preload dependencies for PDL and PDL::NiceSlice
Broken in: (no value)
Severity: (no value)
Owner: Nobody
Requestors: ***@cpan.org
Status: open
Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=106142 >
Post by Shawn Laffan via RT
Thanks. The PDL::NiceSlice approach is much simpler.
If I can get a chance I'll look into the PDL code to see what sort of
calls are used.
Regards,
Shawn.
Post by Roderich Schupp via RT
use PDL;
use PDL::NiceSlice;
my $x = pdl [[2,3,4],[1,2,3]];
print $x(1,);
print $x;
I agree on the %Preload rule for PDL/NiceSlice.pm, it can be made
more
robust as
'PDL/NiceSlice.pm' => 'sub',
but the rule for PDL.pm needs further investigation. Somehow
utf8_heavy.pl is
needed... Maybe this was caused by rev 1501 in
https://www.openfoundry.org/svn/par
when I removed the scan rule that says
Foo::Bar::quux(...)
or
Foo::Bar->quux(...)
implies we should add a dependency on Foo::Bar.
Cheers, Roderich
It looks like the change in rev 1501 is not the cause. I modified the regexp to use a negative lookbehind to avoid false positives on $foo->bar->baz, and added it into scan_chunk, but utf8_heavy.pl was not packed. I did not check for Foo::Bar::baz(), though.

return $1 if (/(?<!\W) \b (\w+(?:::\w+)*) \s* (?:->)/x and $1 ne 'Tk' and $1 ne 'shift' and $1 ne '__PACKAGE__');

A bit of further searching through the PDL code indicated File::Map was a pinch-point, so I added some more Preload rules as a check (see below). These seem to work, but need further investigation since they are not generalised.

File::Map is loaded in PDL::Core, PDL::IO::FlexRaw and PDL::IO::FastRaw using a string eval. From PDL::Core:

eval 'use File::Map 0.47 qw(:all)';


As a quick experiment, I added the following preload rules to Module::ScanDeps. The explicit listing of unicore/Heavy.pl is needed because it is not found through the utf8.pm preload sub. I am sure there is a better way, but maybe this helps locate the source of the problem?


'File/Map.pm' => ['utf8.pm', 'unicore/Heavy.pl'],
'PDL' => ['PDL/Core.pm'],
'PDL/Core.pm' => ['File/Map.pm'],


When I then run pp on the script below, explicitly loading File:Map, the packed executable works.

use PDL;
use File::Map;
my $x = pdl [[2,3,4],[1, 2, 3]];
print $x;


However, if I comment out the 'use File::Map' line it does not pack utf8_heavy.pl, so the preload rules above are clearly wrong.

Instrumenting _glob_in_inc to print to stdout whenever unico[rd]e is passed as the subdir argument has no effect, so I assume the utf8.pm preload sub is not being run for the above preload rules.


Hopefully the above is helpful.

Regards,
Shawn.
Roderich Schupp via RT
2015-08-02 22:02:36 UTC
Permalink
Sun Aug 02 18:02:33 2015: Request 106142 was acted upon.
Transaction: Correspondence added by RSCHUPP
Queue: Module-ScanDeps
Subject: [Patch] Preload dependencies for PDL and PDL::NiceSlice
Broken in: (no value)
Severity: (no value)
Owner: Nobody
Requestors: ***@cpan.org
Status: open
Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=106142 >
Post by Shawn Laffan via RT
Instrumenting _glob_in_inc to print to stdout whenever unico[rd]e is
passed as the subdir argument has no effect, so I assume the utf8.pm
preload sub is not being run for the above preload rules.
Thanks for investigating. I tried to figure out at what point utf8_heavy.pl
comes into play. For that I prepended this to your sample script

BEGIN
{
# insert spy CODE into require's module lookup
unshift @INC, sub
{
my ($self, $pm) = @_;
print STDERR "# require $pm\n";
($package, $filename, $line) = caller;
print STDERR "# from $package ($filename:$line)\n";
return; # i.e. take a pass
};
}

This intercepts any (explicit or implicit) "require", prints out what is required
and from where and then resumes "normal" processing. Here's the output

# require PDL.pm
# from main (/home/roderich/todo/PAR/Module-ScanDeps/shawn.pl:15)
# require PDL/Core.pm
# from main ((eval 1):6)
# require PDL/Types.pm
# from PDL::Core (/usr/lib/x86_64-linux-gnu/perl5/5.22/PDL/Core.pm:223)
# require Carp.pm
# from PDL::Types (/usr/lib/x86_64-linux-gnu/perl5/5.22/PDL/Types.pm:6)
# require strict.pm
# from Carp (/usr/share/perl/5.22/Carp.pm:4)
# require warnings.pm
# from Carp (/usr/share/perl/5.22/Carp.pm:5)
# require Exporter.pm
# from Carp (/usr/share/perl/5.22/Carp.pm:99)
# require overload.pm
# from PDL::Type (/usr/lib/x86_64-linux-gnu/perl5/5.22/PDL/Types.pm:428)
# require overloading.pm
# from overload (/usr/share/perl/5.22/overload.pm:83)
# require warnings/register.pm
# from overload (/usr/share/perl/5.22/overload.pm:144)
# require Exporter/Heavy.pm
# from Exporter (/usr/share/perl/5.22/Exporter.pm:16)
# require PDL/Exporter.pm
# from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):314)
# require DynaLoader.pm
# from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):315)
# require Config.pm
# from DynaLoader (/usr/lib/x86_64-linux-gnu/perl/5.22/DynaLoader.pm:21)
# require vars.pm
# from Config (/usr/lib/x86_64-linux-gnu/perl/5.22/Config.pm:11)
# require Scalar/Util.pm
# from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):1000)
# require List/Util.pm
# from Scalar::Util (/usr/lib/x86_64-linux-gnu/perl/5.22/Scalar/Util.pm:11)
# require XSLoader.pm
# from List::Util (/usr/lib/x86_64-linux-gnu/perl/5.22/List/Util.pm:21)
# require utf8.pm
# from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):1028)
# require utf8_heavy.pl
# from utf8 (/usr/share/perl/5.22/utf8.pm:16)
# require re.pm
# from utf8 (/usr/share/perl/5.22/utf8_heavy.pl:4)
# require unicore/Heavy.pl
# from utf8 (/usr/share/perl/5.22/utf8_heavy.pl:185)
# require unicore/lib/Alpha/Y.pl
# require PDL/Options.pm
# from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):3288)
# require Fcntl.pm
# from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):4167)
...

utf8.pm and the utf8_heavy.pl are actually loaded from PDL::Core.pm
The funny "Basic/Core/Core.pm.PL (i.e. PDL::Core.pm)" is caused by the fact
that PDL/Core.pm is a generated file with some

# line 123 "Basic/Core/Core.pm.PL (i.e. PDL::Core.pm)"

lines in it. And the offending line is

if $value =~ /e\p{IsAlpha}/ or $value =~ /\p{IsAlpha}e/;

There's no explicit mention of utf8.pm here - the code uses a Unicode property
in a regular expression. utf8.pm (at least in Perl 5.22) doesn't do anything
except setting up a AUTOLOAD sub that will require utf8_heavy.pl when being run.
(If you check $utf8::AUTOLOAD when our @INC spy is called, it's value is "utf8::SWASHNEW".)

So the whole utf8_heavy.pl + unico[dr]e shebang is triggered on demand whenever
some Unicode feature of Perl is requested, e.g. a Unicode property in a regex,
probably lots of others.

I don't think it's feasible to try to detect this by statical analysis.
Should we just add this stuff (at least 4 MB speread over more than 400 files)
to _every_ packed executable?

Cheers, Roderich
Shawn Laffan via RT
2015-08-03 00:04:00 UTC
Permalink
Sun Aug 02 20:04:00 2015: Request 106142 was acted upon.
Transaction: Correspondence added by SLAFFAN
Queue: Module-ScanDeps
Subject: [Patch] Preload dependencies for PDL and PDL::NiceSlice
Broken in: (no value)
Severity: (no value)
Owner: Nobody
Requestors: ***@cpan.org
Status: open
Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=106142 >
Post by Roderich Schupp via RT
Post by Shawn Laffan via RT
Instrumenting _glob_in_inc to print to stdout whenever unico[rd]e is
passed as the subdir argument has no effect, so I assume the utf8.pm
preload sub is not being run for the above preload rules.
Thanks for investigating. I tried to figure out at what point
utf8_heavy.pl
comes into play. For that I prepended this to your sample script
BEGIN
{
# insert spy CODE into require's module lookup
{
print STDERR "# require $pm\n";
($package, $filename, $line) = caller;
print STDERR "# from $package ($filename:$line)\n";
return; # i.e. take a pass
};
}
This intercepts any (explicit or implicit) "require", prints out what
is required
and from where and then resumes "normal" processing. Here's the output
# require PDL.pm
# from main (/home/roderich/todo/PAR/Module-ScanDeps/shawn.pl:15)
# require PDL/Core.pm
# from main ((eval 1):6)
# require PDL/Types.pm
# from PDL::Core (/usr/lib/x86_64-linux-
gnu/perl5/5.22/PDL/Core.pm:223)
# require Carp.pm
# from PDL::Types (/usr/lib/x86_64-linux-
gnu/perl5/5.22/PDL/Types.pm:6)
# require strict.pm
# from Carp (/usr/share/perl/5.22/Carp.pm:4)
# require warnings.pm
# from Carp (/usr/share/perl/5.22/Carp.pm:5)
# require Exporter.pm
# from Carp (/usr/share/perl/5.22/Carp.pm:99)
# require overload.pm
# from PDL::Type (/usr/lib/x86_64-linux-
gnu/perl5/5.22/PDL/Types.pm:428)
# require overloading.pm
# from overload (/usr/share/perl/5.22/overload.pm:83)
# require warnings/register.pm
# from overload (/usr/share/perl/5.22/overload.pm:144)
# require Exporter/Heavy.pm
# from Exporter (/usr/share/perl/5.22/Exporter.pm:16)
# require PDL/Exporter.pm
# from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):314)
# require DynaLoader.pm
# from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):315)
# require Config.pm
# from DynaLoader (/usr/lib/x86_64-linux-
gnu/perl/5.22/DynaLoader.pm:21)
# require vars.pm
# from Config (/usr/lib/x86_64-linux-gnu/perl/5.22/Config.pm:11)
# require Scalar/Util.pm
# from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):1000)
# require List/Util.pm
# from Scalar::Util (/usr/lib/x86_64-linux-
gnu/perl/5.22/Scalar/Util.pm:11)
# require XSLoader.pm
# from List::Util (/usr/lib/x86_64-linux-
gnu/perl/5.22/List/Util.pm:21)
# require utf8.pm
# from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):1028)
# require utf8_heavy.pl
# from utf8 (/usr/share/perl/5.22/utf8.pm:16)
# require re.pm
# from utf8 (/usr/share/perl/5.22/utf8_heavy.pl:4)
# require unicore/Heavy.pl
# from utf8 (/usr/share/perl/5.22/utf8_heavy.pl:185)
# require unicore/lib/Alpha/Y.pl
# require PDL/Options.pm
# from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):3288)
# require Fcntl.pm
# from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):4167)
...
utf8.pm and the utf8_heavy.pl are actually loaded from PDL::Core.pm
The funny "Basic/Core/Core.pm.PL (i.e. PDL::Core.pm)" is caused by the
fact
that PDL/Core.pm is a generated file with some
# line 123 "Basic/Core/Core.pm.PL (i.e. PDL::Core.pm)"
lines in it. And the offending line is
if $value =~ /e\p{IsAlpha}/ or $value =~ /\p{IsAlpha}e/;
There's no explicit mention of utf8.pm here - the code uses a Unicode
property
in a regular expression. utf8.pm (at least in Perl 5.22) doesn't do
anything
except setting up a AUTOLOAD sub that will require utf8_heavy.pl when
being run.
is "utf8::SWASHNEW".)
So the whole utf8_heavy.pl + unico[dr]e shebang is triggered on demand
whenever
some Unicode feature of Perl is requested, e.g. a Unicode property in
a regex,
probably lots of others.
I don't think it's feasible to try to detect this by statical
analysis.
Should we just add this stuff (at least 4 MB speread over more than
400 files)
to _every_ packed executable?
Cheers, Roderich
Thanks Roderich,

The size issue rears its head once more...

It would also be a Herculean task to get static scanning to detect all such cases (although maybe PPI could be leveraged if someone ever has the tuits - https://metacpan.org/pod/PPI::Token::Regexp ).

Perhaps another flag could be added to pp for the cases where the code does not explicitly call for unicode, but it is needed for a packed executable to work. pp --unicode?


I also now think that this is the root cause of an issue I've been working around for a while using the code below. I use the pp -x flag when building, and set an environment variable in my script before calling pp.

if ($ENV{BDV_PP_BUILDING}) {
use 5.016;
use feature 'unicode_strings';
my $string = "sp_self_only() and \N{WHITE SMILING FACE}";
$string =~ /\bsp_self_only\b/;
}

Given that, it should be possible to statically scan for the various permutations of /use feature 'unicode_/ to detect unicode_strings and unicode_eval. If someone is using those features in their code then they need the extra libraries.
https://metacpan.org/pod/feature#The-unicode_strings-feature

Such scanning would not detect multiline chunks, as per the documentation caveats. A "pp -unicode" style flag would still be needed in such cases.
https://metacpan.org/pod/Module::ScanDeps#CAVEATS


WRT the pp flag, maybe a more general approach would be something that parallels the feature pragma, e.g.
pp --feature=unicode_strings,unicode_eval
pp --feature=":5.12"


Regards,
Shawn.
Roderich Schupp via RT
2016-12-19 13:35:46 UTC
Permalink
Mon Dec 19 08:35:35 2016: Request 106142 was acted upon.
Transaction: Correspondence added by RSCHUPP
Queue: Module-ScanDeps
Subject: [Patch] Preload dependencies for PDL and PDL::NiceSlice
Broken in: (no value)
Severity: (no value)
Owner: Nobody
Requestors: ***@cpan.org
Status: open
Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=106142 >


Better late than never...

The %Preload rule for PDL::NiceSlice was added in Module::ScanDeps 1.20.
The --unicode option for pp was added in PAR::Packer 1.29.


Chers, Roderich

Loading...