Discussion:
PAR + tmpwatch = mess
(too old to reply)
Markus Jansen
2015-05-12 15:32:04 UTC
Permalink
Hi,

on Linux, very often tmpwatch(8) is employed to clean /tmp and /var/tmp periodically (on other systems, e.g. on Solaris 10,
this is not the default) via cron.daily. Whether atime, mtime, or ctime, or a combination of these stamps are used, is a matter of the local configuration.

Unfortunately, this results not in the removal of complete PAR deployment trees in one of these directories, but to single files.
As PAR restitutes the correct mtime after unpacking for most files, functionality may "slowly deteriorate", worst case depending
on the mtime during packing the PAR executable.

While long-term servers could apply quite some countermeasures (such as using a different location, or periodically "refreshing" their PAR cache),
I fail to see an easy solution for "ordinary" client programs other than setting PAR_CLEAN, thereby slowing startup down.

IMHO the cleanest solution would be to provide tmpwatch with a sort of "wipe all or nothing of a tree" marks.
A possible PAR approach would be to unpack all files without setting the mtime, except for a special "canary bird" file, which would be artificially aged
by a little more than a day. Removing the "canary bird" would then invalidate the cache.

Any opinions are highly appreciated.

Best regards,
Markus



[Ericsson]<http://www.ericsson.com/>

MARKUS JANSEN Dipl.-Ing.
Aachen Engineering Hub ClearCase/Git Expert
ITTE Hub Services / CM Automation Components
EDD/IFT/E

Ericsson
Ericsson Allee 1
52134, Herzogenrath, Germany
Phone +49 2407 575 5157
Mobile +49 172 2742003
Exchange +49 2407 575 0
Fax +49 2407 575 14721
***@ericsson.com
www.ericsson.com

Legal entity: Ericsson GmbH, registered office in Düsseldorf, Germany, Trade Register: Amtsgericht Düsseldorf (HRB 33012). Managing Directors: Stefan Koetz (Chairman), Cecilia Wachtmeister, Bernd Mellinghaus. Supervisory Board: Valter D'Avino (Chairman). This Communication is Confidential. We only send and receive email on the basis of the terms set out at www.ericsson.com/email_disclaimer<http://www.ericsson.com/email_disclaimer>
Shawn Laffan
2015-05-12 22:00:22 UTC
Permalink
Hello Markus,

This looks like RT ticket 101800.
https://rt.cpan.org/Public/Bug/Display.html?id=101800

I submitted a patch to implement the canary approach in that ticket, but
it has not yet had a response.

It could do with some review, so maybe try that patch with your system
and advise if it works or any modifications?

Regards,
Shawn.
Post by Markus Jansen
Hi,
on Linux, very often tmpwatch(8) is employed to clean /tmp and
/var/tmp periodically (on other systems, e.g. on Solaris 10,
this is not the default) via cron.daily. Whether atime, mtime, or
ctime, or a combination of these stamps are used, is a matter of the
local configuration.
Unfortunately, this results not in the removal of complete PAR
deployment trees in one of these directories, but to single files.
As PAR restitutes the correct mtime after unpacking for most files,
functionality may “slowly deteriorate”, worst case depending
on the mtime during packing the PAR executable.
While long-term servers could apply quite some countermeasures (such
as using a different location, or periodically “refreshing” their PAR
cache),
I fail to see an easy solution for “ordinary” client programs other
than setting PAR_CLEAN, thereby slowing startup down.
IMHO the cleanest solution would be to provide tmpwatch with a sort of
“wipe all or nothing of a tree” marks.
A possible PAR approach would be to unpack all files without setting
the mtime, except for a special “canary bird” file, which would be
artificially aged
by a little more than a day. Removing the “canary bird” would then
invalidate the cache.
Any opinions are highly appreciated.
Best regards,
Markus
Ericsson <http://www.ericsson.com/>
*MARKUS JANSEN Dipl.-Ing.*
Aachen Engineering Hub ClearCase/Git Expert
ITTE Hub Services / CM Automation Components
EDD/IFT/E
*Ericsson*
Ericsson Allee 1
52134, Herzogenrath, Germany
Phone +49 2407 575 5157
Mobile +49 172 2742003
Exchange +49 2407 575 0
Fax +49 2407 575 14721
www.ericsson.com
Legal entity: Ericsson GmbH, registered office in Düsseldorf, Germany,
Trade Register: Amtsgericht Düsseldorf (HRB 33012). Managing
Directors: Stefan Koetz (Chairman), Cecilia Wachtmeister, Bernd
Mellinghaus. Supervisory Board: Valter D'Avino (Chairman). This
Communication is Confidential. We only send and receive email on the
basis of the terms set out at www.ericsson.com/email_disclaimer
<http://www.ericsson.com/email_disclaimer>
--
Assoc Prof Shawn Laffan
School of Biological, Earth and Environmental Sciences
UNSW, Sydney 2052, Australia
Tel +61 2 9385 8093
http://www.bees.unsw.edu.au/staff/shawn-laffan
http://www.purl.org/biodiverse (free diversity analysis software)
http://www.tandf.co.uk/journals/ijgis

UNSW CRICOS Provider Code 00098G
Roderich Schupp
2015-05-17 15:40:49 UTC
Permalink
Post by Shawn Laffan
It could do with some review,
Shawn,
sorry for not having looked at this earlier: your patch doesn't solve the
problem at all.
It adds a canary file, alright, but the real problem is that Archive::Zip
(method extractMember)
extractsfiles with their original last modified timestamp restored.
That way you will always have extracted files that are older than the
canary, hence
will be removed by cleaning programs before they ever catch up with the
canary.

So first order of business would be to prevent Archive::Zip from doing that.
Unfortunately this behaviour is hard coded, so we must resort to reset the
last modified time (to "now") ourselves _after_ files have been extracted
(at least in
in PAR::_extract_inc(), but there may be other callers of
Archive::Zip::extract* methods).

Also I don't see the need for PAR::Packer to include the canary file in the
.par archive.
Just create it after the initial extraction phase and set its last modified
time
to something like "24 hours ago".

Cheers, Roderich
Shawn Laffan
2015-05-18 22:49:35 UTC
Permalink
OK. I'll change it to update the mtime after extraction, and create the
canary file at extraction time (which is simpler in any case).

I'll also separate out the other components of the patch as separate
issues (the meta.yml quoting and verbosity levels needed for -add-files).

Is there an alternate repo? svn.openfoundry.org has not been responding
for the last 24 hours, perhaps more. Otherwise I'll wait for it to come
back online to update my repo to the latest par sources before starting.

Shawn.
On Wed, May 13, 2015 at 12:00 AM, Shawn Laffan
It could do with some review,
Shawn,
sorry for not having looked at this earlier: your patch doesn't solve
the problem at all.
It adds a canary file, alright, but the real problem is that
Archive::Zip (method extractMember)
extractsfiles with their original last modified timestamp restored.
That way you will always have extracted files that are older than the
canary, hence
will be removed by cleaning programs before they ever catch up with
the canary.
So first order of business would be to prevent Archive::Zip from doing that.
Unfortunately this behaviour is hard coded, so we must resort to reset the
last modified time (to "now") ourselves _after_ files have been
extracted (at least in
in PAR::_extract_inc(), but there may be other callers of
Archive::Zip::extract* methods).
Also I don't see the need for PAR::Packer to include the canary file
in the .par archive.
Just create it after the initial extraction phase and set its last
modified time
to something like "24 hours ago".
Cheers, Roderich
--
Assoc Prof Shawn Laffan
School of Biological, Earth and Environmental Sciences
UNSW, Sydney 2052, Australia
Tel +61 2 9385 8093
http://www.bees.unsw.edu.au/staff/shawn-laffan
http://www.purl.org/biodiverse (free diversity analysis software)
http://www.tandf.co.uk/journals/ijgis

UNSW CRICOS Provider Code 00098G
Roderich Schupp
2015-05-19 09:48:49 UTC
Permalink
Post by Shawn Laffan
OK. I'll change it to update the mtime after extraction, and create the
canary file at extraction time (which is simpler in any case).
Note (this is mostly a reminder for myself, but other eyes are welcome):
There are places other than PAR::_extract_inc where files are written to
the cache area:

1. files written in stage 1 of the bootstrap process: the custom Perl
interpreter (extracted with the same name as the packed executable to make
$0 work), the shared Perl library (if your Perl is built with one, always
the case on Windows) and all non-system DLLs needed by them (libstdc++ etc,
the list keeps growing with every release of Strawberry Perl)
2. files written in stage 1 of the boostrap process: essential Perl
modules (basically anything transititively required by PAR and
Archive::Zip); these are not in the appended zip and are extracted using
mangled names
3. cache files for modules and DLLs contained in the zip; these are also
extracted using mangled names

For each of the categories, one should

- either check if these are already automatically re-extracted when
missing (might be the case for (3))
- or make sure that:
- their last modified timestamp is the time of extraction (probably
true for (1) and (2), since there *not* extracted by Archive::Zip)
- they're re-extracted when we find that the canary file is missing


Cheers, Roderich
Markus Jansen
2015-06-19 11:38:33 UTC
Permalink
Hi,

sorry for being quite busy, and not contributing anything ...
to ease both the “correction after unpacking” and support for long-term server processes, it may be a good idea to provide a function which updates all timestamps in the program’s PAR cache
(including the canary file to an older value).
I would not recommend the canary mode as a default, unless we have found that it is reasonable to do so.
Hope to be able to contribute some code during summer.

Best regards,
Markus

From: ***@gmail.com [mailto:***@gmail.com] On Behalf Of Roderich Schupp
Sent: Tuesday, May 19, 2015 11:49 AM
To: Shawn Laffan
Cc: ***@perl.org; Markus Jansen
Subject: Re: PAR + tmpwatch = mess

On Tue, May 19, 2015 at 12:49 AM, Shawn Laffan <***@unsw.edu.au<mailto:***@unsw.edu.au>> wrote:
OK. I'll change it to update the mtime after extraction, and create the canary file at extraction time (which is simpler in any case).


Note (this is mostly a reminder for myself, but other eyes are welcome):
There are places other than PAR::_extract_inc where files are written to the cache area:

1. files written in stage 1 of the bootstrap process: the custom Perl interpreter (extracted with the same name as the packed executable to make $0 work), the shared Perl library (if your Perl is built with one, always the case on Windows) and all non-system DLLs needed by them (libstdc++ etc, the list keeps growing with every release of Strawberry Perl)
2. files written in stage 1 of the boostrap process: essential Perl modules (basically anything transititively required by PAR and Archive::Zip); these are not in the appended zip and are extracted using mangled names
3. cache files for modules and DLLs contained in the zip; these are also extracted using mangled names
For each of the categories, one should

* either check if these are already automatically re-extracted when missing (might be the case for (3))
* or make sure that:

* their last modified timestamp is the time of extraction (probably true for (1) and (2), since there not extracted by Archive::Zip)
* they're re-extracted when we find that the canary file is missing

Cheers, Roderich
Roderich Schupp
2015-06-22 12:22:25 UTC
Permalink
Post by Markus Jansen
I would not recommend the canary mode as a default, unless we have found
that it is reasonable to do so.
Why not? If a user is still invoking the packed executable once in a while,
the
canary avoids the situation that he has an incomplete cache area.
If they don't invoke the packed executable anymore, the cache area will be
cleaned up eventually by tmpwatch or whatever, no harm done.

Sorry to pick on this: pp has too many options already.

Cheers, Roderich
Markus Jansen
2015-06-22 13:25:16 UTC
Permalink
Hi,

regarding the intended behaviour, that is if the canary file is the youngest file after unpacking, I don’t have any objections at all.
My point was simply that the unpacking modification might break some code.

Cheers,
Markus

From: ***@gmail.com [mailto:***@gmail.com] On Behalf Of Roderich Schupp
Sent: Monday, June 22, 2015 2:15 PM
To: Markus Jansen
Cc: Shawn Laffan; ***@perl.org
Subject: Re: PAR + tmpwatch = mess

On Fri, Jun 19, 2015 at 1:38 PM, Markus Jansen <***@ericsson.com<mailto:***@ericsson.com>> wrote:
I would not recommend the canary mode as a default, unless we have found that it is reasonable to do so.

Why not? If a user is still invoking the packed executable once in a while, the
canary avoids the situation that he has an incomplete cache area.
If they don't invoke the packed executable anymore, the cache area will be
cleaned up eventually by tmpwatch or whatever, no harm done.
Sorry to pick on this: pp has too many options already.

Cheers, Roderich
Roderich Schupp
2015-06-22 13:43:56 UTC
Permalink
Post by Markus Jansen
My point was simply that the unpacking modification might break some code.
Yes, let's break some code then :)

Cheers, Roderich
Philip Kime
2015-06-22 13:55:47 UTC
Permalink
Please break some code - fixing this is the number one request for biber
users as biber is packed with pp and this issues bites many windows users
due to some supporting XML files disappearing from the cache after auto
tmp cleanup ...

PK

--
Dr P Kime
Post by Markus Jansen
My point was simply that the unpacking modification might break some code.
Yes, let's break some code then :)

Cheers, Roderich
Markus Jansen
2015-07-01 15:49:48 UTC
Permalink
Hi,

hereÂŽs my first try ... find the unified diffs for PAR/Heavy.pm and PAR.pm attached (diff against the 1.010 subversion trunk).

I have also implemented a function PAR::refresh_file_cache(), which allows long-term servers to cope with tmpwatch - if called at least once a day.
This reflects one of my use cases.
The mechanism should also be race condition proof, assuming a weekly cleanup in e.g. /tmp . All time values should be adjustable.

Otherwise - tests should exists, but probably mainly on the PAR-Packer side, and that one I did not touch yet.

Best regards,

Markus


[Ericsson]<http://www.ericsson.com/>

MARKUS JANSEN Dipl.-Ing.
Aachen Engineering Hub ClearCase/Git Expert
ITTE Hub Services / CM Automation Components
EDD/IFT/E

Ericsson
Ericsson Allee 1
52134, Herzogenrath, Germany
Phone +49 2407 575 5157
Mobile +49 172 2742003
Exchange +49 2407 575 0
Fax +49 2407 575 14721
***@ericsson.com
www.ericsson.com

Legal entity: Ericsson GmbH, registered office in DÃŒsseldorf, Germany, Trade Register: Amtsgericht DÃŒsseldorf (HRB 33012). Managing Directors: Stefan Koetz (Chairman), Cecilia Wachtmeister, Bernd Mellinghaus. Supervisory Board: Valter D'Avino (Chairman). This Communication is Confidential. We only send and receive email on the basis of the terms set out at www.ericsson.com/email_disclaimer<http://www.ericsson.com/email_disclaimer>



From: Philip Kime [mailto:***@kime.org.uk]
Sent: Monday, June 22, 2015 3:56 PM
To: Roderich Schupp
Cc: Markus Jansen; Shawn Laffan; ***@perl.org
Subject: Re: PAR + tmpwatch = mess

Please break some code - fixing this is the number one request for biber users as biber is packed with pp and this issues bites many windows users due to some supporting XML files disappearing from the cache after auto tmp cleanup ...

PK
--
Dr P Kime

On 22 Jun 2015, at 15:44, Roderich Schupp <***@gmail.com<mailto:***@gmail.com>> wrote:
On Mon, Jun 22, 2015 at 3:25 PM, Markus Jansen <***@ericsson.com<mailto:***@ericsson.com>> wrote:
My point was simply that the unpacking modification might break some code.

Yes, let's break some code then :)
Cheers, Roderich
Roderich Schupp
2015-05-20 06:31:56 UTC
Permalink
svn.openfoundry.org has not been responding for the last 24 hours,
perhaps more. Otherwise I'll wait for it to come back online to update my
repo to the latest par sources before starting.
The Subversion services at OpenFoundry are back online, the URL has changed
to

https://www.openfoundry.org/svn/par/

(I'll update the module metadata for the next release).

Cheers, Roderich
Loading...