Joined: 20 Dec 2010
Location: Berlin, Germany
|Posted: Wed Dec 22, 2010 3:26 pm Post subject: a common file renaming task
|A very common task of renaming of files is still waiting on the software tool which can solve the task. At the moment, the user itselfs is the tool.
I think, Pfrank is the most advanced tool to help the user with this task
If you download many scientific or other important publications from publishers, institutions, agencies or government, you get files with useless names.
This is one of the actuell issue of the Official Journal of the European Union
nobody likes regulations, but you have to know the text of the regulation. If you download one of the pdf-files, you get file names like this
This file names follows a clear file name rules. Unfortunately, this file name rules is useless outside the european government. Only some figures are useful.
The original file name includes the date of publication 201012-22, but you need drei search replace operations with regular expressions to save the information you need.
Where is the document title? Of course at the beginning of the first side of the document and at the website left of the pdf-symbol. And you the find the document title since 2010 in the correct line of the metadata of the file
Therefore your plugin
Insert Title MetaData From PDF files
should be not only a plugin, it should be part of the common configuration of pfrank. It is very useful and necessary
So the user could integrate the document title in the file title. As you see on the website of the european journal, the document titles are are always too long.
Nicht really a problem for your software. 4, sometimes 5 search and replace operations with regular expression and the document title is short enough for a one view understanding of the content of the file.
This is a easy renaming task.
You see a actual regulation of the german parliament
The file name of this regulation file follows also a file name rule. The information is usefull and should be part of the final file name. Unfortunately, the metadata are empty. You have to copy the document title and the date of publication from the website to the clipboard.
The task for the renamer is to integrate name parts from the clipboard in the final file name.
An actual publications of the european medicines agency
you see a pdf link with a really good link title, which is equal with the document title and will be part of the file name. The date of publication stands nearby 17/12/2010, If you download the pdf, you get
as the file name. Such a type of file name is perharps useful in the agency. The user of these files needs files names you can understand with one view
There is a big demand for a file renamer, which can handle the task of renaming files during the download. The manual renaming of files cost time and is a task nobody likes.
Most of this manual renaming you can automate
Pfrank is the most advanced tool to handle this task
Pfrank is like a washing maschine
The original file names are the dirty clothes
The text modules copied in the clipboard are the washing powders
The set of search replace operations with regular expressions are the washing program designed for every popular website with downloadable files
(of course depend the washing programs onto to the name rules for the final file names)
And finally the clean clothes are the final file names
thanks for your attention
Joined: 09 Mar 2007
|Posted: Sun Jan 30, 2011 6:15 pm Post subject:
|Sorry I have taken so long to respond.
I started to research adding renaming of pdf files using pdf meta data but I got side-tracked and I forgot to send a reply.
There is a ython module that can read pdf meta data. It is called pyPdf.
I made a test program to try reading some of the files from the links you included and got the following:
test1.pdf title is: 'DUMMY'.
test1.pdf author is: 'Publications Office'.
test1.pdf subject is: ' '.
test1.pdf creator is: 'Arbortext Advanced Print Publisher 9.1.500/W Unicode'.
test1.pdf producer is: '3-Heights(TM) PDF Producer 188.8.131.52 (http://www.pdf-tools.com)'.
test1.pdf date created is: 'D:20101221124613+01'00''.
test1.pdf date modified is: 'D:20101221125042+01'00''.
test1.pdf category is: 'None'.
test2.pdf title is: 'Xeplion - SMOP'.
test2.pdf author is: 'Administrator'.
test2.pdf subject is: 'None'.
test2.pdf creator is: 'Acrobat PDFMaker 8.1 for Word'.
test2.pdf producer is: 'Acrobat Distiller 8.1.0 (Windows)'.
test2.pdf date created is: 'D:20101216143006Z'.
test2.pdf date modified is: 'D:20101216143010Z'.
test2.pdf category is: 'None'.
The test1 file is from the Official Journal of the European Union site. Notice that it doesn't have a title in the meta data.
The test2 file is from the european medicines agency site and it has a proper title.
I think what you propose is a good idea. Unfortunately I do not have time to add this capability to PFrank ... at least not in the foreseeable future as family and work life leaves no time for anything else.
The pyPdf module can be converted to a plugin ... again I do not have time to do this. If you have any programming experience, perhaps you could try it or maybe one of the other PFrank forum members would like to take this on.
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum