From version 1.2.3, the switches
--doc2bib set the cb2Bib to work on console mode. The non-exact nature of the involved extractions makes logging necessary. On Windows, graphic or console modes must be decided not at run time, but when the application is built. So far, logging and globing were missing. This release adds the convenience wrapper
c2bconsole --txt2bib i*.txt out.bib, for instance, will work as it does in the other platforms.
Lists of references are now sorted case and diacritic insensitive. For some languages such a choice is not the expected one, and some operating systems offer local-aware collation. Due to usual divergences and inaccuracies and in references, this decision was taken to group together 'Density Matrix' with 'Density-matrix', and Møller with Moller, which, in a personal collection, most probably, refer to the same concept and to the same person. Additionally, document to text converted strings are now clean from extraneous, non-textual symbols. Therefore, recreating cache files is recommended.
Finally, this release introduces a new module, named
c2bciter, and aimed to ease inserting citation IDs into documents. The module should ideally stay idle at the system tray, and be recalled as needed by pressing a global, desktop shortcut. This functionality, while desirable, and usual in dictionaries, is platform and desktop dependent. On KDE there are currently known issues when switching among virtual desktops.
For the latter reason, and for not knowing a priori how would such a tool be designed, the cb2Bib internals had been interlaced to its graphical interface. At the time of version 0.7.0, when the graphical libraries changed, and a major refactoring was required, the code started moving toward a better modularization and structure. The current release pushes code organization further. As a result, it adds two new command line switches:
The new cb2Bib module is named after the BibTeX key 'annote'. Annote is not for a 'one reference annotation' though. Instead, Annote is for short notes that interrelate several references. Annote takes a plain text note, with minimal or no markup, inserts the bibliographic citations, and converts it to a HTML page with links to the referenced documents.
From within the cb2Bib, to write your notes, type Alt+A, enter a filename, either new or existing, and once in Annote, type E to launch your default text editor. For help, type F1. Each time you save the document the viewer will be updated. To display mathematical notations, install jsMath locally. And, remember, code refactoring introduces bugs.
document to its bibliographic reference, in a handy way, by dragging the file to the main (at that time, single) panel. Now, in version 1.0.0, when a file is dropped, the cb2Bib scans the document for metadata packets, and checks, in a rather experimental way, whether or not they contain relevant bibliographic information.
Publishers metadata might or might not be accurate. Some, for instance, assign the DOI to the key Title. The cb2Bib extracts possibly relevant key-value pairs and adds them to clipboard panel. Whenever key-value pairs are found accurate, just pressing Alt+G imports them to the line edits. If keys with the prefix
bibtex are found, then, most probable the data was written by JabRef or the cb2Bib itself, and then, the values are automatically imported.
The preparsed metadata that is added to the clipboard panel begins with
[Bibliographic Metadata and ends with
/Bibliographic Metadata]. Therefore, if you are using PDFImport together with a set of regular expressions, such that they contain the begin (^) or end ($) anchors, you can safely replace them by the above tags. In this manner, existing regular expressions remain useful with this minor change. And, with the advantage that, if recognition fails for a given document, metadata might give the hardest fields to extract from a PDF article, which are author and title.
--conf [full_path]cb2bib.confto specify the settings location. This feature was intended, mainly, as a clean way to run the program on a host computer from a removable drive. The work done focused on arranging the command line and settings related code. It was left for a later release to solve some requirements regarding the managing of file pathnames and temporary files.
This release addresses these two points. Now, when the cb2Bib is launched as
cb2bib --conf --without a configuration filename-- it treats filenames as being relative to the cb2Bib actual location. Temporary files, if needed, will be placed at this location as well. Therefore, no data is being written on the host, and the cb2Bib works independently of the actual address that the host assigns to the removable drive.
The Windows' un/installer cleans/sets configuration data on the registry. Being aware of this particular, it might be better not to install the program directly to the USB drive. Just copy the cb2Bib base directory from a home/own computer to the removable drive, and then run it on the host computer as
cb2bib tmp_refpermits importing references from the browser, whenever a download to reference manager choice is available. In addition, the command
cb2bib --bibedit ref.bibdirectly launches the BibTeX editor for file browsing and editing.
This release adds the command line option
--conf [full_path]cb2bib.conf to specifically set a file where all internal settings are being retrieved and stored. This has two interesting applications. On one hand, it easily permits switching from several sets of extraction rules, since the files
netqinf.txt are all stored in the cb2Bib's settings. And, on the other hand, it allows installing the program on a USB flash drive, and cleanly running it on any (e. g., library) computer. Settings can be stored and kept on the external device, and therefore, no data will be written on the registry or settings directory of the host computer.
So far, however, this feature should be regarded as experimental. The Qt library to which the cb2Bib is linked does read/write access to system settings in a few places (concretely, in file and color dialogs). On Unix and Mac OS systems this access can be modified by setting the environment variable DAG_CONFIG_HOME. No such workaround is presently available in Windows.
See The cb2Bib Command Line for a detailed syntax description.
~/.config/MOLspaces/cb2Bib.conf. This file can be removed, or renamed. On Windows, it is recommended to uninstall previous versions before upgrading.
Second, cb2Bib tags are not shown by default. Instead, it is shown plain, raw clipboard data, as it is easier to identify with the original source. To write a regular expression, right click, on the menu, check 'View Tagged Clipboard Data', and perform the extraction from this view.
And finally, the cb2Bib adds the tag <<excerpt>> for network queries. It takes a simplified version of the clipboard contents and sends it to, e.g. Google Scholar. From there, one can easily import BibTeX references related to that contents. Therefore one should unchecked in most cases the 'Perform Network Queries after automatic reference extractions' box.
To ease pattern writing, cb2Bib preprocesses the raw input data. This can consider format conversion by external tools and general substitutions, in addition to including some special tags. The resulting preprocessed data is usually less readable. A particularly illustrating case is when input data comes from a PDF article.
The cb2Bib now optionally presents input data, as raw, unprocessed data. This preserves the block text format of the source, and thus identifying the relevant bibliographic fields by visual inspection is more straightforward. In this raw mode view panel, interaction works in a similar manner. Except that, no conversions or substitutions are seen there, and that no regular expression tags are written.
All known regressions in 0.6.9x series have been fixed. Also, a few minor improvements have been included. In particular, file selection dialogs display navigation history, and BibTeX output file can be conveniently selected from the list of '*.bib' files at the current directory. Such a feature will be specially useful to users that sort references in thematic files located at a given directory.
network capabilities. Network, and hence querying was erratic, both for the internal HTTP routines and for external clients. In addition to this fix, the
netqinf.txthas been updated. PubMed is working again. Queries are also extended to include DOI's. A possible applicability will be for indexing a set of PDF articles with PDFImport. If the article contains its DOI number, and 'Perform Network Queries after automatic reference extractions' is checked, chances are that automatic extractions will work smooth.
Upgrading to Qt4 it is not a "plug and recompile" game. Thorough refactoring and rewriting was required. The resulting cb2Bib code is cleaner and more suitable to further development. As one might expect, major upgrades introduce new bugs that must be fixed. The cb2Bib 0.6.90 is actually a preview version. It has approximately the same functionality than its predecessor. So, no additions were considered at this point. Its use, bug reporting, and feedback are encouraged. This will help to get sooner a stable cb2Bib 0.7.
To compile it, type
./configure as usual. The
configure script calls the
qmake tool to generate an appropriate
Makefile. To make sure the right, Qt4
qmake is invocated, you can setup
QTDIR environment variable prior to
configure's call statement will then be
'$QTDIR/bin/qmake'. E. g., type
'setenv QTDIR /usr' if
qmake happens to be at the directory
<<Tab_n>>to ease the creation of regular expressions for reference extraction. New line and tabular codes from the input stream are substituted by these numbered tags. Numbering new lines and tabulars gives an extra safety when writing down a regular expression. E. g., suppose field title is 'anything' between '
<<NewLine2>>'. We can then easily write 'anything' as '.+' without the risk of overextending the caption to several '\n' codes. On the other hand, one still can use
<<NewLine\d>>if not interested in a specific numbering. All these internal tags are later removed, once cb2Bib postprocesses the entry fields.
The cb2Bib identified so far new lines by checking for '\n' codes. I was unaware that this was a platform dependent, as well as a not completely accurate way of detecting new lines. McKay Euan reported that
<<NewLine_n>> tags were not appearing as expected in the MacOSX version. I later learn that MacOSX uses '\r' codes, and that Windows uses '\r\n', instead of '\n' for new line encoding.
This realease addresses this issue. It is supposed now that the cb2Bib regular expressions will be more transferable among the different platforms. Extraction from plain text sources is expected to be completely platform independent. Extraction from web pages will still remain browser dependent. In fact, each browser adds its peculiar interpretation of a given HTML source. For example, in Wiley webpages we see the sectioning header 'Abstract' in its source and in several browsers, but we see, and get, 'ABSTRACT' if using Konqueror.
What we pay for this more uniform approach is, however, a break in compatibility with previous versions of cb2Bib. Unix/Linux users should not expect many differences, though. Only one from the nine regular expressions in the examples needed to be modified, and the two contributed regular expressions work perfectly without any change. Windows users will not see a duplication of
<<NewLine_n>> tags. To update previous expressions it should be enough just shifting the
<<NewLine_n>> numbering. And, of course, any working regular expression that does not uses
<<NewLine_n>> tags will still be working in this new version.
Finally, just to mention that I do not have a MacOSX to test any of the cb2Bib realeases in this particular platform. I am therefore assuming that these changes will fix the problem at hand. If otherwise, please, let me know. Also, let me know if release 0.6.0 'break' your own expressions. I consider this realease a sort of experimental or beta version, and the previous version 0.5.3, will still be available during this testing period.
First, if you encounter a 'nothing to install'-error during installation on MacOSX 10.4.x using the cb2bib binary installer available at
please delete the cb2bib-receipts from
/Library/Receipts and then rerun the installer. See also M. Bongard's clarifying note 'MACOSX 10.4.X "NOTHING TO INSTALL"-ERROR' for details.
Second, and also extensible to other cb2Bib platform versions, if PDFImport issues the error message 'Failed to call some_format_to_text' tool, make sure such a tool is installed and available. Go to Configure->PDFImport, click at the 'Select External Convert Tool' button, and navigate to set its full path. Since version 0.5.0 the default full path for the MacOSX is already set, and pointing to
Release Note cb2Bib 0.2.1, cb2Bib started checking the clipboard periodically. This checking was later disabled as a default, needing a few lines of code to be uncomented to activate it. Without such a checking, the cb2Bib appears unresponsive when selecting/copying from e.g., acroread or Mozilla. This release includes the class
clipboardpollwritten by L. Lunak for the KDE's Klipper. Checking is performed in a very optimized way. This checking is enabled by default. If you experience problems with this feature, or if the required X11 headers aren't available, consider disabling it by typing
./configure --disable_cbpollprior to compilation. This will disable checking completely. If the naive, old checking is preferred, uncomment the four usual lines,
./configure --disable_cbpoll, and compile.
network files. Queries were then implemented as user customizable HTML posts to journal databases. In addition, these arrangements permitted defining convenience, dynamic bookmarks that were placed at the cb2Bib's 'About' panel.
cb2Bib contains three viewing panels: 'About', 'Clipboard' and 'View BibTeX', being the 'Clipboard' panel the main working area. To keep cb2Bib simple, only two buttons, 'About' and 'View BibTeX', are set to navigate through the panels. The 'About' and 'View BibTeX' buttons are toggle buttons for momentarily displaying their corresponding panels. Guidance was so far provided by enabling/disabling the buttons.
After the bookmark introduction, the 'About' panel has greatly increased its usefullness. Button functionality has been slightly redesigned now to avoid as many keystrokes and mouse clicks as possible. The buttons remain switchable, but they no longer disable the other buttons. User is guided by icon changes instead. Hopefully these changes will not be confusing or counterintuitive.
Bookmarks and querying functionality are customizable through the
netqinf.txt file, which is editable by pressing the
Alt+B keys. Supported queries are of the form 'Journal-Volume-First Page'. cb2Bib parses
netqinf.txt each time a query is performed. It looks for
journal=Full_Name|[code] to obtain the required information for a specific journal. Empty, '
journal=' entries have a meaning of 'any journal'. New in this realease, cb2Bib will test all possible queries for a given journal instead of giving up at the first
No article found message. The query process stops at the first successfull hit or, otherwise, once
netqinf.txt is parsed completely (in an equivalent way as the automatic pattern recognition works). This permits querying multiple -and incomplete- journal databases.
Users should order the
netqinf.txt file in a way it is more convenient. E.g., put PubMed in front of JACS if desired an automatic extraction. Or JACS in front of PubMed and extract from the journal web page, if author accented characters are wanted.
So far, this querying functionality is still tagged as experimental. Either the querying itself or its syntax seem quite successful. However, downloading of PDF files, on windows OS + T1 network, was found to freeze once progress reaches the 30-50%. Any feedback on this issue will be greatly appreciated. Also, information on
kfmclient equivalent tools for non KDE desktops would be worth to be included in the cb2Bib documentation.
There are situations, however, where several author-strings are required. The following box shows one of these cases. Authors are grouped according to their affiliations. Selecting from 'F. N. First' to 'F. N. Fifth' would include 'First Affiliation' within the author string. Cleaning up whatever wording 'First Affiliation' may contain is a rather ill-posed problem. Instead, cb2Bib includes an
Add Authors option. The way of operation is then to select 'F. N. First, F. N. Second, F. N. Third' and chose
Authors and right after, select 'F. N. Fourth and F. N. Fifth' and chose
At this point in the manual extraction, the user was faced with a red
<<moreauthors>> tag in the cb2Bib clipboard panel. The
<<moreauthors>> tag was intended to warn the user about the fact that cb2Bib would not be able to consider the resulting extraction pattern as a valid, general regular expression. Usual regular expressions are built up from an a priori known level of nesting. In these cases, however, the level of nesting is variable. It depends on the number of different affiliations occurring in a particular reference.
So far the
<<moreauthors>> tag has become a true FAQ about cb2Bib and a source of many confusions. There is no real need, however, for such an user warning. The
<<moreauthors>> has therefore been removed and cb2Bib has taken an step further, to its 0.3.0 version.
The cb2Bib 0.3.0 manual extraction works as usual. By clicking
Authors the Authors edit line is reseted and selection contents moved there. Alternatively, if
Add Authors is clicked, selection contents is added to the author field. On this version, however, both operations are tagged as
<<author>> (singular form, as it is the BibTeX keyword for Authors). The generated extraction pattern can now contain any number of
In automatic mode, cb2Bib now adds all
author captions to Authors. In this way, cb2Bib can treat interlaced author-affiliation cases. Obviously, users needing such extractions will have to write particular regular expressions for cases with one set of authors, for two sets, and so on. Eventhough it is not rare a work having a hundred of authors, it would be quite umprobable that they were working on so many different institutions. Therefore, few regular expressions should actually be required in practice. Although not elegant, this breaks what was a cb2Bib limitation and broadens its use when extracting from PDF sources. Remember here to sort these regular expressions in decreasing order, since at present, cb2Bib stops at the first hit. Also, consider
Any Pattern to get ride of the actual affiliation contents, as you might not want to extract authors addresses.
cb2Bib is intended to help updating personal databases of papers. It is a tool focused on what is left behind in database retrieving. Cases such as email alerts, or inter colleague references and PDF sharing are example situations. Though in an electronic format, sources are not standardized or not globally used as to permit using habitual import filters in reference managers. cb2Bib is designed to consider a direct user intervention, either by creating its own useful filters or by a simple copy-paste assistance when handtyping.
Hopefully someday cb2Bib will be able to take that old directory, with perhaps a few hundreds of papers, to automatically index the references and rename the files by author, in a consistent manner. The required mechanism is already there, in this version. But I guess that this new feature will manifest some present limitations in cb2Bib. For instance, most printed and PDF papers interlace author names and affiliations. cb2Bib doesn't have the means to automatically discern an author name from a department or street name. So far one needs to manually use the 'Add to Authors' feature to deal with these situations. Also, the managing of regular expressions needs developing, specially thinking in the spread variety of design patterns in publications.
In summary, this current version is already useful in classifying and extracting the reference of that couple of papers that someone send right before submitting a work. A complete unsupervised extraction is still far away, however.
The cb2Bib 0.2.1 continues to listen to system clipboard change notifications, whenever they are received and whenever cb2Bib is on connected mode. Additionally, the cb2Bib 0.2.1 periodically checks for changes in the system clipboard. Checks are performed every second, approximately. This permits cb2Bib to work as usual, although one could experience 1-2 seconds delays in systems where the automatic notification is broken.
If the 'select-and-catch' functionality appears 'sticky', possibly happening while using non KDE applications from where text is selected, check the source file
.cpp, look for
'Setting timer', and set variable
interval to 1000. This is the interval of time in ms that cb2Bib will use to check for clipboard changes.