[OS X] accents, etc., in EndNote exported to Bibtex

Andy Jacobson andyj at splash.princeton.edu
Thu Oct 13 19:38:55 EDT 2005


Howdy,

     I use EndNote to import and store my references.  Occasionally I  
use the apple-K feature to copy a formatted reference to put into an  
email or something.  But principally, I write in Latex.  Therefore I  
export the references to Bibtex format.

     With EN7, I used a perl script to notice accented characters and  
replace them with tex equivalents.  This involved opening the  
exported data file in emacs, looking for an accented character and  
using C-x =  to find the hex code of the special character.  Then I'd  
put a line into my perl filter that would replace instances of that  
hex code with the desired tex code (e.g. s/\x88/{\\\'a}/g; would  
replace hex character 0x88 with {\'a}, the tex code for a with an  
acute accent.

     I also used this perl script to replace instances of "co2" with  
{CO}$_2$, etc.

     EN8 now outputs Unicode utf-8 for most special characters.  My  
understanding is that these are multi-byte codes.  I started to  
replace all my filter lines with new multibyte codes, but rapidly  
tired of it.  I discovered, however, that TeX can handle utf-8  
input.  If you put the right magic in the tex preamble, you can use  
the output of endnote export directly!  Without further ado, the  
magic is:

\usepackage[utf8]{inputenc}

     While this deals with just about everything, I noticed that the  
degrees symbol as exported by EN8 isn't recognized as valid utf-8,  
and causes a tex error.  For this reason, and because I still need to  
format things like {CO}$_2$, I still use the perl filter.  I attach  
it to this email.  Usage instructions are on the first (commented)  
line.  It is called "en2bib.pl".

     Best,

         Andy


---cut here-----------------------------------------------------
# perl en2bib.pl < en.bib > arj.bib

while(<>) {
     s/\xef\xbb\xbf//g;  # first few bytes in file are weird crap.
     s/\r/\n/g;
     s/DOI/Doi/g;  # uncomment this if the bst supports it
#    s/DOI/Note/g;  # use this if the bst does not support DOI
     s/\{Manuscript submitted to (.*)\}/{{S}ubmitted to {\\it \1}}/g;
     s/DC\*/\${\\Delta}\${C}*/ig;
     s/p[cC][oO]2/\$p\$\{CO\}\$_2\$/g;
     s/[cC][oO]2/\{CO\}\$_2\$/g;
     s/[oO]2/\{O\}\$_2\$/g;
     s/[nN]2/\{N\}\$_2\$/g;
     s/d13C/\$\\delta\^\{13\}\\text\{C\}\$/g;
     s/13C/\$^13\\text\{C\}\$/g;
     s/12C/\$^12\\text\{C\}\$/g;
     s/14C/\$^14\\text\{C\}\$/g;
     s/\xc2\xb0(.)/\$^\\circ\${$1}/g;  # degrees, ensure uppercase of  
N,E,W,S with {}
     s/[eE]l [nN]i.o/{El}~{Ni{\\~n}o}/g;
     s/Transcom/{TransCom}/ig;
     s/Pacific/{Pacific}/g;
     s/Atlantic/{Atlantic}/g;
     s/Indian/{Indian}/g;
     s/Antarctic/{Antarctic}/g;
     s/Arctic/{Arctic}/g;
     s/Europ/{Europ}/g;
     s/Amazon/{Amazon}/g;
     s/Asia/{Asia}/g;
     s/Americ/{Americ}/g;
     s/([\:\?\.]) *([A-Z])/\1  \{\U\2\E}/g;  # uppercase 1st letter  
of sentence after : . or ?
     s/([A-Z]{2,})/{\1}/g;   # upper-case acronyms (two or more  
letters long)
     s/(Doi = \{[0-9]{2}\.[0-9]{4}\/[0-9]{4})\{([A-Z]{2,})\}([0-9]* 
\},)/\1\2\3/g;   # remove {} from letters in DOI field
     print $_;
}
---cut here-----------------------------------------------------

--
Andy Jacobson

andy.jacobson at noaa.gov

Program in Atmospheric and Oceanic Sciences
Sayre Hall, Forrestal Campus
Princeton University
PO Box CN710 Princeton, NJ 08544-0710 USA

Tel: 609/258-5260  Fax: 609/258-2850






More information about the OSX mailing list