Bibliometrics: Columbus' egg?

About 10 years ago the use of bibliometrics to evaluate scientific publishing and academic merits became more and more popular, for the increasing need to objectively evaluate research achievements. After the widespread diffusion of modern communication facilities, the scientific community increased in size and the number of published papers increased as well. Therefore, funding agencies, academic institutions and the scientific community in general felt that it was necessary to measure the value of scientific achievements by adopting systematic and coherent solutions. The idea was that citations provide an objective measure of the popularity of a research paper and hence its value. Perhaps the most popular bibliometric index is the H index, that was proposed by J.E. Hirsch in 2005 (amazing, isn't it? The H index is 10 years old only.... how could we measure scientific productivity without it??).

The effectiveness of the H index has been discussed in several papers and it's not my intention to discuss it further. I just would like to emphasize one of its major limitations: it is dramatically sensitive to the number of co-authors of a paper. I am convinced that the increased popularity of the H index is the main reason for the (perhaps corresponding) increased frequency of multi-author papers. Some scientific fields are accustomed to carry out their research activity through cooperation of several individuals and therefore traditionally write papers with many co-authors. But for other fields, including hydrology, the average number of authors per paper significantly increased in the recent past. Such evolution may have possibly been encouraged by the widespread diffusion of bibliometrics.

The limitations of the H index and other bibliometric indexes are a major problem nowadays, as bibliometrics is increasingly used to assign academic positions and for attributing research funding. The above problem became evident very soon after the bibliometric indexes were first used and therefore several other indexes, besides the H index, were elaborated to provide a comprehensive synthesis of the value, measured through citations, of the scientific production of an individual, an institution and so on. For a recent reference see, for instance, Lando and Bertoli-Barsotti, A New Bibliometric Index Based on the Shape of the Citation Distribution, Plos One, December 2014, DOI:10.1371/journal.pone.0115962 and references therein.

In my opinion, bibliometric indexes are much helpful to provide a measure of impact. However, given that a single number cannot of course provide a comprehensive overview, one cannot rely on a single index only. Rather, the value of scientific research should be evaluated by using a set of indexes. The problem is that it's not easy to compute them. Web of Science provides an automatic calculation of the H-index, the H-index without self-citations, the average citations per paper and the total number of citations, but does not provide any other index. This is indeed a major problem, as the calculation of these indexes is not easy and requires an extensive work to monitor the citations from an increasing body of published papers.

An interesting alternative to Web of Science for computing bibliometric indexes is given by the Publish or Perish Software. Publish or Perish has the ability to search for author impact, journal impact and citations for individual articles. The software makes a sequence of queries to Google Scholar therefore browsing the web for citations. It is equipped with an exhaustive help that guides the users efficiently. Publish or Perish computes an extended set of bibliometric indexes including:

  • Total number of papers
  • Total number of citations
  • Average number of citations per paper
  • Average number of citations per author
  • Average number of citations per author per year
  • Average number of papers per author
  • Average number of authors per paper
  • H index
  • Zhang's e-index
  • Egghe's g-index
  • The contemporary h-index
  • Three variations of the individual h-index, namely, the hI-index, the hI,norm, and the hm-index
  • The average annual increase in the individual h-index
  • The age-weighted citation rate
  • An analysis of the number of authors per paper

Please refer to the help of Publish or Perish for more details on the above indexes. It is just important to note here that they provide a comprehensive overview of authors' impact. In particular, the normalized H-index resolves the above bias due to multi-author papers. I am quoting here below from the Help of Publish or Perish: .... it first normalizes the number of citations for each paper by dividing the number of citations by the number of authors for that paper, then calculates hI,norm as the h-index of the normalized citation counts. This approach is much more fine-grained than Batista et al.'s; we believe that it more accurately accounts for any co-authorship effects that might be present and that it is a better approximation of the per-author impact, which is what the original h-index sets out to provide. Unfortunately, the search in Publish or Perish is limited to 1000 items, which could be a relevant problem if an author has a widespread family name. For instance, in my case the search for "Montanari A" gives back about 1500 entries and therefore the search gives a partial overview only, limited to the first 1000 entries that are discovered by Google Scholar on the web.

I could bypass the above problem by breaking the search for my name in three sub-searches for subsequent periods. That is, I first searched for "Montanari A" in the period 1995-2005, then I made other 2 searches for the periods 2006-2010 and 2011-2015, respectively. To put the three searches together, I exported the results of each search as a CSV file, and then I merged the files (be careful to include the headings just once). The merged CSV file was subsequently imported in Publish or Perish, which then showed the overall sum of the (about) 1500 entries.

The next step for me was to browse the whole list of the 1500 items to isolate those (about 150) that referred to me. Fortunately Publish or Perish allows one to sort the entries by cites, year of publication, publisher and article type. By using different ordering keys in sequence the selection turned out to be not much time consuming. Indeed, I could complete it in about half a hour. At the end of the selection the software immediately provided the above bibliometric indexes for me.

One may say that bibliometric indexes computed by using the citation counts of Google Scholar are not comparable with those derived from ISI or Scopus. However, this problem can be easily by-passed. In fact, what I did was to manipulate the CSV file to be imported in Publish or Perish by substituting the Google Scholar citation counts with the ISI ones. I limited the substitution to the first 74 entries in terms of number of citations, as I found that the remaining items (I have 84 ISI publications at the moment) were counting a few cites only and therefore did not significantly affect the computation of my indexes (they were 0 citations and 1 citation papers). The resulting CSV file can be downloaded here. By importing it in Publish or Perish I could finally get my figures from the software basing on the ISI citation counts. The results for me are as follows (the query was performed on Dec 26th, 2015):

  • Papers:74
  • Citations:2446
  • Years:19
  • Cites/year:128.74
  • Cites/paper:33.05
  • Cites/author:1026.05
  • Cites/author/year:54.00
  • Papers/author:27.83
  • Authors/paper:3.04
  • h-index:28
  • g-index:48
  • hc-index:23
  • hI-index:10.05
  • hI-norm:18
  • hI,annual:0.95
  • AWCR:466.05
  • AW-index:21.59
  • AWCRpA:173.81
  • e-index:34.19
  • hm-index:15.90

At the end of the story, I found that Publish or Perish is a very useful tool to reach the target of a more objective evaluation of scientific productivity. I hope my experience might be useful to provide effective means for getting a comprehensive overview of authors' impact. In conclusion, my opinion is that bibliometric indexes are much helpful to evaluate research impact, but we should use an extended set of indexes (like the set provided by Publish or Perish) to get a comprehensive overview.

Cheers,
Alberto (Dec 27, 2015)

P.S.: on Jan 3, 2016, I received the very useful comment which I am copying below from Anne-Wil Harzing. I wish to thank Anne-Wil for her kind email and suggestions.

****

Hello Professor Montanari, dear Alberto, Thanks for your interesting description of Publish or Perish. You are clearly a sophisticated user. Few PoP users manage to achieve the export/import process properly, even though it is not particularly difficult and well described in the helpfile. Your website/blog seems to be read quite widely, so I thought I'd write to you with some tips that you might want to include for Italian readers.

  • You may wish to search with your full given name "Alberto Montanari" (see http://www.harzing.com/poptips/tip06.htm). This produces a much cleaner result. There are other ways for better author disambiguation as well, described in tips 4-11.
  • If you have access to ISI (which your blog suggests you do), there is a much simpler way of getting ISI statistics. (see http://www.harzing.com/poptips/tip43.htm)

--

Best wishes,
Anne-Wil

The Publish or Perish Book:
Your guide to effective and responsible citation analysis
http://www.harzing.com/popbook.htm

Prof. Anne-Wil Harzing
Professor of International Management
Middlesex University London, Business School
website: www.harzing.com Email: anne@harzing.com