Results of data needs survey

Gary Wiggins wiggins at indiana.edu
Thu Jan 23 13:50:25 GMT 1997


This is being sent to CHMINF-L, CHEMWEB, and CHEMIND-L.
(It's long.)
------------>
       Data Needs of Academic Research on the Internet
                              
                        Gary Wiggins
            Indiana University Chemistry Library
                     wiggins at indiana.edu

                       Data on the Web

"All in all, the chemical data now available on
 the web is in a different class from the data
 found in refereed journals, critical reviews and
 books from reputable publishers.
          -    David Lide (CHMINF-L, 30 October 1996)

The above response was one of several received in response
to questions sent to three chemically-oriented discussion
lists in the fall of 1996.  This was in preparation for a
lecture and demonstration delivered at the National
Institute of Standards and Technology on December 4, 1996.
Most of the information in this paper was included in that
presentation.
                              
Questions were sent to CHMINF-L, CHEMWEB, & CHEMIND-L in
late October 1996.  They were designed to:

    - Gauge the extent of inaccurate data in Web databases
    - Define the characteristics of data on the Web
      >> Sources of data
      >> Need for standardization of data formats
    - Determine the best guides to data.

Respondents to the survey noted these problems with the
accuracy of data on the Web:

    - Units are frequently omitted
    - Transcription errors are often encountered
    - This leads to a need to find redundant data
    - Very few sources have quality assurance statements
    - Few of the Web data sites give the source of the data
    - If they do, data are likely to be copied from outdated
      sources.

                    Other Survey Results

Several people commented on efforts or practices that will
likely improve the quality of data on the Internet,
including:

    - Standardization efforts:
      >> CLIC, Chemical MIME, CML
      >> Roles for IUPAC, CODATA: certification?

      (One person, however, questioned whether standardization
       efforts were worthwhile.)

    - Efforts to share data or to cooperatively compile data
      sources
      >> Open Molecule Foundation
      >> Molecule of the Month
      >> Reciprocal Net
      >> Structure and Reactivity Across the Periodic Table

    - Provision of a minimal level of auxiliary information
      (metadata)
      >> authorship
      >> units
      >> conditions of measurement
      >> references to primary and secondary sources of data
   
    - Use of standard symbols and terminology
    
    - Guidelines on how to handle special characters.

             General Comments on Data on the Web

"While some might argue that the Internet is designed to make 
information in a single location accessible to users around the 
world, the large number of mirrored sites already in existence
points out the Net's inadequacy."
          -    Byte, December 1996

There are a number of steps needed to improve the quality of
data found on the Web.  Among them are:

    - Mechanisms to synchronize changes made at multiple
      sites
    - Faster access to resources
    - More secure transactions
    - Progress on chemical metadata standards
    - Interoperability of chemical plug-in programs.

          Some Goals for Improving Data on the Web

    - Assemble the most reliable data available
    - Arrange data for easy retrieval
    - Provide a "SuperIndex" of available data sources
    - Establish criteria for evaluation of data sources:
      >> descriptions of physical theories on which data are
         based
      >> full references to literature
      >> format of the database
      >> search capabilities

                    How to Find Data Now

A second part of the NIST presentation was a look at how to
find data on the Web today.  One person pointed me
toward Alexander Lebedev's "Best Search Engines for Finding
Scientific Information in the Web"
(http://www.chem.msu.su/eng/comparison.html).  He searched
11 Web search engines and concluded:

     - Excite retrieves a comparable number of documents to
       Altavista
     - Metacrawler is the most powerful search engine for SATI
     - Two of the search engines are not being updated.

Lebedev  also compared the Web searches to INSPEC
results for 1994 & 1995 on the same topics.  He found:

     - Only 5-10 % of relevant information is on the net
     - The Web is particularly good for supplemental
       information:
       >> on authors
       >> on their work and research projects
       >> on foundations supporting them.

Besides using search engines, these are some other ways to
find data using the Internet:

    - Submit the question to a knowledgeable source
    - Consult lists of sources (guides)
    - Try known sources
    - Try comprehensive chemistry guides.

                  Lists of Sources (Guides)
    
CIS-IU (Chemical Information Sources from Indiana
University)
  http://www.indiana.edu/~cheminfo/ca_accc.html
  http://www.indiana.edu/~cheminfo/ca_ppi.html

Databases for Atomic and Plasma Physics
  http://plasma-gate.weizmann.ac.il/DBfAPP.html

IOP's Software and Data Page
  http://www.iop.org/Physics/Resources/phsoft.html

                        Known Sources

NIST Physics Laboratory
  http://physics.nist.gov/PhysRefData/contents.html
    
Sheffield ChemPuter
  http://www.shef.ac.uk/~chem/chemputer/

Biocatalysis/Biodegradation Database
  http://dragon.labmed.umn.edu/~lynda/index.html

               Comprehensive Chemistry Guides

Chemfinder
  http://chemfinder.camsoft.com/
    
WWW Chemical Structures Database
  http://schiele.organik.uni-erlangen.de/services/webmol.html
    
SpaceCrunch
  http://www.tripos.com/spacecrunch/

                       Other Examples
    
University of Texas's ThermoDex
  http://www.lib.utexas.edu/Libs/Chem/info/thermodex/
    
Table of the Properties of 200 Linear Macromolecules
and Small Molecules
  http://funnelweb.utcc.utk.edu/~athas/databank/intro.html

Chemical errors found on WWW sites; A discussion of
problems encountered while creating the ChemFinder WebServer
database
  http://www.camsoft.com/chemfinder/errorsfound.html


                   Internet Demos at NIST

CIS-IU ca_accc.html

Go to Anal Chem page, then to MS Links at SIS, then Dave's
Math Tables
     www.sisweb.com/math/tables.htm

NMR Information Server at U of Florida
     micro.ifas.ufl.edu/
     playing Happy Birthday to You on an NMR Spectrometer

Dababase of Core-Edge (Inner-Shell) Excitation Spectra of
Gas Phase Atoms and Molecules
     xray.uu.se/hypertext/corexdb.html
          SEARCH naphthalene

Spin trap Data Base
     alfred.niehs.nih.gov/LMB/stdb
          ENTER THE DATABASE doesn't work, but HIPPO does

Electron Paramagnetic Resonance at Bristol
     emrs.chm.bris.ac.uk/
          Beautiful background!
          In "About the Database" in the Introduction,
Spectra examples,
          Show the example Cu(II) (nothing else works!)

Look at IU Molecular Structure Center's Reciprocal Net
     www.cica.indiana.edu/~recip/
     www.indiana.edu/ReciprocalNet.html

Molecules R Us
     molbio.info.nih.gov/cgi-bin/pdb
     Search dehalogenase  (E.C.3.8.1.5)

NIST Chemistry WebBook
     webbook.nist.gov/chemistry
     Look for 91-56-5

AIRSITE
     ozone.sph.unc.edu
     Has "Environmental Data, but it's "under construction"

THERMODEX
     www.lib.utexas.edu/Libs/Chem/info/thermodex/
     Search Gibbs Free Energy and organic

Chemfinder
     chemfinder.camsoft.com
     Search MEK

WWW Chemical Structures Database
     schiele.organik.uni-erlangen.de/services/webmol.html
     Search MEK, then 2-butanone

SpaceCrunch
     www.tripos.com/spacecrunch/

Molecule of the Month
     www.bris.ac.uk/MOTM/motm.html


-----
chemweb: A list for Chemical Applications of the Internet.
Archived as: http://www.ch.ic.ac.uk/hypermail/chemweb/
To unsubscribe, send to listserver at ic.ac.uk the following message;
unsubscribe chemweb
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)



More information about the chemweb mailing list