CINF 1 :
Ultra High Throughput Screening using THINK on the
Internet
E Keith Davies, Department of Chemistry,
Oxford University, Central Chemistry Laboratory, South Parks Road, Oxford,
United Kingdom, Fax: +44 1865 275905, Keith.Davies@Chem.ox.ac.uk, and Catherine
J Davies, Treweren Consultants Ltd
Abstract
The growth in the
collections of small molecules available for experimental testing prompts
selection of subsets and has stimulated the question "how many drug-like
molecules are there?". In the CAN-DDO project we harnessed the power of over 1
million volunteered PCs to screen 3.5 billion drug-like molecule against 12
protein targets of relevance to cancer therapy. The development of the THINK
software, its adaptation to run as a screen-saver and some of data management
issues will be described in this paper.
CINF 2 : Next
steps for virtual screening and massively distributed
computing
Davin M. Potts, United Devices, Inc, 12675
Research Blvd., Building A, Austin, TX 78759, Fax: 512-331-6235, davin@ud.com
Abstract
The recent massive
distributed computing project led by W. Graham Richards' team (Oxford) to
perform virtual screening of 3.5 billion drug-like molecules against a series of
protein targets has identified a significant number of promising, novel small
molecules which warrant further investigation and refinement into prospective
drug candidates. The successful involvement of the general public (1.5 million
PCs participating on the internet to date capable of producing a sustained
compute power in excess of 60 teraflops) in this pioneering scientific endeavor
has demonstrated the magnitude and viability of available untapped compute power
for drug discovery efforts. With the recently announced availability of state of
the art screening tools (e.g. LigandFit) on such distributed computing platforms
comes the opportunity and challenge for pharmaceutical and drug discovery
companies to apply this combination of tools to their internal development
efforts. We will discuss the next steps in improving the quality of the findings
from the first stage of the project led by Oxford, the continuing need for and
role that distributed computing will play, and the relevance to commercial
pharmaceutical discovery.
CINF 3 :
Evaluating protein-ligand interactions through flexible
docking
Tad Hurst, ChemNavigator, 6166 Nancy Ridge Drive,
San Diego, CA 92121, Fax: 858-625-2377, thurst@chemnavigator.com
Abstract
As the global
research emphasis shifts from genomics to proteomics, the question of how
copious amounts of bioinformation will ultimately be used to accelerate the
discovery of therapeutic compounds becomes more prominent. At the same time, the
number of commercially accessible compounds that can be tested for
pharmaceutical efficacy has exploded into the millions.
ChemNavigator is addressing this need by offering advanced docking technology that more efficiently evaluates protein-ligand interactions. ChemNavigator has developed ultra-fast 3-D technology that will allow millions of structures to be docked into thousands of protein targets. In addition to rapid analysis, this technology will allow flexible ligand docking against the entire surface of a protein, not requiring specification of an active site.
This presentation details how ChemNavigator’s novel 3-D flexible docking technology can assist life science researchers by allowing them to quickly and efficiently filter millions of structures in their search for novel therapeutic compounds.
CINF 4 : Docking
of diverse ligands to diverse protein sites: six degrees of
application
Teresa A. Lyons1, Michael
Dooley2, Anne-Goupil Lamy1, Sunil Patel3, Remy
Hoffmann4, Hughes-Olivier Bertrand4, and Marguerita
Lim-Wilby5. (1) Accelrys Inc, 200 Wheeler Road, South Tower, 2nd
Floor, Burlington, MA 01803-5501, Fax: (781) 229-9899, txl@accelrys.com, (2)
Accelrys KK, (3) Accelrys Ltd, (4) Accelrys, (5) Lead Identification and
Optimization, Accelrys Inc
Abstract
The utility of a
docking application in the virtual screening of libraries prior to biological
assay or custom synthesis is dependent on its ability to predict ligand affinity
over a wide pKi range at a specific binding site. The definition of the binding
site of interest thus becomes the most critical step in setting the stage for
docking.
Examples will be presented of straightforward docking/screening cases, as well as difficult cases, such as proteins with extremely large potential binding sites, proteins with induced fit or allosterism, protein/ligand complexes with alternate ligand conformations from Xray crystal structures. In between are the “tunable” cases where user settings and protein preparation are critical: highly flexible ligands, steric problems or clashes, and local flexibility in the binding site. Finally we will summarize the classes of proteins and types of ligands for which LigandFit will perform suitably as a vHTS tool.
CINF 5 : eHiTS:
Novel algorithm for fast, exhaustive flexible ligand docking and
scoring
Zsolt Zsoldos1, A. Peter
Johnson2, Aniko Simon1, Irina Szabo1, and Zsolt
Szabo1. (1) Research and Development, SimBioSys Inc, 135 Queen's
Plate Dr, Unit 355, Toronto, ON M9W 6V1, Canada, Fax: 416-741-5083,
zsolt@simbiosys.ca, (2) ICAMS, School of Chemistry, University of Leeds
Abstract
The flexible ligand
docking problem is often divided into two subproblems: pose/conformation search
and scoring function. For virtual screening the search algorithm must be fast;
must provide a manageable number of candidates; and be able to find the optimal
pose/conformation of the complex. Algorithms employing stochastic elements or
crude rotomer samplings fail to satisfy the last criterion. The eHiTS
(electronic High Throughput Screening) software offers new approaches to both
subproblems. The search algorithm is based on exhaustive graph matching that
rapidly enumerates all possible mappings of interacting atoms between receptor
and ligand. Then dihedral angles of rotatable bonds are computed
deterministically as required by the positioning of the interacting atoms.
Consequently, the algorithm can find the optimal conformation even if unusual
rotomers are required. The scoring function contains novel treatment of weak
hydrogen bonds, aromatic pi-stacking and penalties for conflicting interactions.
Validation results on over 300 complexes will be presented.
CINF 6 : Effect
of electrostatic models on the accuracy of ligand
docking
Philip W. Payne, Consultant in Computational
Chemistry, 660 Santa Paula Avenue, Sunnyvale, CA 94085-3416, Fax: none,
PAYNES@PACBELL.NET
Abstract
Clustered ensembles
of various ligands bound to the estrogen receptor were built by systematic
search for energetically favored ligand positions. Four different electrostatic
models were employed: point charge with constant dielectric, point charge with
cubic spline cutoffs in the range 8-10 Ĺ, point charge with cubic spline cutoffs
in the range 10-12 Ĺ, and a sigmoidal dielectric screening model.
Compared to the constant dielectric model without cutoffs, dramatic shifts in the energy spectra of ligand clusters and the positions of bound ligands were observed when calculations were done with either Coulomb distance cutoffs or the sigmoidal screening model. Subsequent analysis demonstrated that the distance cutoffs or sigmoidal screening cause chaotic instability of the electric field in the binding site. Distance cutoffs for Coulomb interactions should therefore be avoided in the study of protein-ligand interactions unless such cutoffs are uniformly applied to all atoms in each polar bond.
CINF 7 :
Integration Continuum...different strokes for different
folks
Kirk Schwall, Manager, Authority Database Operations,
Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202, Fax:
(614) 447-5471, kschwall@cas.org, and Eileen M. Shanbrom, Manager, CAS and Web
Content, Chemical Abstracts Service
Abstract
Scientists are
challenged in today’s environment to locate the right information in a sea of
information that incorporates both traditional and web services offered by
government agencies and others. Information providers should move away from
viewing this changing environment as a conflict of "traditional" resources
versus the web. Information consumers want both available in integrated
services. These services must provide access to the right information at the
right time in the context of scientific research. This is especially relevant
for producers of STM database and search-and-retrieval services, because
scientists will be the biggest users of services that integrate web content with
professionally built databases. Recent developments by a number of information
providers offer good examples of what is now possible and necessary. BLAST
searching for identifying sequences was originally a free service provided by
the U.S. government but BLAST has now been incorporated into proprietary
services that integrate the identification of genes, proteins and other
biological entities with the retrieval of related literature and patents. To an
increasing extent, a solid foundation for building the new "digital research
environment" rests on three building blocks: professionally produced databases,
value-added search tools, and the web.
CINF 8 :
Hindsight is an exact science
Jeremy N
Potter1, Chris Hardy1, Robert D. Brown2,
and Julian Hayward2. (1) Accelrys, Inc, 9685 Scranton Road, San
Diego, CA 92121-3752, jeremyp@accelrys.com, (2) Accelrys Inc
Abstract
The value of
knowing about work done by others in the field of organic synthesis goes without
saying, and providing compilations of such information is the basis of the many
reaction databases available on the market today. These range in size from a few
hundred to over 10 million reactions and can typically be characterised as
selective, thematic or comprehensive in nature. Almost without exception,
however, such databases focus only on those reactions that have a successful
outcome, with the goal of allowing chemists to search for tried and trusted
methods that have been presented in the literature. However, our own experience
of life tells us that it can be just as valuable to know in advance that
something will not work. This paper will outline the ways in which such
knowledge of synthetic 'failures' is used in the pharmaceutical industry, and
will introduce a database of such reactions.
CINF 9 :
Bridging the gap between published and proprietary spectroscopic
databases: an informatics system case study
Gregory M.
Banik, Informatics Division, Sadtler Software & Databases, Bio-Rad
Laboratories, 3316 Spring Garden Street, Philadelphia, PA 19104-2596, Fax:
215-662-0585, gregory_banik@bio-rad.com, and Ty Abshear, Informatics Division,
Sadtler Software and Databases, Bio-Rad Laboratories
Abstract
A new informatics
system, the KnowItAllTM Informatics System, is described that bridges
the gap between published and proprietary spectroscopic information. KnowItAll
offers the world's largest published collection of analytical information and
the ability to manage multiple spectra and chromatograms, including
13C and 1H NMR, IR, Raman, MS, GC, and UV/Vis, along with
the corresponding chemical structures and related property information.
KnowItAll allows users to create their own databases and search them seamlessly
with databases of published spectra as well as databases of reference spectra.
Cross referencing from one analytical technique to another is also seamless, as
is the use of user-assigned NMR spectra in the database-based prediction of NMR
spectra. Finally, web addresses or file names can be added, either to published
databases or proprietary databases, to permit linking to related documents that
are outside the KnowItAll system.
CINF 10 :
Developing value-added organic chemistry databases from traditional print
products
Darla Henderson, and Colleen Finley, John Wiley
& Sons, Inc, 605 Third Avenue, New York, NY 10158, dhenders@wiley.com
Abstract
Various chemical
databases, primarily abstracted databases, have existed in the chemical
information business for three-plus decades. This presentation discusses the
development and features offered by John Wiley & Sons in their newly
released and developing chemical reaction databases. Wiley chemical reaction
databases focus on offering the full content of a product, as opposed to
abstracted data found in most other reaction databases, yet including the
value-added features customers prefer, such as reaction searching and
interoperability among databases. Critical issues, such as developing a product
amenable to both the academic and corporate customers are discussed.
CINF 11 :
Linking reaction information from different
sources
Guenter Grethe1, Peter Loew2,
Hans Kraut2, and Josef Eiblmaier2. (1)
Marketing/Scientific Applications, MDL Information Systems, Inc, 14600 Catalina
Street, San Leandro, CA 94577, Fax: 510-614-3616, guenter@mdli.com, (2) InfoChem
GmbH
Abstract
Collecting required
relevant information to solve a synthetic problem is a formidable task. Unless
the chemist is interested in the preparation of a known compound, it almost
never is straightforward. The process usually involves consulting more than one
source and going back and forth between different sources to find the most
relevant answers. This can be a very time-consuming process in the hardcopy
world and confusing when available electronic sources require the use of
different programs. Today’s technology allows linking of information using
point-and-click rather than cut-and-paste methodology. In the reaction world,
the linking must foremost be based on reaction type rather than the similarity
of participating molecules. As a first step in this direction we have developed
a system that seamlessly links information from reactions of similar type
described in reaction databases and major reference works. The latter provide
important complementary information, including discussions about reaction
mechanism, stereochemistry, the most suitable reagent or catalyst, and others.
Linking the references from both databases and books to the primary literature
augments the integration. We will describe the underlying concept of the system
and demonstrate the usefulness with examples from the recent literature.
CINF 12 :
Reoptimization of MDL keys for use in drug
discovery
Keith T Taylor, Joseph L. Durant Jr., Burton
A Leland, Douglas R Henry, and James G Nourse, MDL Information Systems, 14600
Catalina Street, San Leandro, CA 94577, keitht@mdli.com
Abstract
The use of keysets
based on a variety of different descriptors has an established place within the
drug discovery workflow. MDL’s keysets were optimized for substructure
searching, however, they do have performance for clustering and diversity
analysis comparable with keysets based on feature trees. We will present an
overview of the underlying technology supporting the definition of features in
MDL’s keysets, and encoding them into keysets. Construction of a keyset
containing all possible combinations of our set of defined features with
occurence counts of one or more has been carried out. Standard deviations of a
few percent were observed in the clustering performance of populations of
similarly sized keysets. Additionally, performance is seen to be relatively
insensitive to keyset size, especially for keysets larger than 1000 bits. We
have also examined a variety of strategies to construct keysets, the performance
and relative merits of these strategies will be discussed.
CINF 13 :
Strategies for Lead Discovery Oriented Virtual
Screening
Tudor I. Oprea, EST Chemical Computing,
AstraZeneca R&D Molndal, Molndal S-43183, Sweden, Fax: 46 (0)31-776-3792,
tudor.oprea@astrazeneca.com
Abstract
Large numbers of
virtual compounds can be evaluated in silico via Virtual Screening (VS). Some
properties can be readily evaluated prior to enumeration from reactants.
However, binding affinity estimates require enumerated structures. The Lipinski
rule of five, the standard property filtering protocol for VS, was derived from
drugs (not leads). For lead discovery oriented VS, this protocol needs to be
shifted toward lower molecular weight, lower hydrophobicity and higher
solubility, in order to capture high quality leads. Possible VS strategies with
respect to optimizing binding affinity and pharmacokinetic properties are
discussed.
CINF 14 :
Application of pharmacophore fingerprint keys to structure-based design
and data mining
Marvin Waldman, Moises Hassan, Chien-Ting
Lin, Shashidhar N. Rao, and C. M. Venkatachalam, Accelrys, 9685 Scranton Road,
San Diego, CA 92121, Fax: 858-799-5100, marvin@accelrys.com
Abstract
By combining
technology from Ludi and Catalyst, we are conducting studies on the use of
active site based pharmacophores for mining databases of compound collections
for the purpose of lead identification. In contrast to more conventional
approaches using 3D pharmacophore searching techniques, we explore the use of
similarity comparisons of 3D fingerprint maps of the active site and candidate
ligands as a means of prioritizing ligands for real or virtual high throughput
screening. Various alternative approaches will be examined including the effects
of using binary vs. occurrence counts representation for pharmacophore keys, the
use of different similarity metrics, and the use of different pharmacophoric
feature types including donor and acceptor projected points. Data mining studies
conducted on several protein systems will be presented and analyzed in terms of
the effectiveness of recovering known seeded actives from a larger ligand pool
using the various approaches outlined above.
CINF 15 : Quasi2:
Virtual site model derivation and application to lead
identification
David G. Lloyd, Nicholas C. Perry,
Nikolay P. Todorov, Iwan J. P. de Esch, and Ian L. Alberts, De Novo
Pharmaceuticals, Compass House, Vision Park, Histon, Cambridge CB4 9ZR, United
Kingdom, Fax: +44-(0)1223-238088, david.lloyd@denovopharma.com
Abstract
Traditional
pharmacophore models define the minimum requirements for activity, but not
necessarily the optimum conditions. Quasi2 produces virtual site models by
optimising the molecular similarity within a set of ligands with respect to
those features known to be important in binding to biomolecular targets as a
function of ligand conformation, ionisation state and tautomeric state. The use
of virtual site models in database searching bridges the gap between
pharmacophore screening and high-throughput docking for targets on which
structural information is limited or unavailable. Quasi2 virtual site models
have been validated experimentally, through the design of active compounds
‘tailored’ to the virtual site features and computationally, through accurate
binding mode predictions for known actives and enhanced hit-rates in
high-throughput database screening.
CINF 16 :
Identification of Potent and Novel a4b1 Antagonists using In Silico
Screening
Juswinder Singh1, Steve
Adams1, Wen-Cherng Lee1, and Herman van
Vlijmen2. (1) Structural Informatics, Biogen Inc, 12 Cambridge,
Cambridge, MA 02142, Fax: 617-679-2616, juswinder_singh@biogen.com, (2) Biogen,
Inc
Abstract
a4b1(VLA-4) plays an important role in the
migration of white blood cells to sites of inflammation, and has been implicated
in the pathology of a variety of diseases. We describe a series of potent
inhibitors of a4b1 that were discovered using
computational-based screening for replacements of the peptide region of an
existing tetrapeptide-based a4b1 inhibitor (1;
4-[N'-(2-methylphenyl)ureido]phenylacetyl-Leu-Asp-Val) derived from fibronectin.
The search query was constructed using a model of 1 that was based upon the
X-ray conformation of the related integrin-binding region of VCAM-1. The 3D
search query consisted of the N-terminal cap and the carboxyl side chain of 1
since based upon existing structure-activity data on this series, these were
known to be critical for high-affinity binding to a4b1. The computational screen identified 12
reagents from a database of 8624 molecules as satisfying the model and our
synthetic filters. All of the synthesized compounds tested inhibit a4b1
association with VCAM-1, with the most potent compound having an IC50 of 1 nM,
comparable to the starting compound. Using CATALYST, a 3-D QSAR was generated
that rationalizes the variation in activities of these a4b1
antagonists. The most potent compound was evaluated in a sheep model of asthma,
and a 30mg nebulized dose was able to inhibit early and late airway responses in
allergic sheep following antigen challenge, and prevented the development of
nonspecific airway hyper-responsiveness to carbachol. Our results demonstrate
that it is possible to rapidly identify non-peptidic replacements of integrin
peptide antagonists. This approach should be useful in identification of
non-peptidic a4b1 inhibitors with improved pharmacokinetic properties relative
to their peptidic counterparts.
CINF 17 :
Unified virtual ADME/Tox using a hierarchy of machine learning
models
Guido Lanza, and William Mydlowec, Pharmix Corp, 200
Twin Dolphin Drive, Suite F, Redwood Shores, CA 94065, guido@pharmix.com
Abstract
We present a
unified virtual ADME/Tox system based on a hierarchy of machine learning models.
All compounds are initially subjected to 3-D multi-conformer analysis, and
numerous molecular descriptors are calculated, both conformationally-specific
and not. A hierarchy of models based on these descriptors is then used to
predict various physiochemical and pharmacokinetic properties. We describe a
series of models, including: solubility, octanol/water partition coefficient,
human intestinal passive absorption, intestinal transporter binding, P450 and
related enzyme interactions, blood-brain barrier permeability, plasma protein
binding, and serum transporter binding. We also describe predictive models of
oral bioavailability, volume of distribution, and clearance, as well as limited
models involving mechanism-of-action.
CINF 18 :
Application of 1D-similarity analysis to predict plausible modes of
CYP-450 metabolism
Chaya Duraiswami, Molecular Modeling,
Pharmacopeia, Inc, CN 5350, Princeton, NJ 08543, Fax: 732-422-0156,
cduraisw@pharmacop.com, Steven L. Dixon, ADMET R&D Group, Accelrys, and John
J. Baldwin, Concurrent Pharmaceuticals, Inc
Abstract
A computationally
fast, semi-quantitative, visual method to predict the plausible mode of CYP 450
metabolism based on 1D-Similarity Analysis to known inhibitor, inducers and
substrates of CYP-3A4, CYP-2C9 and CYP-2D6 will be presented. The advantages of
this method include rapid detection of the possibility of drug-drug
interactions, as well as predicting a plausible mode of metabolic degradation
for each test compound. Since this method is semi-quantitative and fast,
predictions for large combinatorial libraries as well as virtual libraries can
be made in a predictive and timely fashion, making this approach a useful
computational ADME filter. The results of this method as applied to a set of
chemokine inhibitors will be presented.
CINF 19 : Exact
chemical structure batch mode searches
Christopher A.
Lipinski, Exploratory Medicinal Sciences, Pfizer Global Research and
Development, Groton Laboratories, Eastern Point Road, mail stop 8200-36, Groton,
CT 06340, Fax: 860-715-3149, christopher_a_lipinski@groton.pfizer.com
Abstract
Chemistry structure
searching tools lag behind those of biology and genomics. Specifically, chemical
structures can easily be searched within corporate databases but it is very
difficult for chemists to perform structure searches on the external literature.
Currently a chemist cannot simply copy a structure from an ISIS/Base corporate
database and use it to search chemical abstracts service (CAS) SciFinder. The
same holds for a chemical structure from a virtual library sdf file. The search
has to be performed by manually drawing in a chemical structure as a search
query. Exact chemical structure searches cannot be done in batch mode. For
example, one cannot search SciFinder for twenty-five chemically unrelated
structures at a time. It is generally unrecognized that the tools are in place
for chemists to solve this problem. Three software licenses are required:
SciFinder from CAS; Accord for Excel from Accelrys and Name from Advanced
Chemistry Development.
CINF 20 :
Integration of disparate data sources from genomics to
chemistry
Robert D. Brown, and David Benham, Accelrys Inc,
9685 Scranton Road, San Diego, CA 92121, rbrown@accelrys.com
Abstract: Abstract text not available.
CINF 21 : How to
build and deploy chemoinformatics applications
Louis J. Culot
Jr., CambridgeSoft Corporation, 100 Cambridge Park Drive, Cambridge, MA
02140, lculot@cambridgesoft.com
Abstract
Rapid development
tools and practices have been used by many industries to develop and deploy
informatics applications. However, the chemical community has been slow to adopt
these tools because of dependencies on specialized technology for handling
chemical data. With the recent availability of new technologies for chemistry,
such as Java and Active-X clients, ODBC chemical drivers, and chemical Oracle
Cartridges, these barriers are removed, and the chemical community can take
advantage of the rapid development tools and practices available to the broader
market. I review the technologies and practices, provide a framework for
managing rapid-development projects, and provide a case study and example of
building such an application.
CINF 22 : Hybrid
methodologies for pKa prediction and database selection
Mark J.
Rice1, Ryan T. Weekley1, William K.
Ridgeway2, and Paul A. Sprengeler1. (1) Structural Group,
Celera Therapeutics, 180 Kimball Way, South San Francisco, CA 94080, Fax:
650-866-6654, mark.rice@celera.com, (2) University of California, Berkeley
Abstract
We have developed a
new methodology for pKa prediction combining empirical prediction methods with
an experimental database. For any compound, the nearest experimental values from
the database are used to correct the predicted value. In order to quantify
similarity, we have developed a novel site-specific fingerprint based in
chemical graph theory. We believe this approach offers a trainable pKa predictor
especially suited to series of compounds.
CINF 23 :
Rule-based two-layer model for virtual high throughput
screening
Ruediger M. Flaig1, Thomas F.
Kochmann2, and Roland Eils2. (1) Institute for
Pharmaceutical Technology and Biopharmacy, University of Heidelberg, Im
Neuenheimer Feld 366, Fax: +49 4075110-17171, flaig@sanctacaris.net, (2)
Intelligent Bioinformatics Systems, German Cancer Research Center, Im
Neuenheimer Feld 280, Fax: +49-6221-42-3620, t.kochmann@dkfz-heidelberg.de
Abstract
Science is
producing vast amounts of data from which relevant knowledge has to be
extracted, a process for which suitable tools still have to be developed. A
universal tool to this end would use a set of rules which it can extend on its
own. It requires two layers of processing: (1) subsymbolic processing
(implemented in C, C++ or Java) for transforming raw data into information, (2)
symbolic processing (implemented in Haskell, Miranda or ML) for extracting
knowledge from the “predigested” information. Subsymbolic processing consists
largely of deconstructing source data into patterns to be distributed over
multiprocessor systems, yielding an array of summary lists (abstraction).
Symbolic processing evaluates these lists by further application of the
underlying rules. To start, we need a primary set of rules, the bootstrap rules
(Kant: “a priori”), as opposed to the deduced rules (“a posteriori”) identified
by the system. The rules can be extended by employing the knowledge gathered
before, leading to a “rising spiral”.
CINF 24 : DNA
decompiler for the establishment of bootstrapping rules
Thomas
F. Kochmann1, Ruediger M. Flaig2, Christian
Busold3, and Roland Eils1. (1) Intelligent Bioinformatics
Systems, German Cancer Research Center, Im Neuenheimer Feld 280, D-69120
Heidelberg, Germany, Fax: +49-6221-42-3620, t.kochmann@dkfz-heidelberg.de, (2)
Institute for Pharmaceutical Technology and Biopharmacy, University of
Heidelberg, Im Neuenheimer Feld 366, D-69120 Heidelberg, Germany, Fax: +49
4075110-17171, flaig@sanctacaris.net, (3) Functional Genome Analysis, German
Cancer Research Center
Abstract
Generally,
DNA-analysis is based on empirically gathered knowledge („deduced rules“),
especially sequence-sequence comparisons. By contrast, the possibility of
identifying rules from single sequences without resorting to empirical knowledge
has not been fully exploited yet. We propose a tool for extracting knowledge
purely from a single DNA sequence. In the decompiler algorithm, any dependency
is estimated stochastically, by calculating its relative information content.
Such a dependency may consist of specific nucleotide arrangements and
neighborhood relationships. It can be determined for any given sequence, thus
providing a universal mechanism for bootstrapping autonomous knowledge systems.
This knowledge can be extended by deductive evolutionary algorithms,
self-organizing into higher-level systems. Here categorical dependency relations
between subparts determine Darwinian selection of the most relevant
interactions. These autonomous virtual systems can be integrated into the actual
scientific process, thus initializing a superimposed knowledge extraction
spiral.
CINF 25 :
Application of chemometric and QSAR approaches to scoring ligand-receptor
binding affinity
Alexander Tropsha1, Jun
Feng2, Alexander Golbraikh1, Curt Breneman3,
Wei Deng4, and Nagamani Sukumar3. (1) Laboratory for
Molecular Modeling, School of Pharmacy, University of North Carolina, CB # 7360,
Beard Hall, School of Pharmacy, Chapel Hill, NC 27599-7360, Fax: 919-966-0204,
alex_tropsha@unc.edu, (2) Laboratory for Molecular Modeling, School of Pharmacy,
University of North Carolina at Chapel Hill, (3) Department of Chemistry,
Rensselaer Polytechnic Institute, (4) Department of Chemistry, RPI
Abstract
59 diverse ligand
receptor complexes have been analyzed in multidimensional chemical descriptor
space. TAE/RECON descriptors of steric and electronic properties were calculated
for active site atoms and ligand atoms independently. For all pairs of ligand
receptor complexes, the Euclidean distances between active sites in TAE/RECON
descriptor space correlated linearly with the distances between complementary
ligands (R2=0.8). Concurrently, k-nearest-neighbor (kNN) variable
selection QSAR procedure was applied to ligands only using binding affinity as a
target property and normalized MolconnZ descriptors as independent variables.
Training and test sets of different size were generated, and multiple models
have been built. The best model afforded leave-one-out cross-validated
R2 (q2)=0.74 for the training of 50 compounds and
predictive R2=0.85 for the test set of 9 compounds. Chemometric and
QSAR approaches to the analysis of ligand-receptor interactions provide an
important addition to current methodologies that rely on direct use of 3D
molecular structures.
CINF 26 :
Evaluation of ligand-receptor binding affinity with a novel statistical
scoring function derived from Delaunay tessellation of protein-ligand
interface
Alexander Tropsha, Laboratory for Molecular
Modeling, School of Pharmacy, University of North Carolina, CB # 7360, Beard
Hall, School of Pharmacy, Chapel Hill, NC 27599-7360, Fax: 919-966-0204,
alex_tropsha@unc.edu, and Jun Feng, Laboratory for Molecular Modeling, School of
Pharmacy, University of North Carolina at Chapel Hill
Abstract
A novel statistical
contact scoring function for calculating ligand receptor binding affinity has
been derived by the means of Delaunay tessellation. Given the full atom
representation of protein ligand interface, Delunay tessellation generates a set
of non-overlapping, space-filling tetrahedra or simplices, which rigorously
define nearest neighbor atoms in sets of four vertices. For every quadruplet
composition of ligand and receptor atom types found at the protein-ligand
interface, a log likelihood factor is obtained from the statistical geometry
analysis of 317 complexes. For 67 diverse protein-ligand complexes, the linear
regression correlation between four-body scoring function and experimental
binding affinity is characterized by R of 0.67. The combination of four-body
contact scoring and two-body distance dependent potential of mean force affords
R of 0.84. This novel scoring function can be used for rapid evaluation of
binding affinity of ligand orientations obtained with various docking
algorithms.
CINF 27 :
Massive Virtual Library (MVL) Screening at Biogen: An Integrated Approach
From Medicinal Chemistry Design to Decision
Donovan N.
Chin, Claudio Chuaqui, Herman van Vlijmen, Xin Zhang, Russell Petter, and
Juswinder Singh, Structural Informatics, Biogen, 14 Cambridge Center, Cambridge,
MA 02142, donovan_chin@biogen.com
Abstract
This talk will
describe our integrated approach to virtual chemistry design, screening, and
analysis of very large small-molecule libraries. We are developing an enterprise
wide system that puts virtual-chemistry design capabilities on the desktops of
medicinal chemists; links these designs with high throughput parallel computing
methods for docking, shape-based screening, and statistical modeling; and
finally presents the promising “hits” on the web through a series of custom
pattern recognition methods and binding mode visualizations. By integrating the
medicinal chemist into the virtual screening process, we are combining their
ability to design new drug like compounds with molecular modeling and high
performance computing. While throughput can be increased with more compute
resources, we have also designed a system to handle the massive amount of
information from the virtual screens and arrive at decisions quickly, which is
essential for impacting projects with tight timelines. As the system evolves, we
are developing “smart” library design rules that further enhance the value of
the MVL at Biogen. The MVL is a key component that integrates and maximizes
information and technologies from medicinal chemistry, structural biology,
screening, and pharmacology. We will discuss the successes and failures, and the
lessons learned in developing the MVL system in a pharmaceutical environment.
CINF 28 : Fuzzy
logic based focused libraries (FL/FL) for HTS screening: application to
anti-carcinogenic compounds
Jacques R.
Chretien1, Marco Pintore1, Nadčge Piclin1,
and Frederic Ros2. (1) BioChemics Consulting, Centre d'Innovation,
16, rue Leonard de Vinci, Orleans cedex 2 45074, France, Fax: + 33 2 38 41 72
21, jacques.chretien@univ-orleans.fr, (2) Chemometrics & BioInformatics,
University of Orleans
Abstract
A global strategy
of Database Mining was applied for classifying a data set of 1294
anti-carcinogenic compounds, divided in 8 classes according their mechanism of
action. After computing a set of 165 molecular descriptors, the most relevant
parameters were selected with help of a procedure combining Genetic Algorithm
concepts and Stepwise method. Successively, an Adaptive Fuzzy Partition
algorithm was implemented on the training set, distributed in the hyperspace of
the most relevant descriptors, to build a robust structure-activity
relationships. The best model was able to predict correctly the
anti-carcinogenic activity of the test set molecules, with a very satisfactory
ratio of about 85%. Finally, this model was employed to screen three types of
different data bases: (i) commercially available compounds of synthetic origin,
(ii) natural substances derived from the Dictionary of Plant Toxins and (iii)
natural substances potentially active as anti-carcinogenic agents. Statistics of
these virtual HTS will be given.
CINF 29 :
Moore’s Law and the future of virtual screening
William
Mydlowec, Pharmix Corp, 200 Twin Dolphin Drive, Suite F, Redwood Shores, CA
91898, Fax: (650) 637-0199, bill@pharmix.com
Abstract
This talk discusses
virtual screening in the context of Moore’s Law, which projects that the number
of transistors on an integrated circuit will double approximately every 18
months. We first discuss the implications of exponentially-increasing computing
power on current-generation virtual screening technologies. For example,
computers are more than 1000x faster than they were in 1987, yet algorithms of
that era continue to dominate in simulation, optimization, and modeling in
computational chemistry. We propose future directions and new algorithms based
on recent advances in computer science and electrical engineering. We then
project the impact of Moore’s Law on virtual screening several decades into the
future, using metrics such as: cost/time/number of screens,
volume/complexity/duration/accuracy of atomistic simulations, etc. We also
consider relevant engineering issues, including development of multimillion-line
software codebases, construction of >10,000 CPU supercomputers and
multi-petabyte databases, and other large-scale issues.
CINF 30 : 100
years Houben–Weyl Methods of Organic Chemistry: Entering the New
Millennium
Guido F. Herrmann, Rolf Hoppe, and Kristina
Kurz, Thieme Publishers, Rüdigerstrasse 14, Stuttgart 70469, Germany, Fax: +49
711 9831 777, guido.herrmann@thieme.de
Abstract
The availability of
scientific information in electronic format has significantly changed the way we
select relevant information sources. Time matters! Information that is not
accessible at the researcher’s desk-top will be overlooked simply because the
library is a walk away and other resources compete for a researcher's attention.
But a highly competitive environment in industry and academia makes knowledge
and efficient access to it an important performance driver. Houben-Weyl is the
standard reference work in synthetic chemistry since 1909 and comprises four
editions, 140 volumes and roughly 160,000 pages.
Thieme (www.thieme-chemistry.com) chose to accept the challenge to convert 100 years of methodology information in the field of organic chemistry into a convenient and user-friendly online system. The complete series is now available in electronic format, featuring an interactive table of contents, key word search, using a controlled vocabulary, as well as a graphical interface.
CINF 31 :
Building digital archives for scientific information
Leah
R. Solla, Physical Sciences Library, Cornell University, 293 Clark Library,
Cornell University, Ithaca, NY 14853-2501, Fax: 607-255-5288, lrm1@cornell.edu
Abstract
Researchers,
librarians, and publishers have valid concerns about the long-term preservation
of digital information. There are many issues to be addressed in the formation
of a trusted digital archive. Some parallel the more familiar preservation of
print material, such as duplication and sustainability. LOCKSS (Lots Of Copies
Keeps Stuff Safe) is a new acronym for an old practice in the print world of
independently maintained and widely distributed collections. Digital
preservation requires duplication; managed and distributed duplication is even
better. Effective digital preservation models need to be self-sustaining, and
adhere to format standards. The digital world does not respect traditional
borders (political, corporate, publisher, content, etc.). The roles of
stakeholders are changing in the digital realm. Publishers have often been the
sole controllers of information, but increasingly there are authors, government
agencies and other players in control. Until recently the library has been the
archive and access provider, but publishers and other players are now active
participants in digital preservation and access. The academic research library
community is investigating a digital preservation role akin to their traditional
role in print, subject based archiving. Archiving across subject areas in the
academic environment complements the archiving approach of publishers in the
competitive market environment. This paper will review a variety of digital
preservation projects in the sciences.
CINF 32 :
Digital Archiving: Experiences of a major commercial publishing
house
C. Amanda Spiteri, ScienceDirect, Elsevier Science,
Molenwerf 1 1014 AG, Amsterdam, Netherlands, c.spiteri@elsevier.com
Abstract
Assuring the
preservation of digital information is one of the highest priorities for
libraries and publishers alike, particularly as more and more libraries go
"electronic only" and the accessibility of traditional paper copies is reduced.
For part of the life cycle of scientific information, commercial publishing
practices support the most cost efficient means of maintaining electronic access
to current information. For other parts of the cycle, digital preservation and
access responsibilities must be supported by a designated agent. Elsevier
Science has been a leader in the digital archiving of electronic journals
through development of services like ScienceDirect. We continue to develop our
experience in archiving issues such as policy, partnership relations, technology
and creation of the digital archive itself. This presentation will cover some
leading initiatives in these areas and give examples of how Elsevier Science
currently addresses these challenges.
CINF 33 :
DSpace: MIT's Digital Repository
Margret
Branschofsky, MIT Libraries, Massachusetts Institute of Technology, Bldg.
10-500, MIT, Cambridge, MA 02139, Fax: 617-452-3000, margretb@mit.edu
Abstract
DSpace, an MIT
Libraries project sponsored by Hewlett-Packard Labs, is a digital repository
that captures, stores and distributes the various digital products of MIT
faculty and researchers. The repository will collect preprints, articles,
working papers, technical reports, datasets, images, video and audio content.
This web-based system provides 1)a flexible submission process for MIT
contributors that captures both metadata and content files, 2)storage and
preservation services for a variety of file formats, and 3)powerful search and
retrieval capabilities for end users. The presentation will review DSpace design
features, organizational issues surrounding development of the system in an
institutional setting, and policy issues arising from implementation of the
system. A review of the beta-testing experience with early adopters will also be
provided.
CINF 34 :
Implementing the Physical Review Online Archive
(PROLA)
Mark D. Doyle, Journal Information Systems,
American Physical Society, 1 Research Road, P. O. Box 9000, Ridge, NY 11961,
Fax: 631-591-4147, doyle@aps.org
Abstract
The American
Physical Society has recently completed digitizing all of our journal content
back to its start in 1893. This content is available online as the Physical
Review Online Archive (PROLA) at http://prola.aps.org/. The archive contains 1.6
million scanned pages for almost 300,000 articles. All bibliographic information
and all reference sections have been captured in XML allowing PROLA to offer all
of the features expected in a modern electronic journal. We describe the
history, building, and implementation of the archive as well as some of the
business concerns in making it available.
CINF 35 :
Journey from books to analytical informatics
Marie
Scandone, Informatics Division, Bio-Rad Laboratories, Inc, 3316 Spring
Garden Street, Philadelphia, PA 19104, Fax: 215-662-0585,
marie_scandone@bio-rad.com, and Deborah Kernan, Informatics Division, Bio-Rad
Laboratories
Abstract
Taking spectral
information from a number of different analytical instruments, presenting it in
a digital format and archiving it can be an enormous undertaking. Sadtler
Research Laboratories has been producing quality spectral information for the
analytical laboratory since 1947. The history is fantastic and the process is
unusual. The journey that Sadtler Research Laboratories has taken to become
Bio-Rad Laboratories, Informatics Division is a part of the history of chemical
information. Along the way, Bio-Rad changed their method of spectral data
delivery but always focused on the quality of the analytical information. This
paper examines that history and the transition from print to digital media.
CINF 36 :
LOCKSS: Lots of copies keeps stuff safe
Vicky Reich, and
Grace Baysinger, HighWire Press, Stanford University Libraries, 1454 Page
Mill Rd, Stanford, CA 94305-8400, Fax: 650-725-4902, vreich@stanford.edu,
graceb@stanford.edu
Abstract
LOCKSS has the
potential to become a sustainable, affordable, preservation tool and archiving
system for web delivered information. LOCKSS software systematically caches
content in a self-correcting P2P network. The current beta test has demonstrated
that the underlying LOCKSS technology works, and in a production environment is
likely to allow libraries to maintain high integrity persistent caches of
electronic content from journal subscriptions. The beta test includes 60 caches
at 50 libraries and two scholarly journals. The system has been in continuous
operation for over ten months. The fault-tolerance of the system has been amply
demonstrated: two beta caches suffered catastrophic disk failures. Both were
able to restart with new, empty disks and recover their content automatically.
41 publishers have expressed strong support for the LOCKSS project. The system
shows the potential to preserve digital materials with current publishing
systems, the cost of entry is low, the payoffs promise to be high.
CINF 37 :
Combining heterogeneous physical property data sets
Peter
J. Linstrom, Physical and Chemical Properties Division, NIST, Building 221,
Room A111, 100 Bureau Drive, Stop 8380, Gaithersburg, MD 20899-0830, Fax:
301-896-4020
Abstract
The lack of
standards for electronic storage of physical property data often makes it
difficult to merge data from different data sets. Data sets often employ
different conventions for identifying chemical systems, data accuracy, and data
quality. It is a challenge for the archivist to insure that the combined data
set represents all data in an appropriate manner.
This talk will discuss lessons learned from the development of the NIST Chemistry WebBook (http://webbook.nist.gov/). The data set for this archive consists of the combination of work from several independent contributors. Efforts were made to produce a set that appears homogeneous to users despite its origins. This required the design of a database that was flexible enough to support the various conventions used by contributors. Examples of problems encountered and their solutions will be discussed.
CINF 38 :
Evaluation, Comparison and Successful Application of Virtual Screening
Tools
Romano T. Kroemer1, Joe
McDonald2, Douglas Rohrer3, Anna Vulpetti1,
Jean-Yves Trosset1, Shashidhar Rao4, John
Irwin5, Brian Shoichet6, Colin McMartin7, and
Pieter Stouten1. (1) Molecular Modelling & Design, Pharmacia,
Discovery Research Oncology, Viale Pasteur, 10, Nerviano 20014, Italy, Fax:
++39-02 4838 3965, romano.kroemer@pharmacia.com, (2) Discovery Research,
Pharmacia, (3) Computer-Aided Drug Discovery, Pharmacia, (4) Accelrys Inc, (5)
Department of Molecular Pharmacology and Biological Chemistry, Northwestern
University, (6) Department of Pharmacology and Biological Chemistry,
Northwestern University, (7) Thistlesoft
Abstract
The latest
Pharmacia efforts in validating and comparing virtual screening tools are
presented. Two studies were carried out in order to assess the performance of
docking programs with respect to reproducing correct binding modes. The first of
these studies contained 20 publicly available crystal structures of
protein-inhibitor complexes belonging to different protein classes. The second
study focused on 20 complexes with the same protein (CDK2/Cyclin A). The docking
programs evaluated and compared comprise the latest versions of DOCK (Brian
Shoichet’s NWU incarnation), Colin McMartin's QXP, Tripos’ FlexX, CCDC’s Gold,
Accelrys’ LigandFit, MolSoft's ICM and the in-house Mosaic2 program. We also
present a case study where docking was used in order to identify hits for a
project in the absence of HTS. After pre-selection, 3,000 compounds were docked
to the target. The top-scoring compounds were inspected visually and 22
molecules were selected. The best binding compound, as verified by NMR screening
and isothermal titration calorimetry, had a Kd of 450 nM.
CINF 39 :
Assessing the quality of virtual screening results for combinatorial
libraries
Dennis G. Sprous, Robert D. Clark, Josepph M.
Leonard, and Trevor W. Heritage, Research, Tripos, Inc, 1699 South Hanley Road,
St. Louis, MO 63144, Fax: 314-647-9241, dsprous@tripos.com
Abstract
Recent developments
in virtual screening tools now make it possible to do enough experiments on the
same library to allow critical evaluation of the quality of the results.
CombiFlexX incorporates both OptiDock and FlexX(c) methods, and takes advantage
of structural redundancies in combinatorial libraries to dramatically speed up
docking. Numerous computational experiments can be done in a reasonable period
of time, allowing an investigation of the thoroughness of conformational and
positional sampling under different protocols and parameters. Metrics and
strategies for assessing the quality of the virtual screening results will be
presented.
CINF 40 :
Virtual high throughput screening using LigandFit as an accurate and very
fast tool for docking, scoring, and ranking
Marguerita
Lim-Wilby1, Jeff Jiang2, Marvin Waldman2,
and C. M. Venkatachalam2. (1) Lead Identification and Optimization,
Accelrys Inc, 9685 Scranton Rd, San Diego, CA 92121, rwilby@accelrys.com, (2)
Rational and Combinatorial Drug Design, Accelrys Inc
Abstract
The imperative for
virtual high throughput screening arises from the availability of multiple
targets, millions of compounds in screening libraries, and limited resources for
even the best-endowed pharmaceutical enterprises. The docking application
LigandFit has been developed to address this need. A suite of algorithms is
provided that (1) aids the user in the detection and definition of binding
sites, (2) provides various docking modes with user-defined options, and (3)
scores dock poses using proprietary and published scoring functions. We will
present considerations that affect accuracy in docking & in scoring, as well
as the effects of disproportionately large binding sites, extremely flexible
ligands, metal ions, and the presence of flexible protein side chains. Recent
advances have allowed reasonably large (~50k) ligand libraries to be screened in
a matter of hours, such that the bottleneck in virtual screening is no longer
docking, but the preparation and analysis of the datasets.
CINF 41
: EasyDock: a new docking program for high-throughput screening and
binding-mode search
Nikolay P. Todorov1, Ricardo L.
Mancera1, Per Kallblad1, and Philippe
Monthoux2. (1) De Novo Pharmaceuticals, Compass House, Vision Park,
Chivers Way, Histon, Cambridge CB4 9ZR, United Kingdom, Fax: 1223-238088,
nikolay.todorov@denovopharma.com, ricardo.mancera@denovopharma.com, (2)
Department of Physics, University of Cambridge
Abstract
We have implemented
the stochastic tunneling global optimization method within a ligand docking
application software, easyDock. By using a novel multiple ligand copy approach
and adding a new hydration penalty function, we have optimized various scoring
functions and have achieved excellent results in the prediction of
protein-ligand binding modes. We have run easyDock on the GOLD data set of
protein-ligand complexes and nearly always found the correct ligand binding mode
as observed in the corresponding crystal structures. Furthermore, we have
achieved a 76% success rate when searching for the correct binding mode using an
energy score criterion. These results show that easyDock can be used effectively
both for the high-throughput screening of large datasets of compounds and for
searching for the correct binding mode of a given ligand.
CINF 42 :
Glide: a new paradigm for rapid, accurate docking and scoring in database
screening
Thomas A. Halgren1, Robert B.
Murphy2, Jay Banks1, Daniel Mainz1, Jasna
Klicic2, Jason K. Perry2, and Richard A.
Friesner3. (1) Schrödinger, 120 West Forty-Fifth Street, New York, NY
10036, Fax: 646-366-9550, halgren@schrodinger.com, (2) Schrodinger, Inc, (3)
Department of Chemistry, Columbia University
Abstract
Glide uses a novel
algorithm for rapid conformation generation that allows an efficient systematic
search of conformational space to be performed during docking. A second key to
Glide's efficiency is a series of "filters" that rapidly reduce the possible
ligand positions and orientations in the search space to a manageable number for
detailed examination. In addition, Glide uses a novel GlideScore function for
scoring that ensures chemical sensibility by penalizing docked poses that
include non-physical juxtapositions of polar and nonpolar groups.
In tests of docking accuracy, Glide achieves root-mean-square deviations between docked and co-crystallized ligand geometries that are half those reported for Gold and FlexX for test sets of 100-200 co-crystallized complexes defined by the developers of these methods. In addition, Glide achieves enrichment factors ranging from 12 to 91 in database screens for 9 diverse receptor systems. Such a high level of reliability is not typical of current-generation docking programs and scoring functions.
CINF 43 :
RACHEL: A new tool for structure-based lead
optimization
Chris M.W. Ho, Drug Design Methodologies, LLC,
700 S. Euclid Ave., St. Louis, MO 63110
Abstract
Lead optimization
is still something of an art. Structural modifications that logically should
enhance affinity can decrease it. The time lines can be long, the process
uncertain and frustrating, and the progress hit-or-miss. RACHEL is software
designed to streamline lead optimization by automated combinatorial optimization
of substituents on a lead scaffold. Starting from a ligand/receptor structure,
substitutions are systematically done at user-defined points on the ligand core.
Custom substituent databases based on in-house sources can be used, allowing the
incorporation of enterprise and project experience. The impact of these
substitutions on affinity is assessed using RACHEL's general scoring function or
a custom scoring function generated by PLS analysis of user-supplied
ligand/receptor affinity data. This presentation will discuss RACHEL's unique
capabilities along with specific applications that demonstrate its value in lead
optimization.
CINF 44 :
HostDesigner: a program for the de novo structure-based design of
molecular receptors with binding sites that complement metal ion
guests
Timothy K. Firman, and Benjamin P. Hay, W. R. Wiley
Environmental Molecular Sciences Laboratory, Pacific Northwest National
Laboratory, PO BOX 999, Richland, WA 99352, Fax: 509-375-6631,
Timothy.Firman@pnl.gov
Abstract
To bring the
powerful concepts embodied in de novo structure-based drug design to the field
of coordination chemistry, we have devised computer algorithms for building
millions of potential host structures from molecular fragments and rapid methods
for prioritizing the resulting candidates with respect to their complementarity
for a targeted metal ion guest. The result is HostDesigner, the first
structure-based design software that is specifically created for the discovery
of novel metal ion receptors. In this talk we describe the molecular structure
building and scoring algorithms, and provide several examples to demonstrate
their usage.
CINF 45 :
Collaborative eR&D - what is it and how do electronic notebooks fit
into it ?
Rich Lysakowski Jr., The Collaborative Electronic
Notebook Systems Association, 800 West Cummings Park, Suite 5400, Woburn, MA
01801, Fax: 781-935-3113, rich@censa.org
Abstract
"Collaborative
eR&D" is a new computing paradigm for scientific research, engineering,
product development, and testing. Collaborative eR&D has two major aspects
to it: 1) collaborative software environments, and 2) cultural support for
collaboration with these tools. The software infrastructure or environmental
aspect of Collaborative eR&D is that software applications have
standardized, intelligently self-integrating interfaces. Software components in
this paradigm may require some configuration, but no extra programming, to
integrate into new business processes. Integration becomes a dynamic, end-user
driven process, rather than one that requires custom coding. The cultural aspect
of Collaborative eR&D beckons R&D teams and enterprises to use
collaborative tools (collaborative electronic notebooks, meeting tools, and
others) to be more effective and efficient. This talk will define and explain
CENSA’s new work beyond “Collaborative Electronic Notebooks” to catalyze the
markets to deliver “Collaborative eR&D” environments and their huge impact
on the practice and productivity of Research and Development.
CINF 46 :
Components of Research Laboratory Notebooks Policy
Sylvia
C. Diaz, Knowledge Integration Resources, Bristol-Myers Squibb, P.O. Box
4000, Princeton, NJ 08543-4000, Fax: 609-252-6743, sylvia.diaz@bms.com
Abstract
Records Management
has long been a core function in a Pharmaceuticals' management of records. The
management of the research laboratory notebooks and its ancillary supporting
data is essential for establishing priority of invention, uphold the validity of
a patent, and memorializing scientific practices and work. A good laboratory
notebook policy sets the boundaries for preparing, signing, witnessing,
protecting and storing the research notebooks. A thorough policy establishes the
fundamentals and standards of good records management practices for the storage
of the paper, hardcopy version of the research notebook. These same principles
translate in to the electronic laboratory notebook world.
This presentation will outline the essential parts of a good research laboratory notebook management policy.
CINF 47 : An
E-Notebook success story, a roadmap for future
trips
Christopher J. Ruggles1, Jim
Rizzi2, and Jorge Manrique1. (1) CambridgeSoft Corp, 100
CambridgePark Drive, Cambridge, MA 02140, Fax: 617-588-9190,
cruggles@cambridgesoft.com, (2) Array BioPharma
Abstract
A successful
Electronic Laboratory Notebook in a drug discovery company inventing new
small-molecule drugs through the integration of chemistry, biology and
informatics, has been deployed. We report a methodology where legal,
technological, and scientific issues were addressed.
Through the use of directed discussion, needs analysis, and process abstraction, many seemingly insurmountable problems were resolved. The result is that a fully functional Electronic Notebook has been deployed throughout the enterprise, and is acting as the primary repository of scientific data for Array BioPharma Inc., dovetailing appropriately with preexisting protocols. We believe that this methodology, when properly applied, is scalable to organizations of varied sizes and complexities. We report here the results of our implementation of this methodology, and explore suggestions for modifications to optimize the methodology for future implementation.
CINF 48 :
LabBook incorporated's eLabBook knowledge management
solution
Tom Tom Zupancic, LabBook, Inc, 2501 9th Street,
Suite 102, Berkeley, CA 94710, Fax: 614-846-2243, thomas.zupancic@labbook.com
Abstract
LabBook's eLabBook
solution is a flexible integration system designed to facilitate knowledge
management within an organization by simplifying the processes required to
access, capture, organize and manipulate information. This capability creates an
environment within the organization where knowledge is generated at an enhanced
rate and captured with a high degree of efficiency. The eLabBook environment
provides a versatile computer interface between people and information so that
it becomes much easier to create a layer of "knowledge" (understanding,
interpreted information, rationales for decisions, actions and plans) and to
superimpose this layer on an organized information collection. The accessible,
user configurable presentation and delivery of this knowledge integrates the
intellectual assets of the organization and accelerates knowledge transfer. That
is, the system by design makes organized, interpreted information widely and
effectively available and actionable.
CINF 49 :
Roundtable discussion focused on implementation successes and issues for
collaborative electronic notebooks and collaborative eR&D
environments
Rich Lysakowski Jr., Executive Director,
Collaborative Electronic Notebook Systems Association, 800 West Cummings Park,
Suite 5400, Woburn, MA 01801, Fax: 781-935-3113, rich@censa.org
Abstract
This last session
will be a Facilitated Roundtable Discussion focused on implementation successes
and open issues using electronic notebooks, collaborative applications,
standardized software application interfaces, agents, component integration
tools and frameworks to tie together the many software packages in common usage
in constantly changing R&D environments. This roundtable discussion will
raise issues, identify the problems and introduce prudent paths forward for
their elimination. It will provide a panel of experts for the audience to get
many of their questions answered.
CINF 50 : CINF
Division Business Meeting
Andrew Berks, Patent Dept, Merck
& Co, RY 60-35, 126 E. Lincoln Ave, Rahway, NJ 07065, Fax: 732-594-5832,
andrew_berks@merck.com
Abstract
This is the open
meeting for discussion of CINF business.
CINF 51 : Open
Meeting: Committees on Publications and on Chemical Abstracts
Service
Robert J. Massie, and Robert D.
Bovenschulte, Director, Chemical Abstracts Service, American Chemical
Society, 2540 Olentangy River Road, Columbus, OH 43202-1505, Fax: (614)
447-3713, rmassie@cas.org, rbovenschulte@acs.org
Abstract
This is an open
meeting for the Committee on Publications and for the Chemical Abstracts
Service.
CINF 52 :
Development of a polymer property database from traditional print
products
Maggie Johnson, Science and Engineering Libraries,
University of Kentucky, 150 C/P Bldg, Lexington, KY 40506-0055, Fax:
859-257-4074, mjohnson@uky.edu, and Darla Henderson, John Wiley & Sons, Inc
Abstract
The polymer
community has for years depended on the value and reliability of data found in
The Polymer Handbook, a print product containing data about polymers and their
properties. Moving forward with Wiley’s chemical databases, we have developed a
polymer property database from The Polymer Handbook, adding features such as the
capability to search by full text or fielded searches, search the entire
database for properties by polymer name, and search the entire database for
polymers by property ranges. Additionally, cross-reference and linking
capabilities have been added. This presentation will focus on the development
and useability of this database to the polymer academic and corporate
communities.
CINF 53 :
Teaching and learning of strucural organic chemistry with
nomenclature/structure software
Bert Ramsay1,
Antony John Williams2, Andrey Erin2, and Robin
Martin2. (1) Department of Chemistry, Eastern Michigan University,
Ypsilanti, MI 48197, Fax: 734-487-1496, Bert.Ramsay@emich.edu, (2) Scientific
Development, Advanced Chemistry Development
Abstract
Many organic
chemistry students have difficulty in determining and "seeing" the configuration
about a stereogenic carbon presented in 2-d structures. A true understanding
comes when these diagrams are converted to 3-D pictures or models that can be
rotated to correspond to the diagram's perspective. Much of this confusion can
be avoided if students would use Nomenclature/Structure software programs to
compare 2- and 3-D renderings and names of chemical structures. A Student Guide
to the Use of Nomenclature/Structure software has been developed for inclusion
with ACD's ChemSketch and ACD/Name software. The Guide also helps students
recognize the location and naming of functional groups.
CINF 54 :
Homogenizing analytical data from multiple vendors into a unified
workspace
Antony John Williams, Scientific Development,
Advanced Chemistry Development, 90 Adelaide Street West, Suite 600, Toronto, ON
M5H 3V9, Canada, Fax: 416-368-5596, tony@acdlabs.com
Abstract
Today a plethora of
analytical techniques are used to characterize a particular chemical compound or
material as it migrates from research and discovery through scale-up to
manufacturing. These techniques include the multiple forms of spectroscopy and
chromatography, hyphenated techniques and other analytical techniques that
produce “curves” including electrochemistry and thermal analysis. The lifecycle
of any particular compound can originate with spectra to identify the structure,
chromatograms to separate the material and other technologies to characterize
its performance. To date it has not been possible to manage all this associated
analytical data, together with associated chemical structure information, in a
single unifying interface and the need for an integrated system for processing
and management of all associated data persists. This talk will provide an
overview of how to address the diverse needs in processing and data management
for multiple forms of analytical data and make the results available across an
enterprise.
CINF 55 :
Aventis Competitor Tracking Database
Christine
Rudolph, DI & A Lead Generation Chemoinformatics, Aventis Pharma
Deutschland GmbH, Industrial Park, Building G879, D-65926 Frankfurt/Main,
Germany, Holger Heitsch, DI & A, Medicinal Chemistry, Aventis Pharma
Deutschland GmbH, and Raul Munoz-Sanz, DI & A Information Solutions, Aventis
Pharma Deutschland GmbH
Abstract
Aventis Competitor
Tracking Database
A Competitor Tracking Database has been designed to facilitate and accelerate the task of disease program chemistry experts to track the activities of Aventis' competitors. The arduous task of extracting information from online-publications and transferring the interesting details (text and structure) by manual selection and putting them into report documents has been replaced by a simple flagging selection procedure of relevant competitor records in a central raw data pool which is fed by our selected news providers (currently: IDDB3, Prous).
The system has been designed to be flexible enough not also to store and annotate the information from various providers but also to store the knowledge about our own compounds such that we can inspect them in a common view with the structures of our competitors. Annotations by mechanism, target, and target families with controlled vocabularies enable us to link this data repository with other sources of information within Aventis.
Currently, this database covers the following Aventis Pharma Frankfurt disease programs: thrombosis, osteoarthritis, heart failure, vascular disease, arrhythmia, diabetes, obesity and lipid disorders. We estimate that we may be covering upto 80% of the relevant competitor information by this tool, expecting to include more information providers in the future.
This application has been designed with standard client/server tools (ISIS/Oracle). In a second phase, the content of the database will be made available through a web front-end to enable the integration into Aventis information portals.
CINF 56 :
Knowledge management in the spectral laboratory
Marie
Scandone, Informatics Division, Bio-Rad Laboratories, Inc, 3316 Spring
Garden Street, Philadelphia, PA 19104, Fax: 215-662-0585,
marie_scandone@bio-rad.com, and Gregory M. Banik, Bio-Rad Laboratories,
Informatics Division
Abstract
In a spectral
laboratory, knowledge management is the identification, collection and active
management of analytical information. The goal is to make existing knowledge
resources available to everyone and the effective management of that data. In
managing analytical data, we have moved from the archiving and warehousing of
spectral data to tools that help identify and evaluate information. This
approach is necessitated by the business need to effectively analyze all
available data as rapidly as possible to facilitate decision-making and to
provide required information for regulatory compliance. There has been strong
impetus, especially from the pharmaceutical industry, to share information from
diverse analytical disciplines. This need has arisen from the realization that
escalating costs for drug development dictate a “fail early, fail often” new
paradigm. Some companies have come to realize that parallel efforts in
analytical chemistry, for instance, the use of NMR and Mass Spectrometry, could
have yielded earlier, more cost effective decisions on drug candidates if these
data types could have been combined earlier into a single knowledge management
system. As the amount of spectral data increases, so does the need for
accessing, processing, and examining that data.
CINF 57 :
Molecular docking for generating peptides inhibitors for
thrombin
Cristina C. Clement1, Julian
Gingold2, and Manfred Philipp1. (1) Chemistry Department,
Lehman College and Biochemistry Ph.D. Program, City University of New York, 365
Fifth Avenue, New York City, NY 10016-4309, Fax: 212-817-1503,
cclement_us@yahoo.com, (2) New Rochelle H.S
Abstract
A promising method
of rational drug design involves the molecular modeling of peptides or small
molecules that might bind to the active site of a target protein. The goal of
this investigation is to discover peptides that reversibly inhibit thrombin. The
approach combines in silico docking using Sculpt (from MDL) with automated
chemical synthesis of candidate compounds using standard Fmoc chemistry. Initial
molecular docking experiments were used to generate candidate compounds (with
both L- and D- amino acids) that were characterized by predicted free
interaction energies that range from –20 to -50 kcal/mol. Candidate competitive
inhibitors were selected from two classes of sequences: X-Pro-Arg-dPro-Y and
X-dPhe-Pro-dArg-Y. The experimental results showed that D-Phe-Pro-D-Arg-Gly-Asp
and D-Phe-Pro-D-Arg-Gly-Asn have Ki values of 156 µM and 112 µM, respectively.
D-Phe-Pro-D-Arg-Gly has a Ki of 6 µM. A library of tetrapeptides with other L-
and D-amino acids at P1’ position (Y=P1’) is under study.
CINF 58 :
Visualization of results in markush structure database
searches
Andrew H. Berks, Merck & Co, 126 E. Lincoln
Ave RY60-35, Rahway, NJ 07065-0900, Fax: 732-594-5832
Abstract
Visualizations of
Markush structures in Markush database search results is problematic because
results are often complex and difficult to interpret. This talk presents a
method for representing Markush structures in database search results, involving
overlaying a representation of the query structure on the search results, and
providing a Markush analysis for each database hit, so that each substituent in
the database record that corresponds to a part of the query structure is
displayed in a distinctive manner, for example by using colors, in the overlaid
query structure.
CINF 59 :
Digging Deeper: from holes in cards to whole structures - indexing
chemistry at Derwent
Peter Norton, (retired), 17 Woodstock
Road, Balby, DN4 0UF Doncaster, England
Abstract
This paper gives
the author’s personal reminiscences about the trials and tribulations involved
in the evolution of the various Derwent retrieval systems, beginning with the
Farmdoc codes, which provided simple manual and punch card retrieval of
Pharmaceutical and Veterinary patents. It then moves on to the extension of
coverage to non-patent pharmaceutical literature (RINGDOC) and the various
patent services (Agdoc, Plasdoc, Chemdoc, CPI and WPI). The paper concludes with
the author’s involvement in the start-up of the Markush DARC graphics retrieval
system.
CINF 60 :
Polymer searching: a capability in progress
Stuart M.
Kaback, Information Research & Analysis, Research Support Services,
ExxonMobil Research & Engineering Co, 1545 Route 22 East, Annandale, NJ
08801, Fax: 908-730-3230, stuart.m.kaback@exxonmobil.com
Abstract
From time to time
this speaker has had the privilege of reporting to a session of the ACS Division
of Chemical Information on the capabilities and shortcomings of systems used to
search for information about polymers. One notable instance was the 1984 Herman
Skolnik Award Symposium honoring Monty Hyams. Another was a 1991 symposium on
Polymer Information Storage and Retrieval. This presentation examines progress
that has been made, and points to areas in which further advances would be
desirable.
CINF 61 : Polymer
indexing by IFI – past, present, and future
Harry M.
Allcock, and Darlene Slaughter, IFI CLAIMS Patent Services, 102
Eastwood Road, Wilmington, NC 28403, Fax: 910-392-0240, allcock@ificlaims.com,
darlene.slaughter@aspenpubl.com
Abstract
IFI has been
indexing polymer chemistry in US patents since 1955, and since that time has
developed a powerful retrieval system for polymers. Patent searchers currently
use the IFI polymer indexing and associated search tools to locate polymers by
structure, modification, and component monomers. Future enhancements to the
system will be driven by searchers’ needs, and IFI’s intellectual and
technological solutions to those needs.
CINF 62 :
Broadening horizons, sharpening the focus: The challenges of searching
multiple datasets to obtain focused recall
Richard W
Neale1, Steve Hajkowski2, Linda Clark3, and
Gez Cross1. (1) Product Development Group Chemistry & Life
Sciences, Derwent Information UK, 14 Great Queen Street, Holborn, London, United
Kingdom, Fax: +44 207 344 2911, richard.neale@derwent.co.uk, (2) Online Training
Department, Derwent Information, (3) IT R&D Group, Derwent Information UK
Abstract
The chemical
industry continues to be one of the largest investors in R&TD. In today’s
market place the R & TD budget can extend beyond the Ł1million per day
value. It is therefore imperative that patented inventions are not duplicated.
With R&TD spends continuing to spiral upwards the industry has become
dependant on the provision of precise patent information to aid the development
of effective R&TD strategies.
As the Information Professional’s requirements broaden, the information provider must evolve to meet the user needs. Searching chemical structure and text data in combination has become a necessity, for accurate retrieval and to limit results within larger databases. This paper will examine combination search approaches currently used in the chemical information industry and investigate how Thomson Scientific as an information vendor is developing future products and content with the combination search in mind.
CINF 63 :
Chemical patent indexing and Gresham's Law
Edlyn S.
Simmons, SourceOne-Business Information Services, Procter & Gamble Co,
5299 Spring Grove Ave., Cincinnati, OH 45217, Fax: 513-627-6854,
simmons.es@pg.com
Abstract
For many years,
fragmentation coding was the gold standard of patent indexing. Fragmentation
coding schemes, such as the one applied to Derwent's Chemical Patents Index, are
applied to both specific and generic chemical structures and serve as keys to
retrieval of documents through searches for chemical structures or
substructures. By providing codes for structural fragments, they allow the
searcher to find molecular structures rather than chemical names.
In recent years, value-added patent databases have been joined by many databases for which indexing is generated automatically from the original text. Gresham's Law tells us that bad money drives out good money. As searchers begin to substitute full text searching for the use of value-added indexing, they lose the capacity to search for chemistry expressed in Markush structures and other structural diagrams. If this is true, Gresham's Law may also tell us that bad indexing drives out good indexing.
CINF 64 :
Chemical structures and reactions in CAS databases – searching for prior
art
Matthew J. Toussant, Chemical Abstracts Service, 2540
Olentangy River Road, Columbus, OH 43202-150, Fax: 614-447-3906,
mtoussant@cas.org
Abstract
Chemical
information in CAS databases takes many forms. Structure information is one form
that links many databases through a connection table identifier system, the CAS
Registry Number. The nature of prior art information in the CAS Registry,
CASREACT, and CHEMCATS databases will be described, and the pivotal role of the
CAS chemical identifier system in linking those collections will be detailed.
Further, the MARPAT database will be examined. CAS approaches to covering
chemical information and the effect of these approaches on efforts to create
exhaustive prior art collections, including from patents, journals, chemical
supply catalogs, and web disclosures, will be assessed.
CINF 65 :
Biotechnology patent searching: past, present and
future
Sandy Burcham, Service Is Our Business, Inc, 111
Lincoln Terrace, Norristown, PA 19403-3317, Fax: 610-630-0863,
cass123@earthlink.net
Abstract
In the last 2
decades, the importance of biotechnology has increased dramatically, moving from
straightforward enzyme catalysed reactions to the complexities of the human
genome project. Similarly, the application of biotechnology has spread from
simple fermentation processes to many complex previously non-biological
technologies. During this time the number of biotechnology patents has also
increased dramatically as organisations have sought to protect their research
and discoveries.
To cope with the increasing importance of biotech and the increasing volume of patent and journal literature, various abstracting and indexing services together with software suppliers and online hosts, have developed resources providing increasingly powerful retrieval and display capabilities.
This paper will discuss the searching of biotech patents - where we were, where we are and where we seem to be going.
CINF 66
: Back for the future: making coding cool
Gez Cross,
and Katharine Hancox, Product Development Group Chemistry & Life Sciences,
Derwent Information UK, 14 Great Queen Street, Holborn, London, United Kingdom,
Fax: +44 207 344 2911, gez.cross@derwent.co.uk
Abstract
The chemical
indexing systems introduced at Derwent by Peter Norton have been used for many
years by Information Professionals to retrieve chemical information from the
patent literature. When structure indexing of Markush compounds was made
available, discontinuation of the structural codes was proposed – and strongly
opposed by professional searchers. However, despite the introduction of software
to help generate the strategies for searching these codes, they remain a tool
used mainly by experienced, professional patent searchers.
With the advent of inhouse and online browser based information retrieval tools, a new generation of information users has arisen – scientists, who formerly relied on IPs for their searching requirements. To encourage these new users, intuitive, user-friendly interfaces have been created, which have further raised the expectations of both old and new users. This paper will examine attempts to bring the older, code-based systems into the internet era with new user-friendly tools and interfaces.
CINF 67
: Developing HT Information Systems, a modular
design
Steve Coles, Database Applications Developer, Tripos
Receptor Research, Bude-Stratton Business Park, Bude EX23 8LY, United Kingdom,
Fax: +44 1288 359222, stcoles@tripos.com
Abstract
It is possible to
develop information systems for high-throughput design, chemistry, analysis and
purification by incorporating a modular approach using best of breed scientific
and information technologies. Working iteratively in close collaboration with
users of the system it is possible to streamline integration projects, reconcile
process issues, and provide customer-facing support. A modular approach
encapsulates domain knowledge, permits easier introduction of new modules and
increments, and can be shared between different applications
CINF 68
: Automating Library Design
Mark J. Duffield, and
Kevin Daniels, EST Lead Informatics, AstraZeneca R&D Boston, 35 Gatehouse
Drive, Waltham, MA 02451, Fax: 781-839-4580, mark.duffield@astrazeneca.com
Abstract
The library design
process is generally performed differently by every participant. Each chemist
has a number of "favorite" parameters with which to evaluate a potential
library. The process usually involves a large number of manual steps including
the reformatting, collating, and integration of data from disparate sources.
This process is time consuming and requires the chemist to perform complex
computing tasks, often across multiple environments. The end result is that the
chemist must spend significant time away from the bench planning their library.
This session will summarize our work in the area of streamlining the library design process through automation. We will describe our library design workflow and present the details of how we have automated many of the steps in the process. The chemist is now able to get the computational aspects done side by side with the actual synthetic work, while maintaining control over the end result.
CINF 69
: On a new model for cheminformatics: Learning the classes of
compounds
Dmitry Korkin, Faculty of Computer Science,
University of New Brunswick, 540 Windsor St., Fredericton, NB E3B 5A3, Canada,
dkorkin@unb.ca
Abstract
We have outlined a
radically new approach to cheminformatics called ChemETS model. It is based on
the first general formalism for structural (or symbolic) object representation
and classification proposed by us, called the evolving transformations system
(ETS) framework. The main central features of the ETS framework are: 1) a new
structural form of class representation that can be constructed (and
modified) inductively and 2) a new structural form of object representation,
which incorporates the constructive (or synthetic) history of
object and is directly related to the above representation of the corresponding
class of objects (containing this object).
I will, first, outline the basic principles of the ChemETS model, together with the central problem of inductive approach to cheminformatics and computer-aided drug design (CADD). Then, I will discuss the application of the ChemETS model to the basic problems in cheminformatics and CADD, such as virtual lead discovery, design and screening of virtual combinatorial libraries of compounds, and others. In particular will be discussed: construction of the class of androgene-like compounds (based on a small set of known androgenes), construction of the new androgene-like compounds (based on the above class representation), and the resulting classification of compounds as either belonging or not to this class.
CINF 70
: Choosing the proper grid resolution for cell-based diversity
estimation
Dmitrii N. Rassokhin, and Dimitris K.
Agrafiotis, 3-Dimensional Pharmaceuticals, Inc, 665 Stockton Drive, Exton, PA
19341, rassokhin@3dp.com
Abstract
Although cell-based
methods are becoming increasingly popular for diversity analysis, the choice of
grid resolution is still guided primarily by intuition and lacks any theoretical
or empirical support. Here we present a systematic analysis of several typical
chemical data sets, and propose a simple technique for identifying a suitable
bin size for cell-based diversity estimation using an algorithm inspired from
the field of fractal analysis. We demonstrate that the relative variance of the
diversity score as a function of resolution exhibits a characteristic bell shape
that depends on the size, distribution and dimensionality of the data set under
consideration, and whose maximum represents the optimum resolution for a given
data set. Even though box counting can be performed in an algorithmically
efficient manner, the ability of cell-based methods to distinguish between
subsets of different spread falls sharply with dimensionality, and the method
becomes useless beyond a few dimensions.
CINF 71
: Quantification of drug-likeness and similarity for combinatorial
follow-on libraries
Mark J. Rice, Ryan T. Weekley, and Paul
A. Sprengeler, Structural Group, Celera Therapeutics, 180 Kimball Way, South San
Francisco, CA 94080, Fax: 650-866-6654, mark.rice@celera.com
Abstract
Striking a balance
between good physical properties and similarity to the initial hit often poses a
problem in the design of follow-on libraries. Good physical properties are
needed to improve both ADME characteristics and drug-likeness, while similarity
is needed to maintain an adequate pharmacophore for binding. These requirements
are often at odds and difficult to quantify. Therefore, we have developed a
site-specific fingerprint based on chemical graph theory as a basis for
sidechain similarity. We have also developed a continuous drug-likeness metric,
using multivariate statistical analysis. We combine these measures to suggest
sidechain selection and more efficiently develop follow-on libraries.
CINF 72
: Predicting generic methods and retention times for high-throughput
chromatography
Daria Jouravleva, Scott Macdonald, Michael
McBrien, and Eduard Kolovanov, Advanced Chemistry Development, Inc, 90 Adelaide
St.West, Suite 600, Toronto, ON M5H 3V9, Canada, daria@acdlabs.com
Abstract
In experimental
validation of combinatorial libraries, speed and high-throughput are the key.
For chromatographic separation or LCMS of the newly synthesized compounds,
generic chromatographic methods have been designed to accommodate a widest
possible diversity of samples. However, when the sample is not suited to the
method, costly instrument downtime slows the analytical process, and often
results in rejection of the whole plate or series of compounds. New
ACD/ChromGenius software will advise if methods are viable, and select between
available multiple methods. This presentation describes retention time and
method selection algorithms used to power the new software computational tool,
as well as physicochemical parameters used to model the chromatographic
separation.
CINF 73
: Copyright and the EU Database Directive: Issues for
chemistry
John R. Rumble Jr., Office of Measurement
Services, National Institute of Standards and Technology, 100 Bureau Drive MS
2310, Gaithersburg, MD 20899-2310, Fax: 301-926-0416, john.rumble@nist.gov
Abstract
The computerization
of scientific information continues to change the scientific communication
process. As we approach the end of the first decade of the Internet era,
ownership issues still loom large with respect to the communication process
itself as well as the economics of the process. In this presentation, I will
review some of the issues related to traditional ownership of authored material
(copyright) as well as new ownership rights (sui generis) as created by the
European Union. Both rights are under review, and possible changes could affect
the communication process in many ways. This talk also provides an introduction
to more detailed talks on this subject later in this session.
CINF 74
: Pressures on the public domain in scientific data and
information
Paul F. Uhlir, Office of International S&T
Information, The National Acacemies, 2101 Constitution Avenue NW, Washington, DC
20418, Fax: 202-334-2231, puhlir@nas.edu
Abstract
The public domain
in scientific and technical data and information (STI) is massive and has played
a major role in the success of the research enterprise in the United States. The
"public domain" may be defined in legal terms as sources and types of data and
information whose uses are not restricted by statutory intellectual property
regimes or by other legal constraints, and that are accordingly available to the
public without authorization. Various legal, economic, and technological
pressures in recent years have narrowed the scope of the public domain in STI,
with poorly understood and perhaps significantly under-appreciated consequences
to our nation's preeminent research capabilities. This presentation will discuss
the background of public-domain information in research and review some of the
many constraints that are being placed on open access to and use of such
resources.
CINF 75
: IPR and modern scientific society publishing
Eric S.
Slater, Publications Division, Copyright Office, American Chemical Society,
1155 Sixteenth Street, NW, Washington, DC 20036, Fax: 202-776-8112,
e_slater@acs.org
Abstract
This presentation
will provide basic information about United States Copyright Law and its
application to modern scientific publishing. Included will be the major issues
surrounding publishing today such as protecting content against piracy,
protecting works that appear online, and how recent court decisions have shaped
the copyright landscape.
CINF 76
: Copyright and the information industry
Dan Duncan,
Executive Director, NFAIS, 1518 Walnut Street, Suite 307, Philadelphia, PA
19102, Fax: 215-893-1564, danduncan@nfais.org
Abstract
A review of major
developments in copyright and related law, with particular emphasis on U.S.
activities, that are of special importance to informaton database providers. The
presentation will focus on how policy developments may affect the delivery and
use of online information databases.
CINF 77
: Database protection and academic research
Harlan J.
Onsrud, Department of Spatial Information Science and Engineering,
University of Maine, 5711 Boardman Hall, Room 340, Orono, ME 04469-5711, Fax:
207-581-2206, onsrud@spatial.maine.edu
Abstract
Many economic and
legal scholars argue that the current, relatively open, access to data
environment in the United States is beneficial to advancing knowledge and the
economy. If so, the traditional method of scientific advancement by extending
from and building upon the data and works of others may be substantially
burdened if the U.S. moves to a database protection legal environment similar to
that instituted recently throughout much of Europe. This talk explores evidence
to date of the effect of the European Database Directive including its effect on
scientific and technical databases. Provisions of the Directive and the
implications for expanding or constraining scientific discourse are discussed.
Likely responses of the scientific community to similar legislation in the U.S.
are hypothesised. Several alternatives for working around such a default law are
suggested and several illustrative examples already being pursued are
highlighted.
CINF 78
: An academic chemist looks at copyright
S. Scott
Zimmerman, Department of Chemistry and Biochemistry, Brigham Young
University, C205 BNSN, Provo, UT 84602-5700, Fax: 801-422-5474,
scott_zimmerman@byu.edu
Abstract
Most academic
chemists think little about copyright issues. They treat copyrighted materials
like their mentors and colleagues do, often without questioning the legality of
their actions. But academicians should know the answers to a few common
copyright questions, for example: Can I photocopy book chapters and research
papers for my personal files? Can I photocopy these materials, include them in a
course packet, and pass them out to my classes? Can I use copyright materials in
my PowerPoint presentations at meetings and in classes? When my students write a
paper describing research done in my laboratory, who owns the copyright? Can my
students publish research results in theses and dissertations, and then publish
the same materials in a journal? If I prepare and publish a graph in a journal
article, can I re-publish the same graph in another journal or review article?
Can I post my published research papers on my Web page? This presentation will
try to answer these and other questions about copyright in academia.
CINF 79
: Integration of Combinatorial Chemistry Analyses with Other Relevant
Information
Jeff Saffer, OmniViz, Inc, Two Clock Tower
Place, Suite 600, Maynard, MA 01754, saffer@omniviz.com
Abstract
Today's chemist
deals with very large collections of information from diverse sources.
Integration of the analysis of textual information (patents or scientific
literature), high throughput screening results, structures, descriptors and
fingerprints is prerequisite for the comprehensive understanding required for
improved decision-making. One of the best instruments for this integration is
the human mind, but this can only be fully engaged when the diverse information
is presented in a context that is easy to assimilate. To this end, we have
developed a visualization framework that integrates analysis of experimental and
computational data with conceptual analysis of textual information while
maintaining the data in the context in which it was generated. Tools enabling
exploration across the multiple data types and detailed exploration within
specific data types increase understanding and decrease the time required to
reach decisions. The application of these approaches to very large (hundreds of
millions of data points) chemistry data sets will be discussed in the context of
discovery research.
CINF 80
: Barriers to effective integration in chemical experiment management
software
J. Christopher Phelan, Marketing, MDL, 1550 Bryant
St., Suite 739, San Francisco, CA 94103, Fax: 415-252-8610, phelan@mdli.com
Abstract
During the past
twenty years, computers have become ubiquitous in chemical research, for
instrument control and for data collection, management, and analysis. However,
despite a pressing need, general software solutions that integrate these
functions are not yet widely available. We present an analysis of several
significant obstacles to the implementation of effective integrated software
solutions in the chemical experiment management arena. Specific topics will
include: compartmentalization of domain specific expertise, lack of a consistent
data model for chemical information beyond simple structure data, idiosyncratic
workflows in the research environment, and complexity issues in design and
architecture.
CINF 81
: Application of statistical design tools for improved efficiency in
chemistry development for high-throughput parallel
synthesis
Jean E. Patterson, and Robb Nicewonger,
Department of Library Optimization, ArQule, 19 Presidential Way, Woburn, MA
01801, Fax: 781-994-0677, jpatterson@arqule.com
Abstract
Although there are
multiple techniques to select structurally diverse subsets of virtual library
products, there remains a need for a practical method to identify reagents that
represent the range of reactivity needed to build a library. Chemical intuition
has been the predominant driver for selection of such reagents, but it has a
number of shortfalls. Chemical intuition is not consistently predictive, it is
not an automated process, and it is not possible to quantitatively describe the
process to enhance the chemistry development of future projects. This
presentation will focus on ArQule’s statistics-based approach to the selection
of experimental test reactions using a commercially available software package
from Umetrics. Identification of chemical descriptors that most closely describe
reagent reactivity using multivariate statistics followed by experimental design
techniques to choose a diverse sampling of reagents that represents the
reactivity of the entire virtual library will be described.
CINF 82
: Library design using multi-dimensional SAR analysis: Incorporating
structure-based predictions
Carleton Sage, Kevin Holme, and
Manish Sud, Cheminformatics Research, LION Bioscience Inc, 9880 Campus Point
Drive, San Diego, CA 92121, carleton.sage@lionbioscience.com
Abstract
After screening
results are available for a compound library, SAR analysis is often used to
determine which R-Groups add favorably to activity. After a chemical core and
R-groups positions are specified, SAR analysis involves identifying R-Groups and
generating a SAR table. We have implemented a system that takes this analysis
one step further. In addition to activity data, we have integrated
structure-based models to predict the ADME and specificity properties of
compounds and have developed methods to simultaneously consider multiple
properties in R-Groups analysis. A critical component of these analyses is the
number and weighting of the properties when they are combined and how changes in
these parameters affects the final prioritization of compounds and R-Groups. We
will present results from using different strategies for simultaneous parameter
combination.
CINF 83
: Use of recursive partitioning/simulated annealing (RP/SA) for mining
combinatorial libraries
Paul Blower, LeadScope, Inc, 1245
Kinnear Rd, Columbus, OH 43212, pblower@leadscope.com, and Petr Kocis, Enabling
Science & Technology, Chemistry, AstraZeneca R&D Boston
Abstract
Recursive
partitioning is a powerful tool for mining large, diverse data sets encountered
in drug discovery. It is useful for explaining a complex, nonlinear response,
and it can handle very large descriptor sets with continuous, discrete, or
categorical variables. At each node, we use simulated annealing to optimize
several variables simultaneously and find good combinations of descriptors. The
search is incorporated into a recursive partitioning design to produce a
regression tree on the space of descriptors. We used RP/SA for mining
combinatorial libraries to identify combinations of structural features and
reaction parameters that give superior yields. In this talk, we will describe
statistical techniques used in this new method and illustrate its application in
mining a combinatorial library.
CINF 84
: NMR Prediction Software and Applications to the Screening of
Combinatorial Libraries
Antony John Williams, and Sergey
Golotvin, Scientific Development, Advanced Chemistry Development, 90 Adelaide
Street West, Suite 600, Toronto, ON M5H 3V9, Canada, Fax: 416-368-5596,
tony@acdlabs.com
Abstract
Coupling automation
with flow NMR technology now allows NMR spectra to be acquired on materials
populating a combinatorial plate in only a few hours. This routine acquisition
of large amounts of spectral data can indeed increase the rate of throughput for
such analyses but the technology can lead to an inordinate amount of data with
no appropriate manner to track and database the information in a facile manner.
We will present software which allows the user to process NMR data directly from
the spectrometer and display in a 96 well plate format. H1 NMR prediction
algorithms allow spectra to be generated for each of the suggested structures
and displayed on screen for direct visual comparison with the experimental
spectra. Verification algorithms for matching experimental and predicted spectra
can be performed based on the differences in shifts, integrals and
multiplicities between the spectra.
CINF 85 :
Computational proteomics: Genome-scale analysis of protein structure,
function, & evolution
Mark Gerstein, P Harrison, J
Qian, R Jansen, V Alexandrov, P Bertone, R Das, D Greenbaum, W Krebs, Y Liu, H
Hegyi, N Echols, J Lin, C Wilson, A Drawid, Z Zhang, Y Kluger, N Lan, N
Luscombe, and S Balasubramanian, MB&B Department, Yale University, Bass
Building, 266 Whitney Avenue, New Haven, CT 06520, Fax: 360-838-7861,
Mark.Gerstein@yale.edu
Abstract
My talk will
address two major post-genomic challenges: trying to predict protein function on
a genomic scale and interpreting intergenic regions. I will approach both of
these through analyzing the properties and attributes of proteins in a database
framework. The work on predicting protein function will discuss the strengths
and limitations of a number of approaches: (i) using sequence similarity; (ii)
using structural similarity; (iii) clustering microarray experiments; and (iv)
data integration. The last approach involves systematically combining
information from the other three and holds the most promise for the future. For
the sequence analysis, I will present a similarity threshold above which
functional annotation can be transferred, and for the microarray analysis, I
will present a new method of clustering expression timecourses that finds
"time-shifted" relationships. In the second part of the talk, I will survey the
occurrence of pseudogenes in several large eukaryotic genomes, concentrating on
grouping them into families and functional categories and comparing these
groupings with those of existing "living" genes. In particular, we have found
that duplicated pseudogenes tend to have a very different distribution than one
would expect if they were randomly derived from the population of genes in the
genome. They tend to lie on the end of chromosomes, have an intermediate
composition between that of genes and intergenic DNA, and, most importantly,
have environmental-response functions. This suggests that they may be
resurrectable protein parts, and there is a potential mechanism for this in
yeast.
CINF 86
: Federated databases: The next level
Peter M. Smith,
Discovery Research Applications, Wyeth Ayerst Research, CN 8000, Princeton, NJ
08543, Fax: 732-274-4733, smithp@war.wyeth.com
Abstract
Accessing data
across diverse databases is a major issue in Pharmaceutical research, and
several solutions have been proposed. They range from the creation of large data
warehouses to a federation of separate databases. In this talk we will present a
new approach to the federated data model, based on distributed computing and a
network-centric applications server engine. It is based on Java components, J2EE
servers, and Oracle data sources. By moving the business logic to a middle tier,
a new level of generalization can be realized which provides flexible,
adaptable, and richly functional access to the various data sources. For
example, in scientific areas, the databases we need to federate can include
chemical structures, reactions, biological activity results, proteins, and
genetic sequences. A set of “rich objects” in the middle tier can map these
complex data types and be queried to provide a cross-database view. The
practical implementation of this model will be discussed in the cheminformatics
domain. A demo of such a system will be given.
This technology is also the foundation of the next generation of scientific applications. It provides modular, “plug-and-play” functionality. The implications of this new approach for scientific software development will be discussed.
CINF 87 :
Practical meta data solutions for the large data
warehouse
Tom Gransee, Paul Vosters, and Ronda
Duncan, Knightsbridge Solutions LLC, 500 W. Madison Street, Suite 3100, Chicago,
IL 60661, Fax: 413-669-2358, rduncan@knightsbridge.com,
rduncan@knightsbridge.com
Abstract
For enterprises
with large data warehouses, implementing a comprehensive meta data solution can
seem like a formidable task. There are no industry standards and no
off-the-shelf tool suites that can meet all of an enterprise's meta data
objectives. However, by carefully gathering requirements, mapping them to meta
data sources, and choosing a solution that achieves the right balance between
standardization and customization, an enterprise can develop and approach to
meta data that meets its business and technical needs. Enterprises that
implement successful meta data solutions will benefit from reduced development
costs, user acceptance of the data warehouse, and the ability to make faster
business decisions.
CINF 88
: So you have a data warehouse - Now What?
William
Langton, Ramesh Durvasula, and Julie Pitney, Software Consultant Manager,
Tripos, Inc, 1699 South Hanley Road, St. Louis, MO 63144, Fax: 314 647 9241,
jamih@tripos.com
Abstract
In the current
informatics-enabled research environment, drug companiesoften pursue data
warehousing as a one-stop solution to data management. However, data warehousing
alone may not return the value expected. We believe proper tools and processes
are essential for leveraging the warehouse to extract meaningful relationships
and create knowledge. In this talk, several examples of such tools and processes
will be presented based on Tripos' experience in designing and deploying
successful informatics systems.
CINF 89 :
Using OLAP and data mining technologies for trending, knowledge discovery,
and collaborative commerce
Jane Griffin, Data
Management Group, Arthur Andersen, LLP, 225 Peachtree Street, Suite 1800,
Atlanta, GA 30303, Fax: 404-954-7980, s.jane.griffin@us.andersen.com
Abstract
Data mining tools
can facilitate knowledge discovery and construction of predictive models that
reveal new opportunities across the value-chain and facilitate greater knowledge
of customers. This presentation will cover how to extend Business Intelligence
beyond traditional boundaries to: 1)Discover and pursue new business
opportunities 2)Enhance the relationships between value-chain partners 3) Using
real-time intelligence to monitor the value-chain Architectures required to
build predictive models to: a) Enhance revenue growth, customer and product
profitability b) Recognize fraud and take immediate action
CINF 90
: Challenges of information provision in a dynamic genomic
landscape
Rachel V. Buckley, Head of Product Development
(Life Sciences), Derwent Information, 14 Great Queen Street, London WC2B 5DF,
United Kingdom, rachel.buckley@derwent.co.uk, and Giles Stokes, Product Manager
(Life Sciences), Derwent Information
Abstract
Significant
intellectual and monetary investment in therapeutic and diagnostics research
means that keeping completely up to date with technology trends is critical to
both scientific and commercial success. As well as the publication of
increasingly large sequence patents and the questions raised of the patenting
strategies that lie behind these activities, another challenge posed by the
growth of genomics is simply the new kinds of information available and the
ability to track and monitor business information in a such highly dynamic
industry. Another impact has been the changes to searching skills required as
the “-omics” has meant the introduction of a new scientific language - new
technical terms, new subject areas and new relationships. The post genomic
landscape continues to change. We believe there will be a need for more
attention to detail - greater focus on qualitative information, and the need for
the interrelation of different data sources. Information providers will need to
accept the requirement for better linking between sources to allow thought
processes to flow more naturally in research. Information provision will need to
keep up with the developments of a rapidly evolving discipline.
CINF 91
: Integration of genomic, biological, and chemical data in drug
discovery
Thomas Laz, Bioinformatics, Schering-Plough, 2015
Galloping Hill Rd, Kenilworth, NJ 07033, thomas.laz@spcorp.com
Abstract
There are many
database mining strategies in place at the Schering-Plough Research Institute
(SPRI) that are being used to support programs in the therapeutic areas. The
common element in these mining strategies is that they generate lists of human
genes that, while satisfying the initial criteria of the mining strategy,
require further prioritization before the initiation of detailed biological
evaluation. To accurately evaluate these projects, scientists must be able to
analyze large amounts of disparate information relevant to the gene sequences
under investigation. To facilitate the accumulation and analysis of this wide
range of data at SPRI, the Bioinformatics group has developed the Discovery Data
Library (DDL). We began with the optimized catalog of human genes and then
developed a strategy to locate and integrate genomic, chemical, and biological
information. We have developed a Web Browser-based user interface that allows
the database to be queried in a variety of ways and delivers the results in a
concise and comprehensive set of reports. The DDL allows SPRI scientists to
rapidly obtain information on human genes in a context that is most relevant to
their drug discovery programs.
CINF 92
: Genomics gorilla....handling sequence overload
Dr.
Bernard French, Manager, Molecular Biology and Genetics, Chemical Abstracts
Service, 2540 Olentangy River Road, Columbus, OH 43202, Fax: 614-447-3713,
bfrench@cas.org, Dr. Balvinder Sidhu, Product Manager, Life Sciences, Chemical
Abstracts Service, and Eileen M. Shanbrom, Marketing, CAS
Abstract
Advances in the
genomics arena has spurred an impassioned debate on the daunting challenge to
manage sequence information overload. Questions concern how to process the
explosion of genomic data, including intellectual analysis, packaging, and
delivering the information in an efficient manner. Information providers should
not only keep changing to an ever evolving bioinformatic needs, but also design
and maintain databases which can handle complex and large sequence data sets.
CAS has created a digital research environment for genomic information, and
accomplishes this by creating algorithmic sequence feeds from public data
sources including patents, integrating sequence data from multiple data sources,
creating intellectual content, and providing value-added search tools. More and
more, a solid foundation for the building of a comprehensive digital research
environment rests on bridging biology, chemistry and information technology.
CINF 93
: Managing and providing biosequence information in the STN host
environment
Ilka Schindler, and Rainer Stuike-Prill,
FIZ-Karlsruhe, Karlsruhe, Germany, ilka.schindler@FIZ-karlsruhe.de
Abstract
Biosequence
information as part of intellectual property has become a rapidly developing
field recently. The number of patent publications containing sequence
information has increased substantially as well as the number of sequences
published with a single publication. The almost exponential increase of
published biosequence data poses a challenge for any supplier of such
information. FIZ Karlsruhe in its role as STN Europe has established itself as
provider of biosequence information. Our objectives as a host are to provide
comprehensive, high-quality information implemented in a unified and integrated
way. The provision of biosequence information in an host environment requires
specific solutions with respect to the retrieval system, the user interfaces,
and the integration with other related databases. In particular the
sophisticated homology search functionality needed to be enhanced to provide the
excellent performance required by our customers. Furthermore, questions related
to the appropriate hardware equipment, database systems and update procedures
needs to be addressed.
CINF 94
: Multiscale hierarchical classifications of genes for genomics HTS
analysis
Chihae Yang, and Limin Yu, LeadScope, Inc,
Columbus, OH 43212, cyang@leadscope.com
Abstract
Although the
application of high-throughput technologies to genomics has greatly increased
the amount of available information, it has not yet led to dramatic increases in
productivity in the drug discovery process. The challenge of inferring
biological target information is formidable given the size of the data set.
Conventional data handling techniques include clustering of the gene sets for
sub-categorization and mapping the classifications for visualization. In this
paper, a unique gene hierarchy, based on annotations from the Gene Ontology
Consortium, is used to differentiate gene expression patterns of various cell
types. The gene hierarchy dynamically queries individual genes for
classification and annotation based on a relational database for gene, EST,
mRNA, clones, protein, enzyme, receptor, and pathways. The gene family classes
by hierarchical classification correlate gene functions to expression levels.
The result from the biological hierarchy analysis is compared to other
computational methods for extracting subsets of genes for differentiation. For
example, the same expression patterns were also differentiated using a
recursive-partitioning (RP), a well-known tree splitting method for classifying
complex non-linear data. A novel approach is presented in which the hierarchical
classification is used to provide a rational gene order for subsequent
multiscale principal component analysis (MSPCA), which is essentially an
integration of wavelet and PCA methods. Identifying subsets of genes is
discussed in the context of identifying specific targets in the drug discovery
process.
CINF 95
: Management, integration and cross-referencing of genomic
information
Anthony Caruso, LION bioscience Inc, 141
Portland Street, 10th floor, Cambridge, MA 02139, Fax: 617 245 5401,
acaruso@lbri.lionbioscience.com
Abstract
Through the
genomics revolution the quantity of data generated in the biological sciences
has been clearly overwhelming. However, these data are in many cases redundant,
ambiguous and of varying quality. In addition, with the accelerated data output
it is quite often that they end up in data graves. Some data sources are known
for their high quality, but they also tend to be of low relative quantity,
whereas other sources are of lower quality but are much more plentiful. When
such data are parsed and organized with various filtering and management
techniques, all of these data can be merged, presented and interpreted in a much
more useful manner. For example, the use of low throughput, high quality PCR
based expression data can be used to help validate lower quality, high
throughout gene-chip expression data. Another, much simpler example, associating
the highly regarded SwissProt protein data with the first-pass sequence reads of
dbEST to look for alternatively spliced forms of proteins of interest. We've
developed an integration scaffold coupled with a management and decision support
system to make better sense of the data in the early, yet crucial stages of the
drug discovery pipeline, target identification and target validation.
CINF 96
: Life after the lab (or how to never leave
university)
Patricia E. Meindl, Department of Chemistry,
University of Toronto, 80 St George St., Toronto, ON M5S 3H6, Canada, Fax:
416-946-8059, pmeindl@chem.utoronto.ca
Abstract
Are you scared of
the librarians at your university library? You shouldn't be! The main job of a
good librarian is make sure you find the information you need. The stereotype of
the prissy old librarian who only goes "Shhhhhhh..." doesn't fit our new
electronic age. No longer is it just a matter of pointing people to the stacks.
With such a wealth of information, the librarians must also have some subject
knowledge to help guide them. My chemistry degree has proven invaluable to me in
answering questions on subjects as diverse as clinical medicine to high-energy
physics. As an academic librarian the focus is slightly different than in
industry. We are teaching students how to find the information they need. The
types of patrons range from high school students with vague requests to
undergrads and graduate students with more specific needs and short timetables
to faculty who have very detailed queries and no time at all. Learning to meet
all these needs and keep up with all the new resources available is a very
challenging but rewarding task.
CINF 97 : What
to expect in a small corporate R&D library
Scott C.
Boito, North American Library, Rhodia, Inc, 259 Prospect Plains Rd,
Cranbury, NJ 08512-7500, Fax: 609-860-0165, scott.boito@us.rhodia.com
Abstract
Are you adept at
finding literature references that your colleagues can't? Do you love chemistry,
but feel as comfortable in the library as you do in the lab? Making the jump
from bench chemist to information specialist is a dramatic transition, but the
rewards are many and your career can be very fulfilling if you commit to the
change. I will discuss a little about my conversion to the information
profession, including some of the reasons I chose to and how I did it
successfully. I will also try to highlight some of the differences between
academic and corporate libraries to help you decide on your correct path. The
goal of the talk is to give some idea of what to expect in your new exciting
career and how to prepare for it.
CINF 98
: Chemical information careers in the government
John R.
Rumble Jr., Office of Measurement Services, National Institute of Standards
and Technology, 100 Bureau Drive MS 2310, Gaithersburg, MD 20899-2310, Fax:
301-926-0416, john.rumble@nist.gov
Abstract
The United States
Government is intimately involved in virtually every aspect of the chemical
sciences. It funds research and developement, operates laboratories,
manufactures a wide variety of chemical substances, issues chemical-related
regulations and maintains large chemical databases. In all these efforts, modern
chemical informatics plays an important role. The need for chemical information
specialists within the government has never been higher. In this talk, I will
describe some of the career opportunities that are available to chemists and
chemical informatics experts within the government, with emphasis on emerging
needs for the future.
CINF 99 :
Searching Patents: Background, careers and the
future
Ron Kaminecki, Dialog, Suite 2930, 180 North
LaSalle Street, Chicago, IL 60601, Fax: 312-726-3550
Abstract
Patent information
involves the best of chemistry and the law. Patents are legal documents that are
made to be defended in court and are thus written with that intent in mind.
Patents also contain in-depth technical discussions that incorporate the leading
edge of chemistry though written under the rules of statutory and case law.
Thus, searching the prior art involves the best of chemical and legal skills to
find the appropriate information to obtain, enforce, or invalidate a patent.
This session will cover the skills and background that are needed in the search
profession, the career path and typical role in industry, and the typical salary
and expectations of patent search professionals.
CINF
100 : Creating content and selling it: a career in
publishing
Kristina Kurz, Thieme Publishers, 333 7th
Avenue, New York, NY 10001, Fax: 212 947 1112, kkurz@thieme.com
Abstract
Chemists have many
skills that are in high demand in various industries. Publishing is one of them.
Scientific information is valuable only when it is read and used by fellow scientists. Being part of the process to capture, select, edit, archive and distribute information is challenging and fun. Especially at smaller publishing houses one is exposed to many aspects of the business and every skill you picked up at graduate school will be used. On the editorial side a thorough understanding of the scientific content of the publication and a deep insight into the scientific community with a good working network is crucial. It will allow the selection of material that is of high quality and of special interest to the readers. On the business development part, out of the box thinking, healthy skepticism and analytical thinking are the most looked after skills.
CINF 101 :
Some novel perspectives with a computational chemistry
degree
Jeffrey L. Nauss, Accelrys, Inc, 9685 Scranton
Road, San Diego, CA 92121-3752, Fax: 858-799-5100, jnauss@accelrys.com
Abstract
A degree in
computational chemistry is often considered to be narrow and specialized. In
some regards, it is; yet there are many other opportunities with such a
background in commercial, academic, government, and non-profit organizations.
This talk will examine several of these opportunities. Drawing from personal
experience spanning nearly two decades, the speaker will paint a story of a
multifaceted career and one that is still evolving. The goal for the
presentation is to show that varied opportunities are out there; you just need
to be open-minded when searching.
CINF 102 :
Teaching and learning of structural organic chemistry with
nomenclature/structure software
Bert
Ramsay1, Antony John Williams2, Andrey
Erin2, and Robin Martin2. (1) Department of Chemistry,
Eastern Michigan University, Ypsilanti, MI 48197, Fax: 734-487-1496,
Bert.Ramsay@emich.edu, (2) Scientific Development, Advanced Chemistry
Development
Abstract
Many organic
chemistry students have difficulty in determining and "seeing" the configuration
about a stereogenic carbon presented in 2-d structures. A true understanding
comes when these diagrams are converted to 3-D pictures or models that can be
rotated to correspond to the diagram's perspective. Much of this confusion can
be avoided if students would use Nomenclature/Structure software programs to
compare 2- and 3-D renderings and names of chemical structures. A Student Guide
to the Use of Nomenclature/Structure software has been developed for inclusion
with ACD's ChemSketch and ACD/Name software. The Guide also helps students
recognize the location and naming of functional groups.
CINF 103 :
Application integration: Providing coherent drug discovery
solutions
Mitchell Miller, and Manish Sud,
Cheminformatics Research, LION Bioscience Inc, 9880 Campus Point Drive, San
Diego, CA 92121, Fax: 858-410-6501, mmiller@netgenics.com
Abstract
Over the last
couple of decades, the number of computational tools available for drug
discovery has underdone rapid growth. Most of these tools are designed to
address a specific drug discovery task. In addition to the need to learn
multiple software packages with very different user interfaces, transferring
data between the various applications can be difficult or impossible. To address
these issues, we have developed an application integration framework that
interconnects a variety of third-party and in-house applications to support drug
discovery efforts. This allows users in a single application to perform a
variety of tasks and seamlessly transfer data from one to another. We will
present solutions developed to support lead identification and optimization
efforts which help discovery scientists identify analogs and optimize their ADME
properties using structure-based models.
CINF
104 : The APRILSTM (Automated Plate Re-Mapping and Integrated Library
Services) System: Using Open Source Tools to Solve Thorny Informatics Problems
Inexpensively
Manton R Frierson III1, Boliang
Lou2, and Shawn Beltz1. (1) Computational Chemistry and
Informatics, Advanced SynTech, LLC, Louisville, KY 40299, Fax: 561-258-5783,
m.frierson@advsyntech.com, (2) Deaprtment of Chemistry, Advanced SynTech, LLC
Abstract
Within many small
companies (and even large ones), the expense of proprietary software solutions
to cheminformatics problems can often be prohibitive. The "Open Source"
revolution has provided many tools to give highly functional and robust systems
on inexpensive hardware platforms. In our own organization, we have used an
Apache webserver in conjunction with the open source scripting languages Perl
and PHP to develop many tools accessible to both our informatics group and our
bench chemists for the purpose of constructing and manipulating the data of
their combinatorial library syntheses. This paper will discuss the construction
and capabilities of the APRILS system which integrates functions like plate
re-mapping (generating a variety of formats for different HTS instrumentation),
dispense lists for automated synthesizers, as well as filters for tracking
"drug-like" properties of proposed or newly synthesized libraries.
CINF 105 :
Homogenizing analytical data from multiple vendors into a unified
workspace
Antony John Williams, Scientific Development,
Advanced Chemistry Development, 90 Adelaide Street West, Suite 600, Toronto, ON
M5H 3V9, Canada, Fax: 416-368-5596, tony@acdlabs.com
Abstract
Today a plethora of
analytical techniques are used to characterize a particular chemical compound or
material as it migrates from research and discovery through scale-up to
manufacturing. These techniques include the multiple forms of spectroscopy and
chromatography, hyphenated techniques and other analytical techniques that
produce “curves” including electrochemistry and thermal analysis. The lifecycle
of any particular compound can originate with spectra to identify the structure,
chromatograms to separate the material and other technologies to characterize
its performance. To date it has not been possible to manage all this associated
analytical data, together with associated chemical structure information, in a
single unifying interface and the need for an integrated system for processing
and management of all associated data persists. This talk will provide an
overview of how to address the diverse needs in processing and data management
for multiple forms of analytical data and make the results available across an
enterprise.
CINF 106 :
Effective chemical information
Jonathan M
Goodman, Department of Chemistry, Cambridge University, Lensfield Road,
Cambridge CB2 1EW, United Kingdom, Fax: +44 1223 336362, jmg11@cam.ac.uk
Abstract
We have more
chemical information than we can handle well. How can we use it most
effectively? Databases are hard work to create and maintain, because: (i) they
need constant curation to keep up to date, (ii) the information within them
needs to be validated, (iii) a rationale for trusting it must be available, and
(iv) the information must be accessible. A series of information sources will be
presented, which break some of these rules, but remain useful. Most of the data
are available on the Cambridge Department of Chemistry web site
(http://www.ch.cam.ac.uk/MMRG/CIL/ ; http://www.ch.cam.ac.uk/c2k/ ;
http://www.ch.cam.ac.uk/magnus/ ; http://www.ch.cam.ac.uk/today/)
CINF 107 :
Snapshot of content, retrieval, and quality of some chemical information
systems
Dieter Rehm, Department of Chemical and
Pharmaceutical Sciences, Johann Wolfgang GOETHE University, Marie-Curie-Strasse
11, Frankfurt am Main D - 60439, Germany, Fax: ++49-69-798-29248,
REHM@chemie.uni-frankfurt.de
Abstract
Traditional primary
printed information is rapidly supplemented or will be replaced in future by
primary e-information. Secondary e-information systems make available access to
primary information by multidimensional retrieval profiles. Despite the
possibilties of full text searching and chemical compound searching by structure
precise procedures to excerpt and index the primary information (print and/or e)
can be improved on a programing level by taking into account the actual content
of the database to increase the precision of a search. Quality of the content as
well as the experience of persons determine finally the result of a retrieval
session. Examples for deficits in chemical information systems are given.
Necessary consequences are shown: To improve the quality of secondary
information is not only the obligation of producers but also of editors and -
last not least - the publishing scientists. This has likewise a feedback to the
education of students.
CINF 108 :
Battling the data avalanche – a chemical data management solution for the
smallcap company
Kevin K Turnbull, Advanced Chemistry
Development, 90 Adelaide St. W., Suite 702, Toronto, ON M5H 3V9, Canada, Fax:
416-368-5596, kevin@acdlabs.com
Abstract
The pharmaceutical
industry is well acquainted with the challenges of managing various forms of
chemical data across an organization. These challenges are augmented when
considering the plight of smallcap companies, whose monetary and human resources
are often severely out of sync with the volumes of chemical data they are
generating.
This talk will discuss the emergence of a novel database software system designed for standardizing and consolidating chemical information company-wide. The software integrates chemical structures with images, reaction diagrams, documents, and text in a manner that is customizable to the user, and thus is malleable to the specific data management needs of an organization. Databases that are built in this system are searchable by chemical structure, sub-structure, text, and other user-defined data fields. Such databases are easily accessible by all beneficiaries in the company, and can be connected to commercial tools for physical property prediction, chemical naming, and analytical data management (NMR, MS, IR, UV, HPLC, and GC).
CINF
109 : Command and control of the drug discovery factory: Putting chemists
in the driver's seat
David Hadfield, Chemistry, Spotfire,
212 Elm Street, Somerville, MA 02144
Abstract
The last decade has
seen an abundance of novel technologies, methodologies, and research content
coming into the domain of drug discovery. High throughput technologies have the
possibility of significantly improving the results of pharmaceutical research.
However - the results have not yet been shown. The output of novel products in
the market place has decreased rather than increased while these new
technologies have been implemented in current processes. Much of the blame for
this has been put on how research organizations have not been ready for dealing
with the data explosion from novel technologies. Researchers have had to deal
with 100x more data - in terms of number of compounds as well as in number of
properties. Novel visualization and analytic technologies have been successful
in battling this explosion - allowing researchers who otherwise would be
confined to spreadsheets to rapidly browse data searching for trends and
outliers. While these novel visualization and analytic technologies have had big
impact I will argue that to see real improvements in research productivity we
need to see a discontinuous change in how research organizations deal with data
and decision-making. Chemists need to be able to see their results in the
context of biology; biologists need to be able to see their results in the
context of chemistry, etc. Decisions need to be made cross functionally - taking
every aspect of chemistry and biology into consideration. Every decision need to
be continuously monitored and updated as new data becomes available. This is
easier said than done. As much as such decision-making indeed would be a
discontinuous change, a discontinuous change in software infrastructure for
decision-making will be needed to enable a change in methodology - and put
researchers in the driver's seat. I will outline a novel architecture for
analytical software for the world of drug discovery - building on previous
success in data visualization - and showing how integrated decision-making can
be made possible, though improvements at every level from the UI to the
database. The presentation will include architecture as well as user interface
issues - and discuss impact on pharmaceutical research.