![]()
CINF
1:
Dixel modeling of gene expression
N Sukumar1, Curt M. Breneman1, Kristin
P. Bennett2, Charles Lawrence3, and Inna Vitol3.
(1) Department of Chemistry, Rensselaer Polytechnic Institute, Cogswell
Laboratory, 110 8th Street, Troy, NY 12180-3590, Fax: 518-276-4045, nagams@rpi.edu,
brenec@rpi.edu, (2) Department of Mathematics, Rensselaer Polytechnic
Institute, (3) Wadsworth Center
Abstract
Sequence-specific binding of proteins to DNA is arguably the most important
foundation of cellular function, since it exerts fundamental control over
the abundance of virtually all cellular functional macromolecules.
Identification of promoter sequences and transcription factor binding sites
in the genome thus represents one of the grand challenges of the
post-genomic era. The most successful bioinformatics methods today are based
on models that represent DNA by sequences of letters (motif methods).
Unfortunately, the sequence data used for training and validation is quite
limited. Motif models are thus hampered both by small sample sizes and by an
abstract representation that has little to do with the energetics of
binding. It is here that cheminformatics can supply additional information
and introduce a more accurate and sensitive chemical representation of
DNA-protein interactions. Drawing upon our experience with E.coli
transcription factors and sigma factors, we show how characterization of DNA
through features of electron densities sampled on the vdW surfaces of the
major and minor grooves (“Dixels”) captures the effects of environmental
perturbations of neighboring base pairs, without requiring additional
sequence data for training.
![]()
CINF 2:
Integration of biological and chemical information: Faster decisions from
linked data and visualizations
Gavin M Fischer, Application Scientist, OmniViz Inc, 2 Clocktower
Place, Suite 600, Maynard, MA 01754, gfischer@omniviz.com
Abstract
Visualizations are the best way for people to understand data. Presenting
anyone with long lists of numbers rarely helps the understanding of the
data, never mind the interconnectedness within that data. This is even more
true when crossing between domains, such as between chemistry and biology.
Both sides understand, in theory if not practice, what the other is doing.
However, the lack of a common language between them necessitates new
approaches for integrating analysis; visualizations are a key to this. The
understanding of HTS data, with linked biologic pathways illustrating the
context in which the target is being tested, and microarrays showing how
responses map against the genome, allow for more rapid decisions. Both
chemists and biologists have analysis techniques that can, and should, aid
the others. I will show some examples of this integration working, and talk
about linking this with literature analysis to understand the BIG picture,
whilst not losing sight of the details on either side.
![]()
CINF 3:
The BioPrint®
pharmaco-informatics platform: A large profile database for the development
of relevant predictive models
Frédérique Barbosa, Molecular Modelling, Cerep, 128, rue Danton,
92500 Rueil Malmaison, France, Fax: 33 1 55 94 84 10, F.Barbosa@cerep.fr
Abstract
Linking biological and chemical information for use in computational
approaches in order to predict biological activity, ADME profiles and
adverse drug reactions (ADR) is critical for enhancing the drug discovery
process. However, modeling approaches have been hampered by the lack of
large, robust and standardized training datasets. In an extensive effort to
build such a dataset, the BioPrint® database is continuously constructed by
systematic profiling of drugs available on the market, as well as numerous
reference compounds (at present, BioPrint includes more than 2,200 compounds
and 172 different assays). The database is composed of several large
datasets: compound pharmacology profiles, and complementary clinical data
including therapeutic use information, pharmacokinetics profiles and ADR
profiles. These data have allowed the development of predictive QSPR and
QSAR models. Models based on chemical structure are strengthened by in vitro
results that can be used as additional compound descriptors to predict
complex in vivo endpoints.
![]()
CINF 4:
Keeping up with the
changing face of Medline and MeSH - 3 keys to improving searches
Soaring Bear, MeSH, NLM/NIH, 8600 Rockville Pike B2E17, Bethesda, MD
20894, Fax: 301-402-2002, bears@mail.nlm.nih.gov
Abstract
National Library of Medicine provides dozens of medical, chemical, sequence,
and structural databases which can all be searched at one time with the new
Entrez interface (http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi) The
information explosion requires prudent search strategies for quicker finding
of the data gems you are seeking in the growing haystack of science results.
Ambiguities of word meanings confound and frustrate. To help, the MeSH group
of the National Library of Medline is continually updating the terms and
concept structure of the MeSH indexing vocabulary (http://www.nlm.nih.gov/mesh/2003/MBrowser.html)
used for Medline (http://Pubmed.gov). Some recent examples of these changes
in biology and chemistry are described and how you can keep up with and use
these changes for better search results. Three easy steps to better Medline
searches will be presented by an NLM expert. A balance of widening (with OR
terms) and narrowing (with NOT terms) can be facilitated with three tools
provided by Pubmed: Details, Display Citation and Mesh Browser.
![]()
CINF 5:
Steric and
electronic requirements of enzyme reactions
Johann Gasteiger1, Martin Reitz1, and Oliver
Sacher2. (1) Computer-Chemie-Centrum and Institute of Organic
Chemistry, University of Erlangen-Nuremberg, Naegelsbachstr. 25, Erlangen
91052, Germany, Fax: +49-9131-85 26566, Gasteiger@chemie.uni-erlangen.de,
(2) Molecular Networks GmbH
Abstract
Genes express proteins, enzymes, that govern biochemical reactions. A more
detailed understanding of these reactions requires an analysis of how the
substrates fit into the enzymes and of the physicochemical effects
influencing the bond breaking and making in enzyme reactions. In order to
advance such studies we have built a database of biochemical pathways that
represents chemical structures and reactions on the atomic level giving
access to each atom and bond of the substrates of enzyme reactions. This
database allows the study of transition state hypotheses of enzyme
reactions. Furthermore, the analysis of the physicochemical effects
operating at the reaction site allows a classification of enzyme reactions
that goes beyond the traditional EC code for enzymes.
![]()
CINF 6:
Linking chemical
scaffolds to gene families to help elucidate molecular mechanisms
Chihae Yang1, Paul E. Blower1, Kevin Cross1,
Glenn Myatt1, Wolfgang Sadée2, and Ying Huang2.
(1) Leadscope, Inc, Columbus, OH 43212, Fax: 614-675-3732, cyang@leadscope.com,
(2) College of Medicine and Public Health, The Ohio State University
Abstract
The significant investment in “omics” technologies and large amount of
information generated by these new paradigms have not yet led to dramatic
productivity increases in the drug discovery process. Linking biology to
chemistry still remains the bottleneck. To link the vast amount of genomics
information to small molecule discovery, we previously correlated the gene
expression profiles of 60 NCI cancer cell lines to compound activity
patterns of the same cell lines, resulting in many possible gene-compound
pairs. In this paper, genes in specific biological process pathways were
correlated with active chemical scaffolds, whose associations were used to
build molecular hypotheses. Gene hierarchical classifications, based on
biological process, were used to differentiate gene expression patterns of
various cell types. The results from the gene hierarchy analysis are
compared to other computational methods for extracting subsets of
differentiating genes. This methodology allows us to extend our hypotheses
from individual gene-compound pair mappings to a systems approach of linking
gene families to compound scaffolds.
![]()
CINF 7:
Streamlining drug
discovery informatics: Accelerating the flow from gene to structure to
pre-clinical candidate
Dean R. Artis, Informatics, Plexxikon Inc, 91 Bolivar Drive,
Berkeley, CA 94710, Fax: (510) 548-4785, drartis@plexxikon.com
Abstract
Plexxikon’s Scaffold-Based Drug Discovery™ platform relies on a unique
combination of low-affinity biochemical screening of a proprietary
target-neutral compound library and structural characterization via
high-throughput x-ray crystallography, coupled to a powerful infrastructure
for computational analysis and design that bridges traditional
bioinformatics and cheminformatics. Use of these integrated systems has
resulted in the identification of many novel chemical starting points with
facile synthetic approaches and a target structure-directed optimization
path. This has enabled the efficient synthesis of lead compounds with
compelling bioactivity against proteins of interest in the kinase,
phosphodiesterase and nuclear receptor families. Examples highlighting the
role of Informatics approaches in Plexxikon’s efforts will be discussed,
including efforts leading to the rapid development of a new class of
anti-diabetic compounds with excellent potency, selectivity, pharmaceutical
properties and in vivo efficacy.
![]()
CINF 8:
Linking
bioinformatics to cheminformatics in biological networks
Barbara A. Eckman, Life Sciences, IBM, 1475 Phoenixville Pike, West
Chester, PA 19380, baeckman@us.ibm.com, and Julia E. Rice, IBM Almaden
Research Center
Abstract
As high-throughput biology generates large volumes of data about the
"parts list" of living organisms, the need grows for robust,
efficient systems to manage metabolic and signaling pathways, chemical
reaction networks, protein interaction networks, etc. Network data is
arguably best represented as graphs, which are not well supported by
standard relational database management systems. IBM Research is extending
DB2 with advanced graph operations, to support such queries as: "Find
all proteins related to protein A (i.e. within a given path length of A) in
a protein interaction graph, and retrieve related assay results and compound
structures.” “Find all pathways where compound x inhibits or slows a
reaction, and retrieve Gene Ontology classifications for all proteins
involved in the reaction.” “Find a subgraph of a large pathway that has
the same structure and involves the same enzyme as the subgraph that I have
circled, and retrieve associated protein and compound annotations.”
![]()
CINF 9:
Technical and
people disconnects hindering knowledge exchange between chemistry and
biology
Christopher A. Lipinski, Exploratory Medicinal Sciences, Pfizer
Global Research and Development, Groton Laboratories (retired), Eastern
Point Road, mail stop 8200-36, Groton, CT 06340, Fax: 860-715-3149,
christopher_a_lipinski@groton.pfizer.com
Abstract
Both technical and people factors hinder knowledge exchange between
chemistry and biology. For both disciplines software effort is expended on
data with little value. For example, capture and subsequent analysis of
large volumes of primary HTS data is difficult because of the very high
noise factor and hence is not very useful. Public access to primary
literature data is very different between the disciplines. Much of
searchable biology data is in the public domain while most of chemistry
structural data is not. Batch mode data searching is feasible in biology but
in chemistry batch mode searching capability is primitive. A problem exists
with chemistry needs for batch mode chemical structure searching capability,
for example with CAS SciFinder a leading software search tool. The time
course of data capture and the very different complexity levels of gene and
protein structure representation compared to chemical structure
representation contribute to this issue. On the people side, software lags
in capture of high level meta data, i.e. why decisions are made. Meta data
capture is complicated by people issues particularly those between chemists
and biologists. Discipline based disconnects occur distressingly often and
are frequently overlooked as a cause of lost productivity. Many of the
problems between chemists and biologists are directly traceable to
differences in training and hence in attitudes and outlook. Most synthetic
chemists are math averse and any type of communication to chemists relying
on mathematical equations will be under appreciated or even ignored.
Chemists are superb at pattern recognition but biologists are not. This
causes confusion and conflict with biology when a medicinal chemist makes a
judgment in just a few seconds as to the quality of a compound structure.
Expert systems that could capture the pattern recognition skills of
medicinal chemists are badly needed.
![]()
CINF 10:
Relating chemical
and biological space: An in-silico platform technology approach to
accelerate the discovery of novel medicinally relevant small molecules
Stephan C. Schürer, Director, Content Development, Sertanty, Inc,
1735 N. First Street, Suite 102, San Jose, CA 95112, Fax: 408 487 4011,
sschurer@sertanty.com
Abstract
In the post-genomic era of drug discovery, a promising approach appears to
be the systematic exploration of target families. It is critical in this
process to utilize all available and relevant SAR data and consider various
synthetic methodologies to most efficiently arrive at novel molecules that
have desired properties and are also amenable to further optimization.
Sertanty, Inc. has developed a discovery informatics platform – LUCIATM
– that facilitates archival, sharing, integration, and exploration of
synthetic methods and biological activity data. Using LUCIA, novel small
molecules can be generated in-silico and prioritized against computationally
efficient eScreensTM and ADMET models. eScreens are derived from
an integrated gene family-wide SAR knowledge base and can improve as new
experimental data is generated. Successful application of the technology has
resulted in the identification of novel ABL Kinase inhibitors in a four
month project and offers promise in both accelerating and enriching the
success-rate of collaborative hit identification and lead optimization. Our
next generation ChIP (Chemical Intelligence Platform) system explores
chemical space in-silico based on forward analysis of synthetic pathways.
Utilizing dynamic transforms that are generated from common representations
of chemical reactions, ChIP prospectively “mix-n-matches” compatible
synthetic strategies to generate novel compositions of matter with probable
improvements in potency, selectivity and ADMET profiles.
![]()
CINF 11:
Critical
assessment of chemo- and bio-informatics applications development, or,
"It's the infrastructure, stupid"
Doron Chema, Department of Medicinal Chemistry, Hebrew University of
Jerusalem, School of Pharmacy, Jerusalem 91120, Israel, doron_chema@md.huji.ac.il
Abstract
The increasing need for bridging chemo- and bio-informatics is an excellent
opportunity to reassess the development of applications in these fields and
the expected consequences of bridging together these disciplines.
Examination of the current situation may lead to the conclusion that both
fields currently suffer from a software crisis. This crisis involved several
aspects of the application developing process. The data format
standardization problem is a well-known aspect of this crisis, as many
similar files and databases formats co-exit, sharing similar goals. Another
aspect of this crisis may be called “too many tools for too small
missions.” It is a fact that even a modest project usually demands
developers to manage several code environments, which in turn were designed
and implemented with a specific scientific goal(s) in mind. Ironically, the
existence of many niche tools effectively causes the lack of appropriate
developing tools. This may end in many times in a situation that much of the
developing work is done from scratch, causing a huge waste of resources. It
is our belief that these major difficulties, which can be found in high
frequency in both fields are already causing major bottlenecks that have
even higher potential to block or delay any significant progress of the
integrated field. In this talk an approach for overcoming these barriers in
the infrastructure level will be described, followed by introduction of a
new infrastructure technology.
![]()
CINF 12:
Cross-discipline
analysis made possible with data pipelining
J.R. Tozer, SciTegic, Inc, 9665 Chesapeake Dr. #401, San Diego, CA
92123, Fax: 858 279 8804, jtozer@scitegic.com
Abstract
While cheminformatics and bioinformatics use completely different data
formats and analysis tools, the data pipelining approach makes is possible
to apply them together. Chemical compound structures and activities can be
processed in the same computing environment that analyzes gene expression
profiles or protein sequences. We will discuss some interesting research
questions that can only be addressed by the coordinated analysis in
bioinformatics and cheminformatics (e.g., clustering gene targets using the
correlation of their expression levels in a series of cells with the
biological activity on those cells of a set of test compounds).
![]()
CINF 13:
Informatics
integration at Arena Pharmaceuticals
Gareth Jones, Arena Pharmaceuticals, Inc, 6166 Nancy Ridge Drive, San
Diego, CA 92121, Fax: 8584537210, gjones@arenapharm.com
Abstract
The development of platform-independent web-based computing allows ordinary
users unprecedented access to corporate information. At Arena we have
developed a web-based informatics system that allows all employees access to
chemical, screening, genomic and gene-expression data. This system was
designed specifically to allow users with little or no computing experience
the ability to browse, analyze, update and edit chemical and biological
data. This results in real-time distribution of experimental data and allows
on the fly analysis and search of information. Additionally, communication
between disparate groups working on the same project has been greatly
facilitated.
The data system is based on a three-tier system with an Oracle database in the back-end. The middle tier comprises a web-server with perl CGI and Java programs. Extensive use has been made of Java applets on the client web-browser. A separate Linux cluster provides cheminformatics services to the middle tier, which are accessed using XML/RPC protocols.
![]()
CINF 14:
Systematic
bioactivity classification of ligands onto a protein target ontology:
Application for library design and virtual profiling of a compound
collection
Mark A. Hermsmeier1, Dora Schnur2, and Bradley
C. Pearce1. (1) New Leads Chemistry, Bristol-Myers Squibb, P.O.
Box 4000, Princeton, NJ 08543, Fax: 609-252-7446, (2) Compter Assisted Drug
Design, Bristol-Myers Squibb
Abstract
Profiling the in-silico biological content of our screening deck and the
ability to create target class libraries are greatly facilitated using a
data platform that integrates ligand databases and a protein target
ontology. The data platform that has been developed integrates the
non-proprietary Gene Ontology from the GO Consortium with three commercially
available Ligand databases. The structures in these ligand databases have in
turn been linked to the screening compounds by atom pairs similarity. The
activity associations and similarity results are stored in a relational
database for rapid retrieval of results. A web interface has been deployed
that allows browsing the Protein Target Ontology and drilling down to view
associated ligands in the commercial databases and similar structures in the
screening deck. The data platform also allows rapid in-silico profiling of
the screening compounds.
![]()
CINF 15:
Proteomica™ –
An integrated system for analysis of biological and chemical data
Michael Farnum1, Sergei Izrailev1, and Dimitris
Agrafiotis2. (1) 3-Dimensional Pharmaceuticals, Inc, 665 Stockton
Dr, Exton, PA, PA 19341, Fax: 610-458-8249, michael.farnum@3dp.com, (2)
Research Informatics, 3-Dimensional Pharmaceuticals, Inc
Abstract
In recent years, there has been an explosion of the amount of chemical and
genomic data. Chemical information has been driven by high-throughput
screening and analysis of large libraries of chemical compounds, both
physical and virtual, while genomic information has been generated through
full genome sequencing and annotation as well as by DNA microarray and other
high-throughput experiments. The number of protein crystal structures
deposited in the Protein Data Bank has also grown at an unprecedented rate.
Much effort has been made to relate the structure and properties of chemical
compounds to the structure and function of genes and proteins. However,
chemical and protein sequence information has been largely analyzed
separately, in part because very few databases and software packages provide
the connectivity required for analyzing and browsing the data
simultaneously. Proteomica™ is an architecture designed to integrate both
types of information. It is leveraged by advanced dimensionality reduction
techniques and provides the capability to visualize similarity in both the
property space of small molecules and the sequence space of target proteins.
Proteomica™ enables scientists to ask iterative questions about
biochemical experiments by combining information from external and in-house
sources. This presentation will demonstrate both the principles and
implementation of the system.
![]()
CINF 16:
Fedora: Federated
access to chemical and biological data
Scott Dixon, Vera Povolna, and David Weininger, Metaphorics, 441 Greg
Ave, Santa Fe, NM 87501, scott@metaphorics.com, vera@metaphorics.com
Abstract
Fedora is a technology which enables the rapid development of special
purpose HTTP servers designed for the analysis and integration of biological
and chemical information. These servers containing seemingly disparate data
can communicate with one another via a web browser and provide the
capability to mine data for complex relationships. The Fedora servers
include a metabolic pathway network (Empath), Protein-Ligand Association
Network (Planet), Traditional Chinese Medicines (TCM), the World Drug Index
(WDI), and others.
![]()
CINF 17:
Case study of IP
information management at a small pharmaceutical company
Susan Wollowitz, Wollowitz Associates, 455 Moraga Rd, Suite C,
Moraga, CA 94556, Fax: 925-247-1289, sue@wollowitz.com
Abstract
A case study will be presented of how a small pharmaceutical company
addressed their intellectual property information acquisition and document
management needs. The situation was initially evaluated including the demand
for IP creation and prosecution, the current capabilities and the
operational contraints. Issues identified were a need for an improved
document tracking system, better access to patent information and an ability
to proactively monitor the competitive landscape. The presentation will
discuss the options considered and selected as well as a retrospective
evaluation of the decision success.
![]()
CINF 18:
Low-income patent
management
John Santacruz, Division of Small Chemical Businesses, 1263 Fulton
Street, Rahway, NJ 07065, santacr2@aol.com
Abstract
Patent management on a low-income budget is a growing concern for Small
Chemical Businesses due to limited resources and multitasking of personnel.
Two methods of legal representation that significantly reduce the annual
costs of patent management will be discussed. The two methods will be
compared to the traditional method of private law firm representation. The
literature and laws in this area will be briefly reviewed.
![]()
CINF 19:
Minimizing
intellectual property cost - maximizing intellectual property return
Gianna Arnold, and Corinne Marie Pouliquen, Epstein Becker and Green,
1227 25th Street, NW, Suite 700, Washington, DC 20037-1175, Fax:
202-296-2882, garnold@ebglaw.com
Abstract
Today’s small business owner faces a vast array of decisions related to
the appropriate protection, utilization, and management of intellectual
assets. This discussion will focus on tools and strategies to maximize the
use of intellectual property dollars, by minimizing actual cost, and by
maximizing return. Topics addressed include establishing a scientific
advisory board; establishing process and screening criteria to
obtain/maintain patents; promoting and easing the burden of invention
disclosure; reducing costs associated with use of outside counsel;
capitalizing on intellectual property as a business asset; and aligning
intellectual property resources with corporate strategy.
![]()
CINF 20:
Patent searching
for small chemical businesses
Barbara Hurwitz, Barbara Hurwitz, consulting, 36 Waverly Street,
Portland, ME 04103, Fax: 207-228-6418
Abstract
Patent searches are run for small chemical companies either directly for the company or through the company’s outside counsel. Using three small businesses as case studies, we can see how interacting with these small companies differs from working with the staff of a large chemical and pharmaceutical company.
![]()
CINF 21:
Information
sources for small companies
Sandy Burcham, Service Is Our Business, Inc, 111 Lincoln Terrace,
Norristown, PA 19403-3317, Fax: 610-630-0863, cass123@earthlink.net
Abstract
This paper will discuss the various sources available to small companies -
in order to aid in the determination of the ways to best spend their
resources.
![]()
CINF 22:
Comparison of free
Internet-based intellectual property (IP) tools with contracting IP research
to third party information professionals
Michael I. Montembeau, and Gerri B. Potash, Nerac, Inc, 1 Technology
Drive, Tolland, CT 06084, Fax: 860-872-7856, mmontembeau@nerac.com
Abstract
Chemical businesses, whether large or small, have an enormous need for
intellectual property information. This need is particularly burdensome for
small chemical businesses which often cannot afford to hire full-time
information staff, let alone full-time patent information staff. As a
result, the small chemical businesses are left to appointing a lead IP
person, who must juggle their new IP duties with their research tasks and
other duties.
This presentation will: 1) outline the tools and capabilities of the free internet-based intellectual property resources, 2) compare the internet-based resources with those of a third-party information, such as Nerac.com; and 3) discuss the advantages and disadvantages of each resource and how one would make effective use of these resources.
This presentation will also describe how chemical businesses can benefit, not only from the Intellectual Property resources at Nerac, but also from the use of the extensive chemical and engineering related databases Nerac has compiled as a research and analysis tool.
![]()
CINF 23:
Professional tools
and services supporting the small to medium enterprise
Anthony J. Trippe, Science IP/Chemical Abstracts Service, 2540 Olentangy
River Rd., Columbus, OH 43210, atrippe@cas.org, and Rebecca A. Wolff,
Product Marketing Management, Chemical Abstracts Service, 2540 Olentangy
River Road, Columbus, OH 43202-1505, Fax: 614-461-7149, rwolff@cas.org
Abstract
Employees at small to medium enterprises must wear many different hats. With
each “hat” that they wear, they also strive to optimize their time,
present a professional image, and add value to their work. CAS provides a
number of tools and services that can assist the multi-hat wearer to not
only meet these needs, but to also meet the needs of both their internal and
external customers.
This presentation will explore how to use the latest STN software to:
1) take advantage of the patent content available on STN, 2) analyze the results to meet business critical needs, and 3) create professional-looking reports and tables.
For smaller organizations in particular, without the benefit of a sizable staff of information professonals, certain projects may require additional expertise or outside assistance to meet a critical deadline. For these situations, CAS has created Science IP, the CAS Search Service. This function is staffed with searching and analysis experts who can assist on a project by project basis. During this presentation, examples of searches with legal ramifications will be discussed and details will be provided on the advantages of working with Science IP on these types of requests.
![]()
CINF 24:
The Questel-Orbit
alternative for chemical information
Elliott Linder, Questel*Orbit, Inc, 7925 Jones Branch Drive, Fax:
703.873.4701, ELinder@questel.orbit.com, and Joseph M Terlizzi,
Questel-Orbit, 8000 Westpark Drive, jterlizzi@questel.orbit.com
Abstract
For over 25 years, Questel·Orbit has offered information specialists an
extensive collection of online patent databases containing chemical
information. For broad subject searching, the European, International, and
US classifications in our exclusive PlusPat database can be used, with easy
lookup using the ECLA and USPCL dictionary files. Narrower searching can be
conducted using the US, EP, and PCT full-text databases. For specific
chemical searching, our exclusive Merged Markush Service (MMS) for chemical
structure searching is available, as are codes and indexing in databases
produced by Derwent, IFI, CAS, INPI, and others. Special features allow the
creation of “super” display records composed of fields from any database
on the system. The standardization of patent numbers system-wide makes
cross-file searching for complementary information simple. Built-in
statistical analysis tools are easy-to-use and valuable for competitive
intelligence. This presentation will review how the techniques and features
outlined above are applicable for small chemical businesses.
![]()
CINF 25:
Instruments on the
Grid: UK national crystallography grid service
Jeremy G. Frey, Chemistry, University of Southampton, Department of
Chemistry, Highfield, Southampton SO17 1BJ, United Kingdom, Fax: +44 23 8059
3781, j.g.frey@soton.ac.uk
Abstract
We will describe the processes and infrastructure needed to develop and
deploy a grid service for access to and interaction with the UK EPRSC
National Crystallography Service (NCS) developed as part of the CombeChem
e-Science Pilot Project and with the assistance of the Centre of Excellence
in Combinatorial Chemistry, all largely based at the University of
Southampton. UK. Special consideration will be given to a discussion of the
sample tracking database and the implementation needed to run this national
service, the implications for the security of the service, and the system
employed to meet these requirements. The user interface, archiving methods
and notification systems will also be described along with the results of
the initial users experience.
![]()
CINF 26:
Computational
science and engineering online: A web-based grid-computing environment for
research and education in computational science and engineering
Thanh N. Truong, Department of Chemistry, University of Utah, 315 S,
1400 E, Room 2020, Salt Lake City, UT 84112, Fax: 801-581-4354, truong@chemistry.chem.utah.edu
Abstract
We present the development of an integrated extendable web-based simulation
environment called Computational Science and Engineering On-line (CSEO) that
allows computational scientists to perform research using state-of-the-art
tools, querying data from personal or public databases, discuss results with
colleagues, and access resources beyond those available locally from a web
browser. Currently, CSEO provides an integrated environment for multi-scale
modeling of complex reacting systems. A unique feature of CSEO is in its
framework that allows data to flow from one application to another in a
transparent manner. A particular example is demonstrated to show how results
from fundamental quantum chemistry simulations are used to calculate
thermodynamic and kinetic properties of a chemical reaction, which
subsequently are used in the simulation of a combustion reactor. Advantages,
disadvantages, and future prospects of a web-based simulation approach are
then discussed. CSEO can be accessed at http://cseo.net.
![]()
CINF 27:
Grid computing:
How applications are finally catching up to the technology
Chris Crafford, Engineering, United Devices, 12675 Research Blvd.,
Bldg. A, Austin, TX 78759, Fax: 512-331-6235, chris@ud.com, and Seetharamulu
Peddaiahgari, Director, Life Sciences Applications, United Devices
Abstract
The completion of the human genome has transformed drug discovery and
molecular targeting, vastly increasing the potential number of druggable
targets as well as information about their possible binding sites. Computer
power is essential to identifying and learning more about these targets.
With the appropriate grid solution, researchers can explore drug actions,
speed the development cycle and reduce costs, without sacrificing precision.
Several research organizations and top pharmaceutical companies are already
using the technology to gain a competitive edge. Multiple case studies will
be presented illustrating how researchers, with the help of top application
providers are using grid computing now to achieve success.
![]()
CINF 28:
Virtual screening
using grid computing
W Graham Richards, Central Chemistry Lab, University of Oxford, South
Parks Road, Oxford, OX1 3QH, United Kingdom, graham.richards@chem.ox.ac.uk
Abstract
The screen saver project currently involving the Chemistry Department at the
University of Oxford, United Devices Inc and Accelrys Inc now involves some
2.5 million PCs in over 220 countries and has provided more than 250,000
years of CPU time: an effective 100 teraflop facility. Such power permits
the virtual screening of billions of drug-like molecules against defined
protein targets within days or weeks. A review of the project and the
results obtained so far and future opportunities will be presented
![]()
CINF 29:
OpenMolGRID, a
Grid-based large-scale drug design system
Laszlo Urge1, Ákos Papp1, István Bágyi1,
Géza Ambrus2, and Ferenc Darvas1. (1) ComGenex Inc,
33-34 Bem rpk, Budapest, H-1027, Hungary, Fax: +361-214-2310, laszlo.urge@comgenex.hu,
(2) RecomGenex, Ltd
Abstract
Pharmaceutical companies are facing the challenges that modern drug
discovery requires precise "high-throughput" in silico systems
that are not only able to handle millions of structures, but can also give
accurate predictions for the requested properties. On the other hand,
mergers in the pharmaceutical industry demand the integration of
geographically distributed information and computation resources. These
challenges make indispensable the usage of GRID systems. As a consequence,
chemical applications developed for traditional environments have to be
redesigned to meet the requirements of this new technology. OpenMolGRID is
going to be one of the first realizations of the GRID technology in drug
design. The system is designed to build forward- and reverse-QSAR models,
and generate novel structures with favorable properties. The lecture details
the realization of implementing traditional chemical IT tools to solve
large-scale library design scenarios. The development of OpenMolGRID is
partly funded by the European Commission (IST-2001-37238).
![]()
CINF 30:
BioSimGRID: A
distributed database for biomolecular simulations
Jonathan W Essex1, Kaihsu Tai2, Stuart Murdock1,
Muan Hong Ng3, Bing Wu4, Steve Johnston3,
Hans Fangohr3, Paul Jeffreys4, Simon Cox3,
and Mark Sansom2. (1) School of Chemistry, University of
Southampton, Highfield, Southampton SO17 1BJ, United Kingdom, Fax: +44 (0)23
8059 3781, jwe1@soton.ac.uk, (2) Department of Biochemistry, University of
Oxford, (3) e-Science Centre, University of Southampton, (4) e-Science
Centre, University of Oxford
Abstract
Biomolecular simulations provide data on the conformational dynamics and
energetics of complex biomolecular systems. We aim to exploit the Grid
infrastructure developing in the UK to enable large scale analysis of the
results of such simulations. The BioSimGRID project (www.biosimgrid.org)
will provide a generic database for comparative analysis of simulations of
biomolecules of biological and pharmaceutical interest. The system will have
a service-oriented computing model using Grid-based Web service technology
to deliver analysis. Data mining services will be provided for the
biomolecular simulation and structural biology communities, using a Python
scripting environment. To address the security problem of the heterogeneous
BioSimGRID environment, a Grid certificate-based and a user/password-based
authentication mechanism will be integrated across the system. The back-end
of BioSimGRID is based on a relational database, with appropriate indexing
to optimize performance of the analysis package.
![]()
CINF 31:
Comb-e-Chem:
GRID-enabled chemical crystallography and a new opportunity for structural
chemistry
Michael B. Hursthouse, Department of Chemistry, University of
Southampton, Southampton SO17 1BJ, United Kingdom, Fax: 44-2380-596723,
M.B.Hursthouse@soton.ac.uk
Abstract
We are exploring the feasibility of an e-Science approach to provide an
integrated, GRID-enabled, Chemical Structure and Property Environment,
incorporating a co-ordinated high-throughput crystal structure determination
and property measurement capability, with distributed structure and property
calculations and data-base mining. We developing new software for automated
pattern searching in crystal structures, with a view to learning more about
crystal structure assembly, polymorphism and materials properties. In a
related E-Bank project, we are developing procedures for automated archiving
and dissemination of fundamental data, subsequent processing and
calculations, and the derived knowledge, so that publications in which the
new information can be assessed and presented, are not compromised by the
need to carry with it the data. This presentation will report and review the
status of these activities
![]()
CINF 32:
Semantic Grid
computing - the WorldWideMolecularMatrix
Yong Zhang1, Robert C. Glen2, Peter Murray-Rust3,
Henry S. Rzepa4, and Joe A Townsend2. (1) Unilever
Centre for Molecular Sciences Informatics, University of Cambridge,
Lensfield Road, Cambridge, United Kingdom, yz237@cam.ac.uk, (2) Department
of Chemistry, Unilever Centre for Molecular Science Informatics, (3)
Unilever Centre for Molecular Informatics, University of Cambridge, (4)
Chemistry, Imperial College
Abstract
The Semantic Web is Tim Berners-Lee's vision of knowledge-based computing for the Web. We have shown how this can be adapted to chemistry. Our implementation uses XML-CML for molecules and properties and the new IChI as a unique key calculated directly from the connection table. A molecule can be precisely differentiated from any other and retrieved by conventional database methods.
The NCI database has ca 250,000 molecules which we converted into CML using openbabel. These are stored in a native XML database, Xindice, and searched by the XPath language. We can retrieve molecules within 50 milliseconds.
Molecular properties were calculated using MOPAC2003, using Condor and the spare cpu time on 24 PCs. Times per molecule varied from 0.5 sec to 500,000 seconds; the calculations took 4 months.
The XML results are Openly available on our WorldWideMolecularMatrix, WWMM. A chemist submits a molecule. If its properties already exist they are returned; otherwise the computation is run. For new molecules the results are provided through a RSS system (CMLRSS).
The system is a peer2peer Grid for chemical information and computation. The software can be downloaded and we invite other groups to run servers with varied functions so a Semantic Grid for chemistry becomes possible.
We thank the DTI and Unilever PLC.
![]()
CINF 33:
Adaptive
informatics infrastructure for multi-scale chemical science
James D. Myers1, Larry Rahn2, David Leahy2,
Carmen M. Pancerella2, Gregor von Laszewski3, Branko
Ruscic4, and William H. Green Jr.5. (1) Collaboratory
Group Leader, Battelle / Pacific Northwest National Laboratory, Battelle
Blvd. MS K1-87, Richland, WA 99352, Fax: 509-375-6631, jim.myers@pnl.gov,
(2) Sandia National Laboratories, (3) Mathematics and Computer Science
Division, Argonne National Laboratory, (4) Chemistry Division, Argonne
National Laboratory, (5) Department of Chemical Engineering, Massachusetts
Institute of Technology
Abstract
The Collaboratory for Multi-scale Chemical Sciences (CMCS, cmcs.org) is
enabling the flow of information across physical scales and scientific
disciplines ranging from subatomic quantum chemistry to predictive
simulations of chemical processes such as combustion. CMCS is using advanced
collaboration and metadata-based data management technologies to develop a
portal providing distributed research support, community interactions, and
data discovery, management, and annotation capabilities. The portal assists
in documenting and browsing data pedigree and in communicating dependencies
between data produced at one scale and computations using it at the next. A
variety of standards-based mechanisms for extracting metadata from files,
translating between schema, converting data formats, and integrating
external applications (such as Active Thermochemical Tables) are being
developed to minimize the work required to adopt CMCS capabilities. These
capabilities are being piloted by involving key national chemistry resources
(data and software) and by supporting distributed groups performing
informatics-based chemical research in combustion science.
![]()
CINF 34:
The application of
distributed computing to computer simulations
Jonathan W Essex1, Christopher J. Woods1,
Adrian P. Willey1, Luca A. Fenu1, Andrew C. Good2,
Andrew R. Leach3, Richard A. Lewis4, and Jeremy G.
Frey1. (1) School of Chemistry, University of Southampton,
Highfield, Southampton SO17 1BJ, United Kingdom, Fax: +44 (0)23 8059 3781,
jwe1@soton.ac.uk, (2) Structural Biology and Modeling, Bristol-Myers Squibb,
(3) Computational Chemistry and Informatics, GlaxoSmithKline Research and
Development, (4) Lilly Research Centre
Abstract
Distributed computing is a very popular, and potentially very powerful,
approach for accessing large amounts of computational power. Under the
umbrella of the comb-e-chem project, we have examined both freely available,
and commercial distributed computing software. In this paper, our
experiences will be described. The performance of coarsely parallel
computations, such as protein-ligand docking, and more tightly coupled
replica-exchange molecular dynamics computer simulations will be assessed.
Issues of security will also be discussed, and in particular how security
determines the availability and utility of computers within a large
organisation.
![]()
CINF 35:
Virtual Research
Parks enable multi-organizational collaboration
Gary G Benesko, Life Sciences, IBM, 755 Cypress Rd., St. Augustine,
FL 32086, Fax: 419-735-6288, benesko@us.ibm.com
Abstract
A Virtual Research Park (VRP) is a secure, state-of-the-art, Web-based
research environment that supports and facilitates joint R&D,
collaboration, and commercial activities among Life Science Communities¨
whose boundaries extend beyond any one enterprise or geography. Each
Community can consist of multiple related organizations and individuals
united by common interests, such as
| Accelerating innovation using an advanced set of collaboration tools across an extended team | |
| Leveraging external expertise through Virtual Consulting services | |
| Streamlining the R&D process through access to Best Practice applications a wide range of data sources, and state-of-the-art R&D tools | |
| Organizing and managing common projects and common resources | |
| Sharing of common data and applications | |
| Leveraging external resources "On Demand" (e.g. compute grids, storage grids, external applications) | |
| Decreasing mutual costs via a common commercial platform with access to external suppliers and vendors of goods and services |
![]()
CINF 36:
Structure-activity
relationships for the design of molecules (STARDoM): The development and
implementation of grid-enabled, automated predictive QSAR modeling
Alexander Tropsha1, Scott Oloff2, Alexander
Golbraikh1, Chi-Duen Poon3, Terry O'Brien4,
Michael Blocksome4, Rich Dulaney4, Madhu Gombar4,
and Virinder Batra4. (1) Laboratory of Molecular Modeling, School
of Pharmacy, The University of North Carolina at Chapel Hill, 301 Beard
Hall, CB# 7360, UNC-CH, Chapel Hill, NC 27599, tropsha@email.unc.edu, (2)
Department of Pharmacology, University of North Carolina at Chapel Hill, (3)
Department of Chemistry, University of North Carolina, (4) IBM Life Sciences
Abstract
QSAR models are typically generated with a single modeling technique. Our
research has demonstrated that multiple models should be generated for any
dataset to ensure their statistical significance, and predictive power. We
have developed a combinatorial QSAR approach which explores all possible
combinations of various descriptor sets and optimization methods coupled
with external model validation. This approach required integration of
multiple individual protocols dealing with descriptor generation, model
development and validation, and model application to external database
mining to identify potentially active hits. The integration of the protocols
developed at UNC was achieved in collaboration with the IBM’s Life
Sciences team using the WebSphere framework and implemented on the North
Carolina BioGrid through a Globus Toolkit. This solution is automated,
efficient, and accessible to users via a web interface. It was successfully
applied to the discovery of novel anticonvulsant agents as well as novel
ligands of the P2Y12 receptor.
![]()
CINF 37:
Development of a
personal computing environment for molecular design on Grid
Umpei Nagashima1, Takeshi Nishikawa1, Satoshi
Sekiguchi1, Sumie Tajima2, Toru Yagi2,
Takeshi Kitayama2, and Makoto Haraguchi2. (1) Grid
Technology Research Center, National Institute of Advanced Industrial
Science and Technology, Umezono 1-1-1, Tsukuba, Japan, Fax: +81-29-861-5301,
u.nagashima@aist.go.jp, (2) Bestsystems Inc
Abstract
We are developing a personal computing environment for molecular design on
Grid as an attempt of computational chemistry on Grid environment. In this
talk, we introduce tow products: Molworks(http://www.molworks.com) and
Gaussian Portal. MolWorks supports molecular modeling, input data
generation, output analysis and Job controls of Molecular orbital
calculation on Grid. Property estimation function of molecules is also
supported. Gaussian Portal is an attempt to construct a framework for
Grid-enabled application service provider. These tow products are expected
to realize a desktop virtual laboratory for Chemists and achieve high
throughput by PC clusters, supercomputers and databases integration with
intelligent scheduler.
![]()
CINF 38:
Heterojunctions of
nanomaterials and organic-inorganic nanoassemblies
Cengiz S. Ozkan, Electrical and Chemical Engineering, Biomaterials
and Nanotechnology Laboratory, Center for Nanoscience Innovation for
Defense, University of California, Riverside, CA CA 92521, cengiz.ozkan@ucr.edu
Abstract
Nanomaterials including carbon nanotubes and nanocrystals have considerable
potential as building blocks in future nanoelectronics and
bio-nanotechnology applications. The unique electrical, mechanical, and
chemical properties of CNT’s have made them intensively studied materials
in the field of nanotechnology within the last decade. Nanocrystals or
quantum dots provide a remarkable opportunity for designing artificial
solids, since they possess unique and controllable physical and chemical
properties based on composition, structure and their size. Another heavily
investigated area includes the conjugation of inorganic nanomaterials with
biomolecules including DNA and protein for various applications in
bio-nanotechnology. In this talk, I will first describe approaches for the
synthesis of nano-assemblies of carbon nanotubes and quantum dots. Such
functional nanostructures could become better alternatives for the
fabrication of nanoscale electronic and photonic devices. They could also be
useful for the bottom-up assembly of nanosystems as part of larger or
microsystem technologies. Detailed chemical and physical characterization of
the nanostructures will be presented via transmission electron microscopy
and Fourier transform infrared spectroscopy. Next, approaches for
encaspulating biological molecules including DNA inside carbon nanotubes
which could be useful for a number of applications including novel
electronics, DNA sequencing and drug delivery systems will be presented.
DNA-oligo labeled with nano-colloid particles are encaspulated into
multiwalled carbon nanotubes and the nanoassemblies are characterized via
transmission electron microscopy and energy dispersive spectroscopy.
![]()
CINF 39:
Effects of the
presence of nanotubes on heat transfer in microfluidics
Nishitha Thummala, and Dimitrios V Papavassiliou, School of Chemical
Engineering and Materials Science, The University of Oklahoma, 100 E Boyd,
SEC T-335, Norman, OK 73019-1004, Fax: 405-325-5813, nishitha@ou.edu
Abstract
The drive for technical advancements in the micro/nano world, emerging from
the desire to manipulate flow fields at smaller and smaller scales, is
indeed challenging. An effective and reliable numerical tool for the
analysis of transport properties in microfluidics is the Lattice Boltzmann
Method (LBM). It can efficiently link the microscopic and macroscopic
phenomena. Our group is using LBM to simulate single-phase flow in
configurations like parallel plates, porous media. The paper will focus on
simulation of heat transport from surfaces that have nanotubes aligned
vertically as line sources or horizontally as point sources. Lagrangian
Scalar Tracking (LST) methods are used to track the trajectories of heat
particles released in the flow field, and to synthesize the behavior of the
mean temperature profile from the behavior of the instantaneous sources of
heat. The effect of the presence of nanotubes on the heat transfer
characteristics will be discussed.
![]()
CINF 40:
Computational
nanotechnology: Bridging lengthscales with Materials Studio
Amitesh Maiti, Gerhard Goldbeck-Wood, and Scott Kahn, Accelrys
Inc, 9685 Scranton Road, San Diego, CA 92121, Fax: 858-799-5100, amaiti@accelrys.com,
scott@accelrys.com
Abstract
Nanotechnology holds tremendous economic and scientific potential, yet it
will cost industry a considerable amount of time, money, and resources to
research and develop new processes, devices, and synthesis techniques. The
use of rational materials discovery software tools in conjunction with
experimentation can lower this barrier significantly, and lead to new
insights that may not be possible otherwise. Technologically important
nanomaterials come in all shapes and sizes. They can range from small
molecules to complex composites and mixtures. Depending upon the spatial
dimensions of the system and properties under investigation, computer
modeling of such materials can range from first-principles Quantum
Mechanics, to Forcefield-based Molecular Mechanics, to mesoscale simulation
methods, to the prediction of structure-property relationships. All of the
above computational techniques are available in Accelrys’ integrated PC
platform Materials StudioTM, as illustrated through a number of recent
applications: (1) carbon nanotubes (CNTs) as nano electromechanical sensors
(NEMS); (2) Metal-oxide nanoribbons as chemical sensors; (3) mesoscale
modeling of polymer-CNT nanocomposites; and (4) mesoscale diffusion of drug
molecules across cell membranes.
Another big challenge for the nanotechnologist is the very large space of possible material parameters and processing routes. Recent developments in Materials Informatics provide crucial knowledge management and data mining tools for better, cheaper and faster materials development. Design of Experiment, Combinatorial and High Throughput materials design software help to focus research and development on the most promising areas.
![]()
CINF 41:
Chemical
information resources for nanotechnology
Robert A Stembridge, Global Marketing Services, Thomson Scientific,
14 Great Queen Street, London, United Kingdom, bob.stembridge@thomson.com
Abstract
Nanotechnology is a young area dating back to Richard Feynman's intellectual
demonstration in 1959 of the possibility of placing a facsimile of the
entire Encyclopaedia Britannica on a pin-head. Much information is still in
the realm of research papers published in learned journals and on the web,
but increasingly practical applications of the technology are appearing in
the patent literature, particularly in the area of chemical nanotechnology.
This paper will illustrate these trends, examine the challenges for the user
of tracking multiple sources of this information and discuss possible
solutions to these problems.
![]()
CINF 42:
A method for
estimating the composite solubility vs. pH profile
Michael B. Bolger, Pharmaceutical Sciences, USC School of Pharmacy
and Simulations Plus, Inc, 1985 Zonal Ave. PSC 700, Los Angeles, CA 90089,
Fax: 323-442-1390, bolger@usc.edu, Christel Bergstrom, Department of
Pharmacy, Uppsala University, Robert Fraczkiewicz, Life Sciences Department,
Simulations Plus, Inc, and Per Artursson, Division of Pharmaceutics, Uppsala
University
Abstract
Purpose: To predict the shape of the composite solubility vs. pH
profile by using purely in silico estimation. Method: The
complete solubility vs. pH profile for 25 monobasic drug molecules was
collected and molecular descriptors were generated using QMPRPlus. We then
examined relationships between intrinsic solubility and several other
molecular descriptors to predict the solubility factor (ratio of solubility
for ionized over unionized). Results: A simple linear relationship
between intrinsic solubility and solubility factor showed that the
solubility factor is inversely proportional to the experimental value of
intrinsic solubility. We then developed a multiple linear regression
equation to predict log of solubility factor using intrinsic solubility and
number of hydrogen bond donors and acceptors as independent variables. Conclusions:
A relationship between log of intrinsic solubility and solubility factor,
when corrected for the number of hydrogen bond donors and acceptors can
provide a good estimate of salt solubility for a small set of monoprotic
basic drugs.
![]()
CINF 43:
A systematic name
generator module for Marvin
Szilveszter Juhos, Gyorgy Pirok, and Ferenc Csizmadia, ChemAxon Ltd,
Maramaros koz 3/a, 1037 Budapest, Hungary, Fax: +36 1 4532659, sjuhos@chemaxon.com
Abstract
Constructing systematic names for single molecules based on IUPAC rules can be rather time-consuming and requires chemists experienced in complex nomenclature. Naming a large number of structures manually is practically impossible so several automatic name generating software tools have been developed.
Our module is a platform-independent Java plugin linked to Marvin to facilitate generating IUPAC names for individual molecule sketches or for whole databases via batch processing. It can be easily integrated into other Java applications or applied over intranet/web pages. The throughput and accuracy of name generation will be demonstrated in the poster.
![]()
CINF 44:
Chemical
information in Medline/PubMed
Beryl M. Benjers, Index section, National Library of Medicine,
Bethesda, MD 20894, Fax: 301-402-2433, benjersb@mail.nlm.nih.gov
Abstract
MEDLINE contains more than 12 million citations from 1966 to present.
Pre-1966 citations are now being added in the OldMEDLINE. More than 4,500
journals in languages from around the world are indexed. Last year over
537,000 indexed citations were added to MEDLINE. Indexers analyze the
article and index at an average rate of four articles/hour, applying 8-10
subject terms from MeSH, NLM’s controlled vocabulary. New indexers attend
a rigorous two-week training course at NLM and then work closely with a
reviser, who reviews their work. An asterisk with a MeSH subject term
indicates the main point of an article, and that the article will be cited
under that term in Index Medicus, the print counterpart of MEDLINE. MEDLINE
citations and abstracts are available as the primary component of NLM’s
PubMed database and retrieval system, which is searchable free-of-charge via
the Internet.
MeSH contains 22,568 descriptors, of which 7,355 are chemical descriptors, supplemented by 138,526 chemical concepts (Supplementary Concept Records). New MeSH descriptors are added annually while Supplementary Concept Records are added daily as they are encountered in the indexed literature. New chemicals are electronically flagged for the chemical specialists, who study, research, update, and/or create new records as needed, and add them to the indexed citation and MeSH Browser. This allows MEDLINE citations to be indexed with the existing terms as well as the new ones.
MEDLINE indexing of chemical concepts includes coordination with a Pharmacological Action (PA) when appropriate. Indexing Information (II) terms may also be added with chemicals (e.g. disease/organism associated with a chemical).
The MeSH Browser is available at http://www.nlm.nih.gov/mesh/2004/MBrowser.html and can be searched by MeSH terms, Supplementary Concepts, ID, II, PA, RN, RR and EC numbers. MEDLINE/PubMed can be searched by MeSH terms, Supplementary concepts, authors, text words, journal, etc.
The National Library of Medicine (NLM) Home pages (http://www.nlm.nih.gov) offer information and links to other databases, such as MEDLINEplus and CHEMIDPlus.
![]()
CINF 45:
Conformational
folding process of a small-peptide predicted by using CONFLEX conformation
search and GRID technology
Hitoshi Goto1, Kazuo Ohta2, Umpei Nagashima3,
Yoshihiro Nakajima4, Mitsuhisa Sato4, and Hiroshi
Chuman5. (1) Department of Knowledge-based Information and
Engineering, Toyohashi University of Technology, Toyohashi 441-8055, Japan,
Fax: 81-532-48-5588, gotoh@cochem2.tutkie.tut.ac.jp, (2) Conflex
Corporation, (3) Grid Technology Research Center, National Institute of
Advanced Industrial Science and Technology, (4) Graduate School of Systems
& Information Engineering, University of Tsukuba, (5) Faculty of
Pharmaceutical Sciences, University of Tokushima
Abstract
Among the fundamental problems in elucidation of biomolecular functions with
the aid of theoretical and computational chemistry, the first difficulty to
overcome is the conformational flexibility problem, especially, related to
the folding problem of proteins. To resolve these challenging problems, we
have started on improvements of our original conformational space search
method gCONFLEXh using parallel computing and Grid techniques. In the
previous ACS meeting, we reported a master-and-worker parallelization and
GRID world-wide distributed computing techniques used in CONFLEX
conformation search algorithm, and those performances data of some small
peptides. In this Anaheim meeting, a folding process of a small polypeptide,
which is predicted by conformational analyses using a clustering technique
based on the conformational distance matrix among backbone conformations,
will be presented. Some interesting animations and movies are also
demonstrated.
![]()
CINF 46:
Combining
fingerprints and other descriptors in virtual HTS
Zsuzsanna Szabo, Miklos Vargyas, Ferenc Csizmadia, and Gyorgy Pirok,
ChemAxon Ltd, Maramaros koz 3/a, 1037 Budapest, Hungary, Fax:
+36-1-453-2659, , fcsiz@chemaxon.com
Abstract
Various aspects of virtual screening using molecular descriptors of 2-dimensional chemical structures have been investigated over the last two years at ChemAxon. The work involved the implementation of various descriptors and metrics as wellas the optimization of some of the parameters. The poster to be presented summarizes our results to date.
When setting up a virtual screening experiment, researchers are faced with the problem of choosing the right combination of the available descriptors. Additionally, some descriptors may allow several parameters which overall increases the degree of freedom dramatically. Finally, when comparing descriptor values one can choose from numerous dissimilarity metrics. To cope with this freedom of choice an automated optimization tool has been implemented.
This tool has proved to be successful in helping chemists to choose suitable descriptors, metrics and parameter values for virtual screening. It will be demonstrated that optimization can increase the enrichment ratio of the screening procedure.
![]()
CINF 47:
Drug discovery
using grid technologies and DrugML
Michiaki Hamada, Science and Technology Group, Fuji Research
Intstitute Corporation, Tokyo 101-8443, Japan, mhamada@star.fuji-ric.co.jp, Yuichiro
Inagaki, Science and Technology Group, Fuji Research Institute
Corporation, Tokyo 101-8443, Japan, yinagaki@star.fuji-ric.co.jp, Hitoshi
Goto, Toyohashi University of Technology, Umpei Nagashima, National
Institute of Advanced Industrial Science and Technology, Shigenori Tanaka,
Toshiba Research and Development Center, and Hiroshi Chuman, Tokushima
University
Abstract
A number of computer resources, such as CPUs and storages, can be connected
over networks to construct a huge virtual computing environment using grid
technologies. Our project "g-Drug Discovery" aims at developing a
platform for drug design using grid technologies, on which various analysis
and calculations are conducted, such as molecular mechanics method, replica
exchange method, docking with proteins, molecular orbital method, and
3-dimensional quantitative structure activity relationship. For storing data
of structures of compounds, descriptors, and calculation results, we are
making DrugML by extending CML. One can use these grid technologies with
DrugML in from rough screening with drug likeness or ADMET properties to
screening by very precise calculation.
![]()
CINF 48:
Investigation of
molecular chirality in 3D chemical structure databases
Zengjian Hu1, William M. Southerland1,
and Shaomeng Wang2. (1) Department of Biochemistry and Molecular
Biology, Howard University College of Medicine and the Howard University
Drug Discovery Unit, 520 West Street, Northwest, Room 324, Washington, DC
20059, huzengjian@hotmail.com, wsoutherland@howard.edu, (2) Departments of
Internal Medicine and Medicinal Chemistry, University of Michigan
Abstract
In recent years, virtual screening of chemical databases using molecular
docking has emerged as the most important tool and a well-established method
in drug discovery for finding new leads. The first step in virtual screening
is to create a searchable database of three-dimensional structures of small.
In the past few years, we have created 9 small molecule 3D searchable
databases which contain more than 1,000,000 molecular entries, and could be
used to discover interesting ligands for various pharmaceutical targets.
When production of 3D chemical databases for screening purposes, we found
that there is no information about absolute stereochemistry (R-S) and double
bond geometry (E-Z) of most compounds contained in the 2D chemical database
connection tables. Today more than 50% of marketed drugs are chiral. Chiral
drugs have become a major focus of most pharmaceutical companies, which are
safer, exhibit fewer side effects, and are more potent than the drugs
previously used. As chiral molecules will certainly play a role in the
exploitation of 3D space for the development of new drugs, the creation of a
3D database with the consideration of chirality of molecules will be
beneficial for the discovery of lead compound binding to molecular targets.
As the first step, we analyzed the chirality of molecules in our 10
three-dimensional databases. It was found that about 29% of the compounds in
these databases were chiral compounds with about 62% compounds in CGE
database being chiral compounds while only about 14% compounds in MCC
database have chirality. It could be seen that most chiral molecules in
these 3D databases have only one chiral center, but it is not rare for
compounds with more than 10 chiral centers. The maximum of chiral centers in
a molecule could be more than 60. It is well known that in general, if a
molecule has n chiral centers, there are 2n different possible stereoisomers.
Therefore, the entries in a 3D databases considering chirality will be
doubled for molecules with one chiral center if there is no any symmetry
elements in the molecule. The creation of th
![]()
CINF 49:
Molecular
modelling for organic chemists: A chemical informatics problem
Jonathan M Goodman, Unilever Centre for Molecular Science
Informatics, Cambridge University, Department of Chemistry, Lensfield Road,
Cambridge CB2 1EW, United Kingdom, Fax: +44 1223 336362, J.M.Goodman@ch.cam.ac.uk,
and María A. Silva, Unilever Centre for Molecular Science Informatics,
University of Cambridge
Abstract
Both molecular modelling and organic chemistry generate and use large
amounts of information, which should be mutually beneficial. However, it can
be difficult to persuade experimental organic chemists to use molecular
modelling, as force field methods cannot be applied to many transition
states and molecular orbital methods are too slow to calculate the behaviour
of many reactions before the experimental result makes the calculation of
less immediate interest. We use a combination of molecular mechanics and
molecular orbital methods in a ‘Chemical Information Laboratory’
(http://www.ch.cam.ac.uk/SGTL/gle/) in order to gain information of
experimental relevance quickly enough to be useful. For example, chemical
information has been generated about the molecules illustrated using this
process, so improving our knowledge of structure and reactivity.
![]()
CINF 50:
Chemical education
markup language: An XML namespace for educational chemistry software
Daniel C. Tofan, Department of Chemistry, State University of New
York, Stony Brook, NY 11794-3400, Fax: 631-632-7960, dtofan@mail.chem.sunysb.edu
Abstract
The Chemical Education Markup Language (ChEdML) is being developed as an XML
namespace to allow learning management systems to include chemical content.
ChEdML was initially intended to provide extensions to the current IMS
specifications for question and test item interoperability (QTI) XML
binding. Such extensions allow authors to create items containing responses
that use chemical symbolism. Examples include chemical reactions, electron
configurations, Lewis structures, measures with units etc. Tags were also
developed to format chemical information for display on web pages. A
complete XML tag set is now under development to encompass a full curriculum
of introductory chemistry. ChEdML also provides a mechanism to parameterize
items and to include equations to calculate numeric responses. This allows
the generation of item templates that can be instantiated at runtime with
appropriate parameters. A Java API is being developed to support the
generation and use of ChEdML.
![]()
CINF 51:
Oligopeptide
transporter (PepT1) homology model based on lactose permease (LacY)
Michael B. Bolger, Pharmaceutical Sciences, USC School of Pharmacy,
1985 Zonal Ave. PSC 700, Los Angeles, CA 90089, Fax: 323-442-1390, bolger@usc.edu
Abstract
Purpose. To build a homology model of the oligopeptide / proton
co-transporter PepT1 based on the crystal structure of bacterial lactose /
proton co-transporter. Methods. The centers of transmembrane spanning
domains (TMDs) in LacY plus the 22 amino acids that comprise each of the
twelve TMDs were selected. The software package “Proteotoolbox™” was
used to guide the threading of the sequence of PepT1 onto the 3D-structure
of LacY to allow for maximal overlap of the 2D and 3D hydrophobic moments.
Finally, the experimental results for site-directed mutagenesis were
examined in light of this new homology model to identify structural basis
for those results. Results. Site directed mutation results and
cysteine-scanning for TMD 5 and 7 were explained on the basis of the PepT1
model. The new model helps to explain the involvement of key histidine
residues in the proton translocation process. Conclusions. The new 3D
model extends and enhances our previous results (J. Pharm. Sci. 87(11):1286
1998) and provides additional insight into the structure and function of the
oligopeptide transporter.
![]()
CINF 52:
Multi-conformational
3D databases: Quality assessment and pharmacophore search capabilities in
MOE
Morten Langgaard, Berith Bjornholm, Anne Marie Munk Jorgensen, and
Klaus Gundertofte, Department of Computational Chemistry, H. Lundbeck A/S,
Ottiliavej 9, Dk 2500 Valby, Denmark, Fax: +45 3643 8237, mol@lundbeck.com
Abstract
In this study we report our experiences with the software solution MOE with
respect to building multi-conformational databases and performing
pharmacophore searches. Template pharmacophores derived from crystal
structures of known protein-ligand complexes as well as classically derived
pharmacophore models are used for the evaluation. Conformational coverage
and the quality of each conformation of the developed multi-conformational
3D databases are evaluated thoroughly. The analysis of the search results
focusses on hit rate, quality of hits, and the impact of pharmacophoric
element selections for the query. Practical issues like speed, storage and
management of databases are also addressed. The performance of MOE with
respect to the above-mentioned issues will be discussed and compared to the
more established method Catalyst.
![]()
CINF 53:
A combinatorial
DFT study of how cisplatin binds to purine bases
Leah Sandvoss, and Mu-Hyun Baik, Department of Chemistry, Indiana
University, 1200 Rolling Ridge Way #1311, Bloomington, IN 47403, lsandvos@indiana.edu
Abstract
Cisplatin (cis-diamminedichloroplatinum(II)) continues to attract much
attention because of its therapeutic importance as an anticancer drug. It
binds primarily to the N7 positions of adjacent guanine (G) sites in genomic
DNA, causing intrastrand cross-links, which suppress replication and lead
ultimately to cell death. Previous work showed both kinetic and
thermodynamic preference of G over adenine for the platination reaction. The
goal of this study is to obtain a chemically intuitive explanation for this
selective behavior of cisplatin by systematically comparing the electronic
structures of a diverse set of functionalized purine bases. A computational
combinatorial library of over 1500 purine derivatives was designed based on
density functional theory calculations and the changes of the most important
molecular orbitals as a function of structural variance were examined in
detail. This electronic profile for purine bases reveals how electronic hot
spots control the reactivity at the N7 position (see figure).
![]()
CINF 54:
Study of
selectivity from a pharmacophore perspective
Klaus Gundertofte, Berith Bjřrnholm, and Morten Langgĺrd,
Department of Computational Chemistry, H. Lundbeck A/S, Ottiliavej 9, Dk
2500 Valby, Denmark, kgu@lundbeck.com
Abstract
A number of pharmacophore models covering G protein-coupled receptors and
transporters primarily from the monoaminergic families of targets have been
developed. The general methodology will be described as well as performance
of different methods, e.g. MOE and Catalyst, applied in the development. In
order to elucidate selectivity issues across the targets studied, a
comparison of the models characterised by their pharmacophoric elements was
done. The analysis of the pharmacophore patterns revealed remarkable
resemblances or superpharmacophores. Distinct differences between the models
were also found. The impact of these findings in medicinal chemistry
projects will be discussed.
![]()
CINF 55:
Successful
shape-based virtual screening: The discovery of a potent inhibitor of the
type I TGFb receptor kinase (TbRI)
Juswinder Singh, and Claudio Chuaqui, Structural Informatics, Biogen,
12 Cambridge St., Cambridge, MA 02142, Fax: 6176792616, Juswinder_Singh@Biogen.com
Abstract
We describe the discovery, using shape-based virtual screening, of a potent,
ATP site-directed inhibitor of the TbRI kinase, an important and novel drug
target for fibrosis and cancer. The first detailed report of a TbRI kinase
small molecule co-complex confirms the predicted binding interactions of our
small molecule inhibitor, which stabilizes the inactive kinase conformation.
Our results validate shape-based screening as a powerful tool to discover
useful leads against a new drug target
![]()
CINF 56:
HypoRefine:
Automated identification of exclusion volumes in pharmacophore models
Allister J. Maynard, Marvin Waldman, and Jon Sutter, Accelrys, 9685
Scranton Rd., San Diego, CA 92121, Fax: 858 799 5100
Abstract
This presentation provides an overview of the HypoGen pharmacophore
generation algorithm. HypoGen is a ligand-based QSAR tool using
pharmacophoric overlap to predict activity.
A limitation of HypoGen is that activity prediction is based purely on the presence and arrangement of pharmacophoric features – steric effects are unaccounted for. A novel modification to HypoGen is described (HypoRefine). HypoRefine accounts for steric effects on activity, based on the targeted addition of excluded volume features to the pharmacophores. These excluded volumes attempt to penalize molecules occupying steric regions not occupied by active molecules.
Details of the steric detection and excluded volume addition algorithm are presented, along with some examples illustrating how excluded volumes improve the QSAR pharmacophore models.
![]()
CINF 57:
Automatic
generation of multiple pharmacophore hypotheses
Simon Cottrell1, Valerie J. Gillet1, and Robin
Taylor2. (1) University of Sheffield, Western Bank, Sheffield S10
2TN, United Kingdom, s.cottrell@sheffield.ac.uk, v.gillet@sheffield.ac.uk,
(2) Cambridge Crystallographic Data Centre
Abstract
Pharmacophore methods provide a way of establishing a structure-activity
relationship for a series of known active ligands. Often, there are several
plausible hypotheses that could explain the same set of ligands and in such
cases, it is important that the chemist is presented with alternatives that
can be tested with different synthetic compounds. Existing pharmacophore
methods involve either generating an ensemble of conformers and considering
each conformer of each ligand in turn or exploring conformational space
on-the-fly. The ensemble methods tend to produce a large number of
hypotheses and require considerable effort to analyse the results, whereas
methods that vary conformation on-the-fly typically generate a single
solution that represents one possible hypothesis even though several might
exist. We will describe a new method for generating multiple pharmacophore
hypotheses with full conformational flexibility being explored on-the-fly.
The method is based on multiobjective evolutionary algorithm techniques and
generates a manageable number of different yet plausible hypotheses.
![]()
CINF 58:
PepT1 substrate
transport pharmacophore determinants: Refinement with data from a single
consistent functional assay
Terry R Stouch1, Teresa Faria2, and Julita
Timoszyk2. (1) Computer-Assisted Drug Design, Bristol-Myers
Squibb Pharmaceutical Research Institute, MS H23-07, PO Box 4000, Princeton,
NJ 08543-4000, Fax: 609-252-6030, terry.stouch@bms.com, (2) Exploratory
Biopharmaceutics and Stability, Bristol-Myers Squibb, Pharmaceutical
Research Institute
Abstract
PepT1 is a primary intestinal transporter of di and tripeptides. It also
transports large quantities of important pharmaceuticals, such as beta-lactams
and ACE inhibitors. The ability to function as a substrate for this channel
can appreciably increase the absorption of drugs whose passive permeation
rates might be low or nill. Data was collected on a series of ligands using
recently developed single fluorescent function assay. The ligands were
specifically chosen to elucidate the important determinants of transport. A
wide range of different rates of transport was evidenced, even for
dipeptides. Coupled with conformational analysis and molecular overlays, a
fairly simple pharmacophore of five elements was developed that can be used
to retrieve known substrates.
![]()
CINF 59:
Structure and
information theory derived pharmacophores as pre- and post-filters for
docking
Kenneth E. Lind, Erik Evensen, Hans Purkey, Robert McDowell, and Erin
K. Bradley, Computational Sciences, Sunesis Pharmaceuticals Inc, 341 Oyster
Point Blvd., South San Francisco, CA 94080, klind@sunesis.com
Abstract
Screening virtual compound collections has been a valuable method for
finding starting points in the drug discovery process. This is often done
through structure-based docking or ligand-based pharmacophore searching.
These methods are more effective than random searching, but both have
inherent limitations. It would be useful to have methods that make optimal
use of both techniques to improve the selection of active molecules. In this
study we compare standard docking and pharmacophore search techniques to
methods that use different permutations to combine both methods, such as
docking as a pre-filter for a pharacophore search, or vice versa. The
methods are evaluated against CDK-2 for their ability to select known
inhibitors and their overall enrichment rates.
![]()
CINF 60:
A new method for
pharmacophore identification
S. Stanley Young, Jun Feng, and Ashish Sanil, National Institute of
Statistical Sciences, 19 T.W. Alexander Dr, Research Triangle Park, NC
27709, young@niss.org, feng@niss.org
Abstract
Abstract
The binding of a small molecule to a protein is inherently a 3D matching problem. As crystal structures are not available for most drug targets, there is a need to be able to infer key binding features and their disposition in space, the pharmacophore, from bioassay data. We use fingerprints of 3D features and a new approach to uncover the common pharmacophore for a set of compounds. We describe the algorithm and basic benchmarking. Knowing the 3D pharmacophore for a target should allow better data base searching and more efficient compound design.
![]()
CINF 61:
A 3DPL case study:
Finding new active molecules for the inhibition of calcineurin
Tad Hurst, Scientific Software, ChemNavigator, 6126 Nancy Ridge
Drive, Suite 117, San Diego, CA 92121, Fax: 858-625-2377, thurst@chemnavigator.com
Abstract
The 3DPL Database Docking system has been demonstrated to be effective at
extracting known active molecules from sets of inactive compounds in many
test cases. The 3DPL technology can dock structures into a receptor
structure at rate of up to 30/second, thus allowing in silico investigation
of millions of database structures. In this paper, we detail the application
of 3DPL to select from over 11 million chemical structures in the
ChemNavigator iResearch Library to find 25 screening candidates. Samples of
these 25 compounds were acquired and tested for calcineurin inhibition. Four
of the compounds were found to be micro-molar inhibitors. Three of these
compounds share a common core structure, and represent a new area for
possible lead development.
![]()
CINF 62:
Facilitating
virtual screening workflows: The PyFlexX/E/S/-Pharm and PyFTrees modules
Sally Ann Hindle1, Frank Sonnenburg1, Marcus
Gastreich2, and Christian Lemmen1. (1)
Chemoinformatics, BioSolveIT GmbH, An der Ziegelei 75, 53757 St. Augustin,
Germany, Sally.Hindle@biosolveit.de, (2) BioSolveIt GmbH
Abstract
Virtual screening usually requires several programs. This entails file
format conversions, conceptually superfluous I/O, manual selection of data,
consideration of interims-results and so on.
Python - a wide-spread, cross-platform, open-source and easy-to-read scripting language - allows for a wrapping of native C-applications in a Python layer, thus generating a modular world of applications which may easily be "plugged" together within a single Python script.
We have recently taken this step with our cheminformatics tools: FlexX/-E/C/-Pharm (docking), FlexS (small molecule alignment), and Feature Trees (similarity comparisons) may now be used within this scripting environment, sharing information instead of transferring it. An instant benefit is the availability of open-source Python packages for analysis and visualisation.
This concept drastically facilitates virtual screening experiments; moreover it allows for rapid prototyping of virtual screening protocols and parameter studies which shall be demonstrated in an application example.
![]()
CINF 63:
Fast Lead
Identification Protocol (FLIP) for structure based data mining using 3D
fingerprints
Amit, S Kulkarni, Scientific Services, Accelrys Inc, 9685 Scranton
Road, San Diego, CA 92121
Abstract
Structure based drug design is the method used to identify and optimize
pharmaceutical leads when the crystal, NMR structure or homology model of a
specific target protein is known. Virtual screening of corporate libraries,
external compound collections and virtual compounds using various docking
methods is routine in the drug discovery process. We are proposing a new
virtual high throughput screening approach that we term “FLIP” (Fast
Lead Identification Protocol) that uses the potential protein-ligand
interaction sites in the active site of the target protein to data-mine
compound collections. This proposed approach has the advantage of being
extremely fast and can potentially be used for any target protein structure
![]()
CINF 64:
Conformation
mining: Shrinking chemical space to find biologically-active molecules
Santosh Putta, Gregory A. Landrum, and Julie E. Penzotti, Rational
Discovery LLC, 555 Bryant St. #467, Palo Alto, CA 94301, sputta@rationaldiscovery.com
Abstract
Discovering the essential three-dimensional steric and chemical features
shared by active compounds is an important step in designing drug
candidates. However, the flexibility of actives often allows them to adopt
several low-energy conformations, some of which are not important for
biological activity. Conformational flexibility complicates the task of
finding important features by forcing a search through a conformational
space with dimensions that increase exponentially with the number of
actives. Model building approaches typically address this problem either by
using a small subset of conformations (e.g. most extended or lowest energy)
or by encoding all of a compound’s conformations in a single fingerprint.
The first approach may miss biologically-important conformations while the
second risks masking critical information available only from individual
conformations.
Here we explore techniques for efficiently mining the conformational space of multiple compounds. Our goal is to find a subset of biologically-important conformations and understand and exploit their commonalities.
![]()
CINF 65:
Hit-directed
nearest neighbor searching
Veerabahu Shanmugasundaram, Computer-Assisted Drug Discovery, Pfizer
Global Research & Development, 2800 Plymouth Road, Ann Arbor, MI 48105,
Fax: 734-622-2782, Veerabahu.Shanmugasundaram@pfizer.com, and Gerald M
Maggiora, Department of Pharmacology and Toxicology, University of Arizona
Abstract
Follow-up of initial hits resulting from HTS is crucial if the hits are
ultimately to give rise to useful lead compounds. Several approaches may be
employed to select compounds from the Research Compound Collection or from
commercially available collections for follow-up screening. Similarity
searching based upon the similarity of the molecular fragments possessed by
the molecules, yields compounds that are similar in structure to the hits.
Nearest-neighbor searching of BCUT Chemistry Space identifies compounds that
have similar BCUT values and hence similar electrostatic, hydrophobic and
hydrogen bonding properties. In contrast to molecular fingerprint based
similarity searching that looks for similar scaffolds in molecules, nearest
neighbor searching identifies isobiological molecular structures with
significantly different molecular scaffolds. Several examples illustrating
the application and the success of this methodology will be presented.
![]()
CINF 66:
AGENT: A program
generating tautomers for computer-aided drug design
Patrick Ballmer, Pavel Pospisil, Gerd Folkers, and Leonardo Scapozza,
Department of Chemistry and Applied Biosciences, Swiss Federal Institute of
Technology (ETH), Winterthurerstr. 190, 8057 Zurich, Switzerland, Fax:
01141-1-6356884, patrick.ballmer@ethz.ch
Abstract
Several cases documenting the impact of ligand tautomerism on protein-ligand
binding are described in the literature. AGENT has been developed to provide
a tool to study this phenomenon. AGENT can be used to create chemically
(energetically) reasonable tautomers of molecules stored in a 3D-input file.
The created tautomeric forms can be directly used for molecular docking
studies. The purpose of AGENT is thus to enrich a given small
molecule-database with tautomeric forms, which are not unlikely to be able
to exist in a protein active site. The number of tautomers created by AGENT
is restricted either by chemical rules or by a user-defined energy threshold
limiting the tolerated, semiempirically calculated Gibbs free energy of
tautomer formation.
![]()
CINF 67:
Accord enabling
technologies for developing data mining software applications
Shikha Varma, and Tim Aitken, Accelrys, Inc, 9685 Scranton Road, San
Diego, CA 92121, Fax: 858-799-5100, shikha@accelrys.com
Abstract
We present here examples to illustrate how a chemistry aware spreadsheet
with several developers toolkit components can be used to create customized
chemistry solutions for industrial and academic R&D. These chemistry
enabled components are very efficient in facilitating chemistry workflow in
a research environment. Accord chemistry components can be used to build
applications that include data generation, chemical calculations and
manipulations as well as data management according to given set of rules.
![]()
CINF 68:
Eli Lilly's
chemistry-focused approach to an ELN: A tale of two pilot projects
Mike Kopach1, Daniel Koch1, Keith
DeVries2, Jeffrey Christoffersen2, Will Prowse2,
and Chavali Balagopalakrishna2. (1) Chemical Process Research
& Development, Eli Lilly and Company, Eli Lilly and Company, Lilly
Corporate Center, Indianapolis, IN 46285, Fax: 317-276-4507, Kopach_Michael@lilly.com,
koch_daniel_j@lilly.com, (2) Eli Lilly & Company
Abstract
Lilly's rationale for pursuing an electronic laboratory notebook (ELN)
system differs little from that of other Pharma companies – namely
improving records quality, collaboration, productivity, and knowledge
management. However, to manage the project scope, Lilly's effort focused on
areas having the highest probable return on investment - areas where
experimental protocols, results and/or data were not currently captured
electronically. Based in part on this criterion, the chemistry functions
within the Discovery and Development organizations began exploring ELN
solutions. Two distinct approaches were employed in evaluating and selecting
potential vendors/tools in an attempt to address the differing tasks and
workflow processes of these organizations. An overview of Lilly's strategy
and, more specifically, the Discovery and Development efforts, will be
further elaborated.
![]()
CINF 69:
ELN development
and global deployment for Schering AG
Charles S Sodano, Information Services, Berlex Laboratories, 2600
Hilltop Drive, Richmond, CA 94804-0099, charlie_sodano@berlex.com
Abstract
Berlex Laboratories, a subsidiary of Schering AG launched an Electronic Lab
Notebook (ELN) system in 1998 that was adapted by all 250 Berlex discovery
researchers in 2001. In July of 2003, global implementation was completed
for 900 users(US, Europe and Japan). This hybrid system utilizes Microsoft
Word and Excel as authoring tools that communicate with Documentum
software(document management) via visual basic add-ins. The legal, archived
version of completed experiments is printed to paper where it is signed and
witnessed. There is not as yet significant case history in the US to support
e-records used in patent ligitation. The system has more than 40,000
completed experiments and is growing at a rate 2,000 a month.
![]()
CINF 70:
ELN perspectives:
from the multinational to the startup
Simon Coles, President & COO, Amphora Research Systems, PO Box
3940, Bracknell, Berkshire RG42 2XN, United Kingdom, simonc@amphora-research.com
Abstract
The ELN industry is developing quickly, with departmental and even
enterprise-scale deployments, bringing successes and some inevitable
failures. Recently, ELN technology has evolved to a point where ELNs are
practical for even small companies. In the process valuable new lessons have
been learned that apply to any ELN deployments, delivering significant
benefits to users and faster, more dependable increased ROI for the
organisation. Critical success factors include the size and scope of any
initial ELN deployment, the overall system architecture, and the early and
continued involvement of the user community.
Amphora principals have been working with ELNs since 1996, starting as the architect and project manager of a leading-edge, enterprise-wide ELN for Kodak. This work grew into Amphora Research Systems, which worked with several other multinationals before merging with PatentPad in 2003. In addition to enterprise-scale ELN solutions to large companies, Amphora delivers practical, affordable ELNs to smaller companies (e.g., Biotechs) and small R&D departments of larger companies.
These experiences delivering ELNs provide a unique overview into the issues involved and approaches required to deliver good value in ELN implementations. Drawing on experience of successes in both small & large organisations, and lessons from ELN projects that are struggling to meet their potential, the paper will examine the critical success factors for any ELN project and the best practices which can assure a thriving ELN deployment.
![]()
CINF 71:
GenSys' electronic
lab notebook and Collaborative R&D platform: Current status and progress
Prem Mohan, Product Development, Gensys Software, 2434 Main Street,
Santa Monica, CA 90405, Fax: 310-309-6715, pmohan@gensys.com
Abstract
The GenSys electronic lab notebook (GenSys/ELN(TM)) product is the
culmination of 4 years of design and development. The GenSys/ELN was
designed to be intuitive, easy to use, have functionality for multiple
laboratory workflows, be a scalable Collaborative R&D platform for
enterprise-wide, secure IP protection and integration with in house tools
and systems. Leading R&D, manufacturing, and testing organizations
recognize the requirements for Collaborative R&D platforms and ELNs and
are starting to use them in place of paper. GenSys has responded to this
demand by competing in numerous industrial and government-initiated RFPs and
has been selected as the ELN vendor of choice for a variety of
organizations. This presentation will discuss the carefully thought out
elements that comprise the latest version of the GenSys/ELN software and its
success as the solution of choice for large and small end user companies in
the market place. A progress update on current customer needs, deployment
projects, and future needs determined through deployment experience will
also be discussed.
![]()
CINF 72:
Integrated data
and knowledge management systems supporting real drug discovery processes:
Strategy, implementation, and examples
Trevor Heritage, Discover Informatics, Product Development, Tripos,
Inc, 1699 South Hanley Road, St. Louis, MO 63144, Fax: 314 647 9241, trevor@tripos.com
Abstract
The Pharmaceutical Industry is faced with an ever increasing number of
biological targets and must translate these targets into effective and
sustainable lead candidates. This requires that leads can be optimized to
incorporate the relevant drug-like characteristics so as to address
industry-wide high attrition rates. The success of this process depends on
using much more efficient lead generation and medicinal chemistry-based
optimization processes than are currently employed.
In order to meet these challenges and to increase inherent efficiency and so decrease attrition, an informatics driven discovery process has been implemented at Tripos Receptor Research Ltd., Tripos’ own chemistry research operation in the UK. There, the Tripos Electronic NotebookTM is being integrated with corporate information systems, including a proprietary LIMS and Tripos’ own reagent registration, inventory, and ordering system, ChemCoreRIOTM.
This informatics driven approach to chemistry concentrates on two key points, do-ability and desirability. Through innovative materials availability assessments and materials management, do-ability is assessed at the earliest stage of the process. Unique design technology is used to ensure the desirability and applicability of potential lead molecules. Applied at each stage of lead finding and lead optimization, the process enhances all aspects of decision-making and facilitates “learning” across different discovery projects.
This presentation will give the underlying concepts and strategies of implementing novel informatics solutions in drug discovery environments. Issues that arise as a consequence of this integrated system approach will be discussed along with lessons learned. This approach to informatics driven chemistry will be illustrated with other real-life examples.
![]()
CINF 73:
Electronic
laboratory notebooks: Lessons learned in three generations of development
Jorge Manrique, and Chris J. Ruggles, Professional Services Dept,
CambridgeSoft Corp, 100 CambridgePark Drive, Cambridge, MA 02140, Fax:
650-286-9931, jmanrique@cambridgesoft.com
Abstract
While the principal role of a classic lab notebook is to be the official
record of a researcher’s technical work, this is not the main reason why
scientists would consider electronic notebooks. The principal objectives of
an electronic lab notebook are capturing the findings of individuals, and
making those results readily available to the team and the organization.
Essentially, investigators want to remedy what they’ve perceived as a
fatal flaw of the classic lab notebook system: to avoid reinventing
something already accomplished, and to further leverage work already
performed. One direct way to succeed in this goal is to pool the information
developed, and devise a mechanism to share findings with those that need to
access them. This presentation will discuss legal, technical and regulatory
issues driving the successful implementation of an electronic notebook
system, the factors that impede and facilitate acceptance, and lessons
learned in three generations of development.
![]()
CINF 74:
Making tea: A
human-centered approach to designing a pervasive lab book
Jeremy G. Frey1, Gareth V. Hughes2, Hugo R.
Mills2, Terry R. Payne2, Monica m.c. Schraefel2,
and Graham M. Smith2. (1) Chemistry, University of Southampton,
Department of Chemistry, Highfield, Southampton SO17 1BJ, United Kingdom,
Fax: +44 23 8059 3781, j.g.frey@soton.ac.uk, (2) Electronics and Computer
Science, University of Southampton
Abstract
The eScience community in the UK seeks to provide access to experimental
data, annotations and provenance information, as this information is
captured. Currently, most of that data is recorded manually into a
paper-based lab book. Previous efforts, both commercial and research based
to translate the lab book into digital form have struggled for widespread
acceptance. We present both an overview of the elicitation/design process
and the resulting model and services we developed to create a useable lab
book replacement for a pervasive chemistry lab. We present the approach we
developed, the prototype we designed based on our technique, and we present
the results of a formative study of the artefact in real use. We show that
our design elicitation method strongly contributed to the success of our
prototype’s take up. The positive results take us one step closer to the
eScience goal of "Publish at Source".
![]()
CINF 75:
Moving to
next-generation informatics for collaborative eR&D
Rich Lysakowski Jr., Executive Director / Chief Science and
Technology Officer, Collaborative Electronic Notebook Systems Association,
800 West Cummings Park, Suite 5400, Woburn, MA 01801, Fax: 781-935-3113,
rich@censa.org
Abstract
Next-generation informatics for Collaborative eR&D and advanced R&D
knowledge management (KM) will take advantage of most, if not all, tools in
the arsenals for R&D information capture, management, processing,
reporting, sharing, and reporting. By definition the next-generation has not
yet arrived. To discuss moving to future software systems requires
projecting future needs onto the generation of products being developed by
suppliers and end users now. Collaborative eR&D and advanced R&D KM
are maturing from conceptual processes into repeatable, automatable business
process knowledge. However, most R&D businesses design and operate
Collaborative eR&D and advanced R&D KM processes differently from
one another. They use different arsenals of tools, different
infrastructures, and even different corporate standards; how these processes
and technologies differ is what provides R&D companies with significant
competitive advantages in the marketplace. So questions always arise. If
these processes, infrastructures, and tools are all different, to what
extent can software suppliers help end users reach their process automation
and informatics goals? What new classes of informatics and automation tools
are needed? What modifications to existing tools are needed? What new
standards (formal and defacto) are needed to build or apply these tools?
What new infrastructure components are process enablers for Collaborative eR&D
and advanced R&D KM? This paper will discuss and provide answers to
these and related questions.
![]()
CINF 76:
Electronic
Notebooks: An interface component for semantic records systems
James D. Myers1, Michael Peterson1, K Prasad
Saripalli1, and Tara Talbott2. (1) Mathematics and
Computational Science, Battelle / Pacific Northwest National Laboratory,
Battelle Blvd. MS K1-87, Richland, WA 99352, Fax: 509-375-6631, jim.myers@pnl.gov,
(2) Mathematics and Computational Science, Battelle / Pacific Northwest
National Laboratory, George Fox University
Abstract
Stand-alone electronic notebooks are limited in their ability to interact
with other producers, curators, and consumers of annotations such as
workflow and data provenance mechanisms, digital libraries, and autonomous
feature-detection agents. The Scientific Annotation Middleware (SAM) project
is developing a new generation of semantic middleware providing capabilities
to view, query, translate, and extend the corpus of metadata generated by
multiple applications, environments, and agents to provide integrated data
discovery, annotation, provenance tracking, and records capabilities. A
notebook services layer being developed within the project allows this
information to be viewed and manipulated from within an electronic notebook
interface. An initial integration of the SAM software and open source
Electronic Laboratory Notebook (ELN) highlights the potential of this
approach and demonstrates how e-notebooks will be able to evolve to support
a richer composite research record while scaling to support increasing
experiment complexity.
![]()
CINF 77:
Capturing
chemistry in XML
Joe A Townsend, Department of Chemistry, Unilever Centre for
Molecular Science Informatics, University of Cambridge, Lensfield Road,
Cambridge CB2 1EW, United Kingdom, jat45@cam.ac.uk, and Peter Murray-Rust,
Unilever Centre for Molecular Informatics, Cambridge University, UK
Abstract
Chemical Markup Language (CML) is an XML-conformant Schema that describes
molecules, spectra, reactions, and computational chemistry. It is capable of
capturing the chemistry in a variety of current publications and is becoming
adopted by many organizations.
We have developed tools for batch conversion of current chemical documents such as primary journal publications and theses into conformant CML. The parser reads many text and molecular formats and extracts chemical concepts into CML that are combined to give a single XML file.
The process works well for methodology and analytical data in organic synthesis. The results are stored in an XML database where they can be queried on molecular identity and numeric quantities.
Parsers can also capture the output of computational chemistry to extract essentially all of the information in the logfile. XML stylesheets can then be used to filter and display the results in an interactive manner.
![]()
CINF 78:
OpenScienceAlliance
and expML - opening the road to data interchange and archiving for
Collaborative eR&D for science and technology
Rich Lysakowski Jr., Executive Director / Chief Science and
Technology Officer, Collaborative Electronic Notebook Systems Association,
800 West Cummings Park, Suite 5400, Woburn, MA 01801, Fax: 781-935-3113,
rich@censa.org
Abstract
In late 2003, CENSA initiated a new XML standards program called "The
Open Science Alliance(TM)" with a purpose to create defacto standards
for lab informatics, a new discipline we call "datakeeping" (i.e.,
applying archival science principles and practices to data). Many companies
decided as a group that traditional voluntary consensus standards processes
for lab informatics systems are badly broken and that they need this fast
track process. Our approach creates defacto standards similar to what the
Portable Document Format (PDF) defacto standard has done advance
high-quality publishing and the Internet. However, our targets are
scientific, engineering, technical, medical, and other technical data. We
know now how to circumvent vendor in-fighting, end users not wanting or
knowing how to standardize, plus the lack of a good rapid consensus process
that works. We are creating new technologies and tools, education for end
users and suppliers, and innovative policies for buyers to follow. The first
standard is called expML(TM) for enabling any software that produces
experiment data and records with the "Scientific Method" to
interchange data and interoperate with other vendors' electronic notebooks,
LIMS, CDMS, SDMS, instruments, data processing tools, etc. Our design
includes integrating extremely diverse and rapidly changing XML standards
from the many concerned communities in science, engineering, product
development, and elsewhere. The design for research includes rapid evolution
of the standards per se so R&D and innovation are accelerated by
standardizing. This talk will explain the OpenScienceAlliance(TM), the
expML(TM), the progress of the OpenScienceAlliance(TM), and how people can
join to help accelerate its progress.
![]()
CINF 79:
Building
spreadsheet or database software for GLP/GMP applications? Why you care
about FDA's new guidance
Jay S. Kunin, Symbion Research International, Inc. & U.C. San
Diego, San Diego, CA 92130
Abstract
Chemical scientists engaged in laboratory or manufacturing activities as
part of drug development remain subject to 21 CFR Part 11, which defines the
FDA’s criteria for accepting electronic records and signatures. While the
September 2003 “Scope and Application” guidance reduces the number of
systems that must meet the requirements, and promises “enforcement
discretion” for some sections of the regulation, it continues to require
substantive compliance. Most computer-based systems developed, maintained,
or managed for GLP(& GMP) application must be compliant, especially in
the areas of security, training, SOPs, validation, documentation, and
electronic signatures. These requirements can be particularly troublesome in
relation to spreadsheet and other locally-developed software, which should
be built and deployed following a rigorous methodology and operated
according to SOPs. In all cases, a documented risk assessment should now be
part of any procedures to design, use, change or validate software for these
applications.
![]()
CINF 80:
How the software
vendor can assist with compliant-ready products or "Why should you care
about structural validation?"
Virginia L. Corbin, Business Development, Waters Corporation, 34
Maple Street, Milford, MA 01757, Fax: 508-482-2773, ginni_L_corbin@waters.com
Abstract
Assuring government agencies that data supporting the production of
regulated products meet quality and consistency requirements is a time
consuming and expensive challenge. How are you positioned to meet these
challenges with FDA’s new “Risk Based” approach to CGMPs?
Compliance is something that cannot be bought. It can be achieved however, through the use of compliant-ready solutions together with owner-managed Standard Operating Procedures (SOP’s).
Compliant-ready solutions are systems and software that have a documented system development life cycle (SDLC) and all additional tools and services needed to assist you in meeting your compliance requirements. Ensuring that these solutions are developed in this manner is also part of compliance. We will discuss what your vendors can do to assist you in meeting the newest challenges for compliance from the FDA.
![]()
CINF 81:
Strategies for
long term archiving of electronic records
Charles S Sodano, Information Services, Berlex Laboratories, 2600
Hilltop Drive, Richmond, CA 94804-0099, charlie_sodano@berlex.com
Abstract
Almost all research and development data, documents and records today are
being authored or generated electronically. However, the archive media of
choice is still paper with a disaster recovery copy as microfilm for most
operations. Laboratory and research records need to be retained somewhere
between a few years to greater than 40 years depending on the importance of
the information to a company’s business. As we move more into electronic
drug applications, patent submissions and e-business transactions there will
be an increased emphasis on long-term storage of electronic records.
Possible strategies will be described that organizations can adapt right now
to assure a smooth transition into electronic record archiving.
![]()
CINF 82:
Information
sources for chemical engineering students
Ann D. Bolek, Science-Technology Library, The University of Akron,
Akron, OH 44325-3907, Fax: 330-972-7033, bolek@uakron.edu
Abstract
Chemistry and chemical engineering students need to find preparations,
properties, reactions, spectra, and safety information for their compounds.
Chemistry students usually need this information for the laboratory, whereas
chemical engineering students usually need this information for pilot plant
and industrial size applications. Chemical engineering students also need to
find additional information, too, such as vapor-liquid equilibria and other
thermodynamic data, process flow diagrams, bulk chemical prices, market
share and other types of business, economic, and marketing information. This
paper will give examples of where chemical engineering students can find the
information suitable for their particular needs, such as in databases,
encyclopedias, handbooks, periodicals, and patents.
![]()
CINF 83:
Using chemical
reaction, supplier, and literature information to meet your process and
engineering chemistry needs
Eva M. Hedrick, Robert C. Dana, and Linda S. Toler, Synthetic and
Polymer Chemistry, Chemical Abstracts Service, 2540 Olentangy River Road,
Columbus, OH 43202-1505, Fax: 614-447-5471, ehedrick@cas.org
Abstract
With access to more than seven million single and multi-step reactions,
coupled with the more than six million commercially available chemicals, and
millions of journal and patent references in the area of chemical
engineering and industrial chemistry, SciFinder provides a unique resource
for process chemists. In just a few short steps process chemists can locate
reaction improvements covered in recent patents, suppliers for key starting
materials, or the latest chemical engineering research.
![]()
CINF 84:
Impact of quantity
and quality of critical property data on model reliability: Essential
information for process simulation applications
Xinjian Yan, Qian Dong, and Michael Frenkel, Thermodynamics Research
Center, National Institute of Standards and Technology, 325 Broadway,
Boulder, CO 80305, xjyan@boulder.nist.gov
Abstract
The modeling of physicochemical property data plays a central role in
chemical process simulation; as a result, the capability of process
simulators depends heavily on the reliability and applicability of
predictive models. A recent call from industry for more accurate physical
property data and robust predictive models demonstrated an urgent need for
reliable data and models that impact many industrial applications spanning
from process design to product design in a global competition environment.
Nonetheless, there are two major problems in developing reliable
thermodynamic models: (1) the quantity of, and (2) the quality of the data
set used for fitting model parameters, which, consequently, have a direct
impact on the reliability of models. The deficiency of the model is
demonstrated when it is used to predict properties of compounds that are not
involved in developing the model. Limitations in the quantity and quality of
the data set may not be easily resolved by model developers. Nevertheless,
the analysis, report and understanding of these effects of the problems on
the model applicability are crucial issues in guiding the different data
applications for industrial and chemical engineering.
This presentation focuses on a systematic analysis of the applicability and reliability of models for the critical volume (Vc), the MP (Marrero/Pardillo) model in particular, as well as the evaluation of the quantity and quality of the Data Bank supplied in the "The Properties of Gases & Liquids", Fourth Edition, 1987, on which MP was developed. The critical temperature (Tc) and critical pressure (Pc) are also briefly discussed. The reference data sources of experimental critical properties selected for the investigation are the IUPAC Project on Critical Compilation of Vapor Liquid Critical Properties and the NIST/TRC SOURCE Data System. This study reveals a fundamental problem in developing reliable models, and the results may serve as guidance for model developers and industrial users in their selection and application of the predictive models for critical properties
![]()
CINF 85:
Chemical and
industrial engineering information in uniquely focused petrochemical
databases
John Hack, Research Support Services / Information Research &
Analysis, ExxonMobil Research & Engineering Company, 1545 Route 22 East,
Annandale, NJ 08801, Fax: 908-730-3230, jhack@exxonmobil.com
Abstract
While both basic and applied research are exhaustively covered in sources
like Chemical Abstracts and Beilstein, chemical engineering and industrial
information is often difficult to extract. Simple categorization like
"Unit Operations" is insufficient to meet the needs of chemical
engineers who typically require detailed studies from a wide array of
technologies like materials, analytics, environment, and kinetics. To fill
this information gap, the Ei EncompassLit and Ei EncompassPat databases cut
neatly across the chemical literature to focus on chemical engineering and
related disciplines. Since they are produced by Elsevier Engineering
Information for the petrochemical industries, these databases derive their
value in providing information that has pre-determined industrial relevance.
This talk will focus on the types of information covered in these databases and sophisticated hierarchical indexing that is designed to identify engineering and chemical substance information on very broad or precise levels.
![]()
CINF 86:
Discovering hidden
value in physicochemical property databases
Qian Dong, Xinjian Yan, and Michael Frenkel, Thermodynamics Research
Center, National Institute of Standards and Technology, 325 Broadway,
Boulder, CO 80305-3328, qdong@boulder.nist.gov
Abstract
With the innovation and success of modern commercial simulators at the
chemical engineer's desktop in recent decades, the further requirements that
chemical engineers have placed on physicochemical data and data quality are
becoming increasingly demanding. The industrial interest now is focused on
requiring highly reliable physical property data and quality-related
information, along with robust predictive models developed and validated
upon such data and information. These changes in industrial needs have
clearly set the directions for physicochemical property research and
database development. Nevertheless, traditionally, the major role of
physicochemical property databases was merely to provide easier and faster
availability of needed data, the majority of which were supplied neither
with uncertainty-related information nor with analytical information
assisting evaluation and modeling of the physicochemical property data.
Thus, it is time for some dramatic changes in developing physicochemical
databases, evolving from the objectives, design, functionalities, and
applications of databases to meet these ever-changing needs.
As a first step in this direction, critical properties are selected from more than 100 properties from the NIST/TRC Source experimental data system as a subject for analyzing and extracting information on data quality, quantity, and measurement technology, and industrial requirements as well, for their significant impact upon other properties of industrial interest. An attempt is made to discover hidden value in the experimental data of critical properties collected through the period 1822 to 2003. Such information includes distribution of compounds and compound classes experimentally studied, quality analysis on uncertainty and related information, status of duplicate measurements, availability of measurements over evolution of data uncertainty with time and advances in measurement technology, problems and progress in measurement technology, etc. The recommended values of critical properties with assigned uncertainty are a fundamental block that is used in data quality analysis. Combining such data and the discovered information, this new functionality of physicochemical databases shows great potential to provide guidance of experimental data analysis to industry, and to help scientists and chemical engineers focus on evaluating and developing the data and models of their interest, which, at the same time, will greatly enhance the capability of process simulators.
![]()
CINF 87:
Thermochemical
database for industrial high-temperature applications
Mark D. Allendorf1, Ida M. B. Nielsen1,
Michelle L. Medlin1, Theodore M. Besmann2, and Carl F.
Melius3. (1) Combustion Research Facility, Sandia National
Laboratories, Mail Stop 9052, Livermore, CA 94551-0969, Fax: 925-294-2276,
mdallen@sandia.gov, (2) Metals and Ceramics Division, Oak Ridge National
Laboratory, (3) Lawrence Livermore National Laboratory
Abstract
Thermochemical data (heats of formation, entropies, and heat capacities) are
essential for modeling the high-temperature chemistry occurring in many
industrial processes, including chemical vapor deposition, combustion,
corrosion, and catalysis. Although accurate data are usually available for
traditional combustion environments, they are often lacking for systems
involving heavier (i.e., non-first-row main group) elements and
noncrystalline condensed phases (e.g., glasses and solutions). Since
experimental efforts to measure these data are rare today, often the only
recourse is to obtain them by computational methods. In this presentation,
we will describe a new on-line database in which thermochemical information
for molecular and condensed-phase systems is made available in a practical
format for modeling. Molecular thermochemistry is obtained from ab initio
electronic structure calculations. New methods under development to predict
heats of formation for compounds containing transition metals and fourth-row
main group compounds will be discussed. The current database contains data
for roughly 750 molecules involving the elements H, B, C, N, O, F, Al, Si,
and Cl. Condensed-phase data for variable composition liquids and glasses
are derived by application of the associate-species model, which accounts
for the non-ideal behavior of these systems. Data applicable to oxide
systems involving the elements Na, Al, Cr, Mn, Ni, B, and Si are currently
available. All data are available* in the form of polynomial fits as a
function of temperature that can be imported into standard equilibrium and
reacting-flow codes. In addition, extensive information resulting from the
electronic-structure calculations is provided.
*See web site at www.ca.sandia.gov/HiTempThermo/index.html
![]()
CINF 88:
Infotherm: The new
thermodynamics database on the web
Jost T. Bohlen, FIZ CHEMIE Berlin, Franklinstrasse 11, D-10587
Berlin, Germany, Fax: +49-30-39977135, bohlen@fiz-chemie.de
Abstract
The quality Infotherm database delivers data on pure compounds and on
mixtures, such as PVT properties, phase equilibria, transport and surface
properties, calorific properties, solid-liquid equilibria. The database
currently contains property data for more than 6,300 pure compounds and more
than 23,000 mixtures. Each piece of data can be accessed via the chemical
name, the trivial name, a formula or via the CAS Registry Number in only a
few seconds. The data originates from journal articles, handbooks and data
collections which are evaluated by FIZ CHEMIE Berlin and which cover the
time-period from 1985 until the present. The database is updated monthly.
![]()
CINF 89:
Chemical patent
information needs in industrial and engineering chemistry
Donald Walter, Customer Training, Thomson Derwent, 1725 Duke Street
Suite 250, Alexandria, VA 22314, Don.Walter@DerwentUS.com
Abstract
Patents are an incredibly rich source of chemical information, since so much
chemical technology appears only in patents and nowhere else. The language
of chemistry poses unique challenges for those who would articulate chemical
information through patents. Furthermore, once the information is obtained,
understanding the language of patents is another challenge still. We will
review some of these issues, explore how they are resolved today, as well as
in the past. We will also make special mention of the dialect of polymers.
We will review some of these issues and explore how they are resolved today,
as well as in the past, for the particular needs of the Industrial and
Engineering chemist.
![]()
CINF 90:
CAS environment
for environmental information
Jan Williams, CAS, 2540 Olentangy River Road, Columbus, OH 43221,
Fax: 614-447-5470, jrw88@cas.org
Abstract
With its broad coverage of chemistry and the related sciences, CAS provides
a unique foundation for environmental information retrieval across multiple
databases. Substances, physical properties, patents, journal articles,
regulatory data, purchasing details, business news, and more can be obtained
from the CAS databases on STN and STN Easy. Although the importance of the
published literature is well known, health and safety concerns call for
maximizing its value by utilizing citations, thesauri, and other content and
functionality capabilities to ensure that “no stone is left unturned.”
In addition, older information is sometimes neglected despite its potential
to provide key details unavailable elsewhere. This presentation will use
polychlorinated biphenyls as a case study in preparing a strategy for
obtaining a cross section of the published resources, analyzing the output,
and reporting the results of the study.
![]()
CINF 91:
Computational
studies on the analysis of organic reactivity
Ingrid M Socorro1, Jonathan M Goodman1, and
Keith T Taylor2. (1) Unilever Centre for Molecular Science
Informatics, Cambridge University, Department of Chemistry, Lensfield Road,
Cambridge CB2 1EW, United Kingdom, Fax: +44 1223 336362, ims28@cam.ac.uk,
(2) MDL Information Systems
Abstract
Our work is focused on the development of computational tools for the study
of organic reactivity with the purpose of predicting and analysing organic
reactions. We are developing a reaction prediction program based at a first
stage on general knowledge of organic chemistry. The program uses Java and
also MDL Cheshire chemical scripting language. The system developed arrives
at its conclusions by application of a series of rules designed to consider
different features in molecules for the determination of reactivity. In this
way, the program makes decisions on primary aspects when considering a
reaction such as the determination of reaction sites or which bonds are to
be broken or made. Therefore, new reactivity should be found and analysed
when considering unprecedented reactions. It will also be possible to
predict and to analyse the reactivity of unknown reactions. An example of
what the program does is given in the figure below.
![]()
CINF 92:
Polar surface area
- problems and opportunities
James A. Platts, Department of Chemistry, Cardiff University, P.O.
Box 912, Cardiff, United Kingdom, Fax: 44-2920-874030, platts@cf.ac.uk, and
Robert A. Saunders, Dept of Chemistry, Cardiff University
Abstract
Modifications to the standard definition of polar surface area (PSA) are
reported. It is shown that increasing the flexibility of PSA-based models
can lead to some improvements in accuracy, but that these still fall well
short of previously published methods. To compete with these, PSA-based
descriptors are scaled according to the hydrogen bonding characteristics of
common functional groups. Introducing this scaling markedly improves the
accuracy in validation studies against octanol-water, chloroform-water, and
cyclohexane-water partition coefficients. The methods so developed are then
applied to a range of important industrial applications, including drug
transport, properties of "green solvents", and solvation and
partition of metal complexes.
![]()
CINF 93:
How can generic
reactions be specific? Virtual synthesis with "smart" reactions
Gyorgy Pirok, Nora Mate, Szilard Dorant, Miklos Vargyas, and Ferenc
Csizmadia, ChemAxon Ltd, Maramaros koz 3/a, 1037 Budapest, Hungary
Abstract
Virtual reactions based on generic reaction equations usually produce many chemically non-feasible products. ChemAxon's solution for this problem consists of three major components:
| standard expression syntax providing customizable rule base (Chemical Terms) | |
| open plugin system for an extensible framework of functions (pKa, logD, etc.) | |
| expression parser and evaluator |
We build a reusable organic reaction library by integrating generic reaction equations with reactivity and selectivity rules. The core reaction engine for the enumeration of "smart" reactions is Reactor. It generates reaction products considering the chemoselectivity, regioselectivity and stereoselectivity issues. Synthesizer evaluates an additional rule layer built into the synthesis definitions of combinatorial libraries to eliminate products outside of the interest areas.
The presentation gives an insight to the "smart" reaction technology and its effective use with some examples.
![]()
CINF 94:
Recursive
Partitioning, models and statistics: What can we extract from categorical
data?
Sean E O'Brien, and Marcel J. de Groot, Molecular Informatics,
Structure and Design, Pfizer Global Research and Development, Sandwich
Laboratories, Ramsgate Road, Sandwich, United Kingdom, obrien_se@sandwich.pfizer.com
Abstract
Recursive Partitioning (RP) is a multivariate data analysis technique
gaining increasing usage in chemo-informatics. It is designed to cope with
categorical data, compounds with multiple mechanism and many descriptor
types. RP enables fast derivation of decision trees for the prediction of
activities or properties and can provide readily interpretable results.
Here we review typical problems and scenarios one may encounter when utilising RP. We demonstrate how full analysis of a decision tree can extract useful information from what appears, at first, to be a poor model. This is illustrated with examples drawn from our practical experiences with standard RP and multiple-Y models (PUMP-RP). We show how RP is not only useful for modelling activities but also valuable as a tool to stimulate different ways of evaluating data.
![]()
CINF 95:
Research patterns
in the Earth System Science Department: An interdisciplinary geoscience
program at the University of California, Irvine
April M. Love, Science Library Reference, University of California,
P. O. Box 19557, Irvine, CA 92623-9557, Fax: 949-824-3114, amlove@uci.edu
Abstract
My analysis of references cited in the publications (from 1999 through 2003)
of the UCI Earth System Science faculty researchers will illustrate the
interdisciplinary nature of the research of this new department.
Additionally, this analysis will provide insights into the research habits
of the Earth System Science (ESS) faculty, including itemizing source
journals consulted. The results presented will demonstrate specialized
collection development experiences in a university library setting as well
as highlight current changes in information seeking habits and usage in the
geosciences. These changes not only have an impact on library users, but
also for those responsible for collection development in support of
research. Founded in 1990, the UC Irvine ESS Department took on the
"global change agenda" for both the research and teaching focus.
Incoming faculty members were hired in the atmospheric sciences,
geochemistry, terrestrial and aquatic ecology, oceanography, and hydrology.
The University of California, Irvine (UCI) was founded in 1965. In 1989-90, the School of Physical Sciences examined the possibility of establishing a geosciences program where, up until this time, there had been no geology program included in the UCI campus science curriculum. The Earth System Science Department has its roots in the atmospheric chemistry research of F. Sherwood Rowland's laboratory group in the Department of Chemistry. The focus of the proposed geosciences program was nontraditional and did not emphasize the usual "rock" geology. In 1990 Ralph Cicerone, a specialist in atmospheric chemistry and former director of the National Center for Atmospheric Research's Atmospheric Chemistry Division, joined the UCI faculty. With Dr. Cicerone came a change in the focus for the departmental curriculum; it took on the "global change agenda," and the founding faculty members were hired in the atmospheric sciences, geochemistry, terrestrial and aquatic ecology, oceanography and hydrology.
![]()
CINF 96:
Pharmaceutical
decision making using LeadDecisionTM
Barry J. Wythoff, Research and Development, Scientific Reasoning, 23B
Congress St, Newburyport, MA 01950
Abstract
The drug discovery process proceeds iteratively and discontinuously. At each
iteration, a fateful decision must be made which might be phrased as
″Out of the molecules that we can ? which should we ?″, wherein
the action denoted by the question mark might be ″order″,
″test″, ″synthesize″, ″carry forward″,
etc. This is a question then, of selecting among alternatives. Increasingly,
this selection must be made in the face of manifold dimensions that we wish
to optimize. Lead Decision™ is designed to aid the scientist in rapidly
accomplishing such selections using a combination of calculation,
visualization and interaction. The calculation methods that will be
described are adapted from economics, statistics, mathematics, artificial
intelligence and operations research.
![]()
CINF 97:
New ways to
integrate data and information
Carl S. Ewig, Life Sciences Solutions, IBM, Suite 300, 4660 La Jolla
Village Dr, San Diego, CA 92122, Fax: 858-587-4835
Abstract
Productivity in research, especially in the life sciences, depends strongly
on the efficient retrieval and integration of data and information from a
variety of sources and in a number of different forms. The most powerful
computational tools for performing the integration are federated data
sources and database engines, which integrate multiple, heterogeneous data
sources into a single virtual database. However recent developments have
taken the federation process well beyond simple virtual federated queries of
cheminformatics and bioinformatics data files, and now encompass several
additional ways of accessing information, including algorithms such as HMMER,
repositories such as accessed through Entrez, and data in an XML
representation. Libraries of user-defined functions allow pre-processing to
be performed, and ″extended search″ procedures allow obtaining
data from unstructured sources such as web sites. This talk summarizes
recent developments at IBM, and provides examples of their use with the DB2
Information Integrator in research applications.