Abstracts, 227th ACS National Meeting
Anaheim, CA, March 28-April 1, 2004

CINF 1:  
Dixel modeling of gene expression

N Sukumar1, Curt M. Breneman1, Kristin P. Bennett2, Charles Lawrence3, and Inna Vitol3. (1) Department of Chemistry, Rensselaer Polytechnic Institute, Cogswell Laboratory, 110 8th Street, Troy, NY 12180-3590, Fax: 518-276-4045, nagams@rpi.edu, brenec@rpi.edu, (2) Department of Mathematics, Rensselaer Polytechnic Institute, (3) Wadsworth Center

Abstract
Sequence-specific binding of proteins to DNA is arguably the most important foundation of cellular function, since it exerts fundamental control over the abundance of virtually all cellular functional macromolecules. Identification of promoter sequences and transcription factor binding sites in the genome thus represents one of the grand challenges of the post-genomic era. The most successful bioinformatics methods today are based on models that represent DNA by sequences of letters (motif methods). Unfortunately, the sequence data used for training and validation is quite limited. Motif models are thus hampered both by small sample sizes and by an abstract representation that has little to do with the energetics of binding. It is here that cheminformatics can supply additional information and introduce a more accurate and sensitive chemical representation of DNA-protein interactions. Drawing upon our experience with E.coli transcription factors and sigma factors, we show how characterization of DNA through features of electron densities sampled on the vdW surfaces of the major and minor grooves (“Dixels”) captures the effects of environmental perturbations of neighboring base pairs, without requiring additional sequence data for training.


CINF 2:  
Integration of biological and chemical information: Faster decisions from linked data and visualizations

Gavin M Fischer, Application Scientist, OmniViz Inc, 2 Clocktower Place, Suite 600, Maynard, MA 01754, gfischer@omniviz.com

Abstract
Visualizations are the best way for people to understand data. Presenting anyone with long lists of numbers rarely helps the understanding of the data, never mind the interconnectedness within that data. This is even more true when crossing between domains, such as between chemistry and biology. Both sides understand, in theory if not practice, what the other is doing. However, the lack of a common language between them necessitates new approaches for integrating analysis; visualizations are a key to this. The understanding of HTS data, with linked biologic pathways illustrating the context in which the target is being tested, and microarrays showing how responses map against the genome, allow for more rapid decisions. Both chemists and biologists have analysis techniques that can, and should, aid the others. I will show some examples of this integration working, and talk about linking this with literature analysis to understand the BIG picture, whilst not losing sight of the details on either side.


CINF 3:  
The BioPrint® pharmaco-informatics platform: A large profile database for the development of relevant predictive models

Frédérique Barbosa, Molecular Modelling, Cerep, 128, rue Danton, 92500 Rueil Malmaison, France, Fax: 33 1 55 94 84 10, F.Barbosa@cerep.fr

Abstract
Linking biological and chemical information for use in computational approaches in order to predict biological activity, ADME profiles and adverse drug reactions (ADR) is critical for enhancing the drug discovery process. However, modeling approaches have been hampered by the lack of large, robust and standardized training datasets. In an extensive effort to build such a dataset, the BioPrint® database is continuously constructed by systematic profiling of drugs available on the market, as well as numerous reference compounds (at present, BioPrint includes more than 2,200 compounds and 172 different assays). The database is composed of several large datasets: compound pharmacology profiles, and complementary clinical data including therapeutic use information, pharmacokinetics profiles and ADR profiles. These data have allowed the development of predictive QSPR and QSAR models. Models based on chemical structure are strengthened by in vitro results that can be used as additional compound descriptors to predict complex in vivo endpoints.


CINF 4:  
Keeping up with the changing face of Medline and MeSH - 3 keys to improving searches

Soaring Bear, MeSH, NLM/NIH, 8600 Rockville Pike B2E17, Bethesda, MD 20894, Fax: 301-402-2002, bears@mail.nlm.nih.gov

Abstract
National Library of Medicine provides dozens of medical, chemical, sequence, and structural databases which can all be searched at one time with the new Entrez interface (http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi) The information explosion requires prudent search strategies for quicker finding of the data gems you are seeking in the growing haystack of science results. Ambiguities of word meanings confound and frustrate. To help, the MeSH group of the National Library of Medline is continually updating the terms and concept structure of the MeSH indexing vocabulary (http://www.nlm.nih.gov/mesh/2003/MBrowser.html) used for Medline (http://Pubmed.gov). Some recent examples of these changes in biology and chemistry are described and how you can keep up with and use these changes for better search results. Three easy steps to better Medline searches will be presented by an NLM expert. A balance of widening (with OR terms) and narrowing (with NOT terms) can be facilitated with three tools provided by Pubmed: Details, Display Citation and Mesh Browser.


CINF 5:  
Steric and electronic requirements of enzyme reactions

Johann Gasteiger1, Martin Reitz1, and Oliver Sacher2. (1) Computer-Chemie-Centrum and Institute of Organic Chemistry, University of Erlangen-Nuremberg, Naegelsbachstr. 25, Erlangen 91052, Germany, Fax: +49-9131-85 26566, Gasteiger@chemie.uni-erlangen.de, (2) Molecular Networks GmbH

Abstract
Genes express proteins, enzymes, that govern biochemical reactions. A more detailed understanding of these reactions requires an analysis of how the substrates fit into the enzymes and of the physicochemical effects influencing the bond breaking and making in enzyme reactions. In order to advance such studies we have built a database of biochemical pathways that represents chemical structures and reactions on the atomic level giving access to each atom and bond of the substrates of enzyme reactions. This database allows the study of transition state hypotheses of enzyme reactions. Furthermore, the analysis of the physicochemical effects operating at the reaction site allows a classification of enzyme reactions that goes beyond the traditional EC code for enzymes.


CINF 6:  
Linking chemical scaffolds to gene families to help elucidate molecular mechanisms

Chihae Yang1, Paul E. Blower1, Kevin Cross1, Glenn Myatt1, Wolfgang Sadée2, and Ying Huang2. (1) Leadscope, Inc, Columbus, OH 43212, Fax: 614-675-3732, cyang@leadscope.com, (2) College of Medicine and Public Health, The Ohio State University

Abstract
The significant investment in “omics” technologies and large amount of information generated by these new paradigms have not yet led to dramatic productivity increases in the drug discovery process. Linking biology to chemistry still remains the bottleneck. To link the vast amount of genomics information to small molecule discovery, we previously correlated the gene expression profiles of 60 NCI cancer cell lines to compound activity patterns of the same cell lines, resulting in many possible gene-compound pairs. In this paper, genes in specific biological process pathways were correlated with active chemical scaffolds, whose associations were used to build molecular hypotheses. Gene hierarchical classifications, based on biological process, were used to differentiate gene expression patterns of various cell types. The results from the gene hierarchy analysis are compared to other computational methods for extracting subsets of differentiating genes. This methodology allows us to extend our hypotheses from individual gene-compound pair mappings to a systems approach of linking gene families to compound scaffolds.


CINF 7:  
Streamlining drug discovery informatics: Accelerating the flow from gene to structure to pre-clinical candidate

Dean R. Artis, Informatics, Plexxikon Inc, 91 Bolivar Drive, Berkeley, CA 94710, Fax: (510) 548-4785, drartis@plexxikon.com

Abstract
Plexxikon’s Scaffold-Based Drug Discovery™ platform relies on a unique combination of low-affinity biochemical screening of a proprietary target-neutral compound library and structural characterization via high-throughput x-ray crystallography, coupled to a powerful infrastructure for computational analysis and design that bridges traditional bioinformatics and cheminformatics. Use of these integrated systems has resulted in the identification of many novel chemical starting points with facile synthetic approaches and a target structure-directed optimization path. This has enabled the efficient synthesis of lead compounds with compelling bioactivity against proteins of interest in the kinase, phosphodiesterase and nuclear receptor families. Examples highlighting the role of Informatics approaches in Plexxikon’s efforts will be discussed, including efforts leading to the rapid development of a new class of anti-diabetic compounds with excellent potency, selectivity, pharmaceutical properties and in vivo efficacy.


CINF 8:  
Linking bioinformatics to cheminformatics in biological networks

Barbara A. Eckman, Life Sciences, IBM, 1475 Phoenixville Pike, West Chester, PA 19380, baeckman@us.ibm.com, and Julia E. Rice, IBM Almaden Research Center

Abstract
As high-throughput biology generates large volumes of data about the "parts list" of living organisms, the need grows for robust, efficient systems to manage metabolic and signaling pathways, chemical reaction networks, protein interaction networks, etc. Network data is arguably best represented as graphs, which are not well supported by standard relational database management systems. IBM Research is extending DB2 with advanced graph operations, to support such queries as: "Find all proteins related to protein A (i.e. within a given path length of A) in a protein interaction graph, and retrieve related assay results and compound structures.” “Find all pathways where compound x inhibits or slows a reaction, and retrieve Gene Ontology classifications for all proteins involved in the reaction.” “Find a subgraph of a large pathway that has the same structure and involves the same enzyme as the subgraph that I have circled, and retrieve associated protein and compound annotations.”


CINF 9:  
Technical and people disconnects hindering knowledge exchange between chemistry and biology

Christopher A. Lipinski, Exploratory Medicinal Sciences, Pfizer Global Research and Development, Groton Laboratories (retired), Eastern Point Road, mail stop 8200-36, Groton, CT 06340, Fax: 860-715-3149, christopher_a_lipinski@groton.pfizer.com

Abstract
Both technical and people factors hinder knowledge exchange between chemistry and biology. For both disciplines software effort is expended on data with little value. For example, capture and subsequent analysis of large volumes of primary HTS data is difficult because of the very high noise factor and hence is not very useful. Public access to primary literature data is very different between the disciplines. Much of searchable biology data is in the public domain while most of chemistry structural data is not. Batch mode data searching is feasible in biology but in chemistry batch mode searching capability is primitive. A problem exists with chemistry needs for batch mode chemical structure searching capability, for example with CAS SciFinder a leading software search tool. The time course of data capture and the very different complexity levels of gene and protein structure representation compared to chemical structure representation contribute to this issue. On the people side, software lags in capture of high level meta data, i.e. why decisions are made. Meta data capture is complicated by people issues particularly those between chemists and biologists. Discipline based disconnects occur distressingly often and are frequently overlooked as a cause of lost productivity. Many of the problems between chemists and biologists are directly traceable to differences in training and hence in attitudes and outlook. Most synthetic chemists are math averse and any type of communication to chemists relying on mathematical equations will be under appreciated or even ignored. Chemists are superb at pattern recognition but biologists are not. This causes confusion and conflict with biology when a medicinal chemist makes a judgment in just a few seconds as to the quality of a compound structure. Expert systems that could capture the pattern recognition skills of medicinal chemists are badly needed.


CINF 10:  
Relating chemical and biological space: An in-silico platform technology approach to accelerate the discovery of novel medicinally relevant small molecules

Stephan C. Schürer, Director, Content Development, Sertanty, Inc, 1735 N. First Street, Suite 102, San Jose, CA 95112, Fax: 408 487 4011, sschurer@sertanty.com

Abstract
In the post-genomic era of drug discovery, a promising approach appears to be the systematic exploration of target families. It is critical in this process to utilize all available and relevant SAR data and consider various synthetic methodologies to most efficiently arrive at novel molecules that have desired properties and are also amenable to further optimization. Sertanty, Inc. has developed a discovery informatics platform – LUCIATM – that facilitates archival, sharing, integration, and exploration of synthetic methods and biological activity data. Using LUCIA, novel small molecules can be generated in-silico and prioritized against computationally efficient eScreensTM and ADMET models. eScreens are derived from an integrated gene family-wide SAR knowledge base and can improve as new experimental data is generated. Successful application of the technology has resulted in the identification of novel ABL Kinase inhibitors in a four month project and offers promise in both accelerating and enriching the success-rate of collaborative hit identification and lead optimization. Our next generation ChIP (Chemical Intelligence Platform) system explores chemical space in-silico based on forward analysis of synthetic pathways. Utilizing dynamic transforms that are generated from common representations of chemical reactions, ChIP prospectively “mix-n-matches” compatible synthetic strategies to generate novel compositions of matter with probable improvements in potency, selectivity and ADMET profiles.


CINF 11:  
Critical assessment of chemo- and bio-informatics applications development, or, "It's the infrastructure, stupid"

Doron Chema, Department of Medicinal Chemistry, Hebrew University of Jerusalem, School of Pharmacy, Jerusalem 91120, Israel, doron_chema@md.huji.ac.il

Abstract
The increasing need for bridging chemo- and bio-informatics is an excellent opportunity to reassess the development of applications in these fields and the expected consequences of bridging together these disciplines. Examination of the current situation may lead to the conclusion that both fields currently suffer from a software crisis. This crisis involved several aspects of the application developing process. The data format standardization problem is a well-known aspect of this crisis, as many similar files and databases formats co-exit, sharing similar goals. Another aspect of this crisis may be called “too many tools for too small missions.” It is a fact that even a modest project usually demands developers to manage several code environments, which in turn were designed and implemented with a specific scientific goal(s) in mind. Ironically, the existence of many niche tools effectively causes the lack of appropriate developing tools. This may end in many times in a situation that much of the developing work is done from scratch, causing a huge waste of resources. It is our belief that these major difficulties, which can be found in high frequency in both fields are already causing major bottlenecks that have even higher potential to block or delay any significant progress of the integrated field. In this talk an approach for overcoming these barriers in the infrastructure level will be described, followed by introduction of a new infrastructure technology.


CINF 12:  
Cross-discipline analysis made possible with data pipelining

J.R. Tozer, SciTegic, Inc, 9665 Chesapeake Dr. #401, San Diego, CA 92123, Fax: 858 279 8804, jtozer@scitegic.com

Abstract
While cheminformatics and bioinformatics use completely different data formats and analysis tools, the data pipelining approach makes is possible to apply them together. Chemical compound structures and activities can be processed in the same computing environment that analyzes gene expression profiles or protein sequences. We will discuss some interesting research questions that can only be addressed by the coordinated analysis in bioinformatics and cheminformatics (e.g., clustering gene targets using the correlation of their expression levels in a series of cells with the biological activity on those cells of a set of test compounds).


CINF 13:  
Informatics integration at Arena Pharmaceuticals

Gareth Jones, Arena Pharmaceuticals, Inc, 6166 Nancy Ridge Drive, San Diego, CA 92121, Fax: 8584537210, gjones@arenapharm.com

Abstract
The development of platform-independent web-based computing allows ordinary users unprecedented access to corporate information. At Arena we have developed a web-based informatics system that allows all employees access to chemical, screening, genomic and gene-expression data. This system was designed specifically to allow users with little or no computing experience the ability to browse, analyze, update and edit chemical and biological data. This results in real-time distribution of experimental data and allows on the fly analysis and search of information. Additionally, communication between disparate groups working on the same project has been greatly facilitated.

The data system is based on a three-tier system with an Oracle database in the back-end. The middle tier comprises a web-server with perl CGI and Java programs. Extensive use has been made of Java applets on the client web-browser. A separate Linux cluster provides cheminformatics services to the middle tier, which are accessed using XML/RPC protocols.


CINF 14:  
Systematic bioactivity classification of ligands onto a protein target ontology: Application for library design and virtual profiling of a compound collection

Mark A. Hermsmeier1, Dora Schnur2, and Bradley C. Pearce1. (1) New Leads Chemistry, Bristol-Myers Squibb, P.O. Box 4000, Princeton, NJ 08543, Fax: 609-252-7446, (2) Compter Assisted Drug Design, Bristol-Myers Squibb

Abstract
Profiling the in-silico biological content of our screening deck and the ability to create target class libraries are greatly facilitated using a data platform that integrates ligand databases and a protein target ontology. The data platform that has been developed integrates the non-proprietary Gene Ontology from the GO Consortium with three commercially available Ligand databases. The structures in these ligand databases have in turn been linked to the screening compounds by atom pairs similarity. The activity associations and similarity results are stored in a relational database for rapid retrieval of results. A web interface has been deployed that allows browsing the Protein Target Ontology and drilling down to view associated ligands in the commercial databases and similar structures in the screening deck. The data platform also allows rapid in-silico profiling of the screening compounds.


CINF 15:  
Proteomica™ – An integrated system for analysis of biological and chemical data

Michael Farnum1, Sergei Izrailev1, and Dimitris Agrafiotis2. (1) 3-Dimensional Pharmaceuticals, Inc, 665 Stockton Dr, Exton, PA, PA 19341, Fax: 610-458-8249, michael.farnum@3dp.com, (2) Research Informatics, 3-Dimensional Pharmaceuticals, Inc

Abstract
In recent years, there has been an explosion of the amount of chemical and genomic data. Chemical information has been driven by high-throughput screening and analysis of large libraries of chemical compounds, both physical and virtual, while genomic information has been generated through full genome sequencing and annotation as well as by DNA microarray and other high-throughput experiments. The number of protein crystal structures deposited in the Protein Data Bank has also grown at an unprecedented rate. Much effort has been made to relate the structure and properties of chemical compounds to the structure and function of genes and proteins. However, chemical and protein sequence information has been largely analyzed separately, in part because very few databases and software packages provide the connectivity required for analyzing and browsing the data simultaneously. Proteomica™ is an architecture designed to integrate both types of information. It is leveraged by advanced dimensionality reduction techniques and provides the capability to visualize similarity in both the property space of small molecules and the sequence space of target proteins. Proteomica™ enables scientists to ask iterative questions about biochemical experiments by combining information from external and in-house sources. This presentation will demonstrate both the principles and implementation of the system.


CINF 16:  
Fedora: Federated access to chemical and biological data

Scott Dixon, Vera Povolna, and David Weininger, Metaphorics, 441 Greg Ave, Santa Fe, NM 87501, scott@metaphorics.com, vera@metaphorics.com

Abstract
Fedora is a technology which enables the rapid development of special purpose HTTP servers designed for the analysis and integration of biological and chemical information. These servers containing seemingly disparate data can communicate with one another via a web browser and provide the capability to mine data for complex relationships. The Fedora servers include a metabolic pathway network (Empath), Protein-Ligand Association Network (Planet), Traditional Chinese Medicines (TCM), the World Drug Index (WDI), and others.


CINF 17:  
Case study of IP information management at a small pharmaceutical company

Susan Wollowitz, Wollowitz Associates, 455 Moraga Rd, Suite C, Moraga, CA 94556, Fax: 925-247-1289, sue@wollowitz.com

Abstract
A case study will be presented of how a small pharmaceutical company addressed their intellectual property information acquisition and document management needs. The situation was initially evaluated including the demand for IP creation and prosecution, the current capabilities and the operational contraints. Issues identified were a need for an improved document tracking system, better access to patent information and an ability to proactively monitor the competitive landscape. The presentation will discuss the options considered and selected as well as a retrospective evaluation of the decision success.


CINF 18:  
Low-income patent management

John Santacruz, Division of Small Chemical Businesses, 1263 Fulton Street, Rahway, NJ 07065, santacr2@aol.com

Abstract
Patent management on a low-income budget is a growing concern for Small Chemical Businesses due to limited resources and multitasking of personnel. Two methods of legal representation that significantly reduce the annual costs of patent management will be discussed. The two methods will be compared to the traditional method of private law firm representation. The literature and laws in this area will be briefly reviewed.


CINF 19:  
Minimizing intellectual property cost - maximizing intellectual property return

Gianna Arnold, and Corinne Marie Pouliquen, Epstein Becker and Green, 1227 25th Street, NW, Suite 700, Washington, DC 20037-1175, Fax: 202-296-2882, garnold@ebglaw.com

Abstract
Today’s small business owner faces a vast array of decisions related to the appropriate protection, utilization, and management of intellectual assets. This discussion will focus on tools and strategies to maximize the use of intellectual property dollars, by minimizing actual cost, and by maximizing return. Topics addressed include establishing a scientific advisory board; establishing process and screening criteria to obtain/maintain patents; promoting and easing the burden of invention disclosure; reducing costs associated with use of outside counsel; capitalizing on intellectual property as a business asset; and aligning intellectual property resources with corporate strategy.


CINF 20:  
Patent searching for small chemical businesses

Barbara Hurwitz, Barbara Hurwitz, consulting, 36 Waverly Street, Portland, ME 04103, Fax: 207-228-6418

Abstract

Patent searches are run for small chemical companies either directly for the company or through the company’s outside counsel. Using three small businesses as case studies, we can see how interacting with these small companies differs from working with the staff of a large chemical and pharmaceutical company.


CINF 21:  
Information sources for small companies

Sandy Burcham, Service Is Our Business, Inc, 111 Lincoln Terrace, Norristown, PA 19403-3317, Fax: 610-630-0863, cass123@earthlink.net

Abstract
This paper will discuss the various sources available to small companies - in order to aid in the determination of the ways to best spend their resources.


CINF 22:  
Comparison of free Internet-based intellectual property (IP) tools with contracting IP research to third party information professionals

Michael I. Montembeau, and Gerri B. Potash, Nerac, Inc, 1 Technology Drive, Tolland, CT 06084, Fax: 860-872-7856, mmontembeau@nerac.com

Abstract
Chemical businesses, whether large or small, have an enormous need for intellectual property information. This need is particularly burdensome for small chemical businesses which often cannot afford to hire full-time information staff, let alone full-time patent information staff. As a result, the small chemical businesses are left to appointing a lead IP person, who must juggle their new IP duties with their research tasks and other duties.

This presentation will: 1) outline the tools and capabilities of the free internet-based intellectual property resources, 2) compare the internet-based resources with those of a third-party information, such as Nerac.com; and 3) discuss the advantages and disadvantages of each resource and how one would make effective use of these resources.

This presentation will also describe how chemical businesses can benefit, not only from the Intellectual Property resources at Nerac, but also from the use of the extensive chemical and engineering related databases Nerac has compiled as a research and analysis tool.


CINF 23:  
Professional tools and services supporting the small to medium enterprise

Anthony J. Trippe, Science IP/Chemical Abstracts Service, 2540 Olentangy River Rd., Columbus, OH 43210, atrippe@cas.org, and Rebecca A. Wolff, Product Marketing Management, Chemical Abstracts Service, 2540 Olentangy River Road, Columbus, OH 43202-1505, Fax: 614-461-7149, rwolff@cas.org

Abstract
Employees at small to medium enterprises must wear many different hats. With each “hat” that they wear, they also strive to optimize their time, present a professional image, and add value to their work. CAS provides a number of tools and services that can assist the multi-hat wearer to not only meet these needs, but to also meet the needs of both their internal and external customers.

This presentation will explore how to use the latest STN software to:

1) take advantage of the patent content available on STN, 2) analyze the results to meet business critical needs, and 3) create professional-looking reports and tables.

For smaller organizations in particular, without the benefit of a sizable staff of information professonals, certain projects may require additional expertise or outside assistance to meet a critical deadline. For these situations, CAS has created Science IP, the CAS Search Service. This function is staffed with searching and analysis experts who can assist on a project by project basis. During this presentation, examples of searches with legal ramifications will be discussed and details will be provided on the advantages of working with Science IP on these types of requests.


CINF 24:  
The Questel-Orbit alternative for chemical information

Elliott Linder, Questel*Orbit, Inc, 7925 Jones Branch Drive, Fax: 703.873.4701, ELinder@questel.orbit.com, and Joseph M Terlizzi, Questel-Orbit, 8000 Westpark Drive, jterlizzi@questel.orbit.com

Abstract
For over 25 years, Questel·Orbit has offered information specialists an extensive collection of online patent databases containing chemical information. For broad subject searching, the European, International, and US classifications in our exclusive PlusPat database can be used, with easy lookup using the ECLA and USPCL dictionary files. Narrower searching can be conducted using the US, EP, and PCT full-text databases. For specific chemical searching, our exclusive Merged Markush Service (MMS) for chemical structure searching is available, as are codes and indexing in databases produced by Derwent, IFI, CAS, INPI, and others. Special features allow the creation of “super” display records composed of fields from any database on the system. The standardization of patent numbers system-wide makes cross-file searching for complementary information simple. Built-in statistical analysis tools are easy-to-use and valuable for competitive intelligence. This presentation will review how the techniques and features outlined above are applicable for small chemical businesses.


CINF 25:  
Instruments on the Grid: UK national crystallography grid service

Jeremy G. Frey, Chemistry, University of Southampton, Department of Chemistry, Highfield, Southampton SO17 1BJ, United Kingdom, Fax: +44 23 8059 3781, j.g.frey@soton.ac.uk

Abstract
We will describe the processes and infrastructure needed to develop and deploy a grid service for access to and interaction with the UK EPRSC National Crystallography Service (NCS) developed as part of the CombeChem e-Science Pilot Project and with the assistance of the Centre of Excellence in Combinatorial Chemistry, all largely based at the University of Southampton. UK. Special consideration will be given to a discussion of the sample tracking database and the implementation needed to run this national service, the implications for the security of the service, and the system employed to meet these requirements. The user interface, archiving methods and notification systems will also be described along with the results of the initial users experience.


CINF 26:  
Computational science and engineering online: A web-based grid-computing environment for research and education in computational science and engineering

Thanh N. Truong, Department of Chemistry, University of Utah, 315 S, 1400 E, Room 2020, Salt Lake City, UT 84112, Fax: 801-581-4354, truong@chemistry.chem.utah.edu

Abstract
We present the development of an integrated extendable web-based simulation environment called Computational Science and Engineering On-line (CSEO) that allows computational scientists to perform research using state-of-the-art tools, querying data from personal or public databases, discuss results with colleagues, and access resources beyond those available locally from a web browser. Currently, CSEO provides an integrated environment for multi-scale modeling of complex reacting systems. A unique feature of CSEO is in its framework that allows data to flow from one application to another in a transparent manner. A particular example is demonstrated to show how results from fundamental quantum chemistry simulations are used to calculate thermodynamic and kinetic properties of a chemical reaction, which subsequently are used in the simulation of a combustion reactor. Advantages, disadvantages, and future prospects of a web-based simulation approach are then discussed. CSEO can be accessed at http://cseo.net.


CINF 27:  
Grid computing: How applications are finally catching up to the technology

Chris Crafford, Engineering, United Devices, 12675 Research Blvd., Bldg. A, Austin, TX 78759, Fax: 512-331-6235, chris@ud.com, and Seetharamulu Peddaiahgari, Director, Life Sciences Applications, United Devices

Abstract
The completion of the human genome has transformed drug discovery and molecular targeting, vastly increasing the potential number of druggable targets as well as information about their possible binding sites. Computer power is essential to identifying and learning more about these targets. With the appropriate grid solution, researchers can explore drug actions, speed the development cycle and reduce costs, without sacrificing precision. Several research organizations and top pharmaceutical companies are already using the technology to gain a competitive edge. Multiple case studies will be presented illustrating how researchers, with the help of top application providers are using grid computing now to achieve success.


CINF 28:  
Virtual screening using grid computing

W Graham Richards, Central Chemistry Lab, University of Oxford, South Parks Road, Oxford, OX1 3QH, United Kingdom, graham.richards@chem.ox.ac.uk

Abstract
The screen saver project currently involving the Chemistry Department at the University of Oxford, United Devices Inc and Accelrys Inc now involves some 2.5 million PCs in over 220 countries and has provided more than 250,000 years of CPU time: an effective 100 teraflop facility. Such power permits the virtual screening of billions of drug-like molecules against defined protein targets within days or weeks. A review of the project and the results obtained so far and future opportunities will be presented


CINF 29:  
OpenMolGRID, a Grid-based large-scale drug design system

Laszlo Urge1, Ákos Papp1, István Bágyi1, Géza Ambrus2, and Ferenc Darvas1. (1) ComGenex Inc, 33-34 Bem rpk, Budapest, H-1027, Hungary, Fax: +361-214-2310, laszlo.urge@comgenex.hu, (2) RecomGenex, Ltd

Abstract
Pharmaceutical companies are facing the challenges that modern drug discovery requires precise "high-throughput" in silico systems that are not only able to handle millions of structures, but can also give accurate predictions for the requested properties. On the other hand, mergers in the pharmaceutical industry demand the integration of geographically distributed information and computation resources. These challenges make indispensable the usage of GRID systems. As a consequence, chemical applications developed for traditional environments have to be redesigned to meet the requirements of this new technology. OpenMolGRID is going to be one of the first realizations of the GRID technology in drug design. The system is designed to build forward- and reverse-QSAR models, and generate novel structures with favorable properties. The lecture details the realization of implementing traditional chemical IT tools to solve large-scale library design scenarios. The development of OpenMolGRID is partly funded by the European Commission (IST-2001-37238).


CINF 30:  
BioSimGRID: A distributed database for biomolecular simulations

Jonathan W Essex1, Kaihsu Tai2, Stuart Murdock1, Muan Hong Ng3, Bing Wu4, Steve Johnston3, Hans Fangohr3, Paul Jeffreys4, Simon Cox3, and Mark Sansom2. (1) School of Chemistry, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom, Fax: +44 (0)23 8059 3781, jwe1@soton.ac.uk, (2) Department of Biochemistry, University of Oxford, (3) e-Science Centre, University of Southampton, (4) e-Science Centre, University of Oxford

Abstract
Biomolecular simulations provide data on the conformational dynamics and energetics of complex biomolecular systems. We aim to exploit the Grid infrastructure developing in the UK to enable large scale analysis of the results of such simulations. The BioSimGRID project (www.biosimgrid.org) will provide a generic database for comparative analysis of simulations of biomolecules of biological and pharmaceutical interest. The system will have a service-oriented computing model using Grid-based Web service technology to deliver analysis. Data mining services will be provided for the biomolecular simulation and structural biology communities, using a Python scripting environment. To address the security problem of the heterogeneous BioSimGRID environment, a Grid certificate-based and a user/password-based authentication mechanism will be integrated across the system. The back-end of BioSimGRID is based on a relational database, with appropriate indexing to optimize performance of the analysis package.


CINF 31:  
Comb-e-Chem: GRID-enabled chemical crystallography and a new opportunity for structural chemistry

Michael B. Hursthouse, Department of Chemistry, University of Southampton, Southampton SO17 1BJ, United Kingdom, Fax: 44-2380-596723, M.B.Hursthouse@soton.ac.uk

Abstract
We are exploring the feasibility of an e-Science approach to provide an integrated, GRID-enabled, Chemical Structure and Property Environment, incorporating a co-ordinated high-throughput crystal structure determination and property measurement capability, with distributed structure and property calculations and data-base mining. We developing new software for automated pattern searching in crystal structures, with a view to learning more about crystal structure assembly, polymorphism and materials properties. In a related E-Bank project, we are developing procedures for automated archiving and dissemination of fundamental data, subsequent processing and calculations, and the derived knowledge, so that publications in which the new information can be assessed and presented, are not compromised by the need to carry with it the data. This presentation will report and review the status of these activities


CINF 32:  
Semantic Grid computing - the WorldWideMolecularMatrix

Yong Zhang1, Robert C. Glen2, Peter Murray-Rust3, Henry S. Rzepa4, and Joe A Townsend2. (1) Unilever Centre for Molecular Sciences Informatics, University of Cambridge, Lensfield Road, Cambridge, United Kingdom, yz237@cam.ac.uk, (2) Department of Chemistry, Unilever Centre for Molecular Science Informatics, (3) Unilever Centre for Molecular Informatics, University of Cambridge, (4) Chemistry, Imperial College

Abstract

The Semantic Web is Tim Berners-Lee's vision of knowledge-based computing for the Web. We have shown how this can be adapted to chemistry. Our implementation uses XML-CML for molecules and properties and the new IChI as a unique key calculated directly from the connection table. A molecule can be precisely differentiated from any other and retrieved by conventional database methods.

The NCI database has ca 250,000 molecules which we converted into CML using openbabel. These are stored in a native XML database, Xindice, and searched by the XPath language. We can retrieve molecules within 50 milliseconds.

Molecular properties were calculated using MOPAC2003, using Condor and the spare cpu time on 24 PCs. Times per molecule varied from 0.5 sec to 500,000 seconds; the calculations took 4 months.

The XML results are Openly available on our WorldWideMolecularMatrix, WWMM. A chemist submits a molecule. If its properties already exist they are returned; otherwise the computation is run. For new molecules the results are provided through a RSS system (CMLRSS).

The system is a peer2peer Grid for chemical information and computation. The software can be downloaded and we invite other groups to run servers with varied functions so a Semantic Grid for chemistry becomes possible.

 


We thank the DTI and Unilever PLC.

 


CINF 33:  
Adaptive informatics infrastructure for multi-scale chemical science

James D. Myers1, Larry Rahn2, David Leahy2, Carmen M. Pancerella2, Gregor von Laszewski3, Branko Ruscic4, and William H. Green Jr.5. (1) Collaboratory Group Leader, Battelle / Pacific Northwest National Laboratory, Battelle Blvd. MS K1-87, Richland, WA 99352, Fax: 509-375-6631, jim.myers@pnl.gov, (2) Sandia National Laboratories, (3) Mathematics and Computer Science Division, Argonne National Laboratory, (4) Chemistry Division, Argonne National Laboratory, (5) Department of Chemical Engineering, Massachusetts Institute of Technology

Abstract
The Collaboratory for Multi-scale Chemical Sciences (CMCS, cmcs.org) is enabling the flow of information across physical scales and scientific disciplines ranging from subatomic quantum chemistry to predictive simulations of chemical processes such as combustion. CMCS is using advanced collaboration and metadata-based data management technologies to develop a portal providing distributed research support, community interactions, and data discovery, management, and annotation capabilities. The portal assists in documenting and browsing data pedigree and in communicating dependencies between data produced at one scale and computations using it at the next. A variety of standards-based mechanisms for extracting metadata from files, translating between schema, converting data formats, and integrating external applications (such as Active Thermochemical Tables) are being developed to minimize the work required to adopt CMCS capabilities. These capabilities are being piloted by involving key national chemistry resources (data and software) and by supporting distributed groups performing informatics-based chemical research in combustion science.


CINF 34:  
The application of distributed computing to computer simulations

Jonathan W Essex1, Christopher J. Woods1, Adrian P. Willey1, Luca A. Fenu1, Andrew C. Good2, Andrew R. Leach3, Richard A. Lewis4, and Jeremy G. Frey1. (1) School of Chemistry, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom, Fax: +44 (0)23 8059 3781, jwe1@soton.ac.uk, (2) Structural Biology and Modeling, Bristol-Myers Squibb, (3) Computational Chemistry and Informatics, GlaxoSmithKline Research and Development, (4) Lilly Research Centre

Abstract
Distributed computing is a very popular, and potentially very powerful, approach for accessing large amounts of computational power. Under the umbrella of the comb-e-chem project, we have examined both freely available, and commercial distributed computing software. In this paper, our experiences will be described. The performance of coarsely parallel computations, such as protein-ligand docking, and more tightly coupled replica-exchange molecular dynamics computer simulations will be assessed. Issues of security will also be discussed, and in particular how security determines the availability and utility of computers within a large organisation.


CINF 35:  
Virtual Research Parks enable multi-organizational collaboration

Gary G Benesko, Life Sciences, IBM, 755 Cypress Rd., St. Augustine, FL 32086, Fax: 419-735-6288, benesko@us.ibm.com

Abstract
A Virtual Research Park (VRP) is a secure, state-of-the-art, Web-based research environment that supports and facilitates joint R&D, collaboration, and commercial activities among Life Science Communities¨ whose boundaries extend beyond any one enterprise or geography. Each Community can consist of multiple related organizations and individuals united by common interests, such as

bulletAccelerating innovation using an advanced set of collaboration tools across an extended team
bulletLeveraging external expertise through Virtual Consulting services
bulletStreamlining the R&D process through access to Best Practice applications a wide range of data sources, and state-of-the-art R&D tools
bulletOrganizing and managing common projects and common resources
bulletSharing of common data and applications
bulletLeveraging external resources "On Demand" (e.g. compute grids, storage grids, external applications)
bulletDecreasing mutual costs via a common commercial platform with access to external suppliers and vendors of goods and services


CINF 36:  
Structure-activity relationships for the design of molecules (STARDoM): The development and implementation of grid-enabled, automated predictive QSAR modeling

Alexander Tropsha1, Scott Oloff2, Alexander Golbraikh1, Chi-Duen Poon3, Terry O'Brien4, Michael Blocksome4, Rich Dulaney4, Madhu Gombar4, and Virinder Batra4. (1) Laboratory of Molecular Modeling, School of Pharmacy, The University of North Carolina at Chapel Hill, 301 Beard Hall, CB# 7360, UNC-CH, Chapel Hill, NC 27599, tropsha@email.unc.edu, (2) Department of Pharmacology, University of North Carolina at Chapel Hill, (3) Department of Chemistry, University of North Carolina, (4) IBM Life Sciences

Abstract
QSAR models are typically generated with a single modeling technique. Our research has demonstrated that multiple models should be generated for any dataset to ensure their statistical significance, and predictive power. We have developed a combinatorial QSAR approach which explores all possible combinations of various descriptor sets and optimization methods coupled with external model validation. This approach required integration of multiple individual protocols dealing with descriptor generation, model development and validation, and model application to external database mining to identify potentially active hits. The integration of the protocols developed at UNC was achieved in collaboration with the IBM’s Life Sciences team using the WebSphere framework and implemented on the North Carolina BioGrid through a Globus Toolkit. This solution is automated, efficient, and accessible to users via a web interface. It was successfully applied to the discovery of novel anticonvulsant agents as well as novel ligands of the P2Y12 receptor.


CINF 37:  
Development of a personal computing environment for molecular design on Grid

Umpei Nagashima1, Takeshi Nishikawa1, Satoshi Sekiguchi1, Sumie Tajima2, Toru Yagi2, Takeshi Kitayama2, and Makoto Haraguchi2. (1) Grid Technology Research Center, National Institute of Advanced Industrial Science and Technology, Umezono 1-1-1, Tsukuba, Japan, Fax: +81-29-861-5301, u.nagashima@aist.go.jp, (2) Bestsystems Inc

Abstract
We are developing a personal computing environment for molecular design on Grid as an attempt of computational chemistry on Grid environment. In this talk, we introduce tow products: Molworks(http://www.molworks.com) and Gaussian Portal. MolWorks supports molecular modeling, input data generation, output analysis and Job controls of Molecular orbital calculation on Grid. Property estimation function of molecules is also supported. Gaussian Portal is an attempt to construct a framework for Grid-enabled application service provider. These tow products are expected to realize a desktop virtual laboratory for Chemists and achieve high throughput by PC clusters, supercomputers and databases integration with intelligent scheduler.


CINF 38:  
Heterojunctions of nanomaterials and organic-inorganic nanoassemblies

Cengiz S. Ozkan, Electrical and Chemical Engineering, Biomaterials and Nanotechnology Laboratory, Center for Nanoscience Innovation for Defense, University of California, Riverside, CA CA 92521, cengiz.ozkan@ucr.edu

Abstract
Nanomaterials including carbon nanotubes and nanocrystals have considerable potential as building blocks in future nanoelectronics and bio-nanotechnology applications. The unique electrical, mechanical, and chemical properties of CNT’s have made them intensively studied materials in the field of nanotechnology within the last decade. Nanocrystals or quantum dots provide a remarkable opportunity for designing artificial solids, since they possess unique and controllable physical and chemical properties based on composition, structure and their size. Another heavily investigated area includes the conjugation of inorganic nanomaterials with biomolecules including DNA and protein for various applications in bio-nanotechnology. In this talk, I will first describe approaches for the synthesis of nano-assemblies of carbon nanotubes and quantum dots. Such functional nanostructures could become better alternatives for the fabrication of nanoscale electronic and photonic devices. They could also be useful for the bottom-up assembly of nanosystems as part of larger or microsystem technologies. Detailed chemical and physical characterization of the nanostructures will be presented via transmission electron microscopy and Fourier transform infrared spectroscopy. Next, approaches for encaspulating biological molecules including DNA inside carbon nanotubes which could be useful for a number of applications including novel electronics, DNA sequencing and drug delivery systems will be presented. DNA-oligo labeled with nano-colloid particles are encaspulated into multiwalled carbon nanotubes and the nanoassemblies are characterized via transmission electron microscopy and energy dispersive spectroscopy.


CINF 39:  
Effects of the presence of nanotubes on heat transfer in microfluidics

Nishitha Thummala, and Dimitrios V Papavassiliou, School of Chemical Engineering and Materials Science, The University of Oklahoma, 100 E Boyd, SEC T-335, Norman, OK 73019-1004, Fax: 405-325-5813, nishitha@ou.edu

Abstract
The drive for technical advancements in the micro/nano world, emerging from the desire to manipulate flow fields at smaller and smaller scales, is indeed challenging. An effective and reliable numerical tool for the analysis of transport properties in microfluidics is the Lattice Boltzmann Method (LBM). It can efficiently link the microscopic and macroscopic phenomena. Our group is using LBM to simulate single-phase flow in configurations like parallel plates, porous media. The paper will focus on simulation of heat transport from surfaces that have nanotubes aligned vertically as line sources or horizontally as point sources. Lagrangian Scalar Tracking (LST) methods are used to track the trajectories of heat particles released in the flow field, and to synthesize the behavior of the mean temperature profile from the behavior of the instantaneous sources of heat. The effect of the presence of nanotubes on the heat transfer characteristics will be discussed.


CINF 40:  
Computational nanotechnology: Bridging lengthscales with Materials Studio

Amitesh Maiti, Gerhard Goldbeck-Wood, and Scott Kahn, Accelrys Inc, 9685 Scranton Road, San Diego, CA 92121, Fax: 858-799-5100, amaiti@accelrys.com, scott@accelrys.com

Abstract
Nanotechnology holds tremendous economic and scientific potential, yet it will cost industry a considerable amount of time, money, and resources to research and develop new processes, devices, and synthesis techniques. The use of rational materials discovery software tools in conjunction with experimentation can lower this barrier significantly, and lead to new insights that may not be possible otherwise. Technologically important nanomaterials come in all shapes and sizes. They can range from small molecules to complex composites and mixtures. Depending upon the spatial dimensions of the system and properties under investigation, computer modeling of such materials can range from first-principles Quantum Mechanics, to Forcefield-based Molecular Mechanics, to mesoscale simulation methods, to the prediction of structure-property relationships. All of the above computational techniques are available in Accelrys’ integrated PC platform Materials StudioTM, as illustrated through a number of recent applications: (1) carbon nanotubes (CNTs) as nano electromechanical sensors (NEMS); (2) Metal-oxide nanoribbons as chemical sensors; (3) mesoscale modeling of polymer-CNT nanocomposites; and (4) mesoscale diffusion of drug molecules across cell membranes.

Another big challenge for the nanotechnologist is the very large space of possible material parameters and processing routes. Recent developments in Materials Informatics provide crucial knowledge management and data mining tools for better, cheaper and faster materials development. Design of Experiment, Combinatorial and High Throughput materials design software help to focus research and development on the most promising areas.


CINF 41:  
Chemical information resources for nanotechnology

Robert A Stembridge, Global Marketing Services, Thomson Scientific, 14 Great Queen Street, London, United Kingdom, bob.stembridge@thomson.com

Abstract
Nanotechnology is a young area dating back to Richard Feynman's intellectual demonstration in 1959 of the possibility of placing a facsimile of the entire Encyclopaedia Britannica on a pin-head. Much information is still in the realm of research papers published in learned journals and on the web, but increasingly practical applications of the technology are appearing in the patent literature, particularly in the area of chemical nanotechnology. This paper will illustrate these trends, examine the challenges for the user of tracking multiple sources of this information and discuss possible solutions to these problems.


CINF 42:  
A method for estimating the composite solubility vs. pH profile

Michael B. Bolger, Pharmaceutical Sciences, USC School of Pharmacy and Simulations Plus, Inc, 1985 Zonal Ave. PSC 700, Los Angeles, CA 90089, Fax: 323-442-1390, bolger@usc.edu, Christel Bergstrom, Department of Pharmacy, Uppsala University, Robert Fraczkiewicz, Life Sciences Department, Simulations Plus, Inc, and Per Artursson, Division of Pharmaceutics, Uppsala University

Abstract
Purpose: To predict the shape of the composite solubility vs. pH profile by using purely in silico estimation. Method: The complete solubility vs. pH profile for 25 monobasic drug molecules was collected and molecular descriptors were generated using QMPRPlus. We then examined relationships between intrinsic solubility and several other molecular descriptors to predict the solubility factor (ratio of solubility for ionized over unionized). Results: A simple linear relationship between intrinsic solubility and solubility factor showed that the solubility factor is inversely proportional to the experimental value of intrinsic solubility. We then developed a multiple linear regression equation to predict log of solubility factor using intrinsic solubility and number of hydrogen bond donors and acceptors as independent variables. Conclusions: A relationship between log of intrinsic solubility and solubility factor, when corrected for the number of hydrogen bond donors and acceptors can provide a good estimate of salt solubility for a small set of monoprotic basic drugs.


CINF 43:  
A systematic name generator module for Marvin

Szilveszter Juhos, Gyorgy Pirok, and Ferenc Csizmadia, ChemAxon Ltd, Maramaros koz 3/a, 1037 Budapest, Hungary, Fax: +36 1 4532659, sjuhos@chemaxon.com

Abstract

Constructing systematic names for single molecules based on IUPAC rules can be rather time-consuming and requires chemists experienced in complex nomenclature. Naming a large number of structures manually is practically impossible so several automatic name generating software tools have been developed.

Our module is a platform-independent Java plugin linked to Marvin to facilitate generating IUPAC names for individual molecule sketches or for whole databases via batch processing. It can be easily integrated into other Java applications or applied over intranet/web pages. The throughput and accuracy of name generation will be demonstrated in the poster.


CINF 44:  
Chemical information in Medline/PubMed

Beryl M. Benjers, Index section, National Library of Medicine, Bethesda, MD 20894, Fax: 301-402-2433, benjersb@mail.nlm.nih.gov

Abstract
MEDLINE contains more than 12 million citations from 1966 to present. Pre-1966 citations are now being added in the OldMEDLINE. More than 4,500 journals in languages from around the world are indexed. Last year over 537,000 indexed citations were added to MEDLINE. Indexers analyze the article and index at an average rate of four articles/hour, applying 8-10 subject terms from MeSH, NLM’s controlled vocabulary. New indexers attend a rigorous two-week training course at NLM and then work closely with a reviser, who reviews their work. An asterisk with a MeSH subject term indicates the main point of an article, and that the article will be cited under that term in Index Medicus, the print counterpart of MEDLINE. MEDLINE citations and abstracts are available as the primary component of NLM’s PubMed database and retrieval system, which is searchable free-of-charge via the Internet.

MeSH contains 22,568 descriptors, of which 7,355 are chemical descriptors, supplemented by 138,526 chemical concepts (Supplementary Concept Records). New MeSH descriptors are added annually while Supplementary Concept Records are added daily as they are encountered in the indexed literature. New chemicals are electronically flagged for the chemical specialists, who study, research, update, and/or create new records as needed, and add them to the indexed citation and MeSH Browser. This allows MEDLINE citations to be indexed with the existing terms as well as the new ones.

MEDLINE indexing of chemical concepts includes coordination with a Pharmacological Action (PA) when appropriate. Indexing Information (II) terms may also be added with chemicals (e.g. disease/organism associated with a chemical).

The MeSH Browser is available at http://www.nlm.nih.gov/mesh/2004/MBrowser.html and can be searched by MeSH terms, Supplementary Concepts, ID, II, PA, RN, RR and EC numbers. MEDLINE/PubMed can be searched by MeSH terms, Supplementary concepts, authors, text words, journal, etc.

The National Library of Medicine (NLM) Home pages (http://www.nlm.nih.gov) offer information and links to other databases, such as MEDLINEplus and CHEMIDPlus.


CINF 45:  
Conformational folding process of a small-peptide predicted by using CONFLEX conformation search and GRID technology

Hitoshi Goto1, Kazuo Ohta2, Umpei Nagashima3, Yoshihiro Nakajima4, Mitsuhisa Sato4, and Hiroshi Chuman5. (1) Department of Knowledge-based Information and Engineering, Toyohashi University of Technology, Toyohashi 441-8055, Japan, Fax: 81-532-48-5588, gotoh@cochem2.tutkie.tut.ac.jp, (2) Conflex Corporation, (3) Grid Technology Research Center, National Institute of Advanced Industrial Science and Technology, (4) Graduate School of Systems & Information Engineering, University of Tsukuba, (5) Faculty of Pharmaceutical Sciences, University of Tokushima

Abstract
Among the fundamental problems in elucidation of biomolecular functions with the aid of theoretical and computational chemistry, the first difficulty to overcome is the conformational flexibility problem, especially, related to the folding problem of proteins. To resolve these challenging problems, we have started on improvements of our original conformational space search method gCONFLEXh using parallel computing and Grid techniques. In the previous ACS meeting, we reported a master-and-worker parallelization and GRID world-wide distributed computing techniques used in CONFLEX conformation search algorithm, and those performances data of some small peptides. In this Anaheim meeting, a folding process of a small polypeptide, which is predicted by conformational analyses using a clustering technique based on the conformational distance matrix among backbone conformations, will be presented. Some interesting animations and movies are also demonstrated.


CINF 46:  
Combining fingerprints and other descriptors in virtual HTS

Zsuzsanna Szabo, Miklos Vargyas, Ferenc Csizmadia, and Gyorgy Pirok, ChemAxon Ltd, Maramaros koz 3/a, 1037 Budapest, Hungary, Fax: +36-1-453-2659, , fcsiz@chemaxon.com

Abstract

Various aspects of virtual screening using molecular descriptors of 2-dimensional chemical structures have been investigated over the last two years at ChemAxon. The work involved the implementation of various descriptors and metrics as wellas the optimization of some of the parameters. The poster to be presented summarizes our results to date.

When setting up a virtual screening experiment, researchers are faced with the problem of choosing the right combination of the available descriptors. Additionally, some descriptors may allow several parameters which overall increases the degree of freedom dramatically. Finally, when comparing descriptor values one can choose from numerous dissimilarity metrics. To cope with this freedom of choice an automated optimization tool has been implemented.

This tool has proved to be successful in helping chemists to choose suitable descriptors, metrics and parameter values for virtual screening. It will be demonstrated that optimization can increase the enrichment ratio of the screening procedure.


CINF 47:  
Drug discovery using grid technologies and DrugML

Michiaki Hamada, Science and Technology Group, Fuji Research Intstitute Corporation, Tokyo 101-8443, Japan, mhamada@star.fuji-ric.co.jp, Yuichiro Inagaki, Science and Technology Group, Fuji Research Institute Corporation, Tokyo 101-8443, Japan, yinagaki@star.fuji-ric.co.jp, Hitoshi Goto, Toyohashi University of Technology, Umpei Nagashima, National Institute of Advanced Industrial Science and Technology, Shigenori Tanaka, Toshiba Research and Development Center, and Hiroshi Chuman, Tokushima University

Abstract
A number of computer resources, such as CPUs and storages, can be connected over networks to construct a huge virtual computing environment using grid technologies. Our project "g-Drug Discovery" aims at developing a platform for drug design using grid technologies, on which various analysis and calculations are conducted, such as molecular mechanics method, replica exchange method, docking with proteins, molecular orbital method, and 3-dimensional quantitative structure activity relationship. For storing data of structures of compounds, descriptors, and calculation results, we are making DrugML by extending CML. One can use these grid technologies with DrugML in from rough screening with drug likeness or ADMET properties to screening by very precise calculation.


CINF 48:  
Investigation of molecular chirality in 3D chemical structure databases

Zengjian Hu1, William M. Southerland1, and Shaomeng Wang2. (1) Department of Biochemistry and Molecular Biology, Howard University College of Medicine and the Howard University Drug Discovery Unit, 520 West Street, Northwest, Room 324, Washington, DC 20059, huzengjian@hotmail.com, wsoutherland@howard.edu, (2) Departments of Internal Medicine and Medicinal Chemistry, University of Michigan

Abstract
In recent years, virtual screening of chemical databases using molecular docking has emerged as the most important tool and a well-established method in drug discovery for finding new leads. The first step in virtual screening is to create a searchable database of three-dimensional structures of small. In the past few years, we have created 9 small molecule 3D searchable databases which contain more than 1,000,000 molecular entries, and could be used to discover interesting ligands for various pharmaceutical targets. When production of 3D chemical databases for screening purposes, we found that there is no information about absolute stereochemistry (R-S) and double bond geometry (E-Z) of most compounds contained in the 2D chemical database connection tables. Today more than 50% of marketed drugs are chiral. Chiral drugs have become a major focus of most pharmaceutical companies, which are safer, exhibit fewer side effects, and are more potent than the drugs previously used. As chiral molecules will certainly play a role in the exploitation of 3D space for the development of new drugs, the creation of a 3D database with the consideration of chirality of molecules will be beneficial for the discovery of lead compound binding to molecular targets. As the first step, we analyzed the chirality of molecules in our 10 three-dimensional databases. It was found that about 29% of the compounds in these databases were chiral compounds with about 62% compounds in CGE database being chiral compounds while only about 14% compounds in MCC database have chirality. It could be seen that most chiral molecules in these 3D databases have only one chiral center, but it is not rare for compounds with more than 10 chiral centers. The maximum of chiral centers in a molecule could be more than 60. It is well known that in general, if a molecule has n chiral centers, there are 2n different possible stereoisomers. Therefore, the entries in a 3D databases considering chirality will be doubled for molecules with one chiral center if there is no any symmetry elements in the molecule. The creation of th


CINF 49:  
Molecular modelling for organic chemists: A chemical informatics problem

Jonathan M Goodman, Unilever Centre for Molecular Science Informatics, Cambridge University, Department of Chemistry, Lensfield Road, Cambridge CB2 1EW, United Kingdom, Fax: +44 1223 336362, J.M.Goodman@ch.cam.ac.uk, and María A. Silva, Unilever Centre for Molecular Science Informatics, University of Cambridge

Abstract
Both molecular modelling and organic chemistry generate and use large amounts of information, which should be mutually beneficial. However, it can be difficult to persuade experimental organic chemists to use molecular modelling, as force field methods cannot be applied to many transition states and molecular orbital methods are too slow to calculate the behaviour of many reactions before the experimental result makes the calculation of less immediate interest. We use a combination of molecular mechanics and molecular orbital methods in a ‘Chemical Information Laboratory’ (http://www.ch.cam.ac.uk/SGTL/gle/) in order to gain information of experimental relevance quickly enough to be useful. For example, chemical information has been generated about the molecules illustrated using this process, so improving our knowledge of structure and reactivity.


CINF 50:  
Chemical education markup language: An XML namespace for educational chemistry software

Daniel C. Tofan, Department of Chemistry, State University of New York, Stony Brook, NY 11794-3400, Fax: 631-632-7960, dtofan@mail.chem.sunysb.edu

Abstract
The Chemical Education Markup Language (ChEdML) is being developed as an XML namespace to allow learning management systems to include chemical content. ChEdML was initially intended to provide extensions to the current IMS specifications for question and test item interoperability (QTI) XML binding. Such extensions allow authors to create items containing responses that use chemical symbolism. Examples include chemical reactions, electron configurations, Lewis structures, measures with units etc. Tags were also developed to format chemical information for display on web pages. A complete XML tag set is now under development to encompass a full curriculum of introductory chemistry. ChEdML also provides a mechanism to parameterize items and to include equations to calculate numeric responses. This allows the generation of item templates that can be instantiated at runtime with appropriate parameters. A Java API is being developed to support the generation and use of ChEdML.


CINF 51:  
Oligopeptide transporter (PepT1) homology model based on lactose permease (LacY)

Michael B. Bolger, Pharmaceutical Sciences, USC School of Pharmacy, 1985 Zonal Ave. PSC 700, Los Angeles, CA 90089, Fax: 323-442-1390, bolger@usc.edu

Abstract
Purpose. To build a homology model of the oligopeptide / proton co-transporter PepT1 based on the crystal structure of bacterial lactose / proton co-transporter. Methods. The centers of transmembrane spanning domains (TMDs) in LacY plus the 22 amino acids that comprise each of the twelve TMDs were selected. The software package “Proteotoolbox™” was used to guide the threading of the sequence of PepT1 onto the 3D-structure of LacY to allow for maximal overlap of the 2D and 3D hydrophobic moments. Finally, the experimental results for site-directed mutagenesis were examined in light of this new homology model to identify structural basis for those results. Results. Site directed mutation results and cysteine-scanning for TMD 5 and 7 were explained on the basis of the PepT1 model. The new model helps to explain the involvement of key histidine residues in the proton translocation process. Conclusions. The new 3D model extends and enhances our previous results (J. Pharm. Sci. 87(11):1286 1998) and provides additional insight into the structure and function of the oligopeptide transporter.


CINF 52:  
Multi-conformational 3D databases: Quality assessment and pharmacophore search capabilities in MOE

Morten Langgaard, Berith Bjornholm, Anne Marie Munk Jorgensen, and Klaus Gundertofte, Department of Computational Chemistry, H. Lundbeck A/S, Ottiliavej 9, Dk 2500 Valby, Denmark, Fax: +45 3643 8237, mol@lundbeck.com

Abstract
In this study we report our experiences with the software solution MOE with respect to building multi-conformational databases and performing pharmacophore searches. Template pharmacophores derived from crystal structures of known protein-ligand complexes as well as classically derived pharmacophore models are used for the evaluation. Conformational coverage and the quality of each conformation of the developed multi-conformational 3D databases are evaluated thoroughly. The analysis of the search results focusses on hit rate, quality of hits, and the impact of pharmacophoric element selections for the query. Practical issues like speed, storage and management of databases are also addressed. The performance of MOE with respect to the above-mentioned issues will be discussed and compared to the more established method Catalyst.


CINF 53:  
A combinatorial DFT study of how cisplatin binds to purine bases

Leah Sandvoss, and Mu-Hyun Baik, Department of Chemistry, Indiana University, 1200 Rolling Ridge Way #1311, Bloomington, IN 47403, lsandvos@indiana.edu

Abstract
Cisplatin (cis-diamminedichloroplatinum(II)) continues to attract much attention because of its therapeutic importance as an anticancer drug. It binds primarily to the N7 positions of adjacent guanine (G) sites in genomic DNA, causing intrastrand cross-links, which suppress replication and lead ultimately to cell death. Previous work showed both kinetic and thermodynamic preference of G over adenine for the platination reaction. The goal of this study is to obtain a chemically intuitive explanation for this selective behavior of cisplatin by systematically comparing the electronic structures of a diverse set of functionalized purine bases. A computational combinatorial library of over 1500 purine derivatives was designed based on density functional theory calculations and the changes of the most important molecular orbitals as a function of structural variance were examined in detail. This electronic profile for purine bases reveals how electronic hot spots control the reactivity at the N7 position (see figure).


CINF 54:  
Study of selectivity from a pharmacophore perspective

Klaus Gundertofte, Berith Bjørnholm, and Morten Langgård, Department of Computational Chemistry, H. Lundbeck A/S, Ottiliavej 9, Dk 2500 Valby, Denmark, kgu@lundbeck.com

Abstract
A number of pharmacophore models covering G protein-coupled receptors and transporters primarily from the monoaminergic families of targets have been developed. The general methodology will be described as well as performance of different methods, e.g. MOE and Catalyst, applied in the development. In order to elucidate selectivity issues across the targets studied, a comparison of the models characterised by their pharmacophoric elements was done. The analysis of the pharmacophore patterns revealed remarkable resemblances or superpharmacophores. Distinct differences between the models were also found. The impact of these findings in medicinal chemistry projects will be discussed.


CINF 55:  
Successful shape-based virtual screening: The discovery of a potent inhibitor of the type I TGFb receptor kinase (TbRI)

Juswinder Singh, and Claudio Chuaqui, Structural Informatics, Biogen, 12 Cambridge St., Cambridge, MA 02142, Fax: 6176792616, Juswinder_Singh@Biogen.com

Abstract
We describe the discovery, using shape-based virtual screening, of a potent, ATP site-directed inhibitor of the TbRI kinase, an important and novel drug target for fibrosis and cancer. The first detailed report of a TbRI kinase small molecule co-complex confirms the predicted binding interactions of our small molecule inhibitor, which stabilizes the inactive kinase conformation. Our results validate shape-based screening as a powerful tool to discover useful leads against a new drug target


CINF 56:  
HypoRefine: Automated identification of exclusion volumes in pharmacophore models

Allister J. Maynard, Marvin Waldman, and Jon Sutter, Accelrys, 9685 Scranton Rd., San Diego, CA 92121, Fax: 858 799 5100

Abstract
This presentation provides an overview of the HypoGen pharmacophore generation algorithm. HypoGen is a ligand-based QSAR tool using pharmacophoric overlap to predict activity.

A limitation of HypoGen is that activity prediction is based purely on the presence and arrangement of pharmacophoric features – steric effects are unaccounted for. A novel modification to HypoGen is described (HypoRefine). HypoRefine accounts for steric effects on activity, based on the targeted addition of excluded volume features to the pharmacophores. These excluded volumes attempt to penalize molecules occupying steric regions not occupied by active molecules.

Details of the steric detection and excluded volume addition algorithm are presented, along with some examples illustrating how excluded volumes improve the QSAR pharmacophore models.


CINF 57:  
Automatic generation of multiple pharmacophore hypotheses

Simon Cottrell1, Valerie J. Gillet1, and Robin Taylor2. (1) University of Sheffield, Western Bank, Sheffield S10 2TN, United Kingdom, s.cottrell@sheffield.ac.uk, v.gillet@sheffield.ac.uk, (2) Cambridge Crystallographic Data Centre

Abstract
Pharmacophore methods provide a way of establishing a structure-activity relationship for a series of known active ligands. Often, there are several plausible hypotheses that could explain the same set of ligands and in such cases, it is important that the chemist is presented with alternatives that can be tested with different synthetic compounds. Existing pharmacophore methods involve either generating an ensemble of conformers and considering each conformer of each ligand in turn or exploring conformational space on-the-fly. The ensemble methods tend to produce a large number of hypotheses and require considerable effort to analyse the results, whereas methods that vary conformation on-the-fly typically generate a single solution that represents one possible hypothesis even though several might exist. We will describe a new method for generating multiple pharmacophore hypotheses with full conformational flexibility being explored on-the-fly. The method is based on multiobjective evolutionary algorithm techniques and generates a manageable number of different yet plausible hypotheses.


CINF 58:  
PepT1 substrate transport pharmacophore determinants: Refinement with data from a single consistent functional assay

Terry R Stouch1, Teresa Faria2, and Julita Timoszyk2. (1) Computer-Assisted Drug Design, Bristol-Myers Squibb Pharmaceutical Research Institute, MS H23-07, PO Box 4000, Princeton, NJ 08543-4000, Fax: 609-252-6030, terry.stouch@bms.com, (2) Exploratory Biopharmaceutics and Stability, Bristol-Myers Squibb, Pharmaceutical Research Institute

Abstract
PepT1 is a primary intestinal transporter of di and tripeptides. It also transports large quantities of important pharmaceuticals, such as beta-lactams and ACE inhibitors. The ability to function as a substrate for this channel can appreciably increase the absorption of drugs whose passive permeation rates might be low or nill. Data was collected on a series of ligands using recently developed single fluorescent function assay. The ligands were specifically chosen to elucidate the important determinants of transport. A wide range of different rates of transport was evidenced, even for dipeptides. Coupled with conformational analysis and molecular overlays, a fairly simple pharmacophore of five elements was developed that can be used to retrieve known substrates.


CINF 59:  
Structure and information theory derived pharmacophores as pre- and post-filters for docking

Kenneth E. Lind, Erik Evensen, Hans Purkey, Robert McDowell, and Erin K. Bradley, Computational Sciences, Sunesis Pharmaceuticals Inc, 341 Oyster Point Blvd., South San Francisco, CA 94080, klind@sunesis.com

Abstract
Screening virtual compound collections has been a valuable method for finding starting points in the drug discovery process. This is often done through structure-based docking or ligand-based pharmacophore searching. These methods are more effective than random searching, but both have inherent limitations. It would be useful to have methods that make optimal use of both techniques to improve the selection of active molecules. In this study we compare standard docking and pharmacophore search techniques to methods that use different permutations to combine both methods, such as docking as a pre-filter for a pharacophore search, or vice versa. The methods are evaluated against CDK-2 for their ability to select known inhibitors and their overall enrichment rates.


CINF 60:  
A new method for pharmacophore identification

S. Stanley Young, Jun Feng, and Ashish Sanil, National Institute of Statistical Sciences, 19 T.W. Alexander Dr, Research Triangle Park, NC 27709, young@niss.org, feng@niss.org

Abstract
Abstract

The binding of a small molecule to a protein is inherently a 3D matching problem. As crystal structures are not available for most drug targets, there is a need to be able to infer key binding features and their disposition in space, the pharmacophore, from bioassay data. We use fingerprints of 3D features and a new approach to uncover the common pharmacophore for a set of compounds. We describe the algorithm and basic benchmarking. Knowing the 3D pharmacophore for a target should allow better data base searching and more efficient compound design.


CINF 61:  
A 3DPL case study: Finding new active molecules for the inhibition of calcineurin

Tad Hurst, Scientific Software, ChemNavigator, 6126 Nancy Ridge Drive, Suite 117, San Diego, CA 92121, Fax: 858-625-2377, thurst@chemnavigator.com

Abstract
The 3DPL Database Docking system has been demonstrated to be effective at extracting known active molecules from sets of inactive compounds in many test cases. The 3DPL technology can dock structures into a receptor structure at rate of up to 30/second, thus allowing in silico investigation of millions of database structures. In this paper, we detail the application of 3DPL to select from over 11 million chemical structures in the ChemNavigator iResearch Library to find 25 screening candidates. Samples of these 25 compounds were acquired and tested for calcineurin inhibition. Four of the compounds were found to be micro-molar inhibitors. Three of these compounds share a common core structure, and represent a new area for possible lead development.


CINF 62:  
Facilitating virtual screening workflows: The PyFlexX/E/S/-Pharm and PyFTrees modules

Sally Ann Hindle1, Frank Sonnenburg1, Marcus Gastreich2, and Christian Lemmen1. (1) Chemoinformatics, BioSolveIT GmbH, An der Ziegelei 75, 53757 St. Augustin, Germany, Sally.Hindle@biosolveit.de, (2) BioSolveIt GmbH

Abstract
Virtual screening usually requires several programs. This entails file format conversions, conceptually superfluous I/O, manual selection of data, consideration of interims-results and so on.

Python - a wide-spread, cross-platform, open-source and easy-to-read scripting language - allows for a wrapping of native C-applications in a Python layer, thus generating a modular world of applications which may easily be "plugged" together within a single Python script.

We have recently taken this step with our cheminformatics tools: FlexX/-E/C/-Pharm (docking), FlexS (small molecule alignment), and Feature Trees (similarity comparisons) may now be used within this scripting environment, sharing information instead of transferring it. An instant benefit is the availability of open-source Python packages for analysis and visualisation.

This concept drastically facilitates virtual screening experiments; moreover it allows for rapid prototyping of virtual screening protocols and parameter studies which shall be demonstrated in an application example.


CINF 63:  
Fast Lead Identification Protocol (FLIP) for structure based data mining using 3D fingerprints

Amit, S Kulkarni, Scientific Services, Accelrys Inc, 9685 Scranton Road, San Diego, CA 92121

Abstract
Structure based drug design is the method used to identify and optimize pharmaceutical leads when the crystal, NMR structure or homology model of a specific target protein is known. Virtual screening of corporate libraries, external compound collections and virtual compounds using various docking methods is routine in the drug discovery process. We are proposing a new virtual high throughput screening approach that we term “FLIP” (Fast Lead Identification Protocol) that uses the potential protein-ligand interaction sites in the active site of the target protein to data-mine compound collections. This proposed approach has the advantage of being extremely fast and can potentially be used for any target protein structure


CINF 64:  
Conformation mining: Shrinking chemical space to find biologically-active molecules

Santosh Putta, Gregory A. Landrum, and Julie E. Penzotti, Rational Discovery LLC, 555 Bryant St. #467, Palo Alto, CA 94301, sputta@rationaldiscovery.com

Abstract
Discovering the essential three-dimensional steric and chemical features shared by active compounds is an important step in designing drug candidates. However, the flexibility of actives often allows them to adopt several low-energy conformations, some of which are not important for biological activity. Conformational flexibility complicates the task of finding important features by forcing a search through a conformational space with dimensions that increase exponentially with the number of actives. Model building approaches typically address this problem either by using a small subset of conformations (e.g. most extended or lowest energy) or by encoding all of a compound’s conformations in a single fingerprint. The first approach may miss biologically-important conformations while the second risks masking critical information available only from individual conformations.

Here we explore techniques for efficiently mining the conformational space of multiple compounds. Our goal is to find a subset of biologically-important conformations and understand and exploit their commonalities.


CINF 65:  
Hit-directed nearest neighbor searching

Veerabahu Shanmugasundaram, Computer-Assisted Drug Discovery, Pfizer Global Research & Development, 2800 Plymouth Road, Ann Arbor, MI 48105, Fax: 734-622-2782, Veerabahu.Shanmugasundaram@pfizer.com, and Gerald M Maggiora, Department of Pharmacology and Toxicology, University of Arizona

Abstract
Follow-up of initial hits resulting from HTS is crucial if the hits are ultimately to give rise to useful lead compounds. Several approaches may be employed to select compounds from the Research Compound Collection or from commercially available collections for follow-up screening. Similarity searching based upon the similarity of the molecular fragments possessed by the molecules, yields compounds that are similar in structure to the hits. Nearest-neighbor searching of BCUT Chemistry Space identifies compounds that have similar BCUT values and hence similar electrostatic, hydrophobic and hydrogen bonding properties. In contrast to molecular fingerprint based similarity searching that looks for similar scaffolds in molecules, nearest neighbor searching identifies isobiological molecular structures with significantly different molecular scaffolds. Several examples illustrating the application and the success of this methodology will be presented.


CINF 66:  
AGENT: A program generating tautomers for computer-aided drug design

Patrick Ballmer, Pavel Pospisil, Gerd Folkers, and Leonardo Scapozza, Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), Winterthurerstr. 190, 8057 Zurich, Switzerland, Fax: 01141-1-6356884, patrick.ballmer@ethz.ch

Abstract
Several cases documenting the impact of ligand tautomerism on protein-ligand binding are described in the literature. AGENT has been developed to provide a tool to study this phenomenon. AGENT can be used to create chemically (energetically) reasonable tautomers of molecules stored in a 3D-input file. The created tautomeric forms can be directly used for molecular docking studies. The purpose of AGENT is thus to enrich a given small molecule-database with tautomeric forms, which are not unlikely to be able to exist in a protein active site. The number of tautomers created by AGENT is restricted either by chemical rules or by a user-defined energy threshold limiting the tolerated, semiempirically calculated Gibbs free energy of tautomer formation.