United States National Library of Medicine National Institutes of Health

Sample NLM® Data

INSTRUCTIONS FOR FTP OF SAMPLE RECORDS

FTP to NLM's anonymous FTP server: ftp://ftp.nlm.nih.gov/nlmdata/sample/
(login as a non-fee/anonymous user; use your e-mail address as password)
You will see a directory for each NLM database. Go to the directory you want and get the desired files.

  1. MEDLINE®/PubMed® (includes approximately 98% of all records in PubMed)

    NLM distributes MEDLINE/PubMed data in XML format.

    2009 Production Year Data

    The NLMMEDLINE DTD used for the 2009 production year data is available at: http://www.nlm.nih.gov/databases/dtd/nlmmedline_090101.dtd. This DTD references the NLMMedlineCitation DTD at http://www.nlm.nih.gov/databases/dtd/nlmmedlinecitation_090101.dtd that in turn references the new NLMSharedCatCit DTD at http://www.nlm.nih.gov/databases/dtd/nlmsharedcatcit_090101.dtd that in turn references the NLMCommon DTD at http://www.nlm.nih.gov/databases/dtd/nlmcommon_090101.dtd.

    Seven large sample files, each in .gz and .zip format, containing 30,000 records each, and named medsamp2009a.xml through medsamp2009g.xml, are available for FTP (see access instructions at the top of this page). These files contain records in MEDLINE, PubMed-not-MEDLINE, and OLDMEDLINE statuses. Please note that maintained versions of all sample records may reside in PubMed during the year.

    A small sample file using the 2009 DTDs and covering each of the five status categories of records distributed to MEDLINE/PubMed licensees (i.e., MEDLINE, In-Data-Review, In-process, PubMed-not-MEDLINE, and OLDMEDLINE) is available.

    Documentation

    A document describing the MEDLINE/PubMed data element descriptions (including definitions of the record status categories) is available at http://www.nlm.nih.gov/bsd/licensee/data_elements_doc.html. (Note: minor changes for 2009 are not yet represented in this document)

  2. CCRIS, GENE-TOX and HSDB®
    Sample CCRIS, GENE-TOX and HSDB data in an abbreviated XML format are available for FTP. See instructions at the top of this page for obtaining the abbreviated DTDs, sample records in XML format, and two files of documentation for each database from NLM's FTP server. The two documentation files are a .readme file containing definitions of the elements using legacy format element names and a conversion table showing conversion of data element names from legacy format to new XML element names.

  3. TOXLINE® Subset
    Sample TOXLINE Subset data in XML format are available for FTP. See instructions at the top of this page for obtaining sample records and DTDs from NLM's FTP server. Multiple DTDs and sample files are available for TOXLINE Subset: toxspec.dtd defines the XML for the entire TOXLINE Subset and archival.dtd defines the XML for the archival subfiles only. (Note that licensees must have special arrangements with BIOSIS and IPA before NLM will distribute their data). Other DTDs and sample files are present for each individual subfile of the database. Updates for the various subfiles comprising this database, if available, will be placed on the NLM server for licensees at the end of each month. The frequency of updates will be irregular, as NLM is dependent upon the outside suppliers whose schedules are not fixed. Each update file will be a complete replacement for that specific subfile.

  4. CHEMIDplus Subset and DIRLINE®
    Sample ChemIDplus and DIRLINE data in XML format are available for FTP. See instructions at the top of this page for obtaining the DTDs and sample records in XML format from NLM's FTP server. Note that licensees must contact U.S. Pharmacopeia Convention, Inc. (USP), for possible special arrangements before NLM will distribute ChemIDplus.

  5. Catfile, CatfilePlus, and Serfile
    Catfile is available in MARC 21 format only; CatfilePlus and Serfile are also available in XML format. Sample files of MARC 21 and XML-formatted products are available per access instructions at the top of this page.

    CatfilePlus in XML and Serfile in XML are defined by three NLM DTDs:
    The 2009 NLMCatalogRecord DTD is available at http://www.nlm.nih.gov/databases/dtd/nlmcatalogrecord_090101.dtd. This DTD references the NLMSharedCatCit DTD at http://www.nlm.nih.gov/databases/dtd/nlmsharedcatcit_090101.dtd that in turn references the NLMCommon DTD http://www.nlm.nih.gov/databases/dtd/nlmcommon_090101.dtd.

    Data element descriptions applicable to CatfilePlus in XML and Serfile in XML are available at http://www.nlm.nih.gov/bsd/licensee/catrecordxml_element_desc2.html. A description of attributes for these elements is available at http://www.nlm.nih.gov/bsd/licensee/catrecordxml_attributevalues_alpha2.html.

    General information on the MARC 21 record structure is available from the Library of Congress at http://lcweb.loc.gov/marc/marcdocz.html.

Last reviewed: 01 June 2009
Last updated: 01 June 2009
First published: 01 January 1999
Metadata| Permanence level: Permanence Not Guaranteed