Welcome, Guest. Please Login
Tinderbox
  News:
IMPORTANT MESSAGE! This forum has now been replaced by a new forum at http://forum.eastgate.com and no further posting or member registration is allowed. The forum is still accessible via read-only access for reference purposes. If you wish to discuss content here, please use the new forum. N.B. - posting in the new forum requires a fresh registration in the new forum (sorry - member data can't be ported).
  HomeHelpSearchLogin  
 
Pages: 1
Send Topic Print
Re: Sente and Tinderbox (Read 8590 times)
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Sente and Tinderbox
Dec 13th, 2009, 3:50pm
 
Sente will export to XML. It doesn't drag/drop into TB but here's the sort of XML data availalble:
Code:
<?xml version="1.0" encoding="UTF-8" ?>
<tss:senteContainer version="1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.thirdstreetsoftware.com/SenteXML-1.0 SenteXML.xsd" xmlns:tss="http://www.thirdstreetsoftware.com/SenteXML-1.0" xmlns="http://www.thirdstreetsoftware.com/SenteXML-1.0" >
	<tss:library>
		<tss:references>
			<tss:reference>
				<tss:publicationType name="Journal Article"/>
				<tss:authors>
					<tss:author role="Author">
						<tss:surname>Shatkay</tss:surname>
						<tss:forenames>Hagit</tss:forenames>
						<tss:initials>H</tss:initials>
					</tss:author>
					<tss:author role="Author">
						<tss:surname>Chen</tss:surname>
						<tss:forenames>Nawei</tss:forenames>
						<tss:initials>N</tss:initials>
					</tss:author>
					<tss:author role="Author">
						<tss:surname>Blostein</tss:surname>
						<tss:forenames>Dorothea</tss:forenames>
					<tss:initials>D</tss:initials>
				</tss:author>
				</tss:authors>
				<tss:dates>
					<tss:date type="Publication" day="15" month="7" year="2006"/>
					<tss:date type="Entry" day="30" month="9" year="2009"/>
					<tss:date type="Modification" day="30" month="9" year="2009"/>
				</tss:dates>
				<tss:characteristics>
					<tss:characteristic name="articleTitle">Integrating image data into biomedical text categorization.</tss:characteristic>
					<tss:characteristic name="publicationTitle">Bioinformatics</tss:characteristic>
					<tss:characteristic name="abstractText">Categorization of biomedical articles is a central task for supporting various curation efforts. It can also form the basis for effective biomedical text mining. Automatic text classification in the biomedical domain is thus an active research area. Contests organized by the KDD Cup (2002) and the TREC Genomics track (since 2003) defined several annotation tasks that involved document classification, and provided training and test data sets. So far, these efforts focused on analyzing only the text content of documents. However, as was noted in the KDD'02 text mining contest-where figure-captions proved to be an invaluable feature for identifying documents of interest-images often provide curators with critical information. We examine the possibility of using information derived directly from image data, and of integrating it with text-based classification, for biomedical document categorization. We present a method for obtaining features from images and for using them-both alone and in combination with text-to perform the triage task introduced in the TREC Genomics track 2004. The task was to determine which documents are relevant to a given annotation task performed by the Mouse Genome Database curators. We show preliminary results, demonstrating that the method has a strong potential to enhance and complement traditional text-based categorization methods.</tss:characteristic>
					<tss:characteristic name="affiliation">School of Computing, Queen's University, Kingston, Ontario, Canada. shatkay@cs.queensu.ca</tss:characteristic>
					<tss:characteristic name="issue">14</tss:characteristic>
					<tss:characteristic name="language">eng</tss:characteristic>
					<tss:characteristic name="pages">e446-53</tss:characteristic>
					<tss:characteristic name="publicationCountry">England</tss:characteristic>
					<tss:characteristic name="publicationStatus">Published</tss:characteristic>
					<tss:characteristic name="UUID">2B97B24F-6329-46C8-8C30-1FC46AAB3EAE</tss:characteristic>
					<tss:characteristic name="volume">22</tss:characteristic>
					<tss:characteristic name="DOI">10.1093/bioinformatics/btl235</tss:characteristic>
					<tss:characteristic name="ISSN">1460-2059</tss:characteristic>
					<tss:characteristic name="PII">22/14/e446</tss:characteristic>
					<tss:characteristic name="PMID">16873506</tss:characteristic>
					<tss:characteristic name="Web data source">PubMed</tss:characteristic>
					<tss:characteristic name="US NLM ID">9808944</tss:characteristic>
					<tss:characteristic name="publicationStatus">Published</tss:characteristic>
				</tss:characteristics>
				<tss:keywords>
					<tss:keyword assigner="Medline">Artificial Intelligence</tss:keyword>
					<tss:keyword assigner="Medline">Computer Graphics</tss:keyword>
					<tss:keyword assigner="Medline">Database Management Systems</tss:keyword>
					<tss:keyword assigner="Medline">Periodicals as Topic</tss:keyword>
					<tss:keyword assigner="Medline">research support, non-u.s. gov't</tss:keyword>
					<tss:keyword assigner="Medline">Systems Integration</tss:keyword>
					<tss:keyword assigner="Sente User Michael Cinkosky">Biomedical</tss:keyword>
					<tss:keyword assigner="Sente User Michael Cinkosky">Document Classification</tss:keyword>
					<tss:keyword assigner="Sente User Michael Cinkosky">Text Mining</tss:keyword>
				</tss:keywords>
				<tss:attachments>
					<tss:attachmentReference>
						<type>PDF Document</type>
						<URL>file://localhost/Users/mwra/Documents/Sente/Sample%20Library.sente6lib/Contents/Attachments/2B/2B97B24F-6329-46C8-8C30-1FC46AAB3EAE/36645D04-7333-42AF-BEC3-27696605E909.pdf</URL>
					</tss:attachmentReference>
				</tss:attachments>
			</tss:reference>
		</tss:references>
	</tss:library>
</tss:senteContainer>
 


A bit of command line magic  + Explode could probably massage this into TB.

[edit - typo]
Back to top
 
« Last Edit: Mar 13th, 2012, 6:09pm by Mark Anderson »  

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Sente and Tinderbox
Reply #1 - Dec 14th, 2009, 3:53am
 
Sente can import data in many difference formats, including:
  • Sente XML (the best format for importing data from another copy of Sente)
  • EndNote XML (EndNote 7 and later)
  • Refer (both generic and EndNote)
  • BibTeX
  • CSA
  • MARC21
  • PubMed/Medline XML
  • Reference Manager (RIS)
  • RefWorks
  • Tagged records (Medlars, Ovid, Toxline)
  • Web of Science


Sente can export to these formats:
  • EndNote XML
  • BibTeX
  • BibTeX (Unicode) (BibTeX format with Unicode characters)
  • Refer
  • Refer (EndNote) (A variant of Refer format that EndNote handles)
  • Sente XML (The only format that can handle all of the information found in a Sente reference)
If trying to do TB/Sente transfers, the Sente XML seems to be the best method.

~~~~~~
Edit: fixed list formatting
Back to top
 
« Last Edit: Feb 9th, 2012, 5:42am by Mark Anderson »  

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Mark Anderson
YaBB Administrator
*
Offline

User - not staff!

Posts: 5689
Southsea, UK
Re: Sente and Tinderbox
Reply #2 - Dec 14th, 2009, 4:23am
 
The Sente XML layout seems to boil down to this:
Code:
<?xml version="1.0" encoding="UTF-8" ?>
<tss:senteContainer version="1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.thirdstreetsoftware.com/SenteXML-1.0 SenteXML.xsd" xmlns:tss="http://www.thirdstreetsoftware.com/SenteXML-1.0" xmlns="http://www.thirdstreetsoftware.com/SenteXML-1.0" >
	<tss:library>
		<tss:references>
			<tss:reference>
				<tss:publicationType name="Journal Article"/>
				<tss:authors>
					<!-- one or many -->
					<tss:author role="Author">
						<tss:surname></tss:surname>
						<tss:forenames></tss:forenames>
						<tss:initials></tss:initials>
					</tss:author>
				</tss:authors>
				<tss:dates>
					<!-- Ref publication date -->
					<tss:date type="Publication" day="" month="" year=""/>
					<!-- Sente item creation  date -->
					<tss:date type="Entry" day="" month="" year=""/>
					<!-- Sente item last edited date -->
					<tss:date type="Modification" day="" month="" year=""/>
				</tss:dates>
				<tss:characteristics>
					<!-- include characteristic if populated below are some choices-->
					<tss:characteristic name="articleTitle"></tss:characteristic>
					<tss:characteristic name="publicationTitle"></tss:characteristic>
					<tss:characteristic name="abstractText"></tss:characteristic>
					<tss:characteristic name="affiliation"></tss:characteristic>
					<tss:characteristic name="issue"></tss:characteristic>
					<tss:characteristic name="language"></tss:characteristic>
					<tss:characteristic name="pages"></tss:characteristic>
					<tss:characteristic name="publicationCountry"></tss:characteristic>
					<tss:characteristic name="publicationStatus"></tss:characteristic>
					<tss:characteristic name="UUID"></tss:characteristic>
					<tss:characteristic name="volume"></tss:characteristic>
					<tss:characteristic name="DOI"></tss:characteristic>
					<tss:characteristic name="ISSN"></tss:characteristic>
					<tss:characteristic name="PII"></tss:characteristic>
					<tss:characteristic name="PMID"></tss:characteristic>
					<tss:characteristic name="Web data source"></tss:characteristic>
					<tss:characteristic name="US NLM ID"></tss:characteristic>
					<tss:characteristic name="publicationStatus"></tss:characteristic>
				</tss:characteristics>
				<tss:keywords>
				<!-- one or many -->
					<tss:keyword assigner="person"></tss:keyword>
				</tss:keywords>
				<tss:attachments>
					<tss:attachmentReference>
						<!-- one or many -->
						<type></type>
						<URL></URL>
					</tss:attachmentReference>
				</tss:attachments>
			</tss:reference>
		</tss:references>
	</tss:library>
</tss:senteContainer> 


Considerations:
  • Multiple authors - you might need to do some data munging on the TB side here to cope with multiple authors.
  • Dates. The input are numbers, month & day are 1 are 2 digits, year is 4 digits. TB date formats can copy with this.
  • Characteristics. Use as many or little as you require.
  • Keywords. Use as many or little as you require.
  • Attachments. You might need to experiment withdata here.
Exporting from Sente at any point will give you a specimen XML file to reverse engineer.

Sente to TB drag-drop doesn't work. Copy/paste of a single Sente record to TB gives a minimal data format as note $Text (with no title set), e.g.:

Shatkay, Hagit, Nawei Chen, and Dorothea Blostein. "Integrating Image Data Into Biomedical Text Categorization." Bioinformatics 22, no. 14 (2006): doi:10.1093/bioinformatics/btl235.

Copy/pasting multiple items seem not to work.

Dragging a Sente XML file onto TB imports the whole file as a note.  Explode could be used to tease out items but some pre-processing of the date via command line beforehand might help.

I hope that helps!
Back to top
 
« Last Edit: Feb 9th, 2012, 5:44am by Mark Anderson »  

--
Mark Anderson
TB user and Wiki Gardener
aTbRef v6
(TB consulting - email me)
WWW shoantel   IP Logged
Pages: 1
Send Topic Print