- FAQ - 2011-03-25
This page is published periodically on the Scholarly HTML website here: http://scholarlyhtml.org/faq/
You can edit it here: http://okfnpad.org/schtml-faqs
(Where things aren’t clear please ask a clarifying question, as per the Socratic Method http://en.wikipedia.org/wiki/Socratic_Method as this will help expose missing concepts or implicit, but dubious, assumptions.)
Q: Does ScHTML have a blog and/or Twitter feed?
The current bloggers are (at least):
- 2011-03-04 peter murray-rust: Scholarly HTML: hackfest and visit of Peter Sefton and Martin Fenner http://blogs.ch.cam.ac.uk/pmr/2011/03/04/scholarly-html-hackfest-and-visit-of-peter-sefton-and-martin-fenner/
- 2011-03-07 martin fenner: http://blogs.plos.org/mfenner/2011/03/07/the-trouble-with-bibliographies/
- 2011-03-09 peter sefton: Hacking towards Scholarly HTML http://ptsefton.com/2011/03/09/hacking-towards-scholarly-html.htm
- 2011-03-11 claudia koltzenburg: “HTML is completely suitable for all forms of modern scientific publication” finds pmr http://blogs.ch.cam.ac.uk/pmr… http://friendfeed.com/claudiakoltzenburg/a4e61e20/html-is-completely-suitable-for-all-forms-of
- 2011-03-15 martin fenner: http://blogs.plos.org/mfenner/2011/03/15/discussing-science-with-microformats/ (one part of “There is currently no standard microformat for scholarly citations” links to http://blogs.ch.cam.ac.uk/pmr/2011/03/14/scholarly-html-%E2%80%93-major-progress/ )
- 2011-03-15 chris tennant: “Scholarly” HTML http://blog.libraryjournal.com/tennantdigitallibraries/2011/03/15/scholarly-html/
- 2011-03-17 peter sefton: Scholarly HTML: Fraglets of progress, http://ptsefton.com/2011/03/18/scholarly-html-fraglets-of-progress.htm
- 2011-03-18 peter murray-rust: Things that Scholarly HTMLers do, http://blogs.ch.cam.ac.uk/pmr/2011/03/18/things-that-scholarly-htmlers-do/
- 2011-03-19 martin fenner: A very brief history of Scholarly HTML http://blogs.plos.org/mfenner/2011/03/19/a-very-brief-history-of-scholarly-html/
- [please add if you HAVE blogged about #scholarlyhtml]
The twitter tag is #scholarlyhtml
Q. Who are the end-users/author-readers/literate agents of ScHTML?
ScHTML is primarily aimed at education (all levels–both students and educatiors), research and related activities. Everyone is potentially a contributor to the specs, and to the toolbase.
Q. What is in scope for ScHTML?
Most activities in education and research that involve scholarly content are in-scope.
- Grant applications
- Research reports
- Slide presentations
- Lecture notes and novel teaching/learning objects
- Student work
- Lab notebooks
- Letters to the Editor(s)
- Encyclopedia pages
- Reference works
- Data sheets
Q. What is it about ScHTML that makes it “scholarly”?
It’s developed by people involved in the process of education and research to enhance and validate those activities. It is designed to be applicable to content in academia that is the basis of scholarly communication and re-use. It’s bottom-up web-democratic so it embodies a non-authoritative approach which (should) be part of modern education and scholarship.
While ScHTML is being developed for and by scholars, we hope it will have wide applicability outside of this field.
Q. How will ScHTML facilitate the peer review process?
ScHTML does not specifically address ANY processes. However ScHTML supports the continuous editing and transformation of the content without semantic loss. It can support a reviewer to:
- edit the manuscript directly – CAVEAT here – if the author has used a tool chain to produce the doc that is only one way eg save as ScHTML from Word then this produces a document fork. So for review, prefer annotation (which could eventually be able to produce a change-set thats can be applied at source).
- annotate the manuscript
- search the document (e.g. with XSLT/Xpath, Solr, bespoke tools (e.g. chemical search)
- filter the doument (extract sub-portions, e.g. for plagiarism checking)
- submit the text to a language translator
- validate numeric values in the document
- rerun simulations, calculations, analyses etc.
Q. How does ScHTML help me answer “big picture” questions, e.g. “Are chemical reactions in the literature getting greener?” (cf. http://scienceonlinelondon.wikidot.com/topics:green-chain-reaction )?
By making information easier to extract and manage it is possible to collect large amounts and ask large questions. If documents were available in semantic form it would be much easier to extract information from them. (an example or citation re: scholarly publishing/querying would be helpful here for novices)
Q. How do I make notes about a ScHTML article/page? And/or how do I annotate?
ScHTML will develop an “annotation” convention. This will define what is being annotated (e.g. points or spans in the document), by whom, when, etc. This is declarative and different user agents could display in different ways (popup, strikethrough, etc.)
Q. What work is being done to integrate ScHTML annotation conventions with existing work?
Q. Is ScHTML a data format or a “text-only” format?
A text-based (non-binary) data format: ScHTML is intended to carry any information object. All characters in an ScHTML document must belong to a Unicode set (e.g. UTF-8, recommended and the default encoding for XML documents).
The incorporation of “data” and “metadata” can be done in several ways:
- a typed hyperlink to a separate well-defined addressable object (e.g. a PNG)
- use of a well-supported inline language (e.g. SVG)
- embedding a well-defined XML vocabulary (e.g. CML)
- linking to a well-defined XML vocabularly
- linking to an unknown object (e.g. foo.binary.dat). ScHTML deliberately makes no guarantee about the semantics of linked objects.
Embedded binary data is forbidden. There are no plans to embed Base64 or other ASCII-compatible translations of binary.
Q. Why is embedded binary forbidden?
Binary as such does not embed well in running ASCII text. Where this is the only way it is safer to encode it as ASCII (say Base64). Even then the semantics of such binary are not always well agreed (e.g. embedded an encoded PNG in running text may/will not be recognised as an image.
Q. Is there a “right way” to represent a given piece of knowledge, assertion, argument? What are the best practices?
The best practice is to find or develop a community that carries enough weight to make a representation which is widely shareable and understandable in practice. Thus, for example, representation of mathematics should be informed by MathML, LaTeX practice, Donald Knuth, AMS, Wikipedia, W3C etc. It’s possible that presentational MathML is best for some content, semantic MathML for others and that some require LaTeX. It’s possible that an authority such as AMS might create a protocol that satisfies all requirements. Alternatively the community might separate into 3 camps, each producing their own tools.
The “right” way has the minimal requirement that the sub-community identifies a convention and represents the convention in ScHTML. This will require a vocabulary and possibly semantics.
ScHTML will define a very small core – the absolute minimum infrastructure (perhaps author and date) and how to define conventions. Extensions (following these conventions) will be the responsibility of subcommunities. If, for example, you need annotations you create an annotation sub-community; for bibliography a separate bibliographysubcommunity; for chemistry another and so on. This creates an ecology rather similar to Wikipedia subcommunities. (<– I don’t know what you mean about Wikipedia subcommunities. What are you referring to? -Jodi) If enough people care about a topic they will define the components and build enough tools for the world to find their contribution useful.
Q. Does ScHTML place any restriction on the size of documents?
Not specifically. It’s unlikely that ScHTML will be significantly larger than unannotated HTML documents. Since XML or HTML compresses well it shouldn’t matter after compression.
Q. What about reading and writing ScHTML on mobile documents?
Aren’t size constraints (or hierarchical substructures) useful in that context?
We believe that the flow for HTML makes it well suited for mobiles and otehr devices (unlike PDF which has a fixed layout.
Q. What is out-of-scope for ScHTML?
ScHTML is not a universal approach to documents on the Web and is not intended as the primary tool for managing processes (e.g. student registration), regulated activities, e-commerce, sport, entertainment, etc. However, ScHTML could be contained within these documents.
Q. What is the minimal set of tools and/or infrastructure I need in order to author ScHTML?
For one thing you don’t necessarily require network access, but without this certain features will be lacking. [Agreed, but it's based on the likelihood that internet will be pervasive]. ScHTML should degrade gracefully if the Internet fails.
The minimal tools are a text editor and a human brain (this is what HTML had in 1992). You also require access (local or Internet) to the vocabularies used in an convention.
It helps to have an HTML-aware editor that has tag-balancing. It also helps if this has tag completion.
It helps to have immediate access to lookup from community vocabularies.
<–This section needs review — Jodi
Q. What does a ScHTML citation look like?
It takes a variety of forms, depending on the purpose. A simple href anchor element is an example, but there are other, e.g. so-called typed citations that use the Citation Typing Ontology.
<– Do you mean a citation inside ScHTML? Or a citation to a ScHTML document? I thought citations were for a subcommunity to deal with??? -Jodi
<!– we mean an in test ReferencePointer (e.g. a <span + href> pointing to a resource. This resource might be part of the document or a separate document
NONE – by design. ScHTML is decalarative. It defines the content and may indicate a human-readable set of possible behaviours. ScHTML would not expect to support <a href=”http://a.com/foo.js“>bar</a> in any specific way. It is simply a statement that there is a link to a given URI.
Q. Could you give an example of the declarative nature of a ScHTML document?
HTML has several of these.
- <address> is an address;
- <cite> is a citation
- <table> is a table
There is no required rendering of these (a machine can consume these as meaningful object). An aural user-agent might speak particular words.
ScHTML will stress the semantic nature of the markup and not how code should process it.
The intention is that if documents are created with tool A then they can be processed by reader B with whatever toolset they have.
Q. Isn’t presentation important? People spend a lot of time dealing with and massaging presentation since it matters to the human eye and human understanding? How will ScHTML be displayed?
This will depend on the browsers and printers. There is a tension between semantics and pressentation. It may be useful to think how you are communicating with an unsighted human or a machine. If the semantics are well represented it should be possible to render the document in a meaningful manner, whereas if there are weak semantics no azmount of postprocessig will help.
Q. Is ScHTML Turing Complete?
Doubtful. Declarative languages such as XSLT can be Turing Complete. It is possibly that the combination of ScHTML with a given user agent might be T-C but since ScHTML is declarative it needn’t “do” anything.
Q: What will be possible ways for non-technical end users to create ScHTML?
There is a balance between using simple text editors and using specific tools. Non-technical users could use desktop tools or Web-based forms. Additional certain applications could emit ScHTML directly. It will also be able to convert some documents into ScHTML using generic tools (e.g. an envisioned Word2ScHTML transformation tool- not yet written.). Most tools will have a specific applicability or will produce somewhat generic ScHTML
Q. How do I cite works that are not ScHTML?
Currently many citations are simply text strings that identify a citable resource. ScHTML will continue to use these strings as the identifier (e.g. “Phil Trans. Vol I Anno 1665″)
One method could be to create a URI for that work, hosting a page about it somewhere appropriate, and linking to that work via the URI or page URL.
<–Isn’t this for a subcommunity to decide? -Jodi
Q. How long is it envisaged ScHTML will be around before it is replaced by a newer technology or set of principles?
HTML is 20 years old. It shows every signs of being viable indefinitely. ScHTML should have the same lifetime.
Assuming humans maintain a global collaborative community, ScHTML principles should continue to flourish and evolve.
Q. Does ScHTML address issues of scalability? For instance, should ScHTML authoring tools support hundreds or thousands of “authors”?
Is there any technical problem? If not then it should support them.
Technical problems include: real-time editing/pingbacks about changes, etc.
Q. Is a Wikipedia article ScHTML?
Not currently because it is not marked as ScHTML, but it could probably be recast into conformant ScHTML. Wikipedia could be an excellent starting place – as there are many micro-conventions. <–I’m unclear what Wikipedia has to do with ScHTML -Jodi
Q. What is the minimal ScHTML document?
Conformant HTML5 with a formal indicator that the document is SchHTML. NOT definitive:
Other approaches might be:
The DOCTYPE html is required for the document to be HTML5 as outlined in section 8.1.1 of the Working Draft of 13 January 2011. (link to draft)
Q. What if my browser cannot support all the features of HTML5?
The desired behaviour is graceful degradation, which depends on the capabilities of a given browser. ScHTML consumers are expected to use a variety of browser tools, each suited for a particular use case.
We assume that modern browsers can support a reasonable subset of HTML5 features. (We do not seek backwards compatibilty with old HTML1 etc.). A typical graceful degradation for an SVG object could be (a) a replacement PNG (b) a message “this is an SVG object with the following metadata [title, description, etc.]
Q. What is the difference between ScHTML and x?
e.g. nanopublications, PDF, XML, RDF
ScHTML is conformant HTML5.
PDF is not HTML and does not interoperate with it. (Can we generate PDF from ScHTML? Semantically-enriched PDF?)
RDF is not HTML but can be interoperated with ScHTML
XML can interoperate with HTML5 and many objects will be represented in XML (e.g. MathML, GML, SVG, CML, etc.)
Nanopublications are an approach to annotation or subgraphs and (presumably) are normally represented in RDF (this does not answer the question)
Q. How do I “Save as..” ScHTML?
For human on browsers this may depend on the browser (and we are addressing this under Packaging).
For machines we simply save to disk
Q. Where do you discuss Packaging?
Q. Can ScHTML be printed?
Yes, and much more. ScHTML should provide facilities to transform an underlying, semantically rich, format into a variety of formats useful in dfifferent circumstances.
ScHTML should provide a series of fallbacks for print (e.g. printing an audiostream might print the metadata). For chemistry we can have a series of fields that are fallen-through. (What does ‘fallen-through mean? – Jodi) Choose between names and identifiers as available.
Q. Can ScHTML have an API or tools that make it easier for machine processing?
ScHTML is fundamentally designed for machine-processing. The processing could be generic (e.g. HTML tools or XML tools) or specific to a domain. How it is processed would be a domain decision – molecules can be visualised or indexed or computed…
Q. How do I pronounce ScHTML?
As with everything in ScHTML it’s up to the community.
“Scuttle-mol” is one interpretation (which English accent do I imagine here? could I have an audio file for this one? – not joking…http://18.104.22.168/tts/speech/2795298eb9f0c75a19ed998761d14c6b.wav alternatively http://22.214.171.124/tts/speech/e33cb328a9e0322b42ce1118fa765310.wav both will definitely need some practise… I’d like to suggest the two .wavs (or at least the first be added to every post on [whatever it will finally be] if possible
“scholarly HTML” => Scollum is quick to pronounce?
Q. What is the abbreviation for Scholarly HTML?
We are still struggling. “scHTML” (case as shown) is a sourceforge tool so maybe we need another
The weekend participants seem to have converged on ScHTML (Capitalized as shown)
What about ScholTML?-J
Q. What are the unique selling points (USP) for ScHTML?
- It’s Web-democratic.
- It allows the majority of users to become more webauthor-literate *in the process of writing scholarly stuff*
- It allows addressing INTO parts of documents
- It allows collaborative authoring
- It guides subcommunities to develop and formalize their vocabularies
- it reuses existing approaches (espcially W3C and other Open tools)
- Copied– these need cleanup–from LaTeX
- 1) it produces “beautiful” equations/typography, and 2) the textual form of input is natural and quick once you’ve learnt it
- * Dan Hagon: So ScHTML must directly address these concerns or it will fail in these specific communities
- * Egon Willighagen: LaTeX is also easy to reuse, because textual but the killer to me is still that I don’t have to care about the numbering and placement of figures and tables
- * Egon Willighagen: the auto-numbering of figures, tables, and citations in ScHTML might be a challenge
- * ptsefton – why do they need to be numbered? To refer to them, right? We can look at how to handle this is a similar way to reference handling – ie refere to doc parts by name and let post processors generate numbering schemes that suit readers.
- * Egon Willighagen: any discussions on that yet?
- * Claudia Koltzenburg: Designing tables is more fun in LaTeX than in html/xml
- * Pablo Echenique: Using BibTeX for bibliography is very powerful and reusable
- * Pablo Echenique: It is the fastest way I know for writing math in a computer
- * Pablo Echenique: Almost any journal accept it
- * Pablo Echenique: A rich and powerful number of packages that provide extra capabilities
[see right column re USP:LaTeX communities with remarks by Claudia, Dan and Egon om March 12 between 19:17 and 21:23]
[SUGGEST Copying these into the body as they may get lost, done, see here: http://okfnpad.org/schtml-how-to-win-over-latex-users]
Q. How does ScHTML facilitate collaborative authoring? What tools are required for this?
The collaborative environment is the hardest part.
Small Native ScHTML documents could be authored in Etherpads immediately. Not ideal for production but useful for developing the approach. Perhaps good for mentoring of newcomers.
Google Docs could be used
Most wikis will support this.
Diffs (ideally really smart ones!)
It will require some work at present to enforce ScHTML compatibility. Initially we shall require to take a snapshot and run through a validator
Q. Who owns ScHTML?
Currently no individual or organization. We expect it to evolve as an identified community. We would expect to have a web page. In some respects it resembles Wikipedia, but without the corporate umbrella.
Q. By what process will ScHTML be developed in the future?
We expect to produce a version 0.2 by next Thursday. <–non-relative date please (Jodi) We hope that the world (not just humans in Cambridge) will all contribute. We are gathering a set of early stakeholders (e.g. funders, publishers) and extend this to any more who are interested. We hope that it will overlap with W3C interests and processes.
Q. What about copyright? Are there any aspects of ScHTML that make copyright in ScHTML formatted works irrelevant?
We will try to find an appropriate strategy for ScHTML specs, tools etc. that protect the emerging material. This might be CC-BY or CC0/PDDL.
ScHTML cannot override legal IPR (i.e. if content or part of content is copyright, then it will retain its copyright in ScHTML). This might lead to microIPR within a ScHTML document. (<– please add a link/cite for microIPR – Jodi)
Q. Will there be converters between SchHTML and other formats?
We hope that conversion TO ScHTML will have lossless explicit semantics. Conversion FROM ScHTML will depend on the capacity of the target.
Thus conversion of Word to ScHTML should be possible without loss of semantic content but will probably destroy formatting and some styling. Conversion of ScHTML TO Word (perhaps via RTF) should be possible but will lose semantics (e.g. the roles of the microformats).
Q. Should I convert my documents from their current format into ScHTML or leave them in their original format and simply link?
Depends on what you want and what tools you have. Thus if you have a PDF document it’s very difficult to convert it to anything useful. If you have an Excel spreadsheet you will lose a lot of information. If you have a .DOCX you may be able to capture most of what is in it.
Q. If Henry Oldenburg were still around today would he use ScHTML and why?
If he were at the start of his career (or young at heart) almost certainly Yes. Why – because it would be the best way to change the way people think.
Q. I’m a graduate student. How will ScHTML help me?
It will be possible to devise a “thesis” convention that guides tools to collect and assemble material for the thesis and then indexes and aggregates it into the final document. This should solve the problems of renumbering references, figures, etc and making sure that all components are present
Q. I’m a researcher. How will ScHTML help me?
Q. I’m a reviewer. How will ScHTML help me?
Q. I’m a journal editor. How will ScHTML help me?
It will help you attract authors who are willing and able to go ahead in open standard-based scientific publishing and possibly also in open science.
Q. I’m a publisher. How will ScHTML help me?
It will help you attract authors who are willing and able to go ahead in open standard-based scientific publishing and possibly also in open science. We’re keen to involve publishers and had invaluable support from BrianMcMahon (IUCr) at the hackfest.
If the eco system supports ScHTML for submission, review processing and dissemination then all processes should be easier and many will take much less time and cost less. If you are interested in your material being re-used (either externally or internally) then ScHTML will be a huge improvement.
Q. I’m a reader. How will ScHTML help me?
Q. I’m a librarian. How will ScHTML help me?
Q. I’m a lecturer. How will ScHTML help me?
Q. I’m an undergraduate. How will ScHTML help me?
Q. I’m a funder of research. How will ScHTML help me?
It will help you attract researchers who are willing and able to go ahead in open standard-based scientific publishing and possibly also in open science
Q. I run a University. How will ScHTML help me?
Q. I’m an industrialist. How will ScHTML help me?
Q. I’m a tax payer. How will ScHTML help me?
Q. How does ScHTML manage versions?
There is initial agreement that it is valuable to have revision dates and authors. It’s beyond the current scope to require that ScHTML docs carry all the diffs from past versions (e.g. track changes). However if wikis and similar tools are used it may be possible to extract their content.
Q. How does ScHTML automatically renumber figures, tables, references etc.?
We shall need to build tools.