Position in the team:First Page > Company topic > sanitary napkin making machine cost notebook making machine:Making the collective knowledge of chemistry open and machine actionable

sanitary napkin making machine cost notebook making machine:Making the collective knowledge of chemistry open and machine actionable

Hẹn hò:2022-4-27

sanitary napkin making machine cost notebook making machine:Making the collective knowledge of chemistry open and machine actionable

  The open-science platform we propose in this Perspective provides a central hub for all the synthetic or analytical work of a chemist or materials scientist. Underpinning this platform are two common principles we feel are essential to make it truly open science, such that it can benefit data-intensive research and address reproducibility problems (thesis 1 in Fig. 1). First, FAIR data should be at the core; all data that enter the platform need to be converted into an open, structured and standardized form with the appropriate linked metadata—this is the main functionality that an ELN should provide (thesis 2). Second, open science also implies ensuring that other researchers can reproduce and build on the results. Therefore, the platform should be able to export the data in a form that is machine-readable and interpretable and that can easily be reused by other groups (thesis 3). In addition, in an open-science vision the tools used to analyse the data should be made available to anyone in the world who might be interested in reproducing the results or reinterpreting the data. This leads to the notion that such a platform is ideally developed as a modular open-source infrastructure in which the analysis code can be scrutinized, reused and improved by the community (thesis 4).

  If such a platform becomes widely used and supported by the community, the possibilities are unlimited. The way we assess scientific work and credit scientific outputs has the potential to change. Trusted time stamps can provide unique proofs of discovery, going beyond the compressed and delayed priority claim that preprints can provide38, and peers can continuously provide feedback about the raw data, the analyses and the conclusions. An interesting form of making the full research record public, and hence open for feedback, has already been proposed in the context of open notebook science39. If this information is shared with the community, one can build a community-driven version of the Organic Syntheses journal in which the verification of the results is done continuously by the community and not (only) in a lab of one of the members of the editorial board. Importantly, this version would also contain information about the attempts that did not work and in this way document the process, and the learnings, that led to the final result. If data are available in digital form, the peer-review process can be supported with automated checks, for example, to verify the consistency of NMR assignments, and so highlight potential issues for peer reviewers.

  The most important reason for embracing the approach described in this Perspective is that it can change the way we do chemistry. Many of us were educated before the digital era, with the idea that if we publish all the data that we generate, any human being will become lost in the sheer volume of data. Data-intensive science, however, fundamentally changed this point of view. With machine learning, we have the tools to analyse orders of magnitude more data than human being can process, discover correlations in millions of data points and build predictive models40. For example, if we aim to synthesize a compound, a simple query in the collective ELN database might show that for one synthesis route there are 100 ‘failed’ reactions and two successful ones, whereas another route shows 90 successful and ten ‘failed’ attempts—which clearly indicates which synthesis route should be tried first. Undoubtedly, a very experienced chemist might have very good intuitions about what works and what does not. However, for a new student in the field, this collective knowledge now becomes accessible. Clearly, we can go beyond this simple search and try to harvest the collective knowledge generated by all chemists, using machine-learning techniques to capture subtle correlations across the chemical space of the millions of reactions that have been carried out in the world. In this respect, machine learning is not different from the experienced chemist; most probably, it can learn even more from ‘failed’ and partially successful experiments as from the successful ones. However, in contrast with the chemist, it typically needs large amounts of structured data—which we could easily generate in chemistry.

  Another issue that the chemistry community faces with open data is that everyone agrees that there are benefits in making data reusable and in reporting ‘failed’ experiments, but often there is hesitation from individual researchers to adopt this behaviour until all members of the community do so. The social sciences give us a range of possible solutions to this problem setting35,41. One approach is some kind of compulsion. For example, the fact that the submission of DNA sequences is a condition for publication in the leading scientific journals of the field is seen as one of the reasons for the success of the GenBank database42. This, in turn, opened many doors for bioinformatics research. We also witnessed that for small groups, which include leaders of the field, agreements such as the ‘Bermuda Principles’, which require that DNA sequence data are automatically released in publicly accessible databases directly after the measurement, can be achieved. In chemistry, we have observed similar dynamics in crystallography, in which crystallographic information files must be deposited with the Cambridge Structural Database, where they are made freely accessible (and searchable) on publication. This led the European Commission to conclude that “the requirement from academic journals that authors provide data in support to their papers has proven to be potentially culture-changing, as has been the case in crystallography”31. What we can also learn from crystallography is that once some standards are adopted, automatic checks (such as checkCIF) can be implemented.

  From the Structural Genomics Consortium and related initiatives (for example, Open Source Malaria43 and COVID Moonshot44) we can learn that openness can also be enforced at the level of a consortium, for example, by requesting that members openly publish the protein structures and not file patents for the research outputs. This public–private partnership model seems to be successful because the private sector, which provides the funding and ‘chemical probes’ (potent inhibitors of protein function), can guide the research—that is, prioritize structures that should be solved—without disclosing the companies research and development priorities as the consortium anonymizes the ‘wish lists’45. The utility of such a consortium can best be seen at the precompetitive stage (that is, the early stages of drug discovery) during which it can share risks, enhance collective learning and avoid duplication in new areas of (basic) science46. This is particularly interesting in the case of ‘chemical probes’, which are best produced by experienced industrial medicinal chemists. However, industry would profit enormously if academia could use such probes to validate drug targets47. For this reason, the Structural Genomics Consortium makes them available as ‘open access’ reagents—under the conditions that the research outputs are made available in the public domain. A similar ‘physical open access’ approach is pursued by the Molecule Archive of the Compound Platform at the Karlsruhe Institute of Technology, which acts as a mediator for compound exchange: synthetic chemists can ‘archive’ their compounds (which increases their visibility), which can then be requested for biological screenings48.

  Beyond these measures, we need to change incentive structures by creating better ways to give researchers credit for curating data. ELNs could help in this regard by storing the ‘credit’ chain when data are imported and automatically append the citation when datasets are prepared for publication.

  Beyond that, the adaption of this data-centric approach to chemistry requires changes in the curriculum at universities to raise the awareness of such new developments, as well as the need for, and the promises of, data curation. Ideally, open-science solutions, such as the infrastructure we describe here, should already be introduced in the undergraduate curriculum. Students can record the results of their labs in ELNs, harvest the data in machine-learning classes, predict the infrared spectrum they just measured in computational chemistry classes49 and use open notebooks to comment on and improve each other’s work. Towards this goal, we define commonly used technical terms in a glossary in the Supplementary Information.

  The question that might still be open at this point is how realistic the widespread adoption of such an open-data platform across the chemistry community is. We argue that we have all the basic tools and technology in place. For many of the key design aspects, here we use examples from our own work, which is openly available, can be tried out by the community and can be reused in other implementations. There are also several initiatives (Supplementary Table 3) that work on some of the aspects we emphasize in this Perspective. One example is the German NFDI4Chem consortium50,51, which is embedded in the larger German initiative for the creation of National Research Data Management Infrastructures (which also includes NFDI4Cat52 for catalysis research and NFDI4Ing for the engineering sciences), and aims to ‘FAIRify’ the full data life cycle in chemistry. However, we, as a community, also have to realize that we are in a phase in which there are an insane number of initiatives, proposed data schema and ELNs. The task we as a community face is to embrace and connect the efforts. Only if we succeed in making these tools interoperable we will be able to leverage the full potential of data and the digital age. One promising way forward is the formation of data communities53, in which experimentalists and ELN developers work together to develop a domain-specific (for example, porous materials or batteries) open-science infrastructure by combining, extending and polishing the existing building blocks.

  From our perspective, there are a concrete few steps that need to be implemented to reach this goal:

  To conclude, we emphasize that the technology is here not only to facilitate the process of publishing data in a FAIR format to satisfy the sponsors, but also to ensure that the combination of chemical data, FAIR principles and openness gives scientists the possibility to harvest all data so that all chemists can have access to the collective knowledge of everybody’s successful, partly successful and even ‘failed’ experiments.

sanitary napkin making machine cost notebook making machine:Making the collective knowledge of chemistry open and machine actionable

No data